๐บ YouTube Trending Video Analysis
A full-stack analytics solution designed to uncover patterns in content performance, audience engagement, and channel behavior across global YouTube trends. This project analyzes 100,000+ trending videos to empower content strategists, media analysts, and creators with predictive insights and interactive dashboards for smarter publishing decisions.
๐ GitHub Project Repository
๐ Click to view YouTube-Analysis
๐ง Project Overview
Understanding what makes a video trend is key to maximizing reach and engagement. This project delivers an end-to-end analytics platform that enables:
- ๐ Performance forecasting based on video metadata
- ๐ Engagement-driven content strategy
- ๐ Country-specific trend analysis
- ๐ Dashboard-driven insights for creators and media teams
๐ฏ Key Objectives
- Clean and preprocess multi-country YouTube data
- Engineer features for view prediction and dashboarding
- Build regression models to estimate video views
- Deploy interactive dashboards for strategic content planning
๐ Project Structure
File Name |
Description |
youtube.csv |
Raw dataset with trending video records |
cleaned_youtube.csv |
Preprocessed dataset with feature engineering |
youtube.sql |
SQL queries for data extraction and filtering |
sqlconnect.py |
Python script for SQL database connection |
youtube analysis.ipynb |
Jupyter notebook with EDA, modeling, and insights |
trending_model.pkl |
Trained regression model for view prediction |
trending_features.pkl |
Feature list used in modeling |
channel_encoder.pkl |
Label encoder for channel names |
country_encoder.pkl |
Label encoder for country names |
app.py |
Streamlit app for dashboard deployment |
YOUTUBE ANALYSIS DASHBOARD.accdb |
MS Access dashboard visualizing engagement and performance trends |
๐งน Data Preprocessing
- Encoded categorical variables (
channel_title
, country
, category_id
)
- Extracted publish time features (
hour
, day
, month
)
- Engineered engagement metrics (
likes
, dislikes
, comment_count
)
- Removed duplicates and handled missing values
- Normalized continuous variables for modeling
๐ Exploratory Data Analysis
- ๐ View distribution across countries and categories
- ๐ Engagement metrics over time
- ๐บ Channel behavior and upload consistency
- ๐ง Correlation matrix of features influencing views
- ๐
Publish time impact on video performance
๐ค Modeling Approach
- Target Variable:
views
- Algorithms Used: Linear Regression, Random Forest Regressor, XGBoost Regressor
- Evaluation Metrics: MAE, RMSE, Rยฒ Score
- Top Features:
likes
, comment_count
, publish_hour
, channel_title
, country
๐ Dashboard Overview
๐ท MS Access Dashboard
Visualizes YouTube performance metrics and engagement trends:
- ๐ View distribution by country and category
- ๐ Engagement metrics over time
- ๐ KPI cards for average views, likes, and comments


๐ข Streamlit App
Interactive dashboard for real-time video performance prediction:
- ๐บ Input video metadata to forecast views
- ๐ Feature importance visualization
- ๐ Country and category filters
- ๐ Engagement-driven prediction interface

๐ Deployment
- Model serialized with
joblib
as trending_model.pkl
- Dashboard deployed via Streamlit Cloud
- SQL integration for dynamic video filtering
- Git LFS used for large file management
๐ง Business Impact
- Forecasts video performance for content planning
- Identifies high-engagement formats and categories
- Supports regional strategy with country-specific insights
- Enhances creator decision-making with predictive analytics
๐ ๏ธ Tech Stack
- Python: Pandas, NumPy, Scikit-learn, Streamlit
- SQL: Data extraction and filtering
- Visualization: MS Access, Matplotlib, Seaborn, Plotly
- Deployment: Streamlit Cloud, GitHub, Git LFS
๐ Future Enhancements
- Integrate NLP for sentiment analysis from video descriptions
- Add clustering for channel behavior segmentation
- Enable user-uploaded video metadata for prediction
- Expand dashboard to include subscriber growth forecasting
๐ค Author
Anesh Raj
๐ GitHub Profile