๐ Amazon Customer Analysis
A comprehensive analytics solution designed to understand customer behavior, purchasing patterns, and churn risk in an Amazon environment. This system empowers marketing teams, product managers, and business analysts with predictive insights and interactive dashboards.
๐ GitHub Project Repository
๐ Click to view Amazon-Customer-Analysis
๐ง Project Overview
Understanding customer behavior is key to driving retention, personalization, and profitability. This project analyzes 250,000+ customer transactions to uncover purchasing trends, churn indicators, and demographic influences.
Key Objectives:
- Clean and preprocess customer transaction data
- Engineer features for churn modeling and dashboarding
- Build classification models to predict churn
- Deploy interactive dashboards for business decision-making
๐ Project Structure
File Name |
Description |
ecommerce_customer_data_large.csv |
Raw dataset with customer transactions |
cleaned_ecommerce.csv |
Preprocessed dataset with feature engineering |
churn_model.pkl |
Trained model for churn prediction |
feature_names.pkl |
Feature list used in model training |
label_encoders.pkl |
Encoders for categorical variables |
ecommerce.sql |
SQL queries for data extraction and filtering |
sqlconnect.py |
Python script for SQL database connection |
app.py |
Streamlit app for dashboard deployment |
E_COMMERCE.ipynb |
Jupyter notebook with EDA, modeling, and insights |
ecommerce_customer_analysis_dashboard |
Interactive dashboard file (Streamlit or Power BI) |
๐งน Data Preprocessing
- Imputed missing values in
Returns
and Product Price
- Normalized continuous variables (
Total Purchase Amount
, Customer Age
)
- Encoded categorical features (
Payment Method
, Product Category
, Gender
)
- Removed outliers in
Quantity
and Price
- Verified data types and optimized memory usage
๐ Exploratory Data Analysis
- Purchase trends across product categories and payment methods
- Age and gender segmentation of customers
- Return behavior and its impact on churn
- Correlation matrix of purchase features and churn
- Seasonal and temporal purchase patterns
๐ค Modeling Approach
- Target Variable:
Churn
- Algorithms Used: Logistic Regression, Random Forest, XGBoost
- Evaluation Metrics: Accuracy, Precision, Recall, F1 Score
- Feature Importance:
Returns
, Total Purchase Amount
, Product Category
, Payment Method
, Customer Age
๐ Dashboard Overview
๐ท Power BI Dashboard
Visualizes key customer metrics and churn insights:
- ๐ Purchase behavior segmented by category and payment method
- ๐ Return rate analysis and churn correlation
- ๐ฅ Customer demographics: age and gender distribution
- ๐ Monthly purchase trends and seasonal patterns
- ๐ง Feature importance from churn prediction model

๐ข Streamlit App
Interactive dashboard with real-time filtering and model predictions:
- ๐๏ธ Dynamic filtering by product category, payment method, and demographics
- ๐ Churn probability predictions for selected customer segments
- ๐ Visual breakdown of feature importance and churn drivers
- ๐ SQL-integrated querying for custom customer views

๐ Deployment
- Model serialized with
joblib
as churn_model.pkl
- Dashboard deployed via Streamlit Cloud
- SQL integration for dynamic customer querying
- Git LFS used for large file management
๐ง Business Impact
- Identifies high-risk churn segments for retention campaigns
- Improves targeting strategies based on purchase behavior
- Enhances product bundling and pricing decisions
- Supports personalized marketing with demographic insights
๐ ๏ธ Tech Stack
- Python: Pandas, NumPy, Scikit-learn, Streamlit
- SQL: Data extraction and filtering
- Visualization: Matplotlib, Seaborn, Plotly
- Deployment: Streamlit Cloud, GitHub, Git LFS
๐ Future Enhancements
- Integrate NLP for customer review sentiment analysis
- Add lifetime value prediction for customer cohorts
- Enable user-uploaded transaction logs for analysis
- Expand dashboard to include product recommendation engine
๐ค Author
Anesh Raj
๐ GitHub Profile