๐งพ Merchant KYC Analysis
A full-stack compliance analytics solution designed to evaluate merchant onboarding, KYC verification, and risk profiling. This project analyzes 100,000+ merchant records to empower financial institutions and compliance teams with predictive insights and interactive dashboards for fraud prevention and regulatory adherence.
๐ GitHub Project Repository
๐ Click to view Merchant-KYC-analysis
๐ง Project Overview
KYC compliance is a cornerstone of financial integrity and fraud mitigation. This project delivers an end-to-end analytics platform that enables:
- ๐ Verification tracking and document validation
- ๐ก๏ธ Risk profiling and review prediction
- ๐
Onboarding timeline analysis
- ๐ Executive dashboards for compliance monitoring
๐ฏ Key Objectives
- Clean and preprocess merchant KYC data
- Engineer features for review status and risk modeling
- Build classification models to predict compliance outcomes
- Deploy interactive dashboards for stakeholder decision-making
๐ Project Structure
File Name |
Description |
merchant_kyc_100000.csv |
Raw dataset with merchant KYC records |
cleaned_merchant_kyc.csv |
Preprocessed dataset with feature engineering |
kyc_review_model.pkl |
Trained model for predicting review status |
merchant_kyc.sql |
SQL queries for data extraction and filtering |
sqlconnect.py |
Python script for SQL database connection |
app.py |
Streamlit app for dashboard deployment |
merchant_kyc.ipynb |
Jupyter notebook with EDA, modeling, and insights |
merchant_kyc_dashboard |
Power BI or Streamlit dashboard visualizing compliance metrics |
๐งน Data Preprocessing
- Verified PAN and GST formats
- Encoded categorical features (
address_proof_type
, kyc_status
, risk_level
)
- Converted
onboarding_date
to datetime format
- Normalized
compliance_score
- Removed duplicates and ensured type consistency
๐ Exploratory Data Analysis
- ๐ Distribution of KYC status across address proof types
- ๐ก๏ธ Compliance score trends by risk level
- ๐ Review status breakdown by document validity
- ๐
Onboarding timeline and volume analysis
- ๐ง Correlation matrix of compliance features
๐ค Modeling Approach
- Target Variable:
review_status
- Algorithms Used: Logistic Regression, Random Forest, XGBoost
- Evaluation Metrics: Accuracy, Precision, Recall, F1 Score
- Top Features:
compliance_score
, kyc_status
, address_proof_valid
, risk_level
๐ Dashboard Overview
๐ท Power BI Dashboard
Visualizes KYC performance and risk segmentation:
- ๐งพ KYC verification summary
- ๐ Compliance score distribution
- ๐ก๏ธ Risk level segmentation
- ๐
Onboarding trends and review status
- ๐ Document validity and fraud flags


๐ข Streamlit App
Interactive dashboard for real-time compliance insights:
- ๐ Predict review status based on merchant attributes
- ๐ Feature importance visualization
- ๐งพ KYC status and document validity filters
- ๐ก๏ธ Risk profiling and fraud flagging


๐ Deployment
- Model serialized with
joblib
as kyc_review_model.pkl
- Dashboard deployed via Streamlit Cloud
- SQL integration for dynamic merchant querying
- Git LFS used for large file management
๐ง Business Impact
- Flags high-risk merchants during onboarding
- Improves compliance tracking and audit readiness
- Reduces manual review workload with predictive insights
- Enhances fraud detection and regulatory reporting
๐ ๏ธ Tech Stack
- Python: Pandas, NumPy, Scikit-learn, Streamlit
- SQL: Data extraction and filtering
- Visualization: Power BI, Matplotlib, Seaborn, Plotly
- Deployment: Streamlit Cloud, GitHub, Git LFS
๐ Future Enhancements
- Integrate real-time document verification APIs
- Add explainability via SHAP for compliance decisions
- Enable user-uploaded KYC records for review simulation
- Expand dashboard to include fraud scoring and alerts
๐ค Author
Anesh Raj
๐ GitHub Profile