MzansiScore Credit Risk Model
MzansiScore is a synthetic South African credit risk model pack for probability-of-default scoring, affordability-aware decisioning, and explainable API deployment.
Repository contents
This repository is expected to contain:
best_tree_model.joblibβ best-performing tree model selected during traininglogistic_regression.joblibβ interpretable regulatory baselinefeature_cols.joblibβ ordered feature columns used by the tree modellabel_encoders.joblibβ fitted encoders for categorical featuresmodel_meta.jsonβ metrics, governance metadata, fairness summaries, and rationale catalogREADME.mdβ this model card
Model summary
- Primary selected model artifact: XGBoost
- Published scoring artifact summary: XGBoost_cal
- Holdout AUC: 0.9243
- Holdout Gini: 0.8486
- Holdout Brier score: 0.0883
- Training rows: 133108
- Holdout rows: 33278
- Train default rate: 23.46%
- Holdout default rate: 23.46%
- Positive-class weighting: 3.26
Regulatory and governance context
- Geography: South Africa
- Training data: synthetic loan application and affordability data
- Policy overlay: nca_affordability_pass
- Policy description: Applicants failing affordability are declined outside the model.
- Proxy score features are excluded from regulated training by default
- Fairness summaries are recorded in
model_meta.json - Adverse-action rationale templates are recorded in
model_meta.json
Top feature drivers
| Feature | Importance |
|---|---|
| affordability_surplus | 0.7421 |
| combined_risk | 0.5698 |
| credit_score | 0.3508 |
| expense_understatement_flag | 0.3242 |
| worst_status_12m | 0.2179 |
| affordability_margin | 0.2073 |
| worst_status_3m | 0.1864 |
| debit_order_returns_3m | 0.1527 |
| affordability_surplus_ratio | 0.1526 |
| age | 0.1197 |
Intended use
Use this model pack for:
- Demo scoring APIs
- Explainable credit-risk dashboard prototypes
- Internal experimentation on calibrated probability-of-default workflows
Limitations
- The underlying data is synthetic and should not be treated as live production applicant data.
- This repository is not a substitute for formal model validation, governance approval, or legal review.
- Production credit decisions should include affordability, compliance, monitoring, and human oversight controls.
Example deployment flow
- Download artifacts from the HuggingFace model repository.
- Load
best_tree_model.joblib,feature_cols.joblib, andlabel_encoders.joblib. - Rebuild engineered features before inference.
- Apply policy rules alongside predicted default probability.
Training notes
The current training pipeline compares Logistic Regression, XGBoost, calibrated XGBoost, LightGBM, and calibrated LightGBM. Calibration is included because credit decisioning depends on reliable probability-of-default estimates, not only rank ordering.
Space using Pieter182/mzansiscore-credit-risk 1
Evaluation results
- roc_auc on Synthetic South African credit applicationsself-reported0.924
- gini on Synthetic South African credit applicationsself-reported0.849
- brier_score on Synthetic South African credit applicationsself-reported0.088