MzansiScore Credit Risk Model

MzansiScore is a synthetic South African credit risk model pack for probability-of-default scoring, affordability-aware decisioning, and explainable API deployment.

Repository contents

This repository is expected to contain:

best_tree_model.joblib — best-performing tree model selected during training
logistic_regression.joblib — interpretable regulatory baseline
feature_cols.joblib — ordered feature columns used by the tree model
label_encoders.joblib — fitted encoders for categorical features
model_meta.json — metrics, governance metadata, fairness summaries, and rationale catalog
README.md — this model card

Model summary

Primary selected model artifact: XGBoost
Published scoring artifact summary: XGBoost_cal
Holdout AUC: 0.9243
Holdout Gini: 0.8486
Holdout Brier score: 0.0883
Training rows: 133108
Holdout rows: 33278
Train default rate: 23.46%
Holdout default rate: 23.46%
Positive-class weighting: 3.26

Regulatory and governance context

Geography: South Africa
Training data: synthetic loan application and affordability data
Policy overlay: nca_affordability_pass
Policy description: Applicants failing affordability are declined outside the model.
Proxy score features are excluded from regulated training by default
Fairness summaries are recorded in model_meta.json
Adverse-action rationale templates are recorded in model_meta.json

Top feature drivers

Feature	Importance
affordability_surplus	0.7421
combined_risk	0.5698
credit_score	0.3508
expense_understatement_flag	0.3242
worst_status_12m	0.2179
affordability_margin	0.2073
worst_status_3m	0.1864
debit_order_returns_3m	0.1527
affordability_surplus_ratio	0.1526
age	0.1197

Intended use

Use this model pack for:

Demo scoring APIs
Explainable credit-risk dashboard prototypes
Internal experimentation on calibrated probability-of-default workflows

Limitations

The underlying data is synthetic and should not be treated as live production applicant data.
This repository is not a substitute for formal model validation, governance approval, or legal review.
Production credit decisions should include affordability, compliance, monitoring, and human oversight controls.

Example deployment flow

Download artifacts from the HuggingFace model repository.
Load best_tree_model.joblib, feature_cols.joblib, and label_encoders.joblib.
Rebuild engineered features before inference.
Apply policy rules alongside predicted default probability.

Training notes

The current training pipeline compares Logistic Regression, XGBoost, calibrated XGBoost, LightGBM, and calibrated LightGBM. Calibration is included because credit decisioning depends on reliable probability-of-default estimates, not only rank ordering.

Downloads last month: -; Downloads are not tracked for this model. How to track

Space using Pieter182/mzansiscore-credit-risk 1

Evaluation results

roc_auc on Synthetic South African credit applications
self-reported

0.924
gini on Synthetic South African credit applications
self-reported

0.849
brier_score on Synthetic South African credit applications
self-reported

0.088