MzansiScore Credit Risk Model

MzansiScore is a synthetic South African credit risk model pack for probability-of-default scoring, affordability-aware decisioning, and explainable API deployment.

Repository contents

This repository is expected to contain:

  • best_tree_model.joblib β€” best-performing tree model selected during training
  • logistic_regression.joblib β€” interpretable regulatory baseline
  • feature_cols.joblib β€” ordered feature columns used by the tree model
  • label_encoders.joblib β€” fitted encoders for categorical features
  • model_meta.json β€” metrics, governance metadata, fairness summaries, and rationale catalog
  • README.md β€” this model card

Model summary

  • Primary selected model artifact: XGBoost
  • Published scoring artifact summary: XGBoost_cal
  • Holdout AUC: 0.9243
  • Holdout Gini: 0.8486
  • Holdout Brier score: 0.0883
  • Training rows: 133108
  • Holdout rows: 33278
  • Train default rate: 23.46%
  • Holdout default rate: 23.46%
  • Positive-class weighting: 3.26

Regulatory and governance context

  • Geography: South Africa
  • Training data: synthetic loan application and affordability data
  • Policy overlay: nca_affordability_pass
  • Policy description: Applicants failing affordability are declined outside the model.
  • Proxy score features are excluded from regulated training by default
  • Fairness summaries are recorded in model_meta.json
  • Adverse-action rationale templates are recorded in model_meta.json

Top feature drivers

Feature Importance
affordability_surplus 0.7421
combined_risk 0.5698
credit_score 0.3508
expense_understatement_flag 0.3242
worst_status_12m 0.2179
affordability_margin 0.2073
worst_status_3m 0.1864
debit_order_returns_3m 0.1527
affordability_surplus_ratio 0.1526
age 0.1197

Intended use

Use this model pack for:

  1. Demo scoring APIs
  2. Explainable credit-risk dashboard prototypes
  3. Internal experimentation on calibrated probability-of-default workflows

Limitations

  • The underlying data is synthetic and should not be treated as live production applicant data.
  • This repository is not a substitute for formal model validation, governance approval, or legal review.
  • Production credit decisions should include affordability, compliance, monitoring, and human oversight controls.

Example deployment flow

  1. Download artifacts from the HuggingFace model repository.
  2. Load best_tree_model.joblib, feature_cols.joblib, and label_encoders.joblib.
  3. Rebuild engineered features before inference.
  4. Apply policy rules alongside predicted default probability.

Training notes

The current training pipeline compares Logistic Regression, XGBoost, calibrated XGBoost, LightGBM, and calibrated LightGBM. Calibration is included because credit decisioning depends on reliable probability-of-default estimates, not only rank ordering.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Space using Pieter182/mzansiscore-credit-risk 1

Evaluation results

  • roc_auc on Synthetic South African credit applications
    self-reported
    0.924
  • gini on Synthetic South African credit applications
    self-reported
    0.849
  • brier_score on Synthetic South African credit applications
    self-reported
    0.088