T2.1 β Compressed Crop Disease Classifier (INT8 ONNX, 4.34 MB)
A MobileNetV3-Small classifier compressed to 4.34 MB INT8 ONNX, trained for the AIMS KTT Hackathon Tier 2 brief (T2.1). It takes a 224Γ224 JPEG leaf image and returns one of five labels:
bean_spotβ bean angular leaf spotcassava_mosaicβ Cassava Mosaic Disease (CMD)healthyβ healthy maize leafmaize_blightβ maize Northern Leaf Blightmaize_rustβ maize common rust
Intended for low-bandwidth, edge-device deployment in rural agricultural contexts. It ships with a FastAPI/ONNX Runtime service and a USSD/SMS fallback pathway for farmers on feature phones.
GitHub: DrUkachi/ktt-crop-disease-classifier
Evaluation
| Split | Macro-F1 | Notes |
|---|---|---|
| Clean test (150 imgs) | 1.0000 | balanced 30 per class |
| Field-noisy test (150 imgs) | 0.9867 | same images, blur Ο β [0, 1.5] + JPEG q β [50, 85] + brightness jitter |
| Ξ clean β field | 1.33 pp | brief budget: < 12 pp β |
| INT8 vs FP32 delta | 0.00 pp | MatMul/Gemm-only INT8 is lossless on this backbone |
Per-class confusion matrices and Grad-CAM overlays are in notebooks/01_train_eval.ipynb.
Honest caveat on clean F1 = 1.00. PlantVillage (the source for the three maize classes) is a studio-lit dataset with consistent per-class backgrounds, and the five labels span three plant species with very different leaf morphology. ImageNet-pretrained features separate those distributions trivially. The more meaningful number is the 1.33 pp drop on the field-noisy set, which measures generalisation under blur, JPEG re-compression, and brightness jitter.
Model details
- Architecture: MobileNetV3-Small, ImageNet pretrained, classifier head replaced with a
Linear(576 β 1024 β 5)stack - Input: 224 Γ 224 Γ 3 RGB, ImageNet mean/std normalization
- Output: 5 logits in this fixed class ordering:
bean_spot,cassava_mosaic,healthy,maize_blight,maize_rust - Quantization: ONNX Runtime dynamic INT8 on MatMul/Gemm nodes only (the classifier head), preceded by
quant_pre_process(BN fusion, shape inference). The convolutional backbone stays FP32. - Why not full-graph INT8: MobileNetV3βs Hardswish activations and Squeeze-and-Excitation blocks regress badly under ORT static INT8 (clean F1 β 0.73) and collapse entirely under full-graph dynamic INT8 (clean F1 β 0.07, always-one-class). QAT would likely fix this, but it was out of scope for the 4-hour brief cap. Full empirical details are in
process_log.md. - Inference: CPU-only via ONNX Runtime (
CPUExecutionProvider), with observed latency of ~3β5 ms per image
Training
- Hardware: NVIDIA L4 (23 GB)
- Run time: full 15-epoch training took 40.2 seconds
- Optimiser: AdamW, LR 5e-4, weight decay 1e-4, cosine annealing over 15 epochs
- Loss: class-weighted cross-entropy
- Batch size: 64
- Train-time augmentation: horizontal flip, Β±10Β° rotation, mild colour jitter (brightness/contrast 0.2, saturation 0.1)
- Best epoch: 2
Blur and JPEG re-compression were deliberately excluded from training so the clean β field gap remains an honest robustness check.
Training data
Assembled by generate_dataset.py from three public Hugging Face dataset mirrors:
| Class | HF dataset | Label |
|---|---|---|
bean_spot |
AI-Lab-Makerere/beans |
idx 0 angular_leaf_spot |
cassava_mosaic |
dpdl-benchmark/cassava |
3 CMD |
healthy |
BrandonFors/Plant-Diseases-PlantVillage-Dataset |
idx 10 Corn_(maize)___healthy |
maize_blight |
same | idx 9 Corn_(maize)___Northern_Leaf_Blight |
maize_rust |
same | idx 8 Corn_(maize)___Common_rust_ |
There are 300 images per class, with an 80/10/10 train/val/test split using seed 1337. Full provenance (per-image source IDs) is recorded in data/manifest.json after the generator runs.
Usage
With ONNX Runtime directly
import numpy as np
import onnxruntime as ort
from PIL import Image
CLASSES = ["bean_spot", "cassava_mosaic", "healthy", "maize_blight", "maize_rust"]
MEAN = np.array([0.485, 0.456, 0.406], dtype=np.float32)
STD = np.array([0.229, 0.224, 0.225], dtype=np.float32)
sess = ort.InferenceSession("model.onnx", providers=["CPUExecutionProvider"])
img = Image.open("maize_rust.jpg").convert("RGB").resize((224, 224))
arr = (np.asarray(img, dtype=np.float32) / 255.0 - MEAN) / STD
arr = arr.transpose(2, 0, 1)[None, ...].astype(np.float32)
logits = sess.run(None, {sess.get_inputs()[0].name: arr})[0][0]
print(CLASSES[int(logits.argmax())])
As a FastAPI service
git clone https://github.com/DrUkachi/ktt-crop-disease-classifier.git
cd ktt-crop-disease-classifier
pip install -r service/requirements.txt
uvicorn service.app:app --host 0.0.0.0 --port 8000
curl -X POST -F 'image=@samples/maize_rust_1.jpg' http://localhost:8000/predict
The service returns { label, confidence, top3, latency_ms, rationale } and adds escalation: "second_photo_different_angle" when confidence < 0.6.
Limitations and intended use
- Trained on ~1,200 studio-lit and smartphone-quality images
- Performance on microscope, UV, or non-leaf substrate images is not characterised
- The five classes do not cover all realistic field scenarios
- The service exposes
top3and anescalationfield so the consuming PWA can route low-confidence cases to a human extension officer - Training data provenance is inherited from the upstream Hugging Face mirrors
- The model card does not evaluate fairness across cultivars, soil types, or geographies
License
MIT, matching the GitHub repo.
Citation
@misc{osisiogu2026ktt,
author = {Osisiogu, Ukachi},
title = {Compressed Crop Disease Classifier (AIMS KTT T2.1)},
year = {2026},
howpublished = {\url{https://github.com/DrUkachi/ktt-crop-disease-classifier}},
}
Upstream dataset credits: PlantVillage (Mohanty et al. 2016), Cassava Leaf Disease (Mwebaze et al. 2019, Kaggle 2020), and iBeans (Makerere AI Lab 2020).
Datasets used to train DrUkachi/ktt-crop-disease-classifier
Evaluation results
- macro-F1 (clean test) on T2.1 synthetic recipe (PlantVillage + Cassava + iBeans, 300/class)self-reported1.000
- macro-F1 (field-noisy test) on T2.1 synthetic recipe (PlantVillage + Cassava + iBeans, 300/class)self-reported0.987