efficientnetb0-ct / README.md
ashbwell's picture
Update README.md
49cb8e3 verified
metadata
license: cc-by-nc-sa-4.0
library_name: keras
pipeline_tag: image-classification
language: en
tags:
  - medical-imaging
  - ct
  - lung-cancer
  - efficientnet-b0
  - transfer-learning
  - grad-cam
model-index:
  - name: EfficientNetB0 Lung CT Classifier (4-class)
    results:
      - task:
          type: image-classification
          name: Image Classification
        dataset:
          name: Hany Lung Cancer CT (derived; cleaned)
          type: custom
          split: test
        metrics:
          - type: accuracy
            value: TODO:0.XX
          - type: precision
            value: TODO:0.XX
          - type: recall
            value: TODO:0.XX
          - type: f1
            value: TODO:0.XX

Attribution

Original Source:

Hany H. (2020). Chest CT-Scan Images Dataset. Kaggle.
https://www.kaggle.com/datasets/hanyhossam/chest-ctscan-images-dataset

Original License:

Database: Open Database Commons Open Database License (ODbL v1.0)
https://opendatacommons.org/licenses/odbl/1-0/

Derived Dataset Author:

Ashley Blackwell (2025). Chest CT-Scan Images (Cleaned, Derived from Hany et al.). Hugging Face Datasets.
https://huggingface.co/datasets/ashleyblackwell/lung-ct-cleaned-hany


Cleaning & Preprocessing Summary

The original dataset was processed and curated to ensure consistency, quality, and reproducibility for use in deep-learning experiments (i.e.., the EfficientNet-B0 Lung CT Classifier).

Steps Performed

  1. Integrity Checks: Removed corrupted or unreadable .jpg and .png files.
  2. Resolution Standardization: Resized all images to 224 ร— 224 ร— 3 pixels.
  3. Color Normalization: Converted grayscale scans to RGB format.
  4. Class Organization: Verified folder structure for four diagnostic categories:
    • Adenocarcinoma
    • Large-Cell Carcinoma
    • Squamous-Cell Carcinoma
    • Normal
  5. Stratified Splits:
    • Train: 70%
    • Validation: 20%
    • Test: 10%
  6. Metadata File: Generated metadata.csv containing filename, class label, and original resolution for traceability.

Dataset Overview

Split Approx. Images Notes
Train ~TODO Stratified by class
Validation ~TODO For hyperparameter tuning
Test ~TODO Final evaluation set
Total ~TODO All cleaned and standardized

Intended Use

  • Purpose:
    Designed for research, coursework, and educational demonstrations in medical image classification, model interpretability (Grad-CAM), and reproducible machine learning pipelines.

  • Out of Scope:
    This dataset must not be used for clinical diagnosis, treatment decisions, or commercial medical software development.


Legal & License Information

License

This dataset is distributed under the Open Data Commons Open Database License (ODbL v1.0).
You are free to:

  • Share: Copy, distribute, and use the database.
  • Create: Produce works from the database.
  • Adapt: Modify, transform, and build upon the database.

Full legal text:
https://opendatacommons.org/licenses/odbl/1-0/


Intended Use

  • Purpose:
    Designed for research, coursework, and educational demonstrations in medical image classification, model interpretability (Grad-CAM), and reproducible machine learning pipelines.

Scope

  • Intended: Research, UMGC coursework, model-interpretability demos (Grad-CAM), benchmarking.

Out-of-scope: Clinical diagnosis, patient triage, or any safety-critical application.

  • Model Architecture
  • Backbone: EfficientNet-B0 (ImageNet-initialized, fine-tuned)
  • Input size: 224 ร— 224 ร— 3
  • Head: GlobalAveragePooling โ†’ Dropout (TODO: rate) โ†’ Dense(4, softmax)
  • Loss: Categorical Cross-Entropy
  • Optimizer: TODO (e.g., Adam, lr = 1e-4 with decay)
  • Epochs / Batch size: TODO
  • Class labels (index): 0: Adenocarcinoma 1: Large-Cell Carcinoma 2: Squamous-Cell Carcinoma 3: Normal

Data & Preprocessing

Source: Derived from Hany Lung Cancer CT Scan dataset (Kaggle). Corrupted and irregular-resolution images were removed and all remaining images standardized to 224ร—224. Split: Train/Val/Test = 70/20/10 (stratified). Transforms: Resize โ†’ RGB conversion โ†’ normalize to [0,1] or use preprocess_input. Artifacts logged: Confusion matrix, classification report, Grad-CAM overlays. Attribution: Credit original dataset per its license when sharing or publishing.


Evaluation

Test set size: TODO:N Metrics (macro): Accuracy, Precision, Recall, F1 Class Precision Recall F1 Support Adenocarcinoma TODO TODO TODO TODO Large-Cell TODO TODO TODO TODO Squamous TODO TODO TODO TODO Normal TODO TODO TODO TODO Macro Avg TODO TODO TODO N

Suggested Environment

tensorflow==2.15.0 keras==2.15.0 huggingface_hub>=0.23.0 numpy>=1.24


Explainability (Grad-CAM)

Last conv layer: top_conv for EfficientNet-B0. Tip: Use Grad-CAM to overlay heatmaps and validate that the model focuses on pathologically relevant regions.

Limitations, Bias & Ethical Considerations

Domain shift: CT protocols and scanners vary; may affect generalization.

Label noise: Community datasets can contain mislabels. Generalization: Model is not clinically validated. Mitigation: Use Grad-CAM audits and external validation before any applied use.


Training & Reproducibility

Hardware: TODO (e.g., NVIDIA T4 / A100 / local GPU). Training time: TODO Seed / Determinism: TODO Reproduction steps: TODO (link to notebook or script if available).

License

Model weights & code: CC BY-NC-SA 4.0 (non-commercial, share-alike, with attribution). Dataset (derived): Follow the original datasetโ€™s license terms and provide credit to the creator.

Citation

If you use this model, please cite: Blackwell, A. (2025). EfficientNet-B0 Lung CT Classifier (4-class) [Computer software]. Hugging Face. https://huggingface.co/TODO @software{blackwell2025lungct, author = {Blackwell, Ashley}, title = {EfficientNet-B0 Lung CT Classifier (4-class)}, year = {2025}, publisher = {Hugging Face}, url = {https://huggingface.co/TODO} } ๐Ÿ‘ฉโ€๐Ÿซ Maintainers Ashley Blackwell โ€” Questions and feedback welcome via the Hugging Face Discussions tab. ๐Ÿ—’ Changelog 2025-10-06: Initial public release (.keras weights), added model card, class map, and metric placeholders.


Citation

If you use this dataset, please cite both the original source and the derived version:

Original dataset:

Hany H. (2020). Chest CT-Scan Images Dataset. Kaggle.
https://www.kaggle.com/datasets/hanyhossam/chest-ctscan-images-dataset

Derived version:

Blackwell, A. (2025). Chest CT-Scan Images (Cleaned, Derived from Hany et al.) [Dataset]. Hugging Face.
https://huggingface.co/datasets/ashleyblackwell/lung-ct-cleaned-hany

@dataset{hany2020chestct,
  author    = {Hany, H.},
  title     = {Chest CT-Scan Images Dataset},
  year      = {2020},
  publisher = {Kaggle},
  url       = {https://www.kaggle.com/datasets/hanyhossam/chest-ctscan-images-dataset}
}

@dataset{blackwell2025lungctcleaned,
  author    = {Blackwell, Ashley},
  title     = {Chest CT-Scan Images (Cleaned, Derived from Hany et al.)},
  year      = {2025},
  publisher = {Hugging Face},
  url       = {https://huggingface.co/datasets/ashleyblackwell/lung-ct-cleaned-hany}
}

---

## How to Use (Load & Inference)
**Option A โ€” Download from the Hub**
- from huggingface_hub import hf_hub_download
import json, numpy as np, tensorflow as tf
from tensorflow.keras.preprocessing import image

REPO_ID = "TODO:your-username/efficientnetb0-lung-ct-4class"

model_path = hf_hub_download(repo_id=REPO_ID, filename="model.keras")
class_map_path = hf_hub_download(repo_id=REPO_ID, filename="class_map.json")

model = tf.keras.models.load_model(model_path, compile=False)
with open(class_map_path) as f:
    idx_to_label = json.load(f)

def preprocess(img_path):
    img = image.load_img(img_path, target_size=(224, 224))
    x = image.img_to_array(img)
    x = np.expand_dims(x, 0)
    x = x / 255.0  # or use tf.keras.applications.efficientnet.preprocess_input(x)
    return x

x = preprocess("path/to/ct_slice.png")
probs = model.predict(x, verbose=0)[0]
for i, p in enumerate(probs):
    print(f"{idx_to_label[str(i)]}: {p:.3f}")
print("Predicted:", idx_to_label[str(int(np.argmax(probs)))])
**Option B โ€” Snapshot Download (Local Folder)**
from huggingface_hub import snapshot_download
local_dir = snapshot_download(repo_id="TODO:your-username/efficientnetb0-lung-ct-4class")
# loads ./model.keras and ./class_map.json from local_dir

---