sam2_repo / README.md
Alejandro Pirola
Initial commit: SAM2 finetuned checkpoint + config
b1f8e6c
# SAM2 ID Segmenter
Lightweight wrapper and fine‑tuning scaffold around Meta's Segment Anything 2 (SAM2) adapted to segment structured regions in ID / document images (e.g. portrait, number field, security areas). The repository currently focuses on: (1) reproducible loading of a fine‑tuned SAM2 checkpoint, (2) automatic multi‑mask generation + tight cropping, and (3) configuration file driven training/inference settings.
> Status: Inference wrapper implemented (`SamSegmentator`). End‑to‑end training loop is a planned addition. Config already anticipates training hyper‑parameters.
---
## Contents
1. Motivation & Scope
2. Intended Use & Non‑Goals
3. Repository Structure
4. Configuration (`config.json`)
5. Installation
6. Inference Usage (`SamSegmentator`)
7. Dataset & Mask Format (planned training)
8. Checkpoints & Auto‑Download
9. Metrics (recommended)
10. Limitations & Risks
11. Roadmap
12. License & Citation
---
## 1. Motivation & Scope
Document / ID workflows often need fast class‑agnostic region extraction (for OCR, redaction, or downstream classifiers). SAM2 provides strong general mask proposals; this project wraps it to directly yield cropped image + mask pairs ordered by area and optionally padded.
## 2. Intended Use & Non‑Goals
Intended:
- Pre‑segmentation of ID / document fields prior to OCR.
- Selective anonymization / redaction pipelines (masking faces, MRZ, barcodes, etc.).
- Rapid prototyping for custom fine‑tuning of SAM2 on a small set of document classes.
Non‑Goals:
- Biometric identity verification or authoritative fraud detection.
- Legal decision making without human review.
- Full multi‑modal extraction (text recognition is out of scope here).
## 3. Repository Structure
```
model_repo/
config.json # Central hyper‑parameter & path config
README.md # (this file)
checkpoints/ # Local downloaded / fine‑tuned checkpoints
samples/
sample_us_passport.jpg
src/
sam_segmentator.py # Inference wrapper (SamSegmentator)
main.py # Placeholder entry point
```
Planned: `train/` scripts for fine‑tuning (not yet implemented).
## 4. Configuration (`model_repo/config.json`)
Key fields (example values included in the repo):
- `model_type`: Always `sam2` here.
- `checkpoint_path`: Path relative to project root or absolute; if omitted and `auto_download=True` the code will attempt remote download.
- `image_size`: Target square size used during training (future). Inference wrapper accepts raw image size.
- `num_classes`, `class_names`: For supervised training (future); not required by the current automatic mask generator, but kept for consistency.
- `augmentation`, `loss`, `optimizer`, `lr_scheduler`: Reserved for training loop integration.
- `paths`: Expected dataset layout for training: `data/train/images`, `data/train/masks`, etc.
- `mixed_precision`: Will enable `torch.autocast` during training.
Even if not all fields are consumed now, keeping them centralized avoids future breaking refactors.
## 5. Installation
### Prerequisites
- Python 3.10+ (recommended)
- CUDA GPU (optional but recommended for speed)
### Using uv (preferred fast resolver)
If `pyproject.toml` is present (it is), you can do:
```
uv sync
```
This creates / updates the virtual environment and installs dependencies.
### Using pip (alternative)
```
python -m venv .venv
.venv\Scripts\activate
pip install -U pip
pip install -e .
```
If SAM2 is not a published package in your environment, you may need to install it from source (instructions will depend on the upstream SAM2 repository—add here when finalized).
## 6. Inference Usage (`SamSegmentator`)
Minimal example using the sample passport image:
```python
import cv2
from pathlib import Path
from src.sam_segmentator import SamSegmentator
image_path = Path("samples/sample_us_passport.jpg")
img_bgr = cv2.imread(str(image_path)) # BGR (OpenCV)
segmentator = SamSegmentator(
checkpoint_path="checkpoints/sam2.1_hiera_base_plus_ft_ids.pt", # or None to auto-download if configured
pred_iou_thresh=0.88, # forwarded to SAM2AutomaticMaskGenerator
stability_score_thresh=0.90,
)
segments = segmentator.infer(img_bgr, pad_percent=0.05)
print(f"Total segments: {len(segments)}")
# Each segment is (crop_bgr, mask_255)
for i, (crop, mask) in enumerate(segments[:3]):
cv2.imwrite(f"outputs/segment_{i}_crop.png", crop)
cv2.imwrite(f"outputs/segment_{i}_mask.png", mask)
```
Output: pairs of tightly cropped images and their binary masks (0 background, 255 foreground), sorted by mask area descending.
### Parameter Notes
- `pad_percent`: Relative padding (default 5%) added around each tight bounding box.
- Deprecated `pad` (absolute pixels) still accepted but will warn.
- All additional kwargs go to `SAM2AutomaticMaskGenerator` (e.g., `box_nms_thresh`, `min_mask_region_area`).
## 7. Dataset & Mask Format (For Future Training)
Expected layout (mirrors `paths` in config):
```
data/
train/
images/*.jpg|png
masks/*.png # Single‑channel, integer indices (0=background)
val/
images/
masks/
```
Class index mapping (example):
```
class_names = ["ID1", "ID3", "IDCOVER"]
0 -> background
1 -> ID1
2 -> ID3
3 -> IDCOVER
```
Masks should use nearest‑neighbor safe compression (PNG). Avoid palette mismatch; explicit integer pixel values are recommended.
## 8. Checkpoints & Auto‑Download
`SamSegmentator` will:
1. Use provided `checkpoint_path` if it exists.
2. If none is provided and `auto_download=True`, download the default checkpoint to `checkpoints/` using an environment configured URL (`SAM2_CHECKPOINT_URL`).
3. (Optional) Validate SHA256 if `SAM2_CHECKPOINT_SHA256` is set.
Environment variables:
```
SAM2_CHECKPOINT_URL=<direct_download_url>
SAM2_CHECKPOINT_SHA256=<hex>
SAM2_CHECKPOINT_DIR=checkpoints
```
## 9. Metrics (Recommended When Training Added)
- Mean IoU (per class & macro average)
- Dice coefficient
- Pixel accuracy
- Class frequency distribution (to inform potential class weighting)
Store per‑epoch metrics as JSON for reproducibility.
## 10. Limitations & Risks
Technical:
- Current version does not include a fine‑tuning script; only inference wrapper.
- Automatic mask generator is class‑agnostic; without fine‑tuning it may over‑segment or miss tiny fields.
Ethical / Compliance:
- Processing ID documents may involve PII; ensure secure storage and compliant handling.
- Not intended for biometric decisions nor identity verification pipelines without human oversight.
## 11. Roadmap
- [ ] Add training script (supervised fine‑tuning using `config.json`).
- [ ] Optional class‑guided prompting (points / boxes) pipeline.
- [ ] Export to ONNX / TorchScript.
- [ ] CLI interface for batch folder inference.
- [ ] Lightweight web demo (Gradio / FastAPI).
## 12. License & Citation
Specify a license in a top‑level `LICENSE` file (e.g., MIT or Apache‑2.0) ensuring compatibility with SAM2's original license.
Please cite SAM / SAM2 in academic work. Example (placeholder):
```
@article{kirillov2023segmentanything,
title={Segment Anything},
author={Kirillov, Alexander and others},
journal={arXiv preprint arXiv:2304.02643},
year={2023}
}
```
Add updated SAM2 citation once official reference is finalized.
## Acknowledgments
- Meta AI for releasing Segment Anything & SAM2.
- OpenCV, PyTorch, and the broader CV community.
---
If you have questions or need feature prioritization, open an Issue or start a Discussion.