# SAM2 ID Segmenter

Lightweight wrapper and fine‑tuning scaffold around Meta's Segment Anything 2 (SAM2) adapted to segment structured regions in ID / document images (e.g. portrait, number field, security areas). The repository currently focuses on: (1) reproducible loading of a fine‑tuned SAM2 checkpoint, (2) automatic multi‑mask generation + tight cropping, and (3) configuration file driven training/inference settings.

> Status: Inference wrapper implemented (`SamSegmentator`). End‑to‑end training loop is a planned addition. Config already anticipates training hyper‑parameters.

---

## Contents
1. Motivation & Scope
2. Intended Use & Non‑Goals
3. Repository Structure
4. Configuration (`config.json`)
5. Installation
6. Inference Usage (`SamSegmentator`)
7. Dataset & Mask Format (planned training)
8. Checkpoints & Auto‑Download
9. Metrics (recommended)
10. Limitations & Risks
11. Roadmap
12. License & Citation

---

## 1. Motivation & Scope
Document / ID workflows often need fast class‑agnostic region extraction (for OCR, redaction, or downstream classifiers). SAM2 provides strong general mask proposals; this project wraps it to directly yield cropped image + mask pairs ordered by area and optionally padded.

## 2. Intended Use & Non‑Goals
Intended:
- Pre‑segmentation of ID / document fields prior to OCR.
- Selective anonymization / redaction pipelines (masking faces, MRZ, barcodes, etc.).
- Rapid prototyping for custom fine‑tuning of SAM2 on a small set of document classes.

Non‑Goals:
- Biometric identity verification or authoritative fraud detection.
- Legal decision making without human review.
- Full multi‑modal extraction (text recognition is out of scope here).

## 3. Repository Structure
```
model_repo/
	config.json          # Central hyper‑parameter & path config
	README.md            # (this file)
checkpoints/           # Local downloaded / fine‑tuned checkpoints
samples/
	sample_us_passport.jpg
src/
	sam_segmentator.py   # Inference wrapper (SamSegmentator)
main.py                # Placeholder entry point
```
Planned: `train/` scripts for fine‑tuning (not yet implemented).

## 4. Configuration (`model_repo/config.json`)
Key fields (example values included in the repo):
- `model_type`: Always `sam2` here.
- `checkpoint_path`: Path relative to project root or absolute; if omitted and `auto_download=True` the code will attempt remote download.
- `image_size`: Target square size used during training (future). Inference wrapper accepts raw image size.
- `num_classes`, `class_names`: For supervised training (future); not required by the current automatic mask generator, but kept for consistency.
- `augmentation`, `loss`, `optimizer`, `lr_scheduler`: Reserved for training loop integration.
- `paths`: Expected dataset layout for training: `data/train/images`, `data/train/masks`, etc.
- `mixed_precision`: Will enable `torch.autocast` during training.

Even if not all fields are consumed now, keeping them centralized avoids future breaking refactors.

## 5. Installation

### Prerequisites
- Python 3.10+ (recommended)
- CUDA GPU (optional but recommended for speed)

### Using uv (preferred fast resolver)
If `pyproject.toml` is present (it is), you can do:
```
uv sync
```
This creates / updates the virtual environment and installs dependencies.

### Using pip (alternative)
```
python -m venv .venv
.venv\Scripts\activate
pip install -U pip
pip install -e .
```

If SAM2 is not a published package in your environment, you may need to install it from source (instructions will depend on the upstream SAM2 repository—add here when finalized).

## 6. Inference Usage (`SamSegmentator`)
Minimal example using the sample passport image:
```python
import cv2
from pathlib import Path
from src.sam_segmentator import SamSegmentator

image_path = Path("samples/sample_us_passport.jpg")
img_bgr = cv2.imread(str(image_path))  # BGR (OpenCV)

segmentator = SamSegmentator(
		checkpoint_path="checkpoints/sam2.1_hiera_base_plus_ft_ids.pt",  # or None to auto-download if configured
		pred_iou_thresh=0.88,  # forwarded to SAM2AutomaticMaskGenerator
		stability_score_thresh=0.90,
)

segments = segmentator.infer(img_bgr, pad_percent=0.05)
print(f"Total segments: {len(segments)}")

# Each segment is (crop_bgr, mask_255)
for i, (crop, mask) in enumerate(segments[:3]):
		cv2.imwrite(f"outputs/segment_{i}_crop.png", crop)
		cv2.imwrite(f"outputs/segment_{i}_mask.png", mask)
```
Output: pairs of tightly cropped images and their binary masks (0 background, 255 foreground), sorted by mask area descending.

### Parameter Notes
- `pad_percent`: Relative padding (default 5%) added around each tight bounding box.
- Deprecated `pad` (absolute pixels) still accepted but will warn.
- All additional kwargs go to `SAM2AutomaticMaskGenerator` (e.g., `box_nms_thresh`, `min_mask_region_area`).

## 7. Dataset & Mask Format (For Future Training)
Expected layout (mirrors `paths` in config):
```
data/
	train/
		images/*.jpg|png
		masks/*.png        # Single‑channel, integer indices (0=background)
	val/
		images/
		masks/
```
Class index mapping (example):
```
class_names = ["ID1", "ID3", "IDCOVER"]
0 -> background
1 -> ID1
2 -> ID3
3 -> IDCOVER
```
Masks should use nearest‑neighbor safe compression (PNG). Avoid palette mismatch; explicit integer pixel values are recommended.

## 8. Checkpoints & Auto‑Download
`SamSegmentator` will:
1. Use provided `checkpoint_path` if it exists.
2. If none is provided and `auto_download=True`, download the default checkpoint to `checkpoints/` using an environment configured URL (`SAM2_CHECKPOINT_URL`).
3. (Optional) Validate SHA256 if `SAM2_CHECKPOINT_SHA256` is set.

Environment variables:
```
SAM2_CHECKPOINT_URL=<direct_download_url>
SAM2_CHECKPOINT_SHA256=<hex>
SAM2_CHECKPOINT_DIR=checkpoints
```


## 9. Metrics (Recommended When Training Added)
- Mean IoU (per class & macro average)
- Dice coefficient
- Pixel accuracy
- Class frequency distribution (to inform potential class weighting)
Store per‑epoch metrics as JSON for reproducibility.

## 10. Limitations & Risks
Technical:
- Current version does not include a fine‑tuning script; only inference wrapper.
- Automatic mask generator is class‑agnostic; without fine‑tuning it may over‑segment or miss tiny fields.

Ethical / Compliance:
- Processing ID documents may involve PII; ensure secure storage and compliant handling.
- Not intended for biometric decisions nor identity verification pipelines without human oversight.

## 11. Roadmap
- [ ] Add training script (supervised fine‑tuning using `config.json`).
- [ ] Optional class‑guided prompting (points / boxes) pipeline.
- [ ] Export to ONNX / TorchScript.
- [ ] CLI interface for batch folder inference.
- [ ] Lightweight web demo (Gradio / FastAPI).

## 12. License & Citation
Specify a license in a top‑level `LICENSE` file (e.g., MIT or Apache‑2.0) ensuring compatibility with SAM2's original license.

Please cite SAM / SAM2 in academic work. Example (placeholder):
```
@article{kirillov2023segmentanything,
	title={Segment Anything},
	author={Kirillov, Alexander and others},
	journal={arXiv preprint arXiv:2304.02643},
	year={2023}
}
```
Add updated SAM2 citation once official reference is finalized.

## Acknowledgments
- Meta AI for releasing Segment Anything & SAM2.
- OpenCV, PyTorch, and the broader CV community.

---
If you have questions or need feature prioritization, open an Issue or start a Discussion.