Alejandro Pirola

Initial commit: SAM2 finetuned checkpoint + config

b1f8e6c 5 months ago

7.5 kB

	# SAM2 ID Segmenter

	Lightweight wrapper and fine‑tuning scaffold around Meta's Segment Anything 2 (SAM2) adapted to segment structured regions in ID / document images (e.g. portrait, number field, security areas). The repository currently focuses on: (1) reproducible loading of a fine‑tuned SAM2 checkpoint, (2) automatic multi‑mask generation + tight cropping, and (3) configuration file driven training/inference settings.

	> Status: Inference wrapper implemented (`SamSegmentator`). End‑to‑end training loop is a planned addition. Config already anticipates training hyper‑parameters.

	---

	## Contents
	1. Motivation & Scope
	2. Intended Use & Non‑Goals
	3. Repository Structure
	4. Configuration (`config.json`)
	5. Installation
	6. Inference Usage (`SamSegmentator`)
	7. Dataset & Mask Format (planned training)
	8. Checkpoints & Auto‑Download
	9. Metrics (recommended)
	10. Limitations & Risks
	11. Roadmap
	12. License & Citation

	---

	## 1. Motivation & Scope
	Document / ID workflows often need fast class‑agnostic region extraction (for OCR, redaction, or downstream classifiers). SAM2 provides strong general mask proposals; this project wraps it to directly yield cropped image + mask pairs ordered by area and optionally padded.

	## 2. Intended Use & Non‑Goals
	Intended:
	- Pre‑segmentation of ID / document fields prior to OCR.
	- Selective anonymization / redaction pipelines (masking faces, MRZ, barcodes, etc.).
	- Rapid prototyping for custom fine‑tuning of SAM2 on a small set of document classes.

	Non‑Goals:
	- Biometric identity verification or authoritative fraud detection.
	- Legal decision making without human review.
	- Full multi‑modal extraction (text recognition is out of scope here).

	## 3. Repository Structure
	```
	model_repo/
	config.json # Central hyper‑parameter & path config
	README.md # (this file)
	checkpoints/ # Local downloaded / fine‑tuned checkpoints
	samples/
	sample_us_passport.jpg
	src/
	sam_segmentator.py # Inference wrapper (SamSegmentator)
	main.py # Placeholder entry point
	```
	Planned: `train/` scripts for fine‑tuning (not yet implemented).

	## 4. Configuration (`model_repo/config.json`)
	Key fields (example values included in the repo):
	- `model_type`: Always `sam2` here.
	- `checkpoint_path`: Path relative to project root or absolute; if omitted and `auto_download=True` the code will attempt remote download.
	- `image_size`: Target square size used during training (future). Inference wrapper accepts raw image size.
	- `num_classes`, `class_names`: For supervised training (future); not required by the current automatic mask generator, but kept for consistency.
	- `augmentation`, `loss`, `optimizer`, `lr_scheduler`: Reserved for training loop integration.
	- `paths`: Expected dataset layout for training: `data/train/images`, `data/train/masks`, etc.
	- `mixed_precision`: Will enable `torch.autocast` during training.

	Even if not all fields are consumed now, keeping them centralized avoids future breaking refactors.

	## 5. Installation

	### Prerequisites
	- Python 3.10+ (recommended)
	- CUDA GPU (optional but recommended for speed)

	### Using uv (preferred fast resolver)
	If `pyproject.toml` is present (it is), you can do:
	```
	uv sync
	```
	This creates / updates the virtual environment and installs dependencies.

	### Using pip (alternative)
	```
	python -m venv .venv
	.venv\Scripts\activate
	pip install -U pip
	pip install -e .
	```

	If SAM2 is not a published package in your environment, you may need to install it from source (instructions will depend on the upstream SAM2 repository—add here when finalized).

	## 6. Inference Usage (`SamSegmentator`)
	Minimal example using the sample passport image:
	```python
	import cv2
	from pathlib import Path
	from src.sam_segmentator import SamSegmentator

	image_path = Path("samples/sample_us_passport.jpg")
	img_bgr = cv2.imread(str(image_path)) # BGR (OpenCV)

	segmentator = SamSegmentator(
	checkpoint_path="checkpoints/sam2.1_hiera_base_plus_ft_ids.pt", # or None to auto-download if configured
	pred_iou_thresh=0.88, # forwarded to SAM2AutomaticMaskGenerator
	stability_score_thresh=0.90,
	)

	segments = segmentator.infer(img_bgr, pad_percent=0.05)
	print(f"Total segments: {len(segments)}")

	# Each segment is (crop_bgr, mask_255)
	for i, (crop, mask) in enumerate(segments[:3]):
	cv2.imwrite(f"outputs/segment_{i}_crop.png", crop)
	cv2.imwrite(f"outputs/segment_{i}_mask.png", mask)
	```
	Output: pairs of tightly cropped images and their binary masks (0 background, 255 foreground), sorted by mask area descending.

	### Parameter Notes
	- `pad_percent`: Relative padding (default 5%) added around each tight bounding box.
	- Deprecated `pad` (absolute pixels) still accepted but will warn.
	- All additional kwargs go to `SAM2AutomaticMaskGenerator` (e.g., `box_nms_thresh`, `min_mask_region_area`).

	## 7. Dataset & Mask Format (For Future Training)
	Expected layout (mirrors `paths` in config):
	```
	data/
	train/
	images/*.jpg\|png
	masks/*.png # Single‑channel, integer indices (0=background)
	val/
	images/
	masks/
	```
	Class index mapping (example):
	```
	class_names = ["ID1", "ID3", "IDCOVER"]
	0 -> background
	1 -> ID1
	2 -> ID3
	3 -> IDCOVER
	```
	Masks should use nearest‑neighbor safe compression (PNG). Avoid palette mismatch; explicit integer pixel values are recommended.

	## 8. Checkpoints & Auto‑Download
	`SamSegmentator` will:
	1. Use provided `checkpoint_path` if it exists.
	2. If none is provided and `auto_download=True`, download the default checkpoint to `checkpoints/` using an environment configured URL (`SAM2_CHECKPOINT_URL`).
	3. (Optional) Validate SHA256 if `SAM2_CHECKPOINT_SHA256` is set.

	Environment variables:
	```
	SAM2_CHECKPOINT_URL=<direct_download_url>
	SAM2_CHECKPOINT_SHA256=<hex>
	SAM2_CHECKPOINT_DIR=checkpoints
	```


	## 9. Metrics (Recommended When Training Added)
	- Mean IoU (per class & macro average)
	- Dice coefficient
	- Pixel accuracy
	- Class frequency distribution (to inform potential class weighting)
	Store per‑epoch metrics as JSON for reproducibility.

	## 10. Limitations & Risks
	Technical:
	- Current version does not include a fine‑tuning script; only inference wrapper.
	- Automatic mask generator is class‑agnostic; without fine‑tuning it may over‑segment or miss tiny fields.

	Ethical / Compliance:
	- Processing ID documents may involve PII; ensure secure storage and compliant handling.
	- Not intended for biometric decisions nor identity verification pipelines without human oversight.

	## 11. Roadmap
	- [ ] Add training script (supervised fine‑tuning using `config.json`).
	- [ ] Optional class‑guided prompting (points / boxes) pipeline.
	- [ ] Export to ONNX / TorchScript.
	- [ ] CLI interface for batch folder inference.
	- [ ] Lightweight web demo (Gradio / FastAPI).

	## 12. License & Citation
	Specify a license in a top‑level `LICENSE` file (e.g., MIT or Apache‑2.0) ensuring compatibility with SAM2's original license.

	Please cite SAM / SAM2 in academic work. Example (placeholder):
	```
	@article{kirillov2023segmentanything,
	title={Segment Anything},
	author={Kirillov, Alexander and others},
	journal={arXiv preprint arXiv:2304.02643},
	year={2023}
	}
	```
	Add updated SAM2 citation once official reference is finalized.

	## Acknowledgments
	- Meta AI for releasing Segment Anything & SAM2.
	- OpenCV, PyTorch, and the broader CV community.

	---
	If you have questions or need feature prioritization, open an Issue or start a Discussion.