# SAM2 ID Segmenter Lightweight wrapper and fine‑tuning scaffold around Meta's Segment Anything 2 (SAM2) adapted to segment structured regions in ID / document images (e.g. portrait, number field, security areas). The repository currently focuses on: (1) reproducible loading of a fine‑tuned SAM2 checkpoint, (2) automatic multi‑mask generation + tight cropping, and (3) configuration file driven training/inference settings. > Status: Inference wrapper implemented (`SamSegmentator`). End‑to‑end training loop is a planned addition. Config already anticipates training hyper‑parameters. --- ## Contents 1. Motivation & Scope 2. Intended Use & Non‑Goals 3. Repository Structure 4. Configuration (`config.json`) 5. Installation 6. Inference Usage (`SamSegmentator`) 7. Dataset & Mask Format (planned training) 8. Checkpoints & Auto‑Download 9. Metrics (recommended) 10. Limitations & Risks 11. Roadmap 12. License & Citation --- ## 1. Motivation & Scope Document / ID workflows often need fast class‑agnostic region extraction (for OCR, redaction, or downstream classifiers). SAM2 provides strong general mask proposals; this project wraps it to directly yield cropped image + mask pairs ordered by area and optionally padded. ## 2. Intended Use & Non‑Goals Intended: - Pre‑segmentation of ID / document fields prior to OCR. - Selective anonymization / redaction pipelines (masking faces, MRZ, barcodes, etc.). - Rapid prototyping for custom fine‑tuning of SAM2 on a small set of document classes. Non‑Goals: - Biometric identity verification or authoritative fraud detection. - Legal decision making without human review. - Full multi‑modal extraction (text recognition is out of scope here). ## 3. Repository Structure ``` model_repo/ config.json # Central hyper‑parameter & path config README.md # (this file) checkpoints/ # Local downloaded / fine‑tuned checkpoints samples/ sample_us_passport.jpg src/ sam_segmentator.py # Inference wrapper (SamSegmentator) main.py # Placeholder entry point ``` Planned: `train/` scripts for fine‑tuning (not yet implemented). ## 4. Configuration (`model_repo/config.json`) Key fields (example values included in the repo): - `model_type`: Always `sam2` here. - `checkpoint_path`: Path relative to project root or absolute; if omitted and `auto_download=True` the code will attempt remote download. - `image_size`: Target square size used during training (future). Inference wrapper accepts raw image size. - `num_classes`, `class_names`: For supervised training (future); not required by the current automatic mask generator, but kept for consistency. - `augmentation`, `loss`, `optimizer`, `lr_scheduler`: Reserved for training loop integration. - `paths`: Expected dataset layout for training: `data/train/images`, `data/train/masks`, etc. - `mixed_precision`: Will enable `torch.autocast` during training. Even if not all fields are consumed now, keeping them centralized avoids future breaking refactors. ## 5. Installation ### Prerequisites - Python 3.10+ (recommended) - CUDA GPU (optional but recommended for speed) ### Using uv (preferred fast resolver) If `pyproject.toml` is present (it is), you can do: ``` uv sync ``` This creates / updates the virtual environment and installs dependencies. ### Using pip (alternative) ``` python -m venv .venv .venv\Scripts\activate pip install -U pip pip install -e . ``` If SAM2 is not a published package in your environment, you may need to install it from source (instructions will depend on the upstream SAM2 repository—add here when finalized). ## 6. Inference Usage (`SamSegmentator`) Minimal example using the sample passport image: ```python import cv2 from pathlib import Path from src.sam_segmentator import SamSegmentator image_path = Path("samples/sample_us_passport.jpg") img_bgr = cv2.imread(str(image_path)) # BGR (OpenCV) segmentator = SamSegmentator( checkpoint_path="checkpoints/sam2.1_hiera_base_plus_ft_ids.pt", # or None to auto-download if configured pred_iou_thresh=0.88, # forwarded to SAM2AutomaticMaskGenerator stability_score_thresh=0.90, ) segments = segmentator.infer(img_bgr, pad_percent=0.05) print(f"Total segments: {len(segments)}") # Each segment is (crop_bgr, mask_255) for i, (crop, mask) in enumerate(segments[:3]): cv2.imwrite(f"outputs/segment_{i}_crop.png", crop) cv2.imwrite(f"outputs/segment_{i}_mask.png", mask) ``` Output: pairs of tightly cropped images and their binary masks (0 background, 255 foreground), sorted by mask area descending. ### Parameter Notes - `pad_percent`: Relative padding (default 5%) added around each tight bounding box. - Deprecated `pad` (absolute pixels) still accepted but will warn. - All additional kwargs go to `SAM2AutomaticMaskGenerator` (e.g., `box_nms_thresh`, `min_mask_region_area`). ## 7. Dataset & Mask Format (For Future Training) Expected layout (mirrors `paths` in config): ``` data/ train/ images/*.jpg|png masks/*.png # Single‑channel, integer indices (0=background) val/ images/ masks/ ``` Class index mapping (example): ``` class_names = ["ID1", "ID3", "IDCOVER"] 0 -> background 1 -> ID1 2 -> ID3 3 -> IDCOVER ``` Masks should use nearest‑neighbor safe compression (PNG). Avoid palette mismatch; explicit integer pixel values are recommended. ## 8. Checkpoints & Auto‑Download `SamSegmentator` will: 1. Use provided `checkpoint_path` if it exists. 2. If none is provided and `auto_download=True`, download the default checkpoint to `checkpoints/` using an environment configured URL (`SAM2_CHECKPOINT_URL`). 3. (Optional) Validate SHA256 if `SAM2_CHECKPOINT_SHA256` is set. Environment variables: ``` SAM2_CHECKPOINT_URL= SAM2_CHECKPOINT_SHA256= SAM2_CHECKPOINT_DIR=checkpoints ``` ## 9. Metrics (Recommended When Training Added) - Mean IoU (per class & macro average) - Dice coefficient - Pixel accuracy - Class frequency distribution (to inform potential class weighting) Store per‑epoch metrics as JSON for reproducibility. ## 10. Limitations & Risks Technical: - Current version does not include a fine‑tuning script; only inference wrapper. - Automatic mask generator is class‑agnostic; without fine‑tuning it may over‑segment or miss tiny fields. Ethical / Compliance: - Processing ID documents may involve PII; ensure secure storage and compliant handling. - Not intended for biometric decisions nor identity verification pipelines without human oversight. ## 11. Roadmap - [ ] Add training script (supervised fine‑tuning using `config.json`). - [ ] Optional class‑guided prompting (points / boxes) pipeline. - [ ] Export to ONNX / TorchScript. - [ ] CLI interface for batch folder inference. - [ ] Lightweight web demo (Gradio / FastAPI). ## 12. License & Citation Specify a license in a top‑level `LICENSE` file (e.g., MIT or Apache‑2.0) ensuring compatibility with SAM2's original license. Please cite SAM / SAM2 in academic work. Example (placeholder): ``` @article{kirillov2023segmentanything, title={Segment Anything}, author={Kirillov, Alexander and others}, journal={arXiv preprint arXiv:2304.02643}, year={2023} } ``` Add updated SAM2 citation once official reference is finalized. ## Acknowledgments - Meta AI for releasing Segment Anything & SAM2. - OpenCV, PyTorch, and the broader CV community. --- If you have questions or need feature prioritization, open an Issue or start a Discussion.