| # SAM2 ID Segmenter | |
| Lightweight wrapper and fine‑tuning scaffold around Meta's Segment Anything 2 (SAM2) adapted to segment structured regions in ID / document images (e.g. portrait, number field, security areas). The repository currently focuses on: (1) reproducible loading of a fine‑tuned SAM2 checkpoint, (2) automatic multi‑mask generation + tight cropping, and (3) configuration file driven training/inference settings. | |
| > Status: Inference wrapper implemented (`SamSegmentator`). End‑to‑end training loop is a planned addition. Config already anticipates training hyper‑parameters. | |
| --- | |
| ## Contents | |
| 1. Motivation & Scope | |
| 2. Intended Use & Non‑Goals | |
| 3. Repository Structure | |
| 4. Configuration (`config.json`) | |
| 5. Installation | |
| 6. Inference Usage (`SamSegmentator`) | |
| 7. Dataset & Mask Format (planned training) | |
| 8. Checkpoints & Auto‑Download | |
| 9. Metrics (recommended) | |
| 10. Limitations & Risks | |
| 11. Roadmap | |
| 12. License & Citation | |
| --- | |
| ## 1. Motivation & Scope | |
| Document / ID workflows often need fast class‑agnostic region extraction (for OCR, redaction, or downstream classifiers). SAM2 provides strong general mask proposals; this project wraps it to directly yield cropped image + mask pairs ordered by area and optionally padded. | |
| ## 2. Intended Use & Non‑Goals | |
| Intended: | |
| - Pre‑segmentation of ID / document fields prior to OCR. | |
| - Selective anonymization / redaction pipelines (masking faces, MRZ, barcodes, etc.). | |
| - Rapid prototyping for custom fine‑tuning of SAM2 on a small set of document classes. | |
| Non‑Goals: | |
| - Biometric identity verification or authoritative fraud detection. | |
| - Legal decision making without human review. | |
| - Full multi‑modal extraction (text recognition is out of scope here). | |
| ## 3. Repository Structure | |
| ``` | |
| model_repo/ | |
| config.json # Central hyper‑parameter & path config | |
| README.md # (this file) | |
| checkpoints/ # Local downloaded / fine‑tuned checkpoints | |
| samples/ | |
| sample_us_passport.jpg | |
| src/ | |
| sam_segmentator.py # Inference wrapper (SamSegmentator) | |
| main.py # Placeholder entry point | |
| ``` | |
| Planned: `train/` scripts for fine‑tuning (not yet implemented). | |
| ## 4. Configuration (`model_repo/config.json`) | |
| Key fields (example values included in the repo): | |
| - `model_type`: Always `sam2` here. | |
| - `checkpoint_path`: Path relative to project root or absolute; if omitted and `auto_download=True` the code will attempt remote download. | |
| - `image_size`: Target square size used during training (future). Inference wrapper accepts raw image size. | |
| - `num_classes`, `class_names`: For supervised training (future); not required by the current automatic mask generator, but kept for consistency. | |
| - `augmentation`, `loss`, `optimizer`, `lr_scheduler`: Reserved for training loop integration. | |
| - `paths`: Expected dataset layout for training: `data/train/images`, `data/train/masks`, etc. | |
| - `mixed_precision`: Will enable `torch.autocast` during training. | |
| Even if not all fields are consumed now, keeping them centralized avoids future breaking refactors. | |
| ## 5. Installation | |
| ### Prerequisites | |
| - Python 3.10+ (recommended) | |
| - CUDA GPU (optional but recommended for speed) | |
| ### Using uv (preferred fast resolver) | |
| If `pyproject.toml` is present (it is), you can do: | |
| ``` | |
| uv sync | |
| ``` | |
| This creates / updates the virtual environment and installs dependencies. | |
| ### Using pip (alternative) | |
| ``` | |
| python -m venv .venv | |
| .venv\Scripts\activate | |
| pip install -U pip | |
| pip install -e . | |
| ``` | |
| If SAM2 is not a published package in your environment, you may need to install it from source (instructions will depend on the upstream SAM2 repository—add here when finalized). | |
| ## 6. Inference Usage (`SamSegmentator`) | |
| Minimal example using the sample passport image: | |
| ```python | |
| import cv2 | |
| from pathlib import Path | |
| from src.sam_segmentator import SamSegmentator | |
| image_path = Path("samples/sample_us_passport.jpg") | |
| img_bgr = cv2.imread(str(image_path)) # BGR (OpenCV) | |
| segmentator = SamSegmentator( | |
| checkpoint_path="checkpoints/sam2.1_hiera_base_plus_ft_ids.pt", # or None to auto-download if configured | |
| pred_iou_thresh=0.88, # forwarded to SAM2AutomaticMaskGenerator | |
| stability_score_thresh=0.90, | |
| ) | |
| segments = segmentator.infer(img_bgr, pad_percent=0.05) | |
| print(f"Total segments: {len(segments)}") | |
| # Each segment is (crop_bgr, mask_255) | |
| for i, (crop, mask) in enumerate(segments[:3]): | |
| cv2.imwrite(f"outputs/segment_{i}_crop.png", crop) | |
| cv2.imwrite(f"outputs/segment_{i}_mask.png", mask) | |
| ``` | |
| Output: pairs of tightly cropped images and their binary masks (0 background, 255 foreground), sorted by mask area descending. | |
| ### Parameter Notes | |
| - `pad_percent`: Relative padding (default 5%) added around each tight bounding box. | |
| - Deprecated `pad` (absolute pixels) still accepted but will warn. | |
| - All additional kwargs go to `SAM2AutomaticMaskGenerator` (e.g., `box_nms_thresh`, `min_mask_region_area`). | |
| ## 7. Dataset & Mask Format (For Future Training) | |
| Expected layout (mirrors `paths` in config): | |
| ``` | |
| data/ | |
| train/ | |
| images/*.jpg|png | |
| masks/*.png # Single‑channel, integer indices (0=background) | |
| val/ | |
| images/ | |
| masks/ | |
| ``` | |
| Class index mapping (example): | |
| ``` | |
| class_names = ["ID1", "ID3", "IDCOVER"] | |
| 0 -> background | |
| 1 -> ID1 | |
| 2 -> ID3 | |
| 3 -> IDCOVER | |
| ``` | |
| Masks should use nearest‑neighbor safe compression (PNG). Avoid palette mismatch; explicit integer pixel values are recommended. | |
| ## 8. Checkpoints & Auto‑Download | |
| `SamSegmentator` will: | |
| 1. Use provided `checkpoint_path` if it exists. | |
| 2. If none is provided and `auto_download=True`, download the default checkpoint to `checkpoints/` using an environment configured URL (`SAM2_CHECKPOINT_URL`). | |
| 3. (Optional) Validate SHA256 if `SAM2_CHECKPOINT_SHA256` is set. | |
| Environment variables: | |
| ``` | |
| SAM2_CHECKPOINT_URL=<direct_download_url> | |
| SAM2_CHECKPOINT_SHA256=<hex> | |
| SAM2_CHECKPOINT_DIR=checkpoints | |
| ``` | |
| ## 9. Metrics (Recommended When Training Added) | |
| - Mean IoU (per class & macro average) | |
| - Dice coefficient | |
| - Pixel accuracy | |
| - Class frequency distribution (to inform potential class weighting) | |
| Store per‑epoch metrics as JSON for reproducibility. | |
| ## 10. Limitations & Risks | |
| Technical: | |
| - Current version does not include a fine‑tuning script; only inference wrapper. | |
| - Automatic mask generator is class‑agnostic; without fine‑tuning it may over‑segment or miss tiny fields. | |
| Ethical / Compliance: | |
| - Processing ID documents may involve PII; ensure secure storage and compliant handling. | |
| - Not intended for biometric decisions nor identity verification pipelines without human oversight. | |
| ## 11. Roadmap | |
| - [ ] Add training script (supervised fine‑tuning using `config.json`). | |
| - [ ] Optional class‑guided prompting (points / boxes) pipeline. | |
| - [ ] Export to ONNX / TorchScript. | |
| - [ ] CLI interface for batch folder inference. | |
| - [ ] Lightweight web demo (Gradio / FastAPI). | |
| ## 12. License & Citation | |
| Specify a license in a top‑level `LICENSE` file (e.g., MIT or Apache‑2.0) ensuring compatibility with SAM2's original license. | |
| Please cite SAM / SAM2 in academic work. Example (placeholder): | |
| ``` | |
| @article{kirillov2023segmentanything, | |
| title={Segment Anything}, | |
| author={Kirillov, Alexander and others}, | |
| journal={arXiv preprint arXiv:2304.02643}, | |
| year={2023} | |
| } | |
| ``` | |
| Add updated SAM2 citation once official reference is finalized. | |
| ## Acknowledgments | |
| - Meta AI for releasing Segment Anything & SAM2. | |
| - OpenCV, PyTorch, and the broader CV community. | |
| --- | |
| If you have questions or need feature prioritization, open an Issue or start a Discussion. |