--- license: mit language: - en metrics: - accuracy base_model: - google-bert/bert-base-uncased pipeline_tag: zero-shot-classification library_name: adapter-transformers --- # RigelClauseNet: BERT-Based Fraud Clause Detector **RigelClauseNet** is a fine-tuned BERT-based binary classifier that detects **fraudulent, high-risk, or suspicious clauses** in legal and policy-related documents, including privacy policies, loan agreements, and terms of service. It is designed to help: - Legal analysts - Fintech systems - Regulatory auditors - End users seeking clarity in digital contracts --- ## πŸ” Use Case Given a clause or paragraph from a document, the model outputs: - A binary risk label (`SAFE`, `RISKY`) - A probability confidence score - A breakdown of class probabilities This enables organizations to **flag suspicious clauses early**, audit contracts, and build smarter compliance pipelines. --- ## 🧠 Model Details - **Base Model**: `google-bert/bert-base-uncased` - **Architecture**: BERT + Sequence Classification Head - **Training Data**: 5,000 semi-synthetic and curated clauses (labeled as `SAFE` or `RISKY`) - **Classes**: - `0` β†’ Safe clause - `1` β†’ Fraudulent/risky clause - **Trained On**: Google Colab with Hugging Face Transformers - **Performance**: - Accuracy: **98.47%** - Precision: **99.19%** - Recall: **99.19%** - F1 Score: **99.99%** *(on validation set)* --- ## πŸ“Œ Examples ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch.nn.functional as F import torch model = AutoModelForSequenceClassification.from_pretrained("nitinsri/RigelClauseNet") tokenizer = AutoTokenizer.from_pretrained("nitinsri/RigelClauseNet") def predict_clause(text): inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=128) with torch.no_grad(): logits = model(**inputs).logits probs = F.softmax(logits, dim=1) label = torch.argmax(probs).item() return { "label": "RISKY" if label == 1 else "SAFE", "confidence": round(probs[0][label].item(), 4), "probabilities": probs.tolist() } # Example predict_clause("Late payments will incur a 25% monthly penalty.") ``` ## 🧠 Intended Usage You can use this model for: - Scanning uploaded PDFs, contracts, or policies - Highlighting or flagging suspicious legal language - Powering backend systems in legal-tech and compliance --- ## 🚫 Limitations - Trained on **semi-synthetic clauses**, not actual legal corporations. - Binary classifier only β€” it does not explain why a clause is risky. - Contextual or nested document logic is not supported (yet). --- ## πŸ“‚ Files | File | Description | |------|-------------| | `model.safetensors` | Fine-tuned model weights | | `config.json` | BERT classification head config | | `tokenizer.json` | Tokenizer for preprocessing | | `vocab.txt` | BERT vocabulary | --- ## πŸ’‘ Future Plans - Multi-class classification (`safe`, `risky`, `ambiguous`) - Explanation layer (highlight key tokens that trigger risk) - Full document-level context scanning - Integration with Hugging Face Spaces (with UI) --- ## πŸ‘¨β€πŸ’» Author Built by [Nithin Sri] πŸš€ Hugging Face: [https://huggingface.co/nitinsri](https://huggingface.co/nitinsri) πŸ“§ Email: nitinjuttuka63@gmaiol.com --- ## πŸ“œ License MIT License --- > β€œClarity and transparency in digital contracts are not luxuries β€” they are rights. RigelGuard helps enforce that.” ---