nitinsri
/

RigelClauseNet

+---
+license: mit
+language:
+- en
+metrics:
+- accuracy
+base_model:
+- google-bert/bert-base-uncased
+pipeline_tag: zero-shot-classification
+library_name: adapter-transformers
+---
+# RigelClauseNet: BERT-Based Fraud Clause Detector
+**RigelClauseNet** is a fine-tuned BERT-based binary classifier that detects **fraudulent, high-risk, or suspicious clauses** in legal and policy-related documents, including privacy policies, loan agreements, and terms of service.
+It is designed to help:
+- Legal analysts
+- Fintech systems
+- Regulatory auditors
+- End users seeking clarity in digital contracts
+---
+## 🔍 Use Case
+Given a clause or paragraph from a document, the model outputs:
+- A binary risk label (`SAFE`, `RISKY`)
+- A probability confidence score
+- A breakdown of class probabilities
+This enables organizations to **flag suspicious clauses early**, audit contracts, and build smarter compliance pipelines.
+---
+## 🧠 Model Details
+- **Base Model**: `google-bert/bert-base-uncased`
+- **Architecture**: BERT + Sequence Classification Head
+- **Training Data**: 5,000 semi-synthetic and curated clauses (labeled as `SAFE` or `RISKY`)
+- **Classes**:
+  - `0` → Safe clause
+  - `1` → Fraudulent/risky clause
+- **Trained On**: Google Colab with Hugging Face Transformers
+- **Performance**:
+  - Accuracy: **98.47%**
+  - Precision: **99.19%**
+  - Recall: **99.19%**
+  - F1 Score: **99.99%** *(on validation set)*
+---
+## 📌 Examples
+```python
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+import torch.nn.functional as F
+import torch
+model = AutoModelForSequenceClassification.from_pretrained("nitinsri/RigelClauseNet")
+tokenizer = AutoTokenizer.from_pretrained("nitinsri/RigelClauseNet")
+def predict_clause(text):
+    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=128)
+    with torch.no_grad():
+        logits = model(**inputs).logits
+        probs = F.softmax(logits, dim=1)
+        label = torch.argmax(probs).item()
+        return {
+            "label": "RISKY" if label == 1 else "SAFE",
+            "confidence": round(probs[0][label].item(), 4),
+            "probabilities": probs.tolist()
+        }
+# Example
+predict_clause("Late payments will incur a 25% monthly penalty.")
+```
+## 🧠 Intended Usage
+You can use this model for:
+- Scanning uploaded PDFs, contracts, or policies
+- Highlighting or flagging suspicious legal language
+- Powering backend systems in legal-tech and compliance
+---
+## 🚫 Limitations
+- Trained on **semi-synthetic clauses**, not actual legal corporations.
+- Binary classifier only — it does not explain why a clause is risky.
+- Contextual or nested document logic is not supported (yet).
+---
+## 📂 Files
+| File | Description |
+|------|-------------|
+| `model.safetensors` | Fine-tuned model weights |
+| `config.json`       | BERT classification head config |
+| `tokenizer.json`    | Tokenizer for preprocessing |
+| `vocab.txt`         | BERT vocabulary |
+---
+## 💡 Future Plans
+- Multi-class classification (`safe`, `risky`, `ambiguous`)
+- Explanation layer (highlight key tokens that trigger risk)
+- Full document-level context scanning
+- Integration with Hugging Face Spaces (with UI)
+---
+## 👨‍💻 Author
+Built by [Nithin Sri]
+🚀 Hugging Face: [https://huggingface.co/nitinsri](https://huggingface.co/nitinsri)
+📧 Email: [email protected]
+---
+## 📜 License
+MIT License
+---
+> “Clarity and transparency in digital contracts are not luxuries — they are rights. RigelGuard helps enforce that.”
+---