File size: 3,569 Bytes
9da1194 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 |
---
license: mit
language:
- en
metrics:
- accuracy
base_model:
- google-bert/bert-base-uncased
pipeline_tag: zero-shot-classification
library_name: adapter-transformers
---
# RigelClauseNet: BERT-Based Fraud Clause Detector
**RigelClauseNet** is a fine-tuned BERT-based binary classifier that detects **fraudulent, high-risk, or suspicious clauses** in legal and policy-related documents, including privacy policies, loan agreements, and terms of service.
It is designed to help:
- Legal analysts
- Fintech systems
- Regulatory auditors
- End users seeking clarity in digital contracts
---
## 🔍 Use Case
Given a clause or paragraph from a document, the model outputs:
- A binary risk label (`SAFE`, `RISKY`)
- A probability confidence score
- A breakdown of class probabilities
This enables organizations to **flag suspicious clauses early**, audit contracts, and build smarter compliance pipelines.
---
## 🧠 Model Details
- **Base Model**: `google-bert/bert-base-uncased`
- **Architecture**: BERT + Sequence Classification Head
- **Training Data**: 5,000 semi-synthetic and curated clauses (labeled as `SAFE` or `RISKY`)
- **Classes**:
- `0` → Safe clause
- `1` → Fraudulent/risky clause
- **Trained On**: Google Colab with Hugging Face Transformers
- **Performance**:
- Accuracy: **98.47%**
- Precision: **99.19%**
- Recall: **99.19%**
- F1 Score: **99.99%** *(on validation set)*
---
## 📌 Examples
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch.nn.functional as F
import torch
model = AutoModelForSequenceClassification.from_pretrained("nitinsri/RigelClauseNet")
tokenizer = AutoTokenizer.from_pretrained("nitinsri/RigelClauseNet")
def predict_clause(text):
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=128)
with torch.no_grad():
logits = model(**inputs).logits
probs = F.softmax(logits, dim=1)
label = torch.argmax(probs).item()
return {
"label": "RISKY" if label == 1 else "SAFE",
"confidence": round(probs[0][label].item(), 4),
"probabilities": probs.tolist()
}
# Example
predict_clause("Late payments will incur a 25% monthly penalty.")
```
## 🧠 Intended Usage
You can use this model for:
- Scanning uploaded PDFs, contracts, or policies
- Highlighting or flagging suspicious legal language
- Powering backend systems in legal-tech and compliance
---
## 🚫 Limitations
- Trained on **semi-synthetic clauses**, not actual legal corporations.
- Binary classifier only — it does not explain why a clause is risky.
- Contextual or nested document logic is not supported (yet).
---
## 📂 Files
| File | Description |
|------|-------------|
| `model.safetensors` | Fine-tuned model weights |
| `config.json` | BERT classification head config |
| `tokenizer.json` | Tokenizer for preprocessing |
| `vocab.txt` | BERT vocabulary |
---
## 💡 Future Plans
- Multi-class classification (`safe`, `risky`, `ambiguous`)
- Explanation layer (highlight key tokens that trigger risk)
- Full document-level context scanning
- Integration with Hugging Face Spaces (with UI)
---
## 👨💻 Author
Built by [Nithin Sri]
🚀 Hugging Face: [https://huggingface.co/nitinsri](https://huggingface.co/nitinsri)
📧 Email: [email protected]
---
## 📜 License
MIT License
---
> “Clarity and transparency in digital contracts are not luxuries — they are rights. RigelGuard helps enforce that.”
--- |