RigelClauseNet / README.md
nitinsri's picture
Update README.md
9da1194 verified
---
license: mit
language:
- en
metrics:
- accuracy
base_model:
- google-bert/bert-base-uncased
pipeline_tag: zero-shot-classification
library_name: adapter-transformers
---
# RigelClauseNet: BERT-Based Fraud Clause Detector
**RigelClauseNet** is a fine-tuned BERT-based binary classifier that detects **fraudulent, high-risk, or suspicious clauses** in legal and policy-related documents, including privacy policies, loan agreements, and terms of service.
It is designed to help:
- Legal analysts
- Fintech systems
- Regulatory auditors
- End users seeking clarity in digital contracts
---
## 🔍 Use Case
Given a clause or paragraph from a document, the model outputs:
- A binary risk label (`SAFE`, `RISKY`)
- A probability confidence score
- A breakdown of class probabilities
This enables organizations to **flag suspicious clauses early**, audit contracts, and build smarter compliance pipelines.
---
## 🧠 Model Details
- **Base Model**: `google-bert/bert-base-uncased`
- **Architecture**: BERT + Sequence Classification Head
- **Training Data**: 5,000 semi-synthetic and curated clauses (labeled as `SAFE` or `RISKY`)
- **Classes**:
- `0` → Safe clause
- `1` → Fraudulent/risky clause
- **Trained On**: Google Colab with Hugging Face Transformers
- **Performance**:
- Accuracy: **98.47%**
- Precision: **99.19%**
- Recall: **99.19%**
- F1 Score: **99.99%** *(on validation set)*
---
## 📌 Examples
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch.nn.functional as F
import torch
model = AutoModelForSequenceClassification.from_pretrained("nitinsri/RigelClauseNet")
tokenizer = AutoTokenizer.from_pretrained("nitinsri/RigelClauseNet")
def predict_clause(text):
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=128)
with torch.no_grad():
logits = model(**inputs).logits
probs = F.softmax(logits, dim=1)
label = torch.argmax(probs).item()
return {
"label": "RISKY" if label == 1 else "SAFE",
"confidence": round(probs[0][label].item(), 4),
"probabilities": probs.tolist()
}
# Example
predict_clause("Late payments will incur a 25% monthly penalty.")
```
## 🧠 Intended Usage
You can use this model for:
- Scanning uploaded PDFs, contracts, or policies
- Highlighting or flagging suspicious legal language
- Powering backend systems in legal-tech and compliance
---
## 🚫 Limitations
- Trained on **semi-synthetic clauses**, not actual legal corporations.
- Binary classifier only — it does not explain why a clause is risky.
- Contextual or nested document logic is not supported (yet).
---
## 📂 Files
| File | Description |
|------|-------------|
| `model.safetensors` | Fine-tuned model weights |
| `config.json` | BERT classification head config |
| `tokenizer.json` | Tokenizer for preprocessing |
| `vocab.txt` | BERT vocabulary |
---
## 💡 Future Plans
- Multi-class classification (`safe`, `risky`, `ambiguous`)
- Explanation layer (highlight key tokens that trigger risk)
- Full document-level context scanning
- Integration with Hugging Face Spaces (with UI)
---
## 👨‍💻 Author
Built by [Nithin Sri]
🚀 Hugging Face: [https://huggingface.co/nitinsri](https://huggingface.co/nitinsri)
📧 Email: [email protected]
---
## 📜 License
MIT License
---
> “Clarity and transparency in digital contracts are not luxuries — they are rights. RigelGuard helps enforce that.”
---