|
|
--- |
|
|
license: mit |
|
|
language: |
|
|
- en |
|
|
metrics: |
|
|
- accuracy |
|
|
base_model: |
|
|
- google-bert/bert-base-uncased |
|
|
pipeline_tag: zero-shot-classification |
|
|
library_name: adapter-transformers |
|
|
--- |
|
|
|
|
|
# RigelClauseNet: BERT-Based Fraud Clause Detector |
|
|
|
|
|
**RigelClauseNet** is a fine-tuned BERT-based binary classifier that detects **fraudulent, high-risk, or suspicious clauses** in legal and policy-related documents, including privacy policies, loan agreements, and terms of service. |
|
|
|
|
|
It is designed to help: |
|
|
- Legal analysts |
|
|
- Fintech systems |
|
|
- Regulatory auditors |
|
|
- End users seeking clarity in digital contracts |
|
|
|
|
|
--- |
|
|
|
|
|
## 🔍 Use Case |
|
|
|
|
|
Given a clause or paragraph from a document, the model outputs: |
|
|
- A binary risk label (`SAFE`, `RISKY`) |
|
|
- A probability confidence score |
|
|
- A breakdown of class probabilities |
|
|
|
|
|
This enables organizations to **flag suspicious clauses early**, audit contracts, and build smarter compliance pipelines. |
|
|
|
|
|
--- |
|
|
|
|
|
## 🧠 Model Details |
|
|
|
|
|
- **Base Model**: `google-bert/bert-base-uncased` |
|
|
- **Architecture**: BERT + Sequence Classification Head |
|
|
- **Training Data**: 5,000 semi-synthetic and curated clauses (labeled as `SAFE` or `RISKY`) |
|
|
- **Classes**: |
|
|
- `0` → Safe clause |
|
|
- `1` → Fraudulent/risky clause |
|
|
- **Trained On**: Google Colab with Hugging Face Transformers |
|
|
- **Performance**: |
|
|
- Accuracy: **98.47%** |
|
|
- Precision: **99.19%** |
|
|
- Recall: **99.19%** |
|
|
- F1 Score: **99.99%** *(on validation set)* |
|
|
|
|
|
--- |
|
|
|
|
|
## 📌 Examples |
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
|
import torch.nn.functional as F |
|
|
import torch |
|
|
|
|
|
model = AutoModelForSequenceClassification.from_pretrained("nitinsri/RigelClauseNet") |
|
|
tokenizer = AutoTokenizer.from_pretrained("nitinsri/RigelClauseNet") |
|
|
|
|
|
def predict_clause(text): |
|
|
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=128) |
|
|
with torch.no_grad(): |
|
|
logits = model(**inputs).logits |
|
|
probs = F.softmax(logits, dim=1) |
|
|
label = torch.argmax(probs).item() |
|
|
return { |
|
|
"label": "RISKY" if label == 1 else "SAFE", |
|
|
"confidence": round(probs[0][label].item(), 4), |
|
|
"probabilities": probs.tolist() |
|
|
} |
|
|
|
|
|
# Example |
|
|
predict_clause("Late payments will incur a 25% monthly penalty.") |
|
|
``` |
|
|
|
|
|
## 🧠 Intended Usage |
|
|
|
|
|
You can use this model for: |
|
|
|
|
|
- Scanning uploaded PDFs, contracts, or policies |
|
|
- Highlighting or flagging suspicious legal language |
|
|
- Powering backend systems in legal-tech and compliance |
|
|
|
|
|
--- |
|
|
|
|
|
## 🚫 Limitations |
|
|
|
|
|
- Trained on **semi-synthetic clauses**, not actual legal corporations. |
|
|
- Binary classifier only — it does not explain why a clause is risky. |
|
|
- Contextual or nested document logic is not supported (yet). |
|
|
|
|
|
--- |
|
|
|
|
|
## 📂 Files |
|
|
|
|
|
| File | Description | |
|
|
|------|-------------| |
|
|
| `model.safetensors` | Fine-tuned model weights | |
|
|
| `config.json` | BERT classification head config | |
|
|
| `tokenizer.json` | Tokenizer for preprocessing | |
|
|
| `vocab.txt` | BERT vocabulary | |
|
|
|
|
|
--- |
|
|
|
|
|
## 💡 Future Plans |
|
|
|
|
|
- Multi-class classification (`safe`, `risky`, `ambiguous`) |
|
|
- Explanation layer (highlight key tokens that trigger risk) |
|
|
- Full document-level context scanning |
|
|
- Integration with Hugging Face Spaces (with UI) |
|
|
|
|
|
--- |
|
|
|
|
|
## 👨💻 Author |
|
|
|
|
|
Built by [Nithin Sri] |
|
|
🚀 Hugging Face: [https://huggingface.co/nitinsri](https://huggingface.co/nitinsri) |
|
|
📧 Email: [email protected] |
|
|
|
|
|
--- |
|
|
|
|
|
## 📜 License |
|
|
|
|
|
MIT License |
|
|
|
|
|
--- |
|
|
|
|
|
> “Clarity and transparency in digital contracts are not luxuries — they are rights. RigelGuard helps enforce that.” |
|
|
|
|
|
--- |