File size: 3,569 Bytes

9da1194

---
license: mit
language:
- en
metrics:
- accuracy
base_model:
- google-bert/bert-base-uncased
pipeline_tag: zero-shot-classification
library_name: adapter-transformers
---

# RigelClauseNet: BERT-Based Fraud Clause Detector

**RigelClauseNet** is a fine-tuned BERT-based binary classifier that detects **fraudulent, high-risk, or suspicious clauses** in legal and policy-related documents, including privacy policies, loan agreements, and terms of service.

It is designed to help:
- Legal analysts
- Fintech systems
- Regulatory auditors
- End users seeking clarity in digital contracts

---

## 🔍 Use Case

Given a clause or paragraph from a document, the model outputs:
- A binary risk label (`SAFE`, `RISKY`)
- A probability confidence score
- A breakdown of class probabilities

This enables organizations to **flag suspicious clauses early**, audit contracts, and build smarter compliance pipelines.

---

## 🧠 Model Details

- **Base Model**: `google-bert/bert-base-uncased`
- **Architecture**: BERT + Sequence Classification Head
- **Training Data**: 5,000 semi-synthetic and curated clauses (labeled as `SAFE` or `RISKY`)
- **Classes**:
  - `0` → Safe clause
  - `1` → Fraudulent/risky clause
- **Trained On**: Google Colab with Hugging Face Transformers
- **Performance**:
  - Accuracy: **98.47%**
  - Precision: **99.19%**
  - Recall: **99.19%**
  - F1 Score: **99.99%** *(on validation set)*

---

## 📌 Examples

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch.nn.functional as F
import torch

model = AutoModelForSequenceClassification.from_pretrained("nitinsri/RigelClauseNet")
tokenizer = AutoTokenizer.from_pretrained("nitinsri/RigelClauseNet")

def predict_clause(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=128)
    with torch.no_grad():
        logits = model(**inputs).logits
        probs = F.softmax(logits, dim=1)
        label = torch.argmax(probs).item()
        return {
            "label": "RISKY" if label == 1 else "SAFE",
            "confidence": round(probs[0][label].item(), 4),
            "probabilities": probs.tolist()
        }

# Example
predict_clause("Late payments will incur a 25% monthly penalty.")
```

## 🧠 Intended Usage

You can use this model for:

- Scanning uploaded PDFs, contracts, or policies
- Highlighting or flagging suspicious legal language
- Powering backend systems in legal-tech and compliance

---

## 🚫 Limitations

- Trained on **semi-synthetic clauses**, not actual legal corporations.
- Binary classifier only — it does not explain why a clause is risky.
- Contextual or nested document logic is not supported (yet).

---

## 📂 Files

| File | Description |
|------|-------------|
| `model.safetensors` | Fine-tuned model weights |
| `config.json`       | BERT classification head config |
| `tokenizer.json`    | Tokenizer for preprocessing |
| `vocab.txt`         | BERT vocabulary |

---

## 💡 Future Plans

- Multi-class classification (`safe`, `risky`, `ambiguous`)
- Explanation layer (highlight key tokens that trigger risk)
- Full document-level context scanning
- Integration with Hugging Face Spaces (with UI)

---

## 👨‍💻 Author

Built by [Nithin Sri]  
🚀 Hugging Face: [https://huggingface.co/nitinsri](https://huggingface.co/nitinsri)  
📧 Email: [email protected]

---

## 📜 License

MIT License

---

> “Clarity and transparency in digital contracts are not luxuries — they are rights. RigelGuard helps enforce that.”

---