nitinsri commited on
Commit
9da1194
·
verified ·
1 Parent(s): c2fe0a1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +133 -3
README.md CHANGED
@@ -1,3 +1,133 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ metrics:
6
+ - accuracy
7
+ base_model:
8
+ - google-bert/bert-base-uncased
9
+ pipeline_tag: zero-shot-classification
10
+ library_name: adapter-transformers
11
+ ---
12
+
13
+ # RigelClauseNet: BERT-Based Fraud Clause Detector
14
+
15
+ **RigelClauseNet** is a fine-tuned BERT-based binary classifier that detects **fraudulent, high-risk, or suspicious clauses** in legal and policy-related documents, including privacy policies, loan agreements, and terms of service.
16
+
17
+ It is designed to help:
18
+ - Legal analysts
19
+ - Fintech systems
20
+ - Regulatory auditors
21
+ - End users seeking clarity in digital contracts
22
+
23
+ ---
24
+
25
+ ## 🔍 Use Case
26
+
27
+ Given a clause or paragraph from a document, the model outputs:
28
+ - A binary risk label (`SAFE`, `RISKY`)
29
+ - A probability confidence score
30
+ - A breakdown of class probabilities
31
+
32
+ This enables organizations to **flag suspicious clauses early**, audit contracts, and build smarter compliance pipelines.
33
+
34
+ ---
35
+
36
+ ## 🧠 Model Details
37
+
38
+ - **Base Model**: `google-bert/bert-base-uncased`
39
+ - **Architecture**: BERT + Sequence Classification Head
40
+ - **Training Data**: 5,000 semi-synthetic and curated clauses (labeled as `SAFE` or `RISKY`)
41
+ - **Classes**:
42
+ - `0` → Safe clause
43
+ - `1` → Fraudulent/risky clause
44
+ - **Trained On**: Google Colab with Hugging Face Transformers
45
+ - **Performance**:
46
+ - Accuracy: **98.47%**
47
+ - Precision: **99.19%**
48
+ - Recall: **99.19%**
49
+ - F1 Score: **99.99%** *(on validation set)*
50
+
51
+ ---
52
+
53
+ ## 📌 Examples
54
+
55
+ ```python
56
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
57
+ import torch.nn.functional as F
58
+ import torch
59
+
60
+ model = AutoModelForSequenceClassification.from_pretrained("nitinsri/RigelClauseNet")
61
+ tokenizer = AutoTokenizer.from_pretrained("nitinsri/RigelClauseNet")
62
+
63
+ def predict_clause(text):
64
+ inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=128)
65
+ with torch.no_grad():
66
+ logits = model(**inputs).logits
67
+ probs = F.softmax(logits, dim=1)
68
+ label = torch.argmax(probs).item()
69
+ return {
70
+ "label": "RISKY" if label == 1 else "SAFE",
71
+ "confidence": round(probs[0][label].item(), 4),
72
+ "probabilities": probs.tolist()
73
+ }
74
+
75
+ # Example
76
+ predict_clause("Late payments will incur a 25% monthly penalty.")
77
+ ```
78
+
79
+ ## 🧠 Intended Usage
80
+
81
+ You can use this model for:
82
+
83
+ - Scanning uploaded PDFs, contracts, or policies
84
+ - Highlighting or flagging suspicious legal language
85
+ - Powering backend systems in legal-tech and compliance
86
+
87
+ ---
88
+
89
+ ## 🚫 Limitations
90
+
91
+ - Trained on **semi-synthetic clauses**, not actual legal corporations.
92
+ - Binary classifier only — it does not explain why a clause is risky.
93
+ - Contextual or nested document logic is not supported (yet).
94
+
95
+ ---
96
+
97
+ ## 📂 Files
98
+
99
+ | File | Description |
100
+ |------|-------------|
101
+ | `model.safetensors` | Fine-tuned model weights |
102
+ | `config.json` | BERT classification head config |
103
+ | `tokenizer.json` | Tokenizer for preprocessing |
104
+ | `vocab.txt` | BERT vocabulary |
105
+
106
+ ---
107
+
108
+ ## 💡 Future Plans
109
+
110
+ - Multi-class classification (`safe`, `risky`, `ambiguous`)
111
+ - Explanation layer (highlight key tokens that trigger risk)
112
+ - Full document-level context scanning
113
+ - Integration with Hugging Face Spaces (with UI)
114
+
115
+ ---
116
+
117
+ ## 👨‍💻 Author
118
+
119
+ Built by [Nithin Sri]
120
+ 🚀 Hugging Face: [https://huggingface.co/nitinsri](https://huggingface.co/nitinsri)
121
+ 📧 Email: [email protected]
122
+
123
+ ---
124
+
125
+ ## 📜 License
126
+
127
+ MIT License
128
+
129
+ ---
130
+
131
+ > “Clarity and transparency in digital contracts are not luxuries — they are rights. RigelGuard helps enforce that.”
132
+
133
+ ---