nitinsri
/

RigelClauseNet

Zero-Shot Classification

Model card Files Files and versions

RigelClauseNet / README.md

nitinsri's picture

Update README.md

9da1194 verified 9 months ago

|

history blame contribute delete

3.57 kB

	---
	license: mit
	language:
	- en
	metrics:
	- accuracy
	base_model:
	- google-bert/bert-base-uncased
	pipeline_tag: zero-shot-classification
	library_name: adapter-transformers
	---

	# RigelClauseNet: BERT-Based Fraud Clause Detector

	RigelClauseNet is a fine-tuned BERT-based binary classifier that detects fraudulent, high-risk, or suspicious clauses in legal and policy-related documents, including privacy policies, loan agreements, and terms of service.

	It is designed to help:
	- Legal analysts
	- Fintech systems
	- Regulatory auditors
	- End users seeking clarity in digital contracts

	---

	## 🔍 Use Case

	Given a clause or paragraph from a document, the model outputs:
	- A binary risk label (`SAFE`, `RISKY`)
	- A probability confidence score
	- A breakdown of class probabilities

	This enables organizations to flag suspicious clauses early, audit contracts, and build smarter compliance pipelines.

	---

	## 🧠 Model Details

	- Base Model: `google-bert/bert-base-uncased`
	- Architecture: BERT + Sequence Classification Head
	- Training Data: 5,000 semi-synthetic and curated clauses (labeled as `SAFE` or `RISKY`)
	- Classes:
	- `0` → Safe clause
	- `1` → Fraudulent/risky clause
	- Trained On: Google Colab with Hugging Face Transformers
	- Performance:
	- Accuracy: 98.47%
	- Precision: 99.19%
	- Recall: 99.19%
	- F1 Score: 99.99% (on validation set)

	---

	## 📌 Examples

	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification
	import torch.nn.functional as F
	import torch

	model = AutoModelForSequenceClassification.from_pretrained("nitinsri/RigelClauseNet")
	tokenizer = AutoTokenizer.from_pretrained("nitinsri/RigelClauseNet")

	def predict_clause(text):
	inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=128)
	with torch.no_grad():
	logits = model(**inputs).logits
	probs = F.softmax(logits, dim=1)
	label = torch.argmax(probs).item()
	return {
	"label": "RISKY" if label == 1 else "SAFE",
	"confidence": round(probs[0][label].item(), 4),
	"probabilities": probs.tolist()
	}

	# Example
	predict_clause("Late payments will incur a 25% monthly penalty.")
	```

	## 🧠 Intended Usage

	You can use this model for:

	- Scanning uploaded PDFs, contracts, or policies
	- Highlighting or flagging suspicious legal language
	- Powering backend systems in legal-tech and compliance

	---

	## 🚫 Limitations

	- Trained on semi-synthetic clauses, not actual legal corporations.
	- Binary classifier only — it does not explain why a clause is risky.
	- Contextual or nested document logic is not supported (yet).

	---

	## 📂 Files

	\| File \| Description \|
	\|------\|-------------\|
	\| `model.safetensors` \| Fine-tuned model weights \|
	\| `config.json` \| BERT classification head config \|
	\| `tokenizer.json` \| Tokenizer for preprocessing \|
	\| `vocab.txt` \| BERT vocabulary \|

	---

	## 💡 Future Plans

	- Multi-class classification (`safe`, `risky`, `ambiguous`)
	- Explanation layer (highlight key tokens that trigger risk)
	- Full document-level context scanning
	- Integration with Hugging Face Spaces (with UI)

	---

	## 👨‍💻 Author

	Built by [Nithin Sri]
	🚀 Hugging Face: [https://huggingface.co/nitinsri](https://huggingface.co/nitinsri)
	📧 Email: [email protected]

	---

	## 📜 License

	MIT License

	---

	> “Clarity and transparency in digital contracts are not luxuries — they are rights. RigelGuard helps enforce that.”

	---