Update README.md
Browse files
README.md
CHANGED
|
@@ -4,8 +4,6 @@ language:
|
|
| 4 |
tags:
|
| 5 |
- text2text-generation
|
| 6 |
- summarization
|
| 7 |
-
- legal-ai
|
| 8 |
-
- italian-law
|
| 9 |
license: mit
|
| 10 |
datasets:
|
| 11 |
- joelniklaus/Multi_Legal_Pile
|
|
@@ -28,6 +26,8 @@ They build upon **BART-IT** ([`morenolq/bart-it`](https://huggingface.co/morenol
|
|
| 28 |
- **Trained on legal documents** such as **statutes, case law, and contracts** π
|
| 29 |
- **Not fine-tuned for specific tasks** (requires further adaptation)
|
| 30 |
|
|
|
|
|
|
|
| 31 |
## π Available Models
|
| 32 |
|
| 33 |
| Model | Description | Link |
|
|
@@ -38,8 +38,8 @@ They build upon **BART-IT** ([`morenolq/bart-it`](https://huggingface.co/morenol
|
|
| 38 |
| **LEGIT-SCRATCH-BART** | Trained from scratch on **Italian legal texts** | [π Link](https://huggingface.co/morenolq/LEGIT-SCRATCH-BART) |
|
| 39 |
| **LEGIT-SCRATCH-BART-LSG-4096** | Trained from scratch with **LSG attention**, supporting **4,096 tokens** | [π Link](https://huggingface.co/morenolq/LEGIT-SCRATCH-BART-LSG-4096) |
|
| 40 |
| **LEGIT-SCRATCH-BART-LSG-16384** | Trained from scratch with **LSG attention**, supporting **16,384 tokens** | [π Link](https://huggingface.co/morenolq/LEGIT-SCRATCH-BART-LSG-16384) |
|
| 41 |
-
| **BART-IT-LSG-4096** | `morenolq/bart-it` with **LSG attention**, supporting **4,096 tokens** (no legal adaptation) | [π Link](https://huggingface.co/morenolq/BART-IT-LSG-4096)
|
| 42 |
-
| **BART-IT-LSG-16384** | `morenolq/bart-it` with **LSG attention**, supporting **16,384 tokens** (no legal adaptation) | [π Link](https://huggingface.co/morenolq/BART-IT-LSG-16384) |
|
| 43 |
|
| 44 |
---
|
| 45 |
|
|
@@ -66,13 +66,13 @@ They build upon **BART-IT** ([`morenolq/bart-it`](https://huggingface.co/morenol
|
|
| 66 |
from transformers import BartForConditionalGeneration, AutoTokenizer
|
| 67 |
|
| 68 |
# Load tokenizer and model
|
| 69 |
-
model_name = "morenolq/
|
| 70 |
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
| 71 |
model = BartForConditionalGeneration.from_pretrained(model_name)
|
| 72 |
|
| 73 |
# Example input
|
| 74 |
input_text = "<mask> 1234: Il contratto si intende concluso quando..."
|
| 75 |
-
inputs = tokenizer(input_text, return_tensors="pt", max_length=
|
| 76 |
|
| 77 |
# Generate summary
|
| 78 |
summary_ids = model.generate(inputs.input_ids, max_length=150, num_beams=4, early_stopping=True)
|
|
|
|
| 4 |
tags:
|
| 5 |
- text2text-generation
|
| 6 |
- summarization
|
|
|
|
|
|
|
| 7 |
license: mit
|
| 8 |
datasets:
|
| 9 |
- joelniklaus/Multi_Legal_Pile
|
|
|
|
| 26 |
- **Trained on legal documents** such as **statutes, case law, and contracts** π
|
| 27 |
- **Not fine-tuned for specific tasks** (requires further adaptation)
|
| 28 |
|
| 29 |
+
β οΈ This specific model is pre-trained on general-purpose Italian text! Please select the best model from the table below.
|
| 30 |
+
|
| 31 |
## π Available Models
|
| 32 |
|
| 33 |
| Model | Description | Link |
|
|
|
|
| 38 |
| **LEGIT-SCRATCH-BART** | Trained from scratch on **Italian legal texts** | [π Link](https://huggingface.co/morenolq/LEGIT-SCRATCH-BART) |
|
| 39 |
| **LEGIT-SCRATCH-BART-LSG-4096** | Trained from scratch with **LSG attention**, supporting **4,096 tokens** | [π Link](https://huggingface.co/morenolq/LEGIT-SCRATCH-BART-LSG-4096) |
|
| 40 |
| **LEGIT-SCRATCH-BART-LSG-16384** | Trained from scratch with **LSG attention**, supporting **16,384 tokens** | [π Link](https://huggingface.co/morenolq/LEGIT-SCRATCH-BART-LSG-16384) |
|
| 41 |
+
| **BART-IT-LSG-4096** | `morenolq/bart-it` with **LSG attention**, supporting **4,096 tokens** (β οΈ no legal adaptation) | [π Link](https://huggingface.co/morenolq/BART-IT-LSG-4096)
|
| 42 |
+
| **BART-IT-LSG-16384** | `morenolq/bart-it` with **LSG attention**, supporting **16,384 tokens** (β οΈ no legal adaptation) | [π Link](https://huggingface.co/morenolq/BART-IT-LSG-16384) |
|
| 43 |
|
| 44 |
---
|
| 45 |
|
|
|
|
| 66 |
from transformers import BartForConditionalGeneration, AutoTokenizer
|
| 67 |
|
| 68 |
# Load tokenizer and model
|
| 69 |
+
model_name = "morenolq/BART-IT-LSG-16384"
|
| 70 |
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
| 71 |
model = BartForConditionalGeneration.from_pretrained(model_name)
|
| 72 |
|
| 73 |
# Example input
|
| 74 |
input_text = "<mask> 1234: Il contratto si intende concluso quando..."
|
| 75 |
+
inputs = tokenizer(input_text, return_tensors="pt", max_length=16384, truncation=True)
|
| 76 |
|
| 77 |
# Generate summary
|
| 78 |
summary_ids = model.generate(inputs.input_ids, max_length=150, num_beams=4, early_stopping=True)
|