Mira190
/

Euler-Legal-Embedding-V1

@@ -20,16 +20,16 @@ language:
 - multilingual
 extra_gated_eu_disallowed: true
 ---
 <h1 align="center">Euler-Legal-Embedding-V1</h1>
 <p align="center">
-  <a href="https://huggingface.co/LawRank/Euler-Legal-Embedding-V1">
     <img src="https://img.shields.io/badge/%F0%9F%A4%97_HuggingFace-Model-ffbd45.svg" alt="HuggingFace">
   </a>
 </p>
 ## Short Description
-Euler-Legal-Embedding-V1  is a specialized embedding model for the legal domain, fine-tuned on [Qwen/Qwen3-Embedding-8B](https://huggingface.co/Qwen/Qwen3-Embedding-8B). It achieves strong performance on legal retrieval and reasoning tasks within the MTEB benchmark.
 ## Model Details
 - **Base Model**: Qwen/Qwen3-Embedding-8B
@@ -38,26 +38,36 @@ Euler-Legal-Embedding-V1  is a specialized embedding model for the legal domain,
 - **Max Input Tokens**: 1536
 - **Pooling**: Last token pooling (Standard for Qwen-Embedding)
 - **Training Data**: Legal domain specific dataset (`final-data-new-anonymized-grok4-filtered.jsonl`)
 ## Usage
 ### sentence-transformers support
 Using this model becomes easy when you have [sentence-transformers](https://www.SBERT.net) installed:
 ```bash
 pip install -U sentence-transformers
 You can use the model like this:
 from sentence_transformers import SentenceTransformer
 import torch
 # Load the model
 # trust_remote_code=True is required for Qwen-based models
 model = SentenceTransformer(
-    "LawRank/Euler-Legal-Embedding-V1",
     trust_remote_code=True,
     model_kwargs={
         "torch_dtype": torch.bfloat16,
         "attn_implementation": "flash_attention_2",  # Optional, requires flash-attn installed
     },
 )
 model.max_seq_length = 1536
 sentences = [
     "The plaintiff filed a motion for summary judgment.",
     "The court granted the motion based on lack of genuine dispute of material fact."
@@ -70,13 +80,22 @@ embeddings = model.encode(
     batch_size=16,
     show_progress_bar=True,
 )
 print(embeddings.shape)
-Transformers support
-You can also use the model directly with the transformers library:
 import torch
 from transformers import AutoModel, AutoTokenizer
-model_id = "LawRank/Euler-Legal-Embedding-V1"
 tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
 model = AutoModel.from_pretrained(
     model_id,
@@ -84,30 +103,51 @@ model = AutoModel.from_pretrained(
     torch_dtype=torch.bfloat16,
     device_map="auto"
 )
 sentences = ["This is a legal document.", "This is another legal document."]
-inputs = tokenizer(sentences, return_tensors="pt", padding=True, truncation=True, max_length=1536)
 with torch.no_grad():
     outputs = model(**inputs)
-    # Last token pooling
     embeddings = outputs.last_hidden_state[:, -1]
     # Normalize embeddings
     embeddings = torch.nn.functional.normalize(embeddings, p=2, dim=1)
-print(embeddings)
-Training Details
 The model was fine-tuned using LoRA (Low-Rank Adaptation) via the Swift framework.
-Framework: Swift
-Loss Function: InfoNCE (Temperature: 0.03)
-Batch Size: 4 (per device)
-Learning Rate: 2e-5
-LoRA Config: Rank 8, Alpha 32, Dropout 0.05
-Citation
 If you find this model useful, please consider citing:
 @misc{euler2025legal,
       title={Euler-Legal-Embedding: Advanced Legal Representation Learning},
       author={LawRank Team},
       year={2025},
       publisher={Hugging Face}
-}

 - multilingual
 extra_gated_eu_disallowed: true
 ---
 <h1 align="center">Euler-Legal-Embedding-V1</h1>
 <p align="center">
+  <a href="https://huggingface.co/Mira190/Euler-Legal-Embedding-V1">
     <img src="https://img.shields.io/badge/%F0%9F%A4%97_HuggingFace-Model-ffbd45.svg" alt="HuggingFace">
   </a>
 </p>
 ## Short Description
+Euler-Legal-Embedding-V1 is a specialized embedding model for the legal domain, fine-tuned on [Qwen/Qwen3-Embedding-8B](https://huggingface.co/Qwen/Qwen3-Embedding-8B). It achieves strong performance on legal retrieval and reasoning tasks within the MTEB benchmark.
 ## Model Details
 - **Base Model**: Qwen/Qwen3-Embedding-8B
 - **Max Input Tokens**: 1536
 - **Pooling**: Last token pooling (Standard for Qwen-Embedding)
 - **Training Data**: Legal domain specific dataset (`final-data-new-anonymized-grok4-filtered.jsonl`)
 ## Usage
 ### sentence-transformers support
 Using this model becomes easy when you have [sentence-transformers](https://www.SBERT.net) installed:
 ```bash
 pip install -U sentence-transformers
+```
 You can use the model like this:
+```python
 from sentence_transformers import SentenceTransformer
 import torch
 # Load the model
 # trust_remote_code=True is required for Qwen-based models
 model = SentenceTransformer(
+    "Mira190/Euler-Legal-Embedding-V1",
     trust_remote_code=True,
     model_kwargs={
         "torch_dtype": torch.bfloat16,
         "attn_implementation": "flash_attention_2",  # Optional, requires flash-attn installed
     },
 )
 model.max_seq_length = 1536
 sentences = [
     "The plaintiff filed a motion for summary judgment.",
     "The court granted the motion based on lack of genuine dispute of material fact."
     batch_size=16,
     show_progress_bar=True,
 )
 print(embeddings.shape)
+# Output: (2, 4096)
+```
+### Transformers support
+You can also use the model directly with the `transformers` library:
+```python
 import torch
 from transformers import AutoModel, AutoTokenizer
+model_id = "Mira190/Euler-Legal-Embedding-V1"
+# Load tokenizer and model
 tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
 model = AutoModel.from_pretrained(
     model_id,
     torch_dtype=torch.bfloat16,
     device_map="auto"
 )
 sentences = ["This is a legal document.", "This is another legal document."]
+# Tokenize sentences
+inputs = tokenizer(
+    sentences,
+    return_tensors="pt",
+    padding=True,
+    truncation=True,
+    max_length=1536
+)
+# Move inputs to the same device as the model
+inputs = {k: v.to(model.device) for k, v in inputs.items()}
 with torch.no_grad():
     outputs = model(**inputs)
+    # Last token pooling (Standard for Qwen-Embedding)
+    # Note: Qwen embeddings typically use the last hidden state of the last token (EOS or specific token)
     embeddings = outputs.last_hidden_state[:, -1]
     # Normalize embeddings
     embeddings = torch.nn.functional.normalize(embeddings, p=2, dim=1)
+print(embeddings.shape)
+# Output: (2, 4096)
+```
+## Training Details
 The model was fine-tuned using LoRA (Low-Rank Adaptation) via the Swift framework.
+- **Framework**: Swift
+- **Loss Function**: InfoNCE (Temperature: 0.03)
+- **Batch Size**: 4 (per device)
+- **Learning Rate**: 2e-5
+- **LoRA Config**: Rank 8, Alpha 32, Dropout 0.05
+## Citation
 If you find this model useful, please consider citing:
+```bibtex
 @misc{euler2025legal,
       title={Euler-Legal-Embedding: Advanced Legal Representation Learning},
       author={LawRank Team},
       year={2025},
       publisher={Hugging Face}
+}
+```