Commit
·
64cb362
1
Parent(s):
a4ba7b8
update readme
Browse files- README.md +34 -15
- de_evaluation_results.png +0 -0
README.md
CHANGED
|
@@ -3,7 +3,6 @@ tags:
|
|
| 3 |
- sentence-transformers
|
| 4 |
- feature-extraction
|
| 5 |
- sentence-similarity
|
| 6 |
-
- mteb
|
| 7 |
language:
|
| 8 |
- de
|
| 9 |
- en
|
|
@@ -3109,7 +3108,7 @@ model-index:
|
|
| 3109 |
<br><br>
|
| 3110 |
|
| 3111 |
<p align="center">
|
| 3112 |
-
<img src="https://aeiljuispo.cloudimg.io/v7/https://cdn-uploads.huggingface.co/production/uploads/603763514de52ff951d89793/AFoybzd5lpBQXEBrQHuTt.png?w=200&h=200&f=face" alt="
|
| 3113 |
</p>
|
| 3114 |
|
| 3115 |
|
|
@@ -3117,6 +3116,9 @@ model-index:
|
|
| 3117 |
<b>The text embedding set trained by <a href="https://jina.ai/"><b>Jina AI</b></a>.</b>
|
| 3118 |
</p>
|
| 3119 |
|
|
|
|
|
|
|
|
|
|
| 3120 |
|
| 3121 |
## Intended Usage & Model Info
|
| 3122 |
|
|
@@ -3135,13 +3137,17 @@ Des Weiteren stellen wir folgende Embedding-Modelle bereit:
|
|
| 3135 |
|
| 3136 |
- [`jina-embeddings-v2-small-en`](https://huggingface.co/jinaai/jina-embeddings-v2-small-en): 33 million parameters.
|
| 3137 |
- [`jina-embeddings-v2-base-en`](https://huggingface.co/jinaai/jina-embeddings-v2-base-en): 137 million parameters.
|
| 3138 |
-
- [`jina-embeddings-v2-base-zh`](): Chinese-English Bilingual embeddings
|
| 3139 |
-
- [`jina-embeddings-v2-base-de`](): German-English Bilingual embeddings
|
| 3140 |
-
- [`jina-embeddings-v2-base-es`](): Spanish-English Bilingual embeddings (soon).
|
|
|
|
|
|
|
|
|
|
| 3141 |
|
| 3142 |
## Data & Parameters
|
| 3143 |
|
| 3144 |
-
|
|
|
|
| 3145 |
|
| 3146 |
## Usage
|
| 3147 |
|
|
@@ -3204,9 +3210,29 @@ embeddings = model.encode(
|
|
| 3204 |
)
|
| 3205 |
```
|
| 3206 |
|
| 3207 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3208 |
|
| 3209 |
-
|
|
|
|
|
|
|
| 3210 |
|
| 3211 |
## Use Jina Embeddings for RAG
|
| 3212 |
|
|
@@ -3216,13 +3242,6 @@ According to the latest blog post from [LLamaIndex](https://blog.llamaindex.ai/b
|
|
| 3216 |
|
| 3217 |
<img src="https://miro.medium.com/v2/resize:fit:4800/format:webp/1*ZP2RVejCZovF3FDCg-Bx3A.png" width="780px">
|
| 3218 |
|
| 3219 |
-
|
| 3220 |
-
## Plans
|
| 3221 |
-
|
| 3222 |
-
1. Bilingual embedding models supporting more European & Asian languages, including Spanish, French, Italian and Japanese.
|
| 3223 |
-
2. Multimodal embedding models enable Multimodal RAG applications.
|
| 3224 |
-
3. High-performt rerankers.
|
| 3225 |
-
|
| 3226 |
## Contact
|
| 3227 |
|
| 3228 |
Join our [Discord community](https://discord.jina.ai) and chat with other community members about ideas.
|
|
|
|
| 3 |
- sentence-transformers
|
| 4 |
- feature-extraction
|
| 5 |
- sentence-similarity
|
|
|
|
| 6 |
language:
|
| 7 |
- de
|
| 8 |
- en
|
|
|
|
| 3108 |
<br><br>
|
| 3109 |
|
| 3110 |
<p align="center">
|
| 3111 |
+
<img src="https://aeiljuispo.cloudimg.io/v7/https://cdn-uploads.huggingface.co/production/uploads/603763514de52ff951d89793/AFoybzd5lpBQXEBrQHuTt.png?w=200&h=200&f=face" alt="Jina AI logo: Jina AI is your Portal to Multimodal AI" width="150px">
|
| 3112 |
</p>
|
| 3113 |
|
| 3114 |
|
|
|
|
| 3116 |
<b>The text embedding set trained by <a href="https://jina.ai/"><b>Jina AI</b></a>.</b>
|
| 3117 |
</p>
|
| 3118 |
|
| 3119 |
+
## Quick Start
|
| 3120 |
+
|
| 3121 |
+
The easiest way to starting using `jina-embeddings-v2-base-de` is to use Jina AI's [Embedding API](https://jina.ai/embeddings/).
|
| 3122 |
|
| 3123 |
## Intended Usage & Model Info
|
| 3124 |
|
|
|
|
| 3137 |
|
| 3138 |
- [`jina-embeddings-v2-small-en`](https://huggingface.co/jinaai/jina-embeddings-v2-small-en): 33 million parameters.
|
| 3139 |
- [`jina-embeddings-v2-base-en`](https://huggingface.co/jinaai/jina-embeddings-v2-base-en): 137 million parameters.
|
| 3140 |
+
- [`jina-embeddings-v2-base-zh`](https://huggingface.co/jinaai/jina-embeddings-v2-base-zh): 161 million parameters Chinese-English Bilingual embeddings.
|
| 3141 |
+
- [`jina-embeddings-v2-base-de`](https://huggingface.co/jinaai/jina-embeddings-v2-base-de): 161 million parameters German-English Bilingual embeddings **(you are here)**.
|
| 3142 |
+
- _[`jina-embeddings-v2-base-es`](): Spanish-English Bilingual embeddings (soon)._
|
| 3143 |
+
- _Bilingual embedding models in other world languages (soon)._
|
| 3144 |
+
- _Multimodal-input embedding model (soon)._
|
| 3145 |
+
- _High-performing reranking model (soon)._
|
| 3146 |
|
| 3147 |
## Data & Parameters
|
| 3148 |
|
| 3149 |
+
We will publish a report with technical details about the training of the bilingual models soon.
|
| 3150 |
+
The training of the English model is described in this [technical report](https://arxiv.org/abs/2310.19923).
|
| 3151 |
|
| 3152 |
## Usage
|
| 3153 |
|
|
|
|
| 3210 |
)
|
| 3211 |
```
|
| 3212 |
|
| 3213 |
+
If you want to use the model together with the [sentence-transformers package](https://github.com/UKPLab/sentence-transformers/), make sure that you have installed the latest release and set `trust_remote_code=True` as well:
|
| 3214 |
+
|
| 3215 |
+
```
|
| 3216 |
+
!pip install -U sentence-transformers
|
| 3217 |
+
from sentence_transformers import SentenceTransformer
|
| 3218 |
+
from numpy.linalg import norm
|
| 3219 |
+
|
| 3220 |
+
cos_sim = lambda a,b: (a @ b.T) / (norm(a)*norm(b))
|
| 3221 |
+
model = SentenceTransformer('jinaai/jina-embeddings-v2-base-de', trust_remote_code=True)
|
| 3222 |
+
embeddings = model.encode(['How is the weather today?', 'Wie ist das Wetter heute?'])
|
| 3223 |
+
print(cos_sim(embeddings[0], embeddings[1]))
|
| 3224 |
+
```
|
| 3225 |
+
|
| 3226 |
+
## Alternatives to Using Transformers Package
|
| 3227 |
+
|
| 3228 |
+
1. _Managed SaaS_: Get started with a free key on Jina AI's [Embedding API](https://jina.ai/embeddings/).
|
| 3229 |
+
2. _Private and high-performance deployment_: Get started by picking from our suite of models and deploy them on [AWS Sagemaker](https://aws.amazon.com/marketplace/seller-profile?id=seller-stch2ludm6vgy).
|
| 3230 |
+
|
| 3231 |
+
## Benchmark Results
|
| 3232 |
|
| 3233 |
+
We evaluated our Bilingual model on all German and English evaluation tasks availble on the [MTEB benchmark](https://huggingface.co/blog/mteb). In addition, we evaluated the models agains a couple of other German, English, and multilingual models on additional German evaluation tasks:
|
| 3234 |
+
|
| 3235 |
+
<img src="de_evaluation_results.png" width="780px">
|
| 3236 |
|
| 3237 |
## Use Jina Embeddings for RAG
|
| 3238 |
|
|
|
|
| 3242 |
|
| 3243 |
<img src="https://miro.medium.com/v2/resize:fit:4800/format:webp/1*ZP2RVejCZovF3FDCg-Bx3A.png" width="780px">
|
| 3244 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3245 |
## Contact
|
| 3246 |
|
| 3247 |
Join our [Discord community](https://discord.jina.ai) and chat with other community members about ideas.
|
de_evaluation_results.png
ADDED
|