Serega6678 commited on
Commit
21a334b
·
1 Parent(s): e1a79ba

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +39 -3
README.md CHANGED
@@ -15,7 +15,43 @@ inference: false
15
  tags:
16
  - mBERT
17
  - BERT
18
- - feature extraction
19
- - entity recognition
20
  - generic
21
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
  tags:
16
  - mBERT
17
  - BERT
 
 
18
  - generic
19
+ - entity-recognition
20
+ ---
21
+
22
+ ## Model
23
+
24
+ The [multilingual BERT](https://huggingface.co/bert-base-multilingual-cased) finetunned on an artificially annotated multilingual subset of [Oscar dataset](https://huggingface.co/datasets/oscar-corpus/OSCAR-2201). This model provides domain & language independent embedding for Entity Recognition Task. Embeddings can be used out of the box or fine-tuned on specific datasets.
25
+
26
+ ## Usage
27
+
28
+ ```python
29
+ import torch
30
+ import transformers
31
+
32
+
33
+ model = transformers.AutoModel.from_pretrained(
34
+ 'numind/entity-recognition-multilingual-general-sota-v1',
35
+ output_hidden_states=True,
36
+ )
37
+ tokenizer = transformers.AutoTokenizer.from_pretrained(
38
+ 'numind/entity-recognition-multilingual-general-sota-v1',
39
+ )
40
+
41
+ text = [
42
+ "NuMind is an AI company based in Paris and USA.",
43
+ "NuMind est une entreprise d'IA basée à Paris et aux États-Unis.",
44
+ "Check other awesome models from NuMind on https://huggingface.co/numind"
45
+ ]
46
+ encoded_input = tokenizer(text, return_tensors='pt', padding=True, truncation=True)
47
+ output = model(**encoded_input)
48
+
49
+ # for better quality
50
+ emb = torch.cat(
51
+ (output.hidden_states[-1], output.hidden_states[-7]),
52
+ dim=2
53
+ )
54
+
55
+ # for better speed
56
+ # emb = output.hidden_states[-1]
57
+ ```