Update README.md
Browse files
README.md
CHANGED
|
@@ -1,84 +1,84 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: cc-by-nc-4.0
|
| 3 |
-
---
|
| 4 |
-
<div style="width: auto; margin-left: auto; margin-right: auto; background-color:black">
|
| 5 |
-
<img src="https://assets-global.website-files.com/6423879a8f63c1bb18d74bfa/648818d56d04c3bdf36d71ab_Refuel_rev8-01_ts-p-1600.png" alt="Refuel.ai" style="width: 100%; min-width: 400px; display: block; margin: auto;">
|
| 6 |
-
</div>
|
| 7 |
-
|
| 8 |
-
## Model Details
|
| 9 |
-
|
| 10 |
-
We’re thrilled to introduce RefuelLLM-2 and RefuelLLM-2-small, the next version of our large language models purpose built for data labeling, enrichment and cleaning.
|
| 11 |
-
|
| 12 |
-
1. RefuelLLM-2 (83.82%) outperforms all state-of-the-art LLMs, including GPT-4-Turbo (80.88%), Claude-3-Opus (79.19%) and Gemini-1.5-Pro (74.59%), across a benchmark of ~30 data labeling tasks.
|
| 13 |
-
2. RefuelLLM-2 is a Mixtral-8x7B base model, trained on a corpus of 2750+ datasets, spanning tasks such as classification, reading comprehension, structured attribute extraction and entity resolution.
|
| 14 |
-
3. RefuelLLM-2-small (79.67%), aka Llama-3-Refueled, outperforms all comparable LLMs including Claude3-Sonnet (70.99%), Haiku (69.23%) and GPT-3.5-Turbo (68.13%). The model was trained with the same recipe as RefuelLLM-2, but on top of Llama3-8B base.
|
| 15 |
-
|
| 16 |
-
As a part of this announcement, we are open-sourcing RefuelLLM-2-small for the community to build on top of.
|
| 17 |
-
|
| 18 |
-
**Model developers** Refuel AI
|
| 19 |
-
|
| 20 |
-
**Input** Models input text only.
|
| 21 |
-
|
| 22 |
-
**Output** Models generate text only.
|
| 23 |
-
|
| 24 |
-
**Model Architecture** RefuelLLM-2-small is built on top of Llama-3-8B-instruct which is an auto-regressive language model that uses an optimized transformer architecture.
|
| 25 |
-
|
| 26 |
-
**Model Release Date** May
|
| 27 |
-
|
| 28 |
-
## How to use
|
| 29 |
-
|
| 30 |
-
This repository contains weights for RefuelLLM-2-small that are compatible for use with HuggingFace.
|
| 31 |
-
|
| 32 |
-
### Use with transformers
|
| 33 |
-
|
| 34 |
-
See the snippet below for usage with Transformers:
|
| 35 |
-
|
| 36 |
-
```python
|
| 37 |
-
>>> import torch
|
| 38 |
-
>>> from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 39 |
-
|
| 40 |
-
>>> model_id = "refuelai/Llama-3-Refueled"
|
| 41 |
-
>>> tokenizer = AutoTokenizer.from_pretrained(model_id)
|
| 42 |
-
>>> model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")
|
| 43 |
-
|
| 44 |
-
>>> messages = [{"role": "user", "content": "Is this comment toxic or non-toxic: RefuelLLM is the new way to label text data!"}]
|
| 45 |
-
|
| 46 |
-
>>> inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to("cuda")
|
| 47 |
-
|
| 48 |
-
>>> outputs = model.generate(inputs, max_new_tokens=20)
|
| 49 |
-
>>> print(tokenizer.decode(outputs[0]))
|
| 50 |
-
```
|
| 51 |
-
|
| 52 |
-
## Training Data
|
| 53 |
-
|
| 54 |
-
RefuelLLM-2 and RefuelLLM-2-small were both trained on over 4 Billion tokens, spanning 2750+ NLP tasks. Our training collection consists majorly of:
|
| 55 |
-
1. Human annotated datasets like Flan, Task Source, and the Aya collection
|
| 56 |
-
2. Synthetic datasets like OpenOrca, OpenHermes and WizardLM
|
| 57 |
-
3. Proprietary datasets developed or licensed by Refuel
|
| 58 |
-
|
| 59 |
-
## Benchmarks
|
| 60 |
-
|
| 61 |
-
In this section, we report the results for Refuel models on our benchmark of labeling tasks. For details on the methodology see [here](https://refuel.ai/blog-posts/announcing-refuel-llm-2).
|
| 62 |
-
|
| 63 |
-
<table>
|
| 64 |
-
<tr></tr>
|
| 65 |
-
<tr><th>Provider</th><th>Model</th><th colspan="4" style="text-align: center">LLM Output Quality (by task type)</tr>
|
| 66 |
-
<tr><td></td><td></td><td>Overall</td><td>Classification</td><td>Reading Comprehension</td><td>Structure Extraction</td><td>Entity Matching</td><td></td></tr>
|
| 67 |
-
<tr><td>Refuel</td><td>RefuelLLM-2</td><td>83.82%</td><td>84.94%</td><td>76.03%</td><td>88.16%</td><td>92.00%</td><td></td></tr>
|
| 68 |
-
<tr><td>OpenAI</td><td>GPT-4-Turbo</td><td>80.88%</td><td>81.77%</td><td>72.08%</td><td>84.79%</td><td>97.20%</td><td></td></tr>
|
| 69 |
-
<tr><td>Refuel</td><td>RefuelLLM-2-small</td><td>79.67%</td><td>81.72%</td><td>70.04%</td><td>84.28%</td><td>92.00%</td><td></td></tr>
|
| 70 |
-
<tr><td>Anthropic</td><td>Claude-3-Opus</td><td>79.19%</td><td>82.49%</td><td>67.30%</td><td>88.25%</td><td>94.96%</td><td></td></tr>
|
| 71 |
-
<tr><td>Meta</td><td>Llama3-70B-Instruct</td><td>78.20%</td><td>79.38%</td><td>66.03%</td><td>85.96%</td><td>94.13%</td><td></td></tr>
|
| 72 |
-
<tr><td>Google</td><td>Gemini-1.5-Pro</td><td>74.59%</td><td>73.52%</td><td>60.67%</td><td>84.27%</td><td>98.48%</td><td></td></tr>
|
| 73 |
-
<tr><td>Mistral</td><td>Mixtral-8x7B-Instruct</td><td>62.87%</td><td>79.11%</td><td>45.56%</td><td>47.08%</td><td>86.52%</td><td></td></tr>
|
| 74 |
-
<tr><td>Anthropic</td><td>Claude-3-Sonnet</td><td>70.99%</td><td>79.91%</td><td>45.44%</td><td>78.10%</td><td>96.34%</td><td></td></tr>
|
| 75 |
-
<tr><td>Anthropic</td><td>Claude-3-Haiku</td><td>69.23%</td><td>77.27%</td><td>50.19%</td><td>84.97%</td><td>54.08%</td><td></td></tr>
|
| 76 |
-
<tr><td>OpenAI</td><td>GPT-3.5-Turbo</td><td>68.13%</td><td>74.39%</td><td>53.21%</td><td>69.40%</td><td>80.41%</td><td></td></tr>
|
| 77 |
-
<tr><td>Meta</td><td>Llama3-8B-Instruct</td><td>62.30%</td><td>68.52%</td><td>49.16%</td><td>65.09%</td><td>63.61%</td><td></td></tr>
|
| 78 |
-
</table>
|
| 79 |
-
|
| 80 |
-
|
| 81 |
-
## Limitations
|
| 82 |
-
|
| 83 |
-
The RefuelLLM-v2-small does not have any moderation mechanisms. We're looking forward to engaging with the community
|
| 84 |
-
on ways to make the model finely respect guardrails, allowing for deployment in environments requiring moderated outputs.
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: cc-by-nc-4.0
|
| 3 |
+
---
|
| 4 |
+
<div style="width: auto; margin-left: auto; margin-right: auto; background-color:black">
|
| 5 |
+
<img src="https://assets-global.website-files.com/6423879a8f63c1bb18d74bfa/648818d56d04c3bdf36d71ab_Refuel_rev8-01_ts-p-1600.png" alt="Refuel.ai" style="width: 100%; min-width: 400px; display: block; margin: auto;">
|
| 6 |
+
</div>
|
| 7 |
+
|
| 8 |
+
## Model Details
|
| 9 |
+
|
| 10 |
+
We’re thrilled to introduce RefuelLLM-2 and RefuelLLM-2-small, the next version of our large language models purpose built for data labeling, enrichment and cleaning.
|
| 11 |
+
|
| 12 |
+
1. RefuelLLM-2 (83.82%) outperforms all state-of-the-art LLMs, including GPT-4-Turbo (80.88%), Claude-3-Opus (79.19%) and Gemini-1.5-Pro (74.59%), across a benchmark of ~30 data labeling tasks.
|
| 13 |
+
2. RefuelLLM-2 is a Mixtral-8x7B base model, trained on a corpus of 2750+ datasets, spanning tasks such as classification, reading comprehension, structured attribute extraction and entity resolution.
|
| 14 |
+
3. RefuelLLM-2-small (79.67%), aka Llama-3-Refueled, outperforms all comparable LLMs including Claude3-Sonnet (70.99%), Haiku (69.23%) and GPT-3.5-Turbo (68.13%). The model was trained with the same recipe as RefuelLLM-2, but on top of Llama3-8B base.
|
| 15 |
+
|
| 16 |
+
As a part of this announcement, we are open-sourcing RefuelLLM-2-small for the community to build on top of.
|
| 17 |
+
|
| 18 |
+
**Model developers** Refuel AI
|
| 19 |
+
|
| 20 |
+
**Input** Models input text only.
|
| 21 |
+
|
| 22 |
+
**Output** Models generate text only.
|
| 23 |
+
|
| 24 |
+
**Model Architecture** RefuelLLM-2-small is built on top of Llama-3-8B-instruct which is an auto-regressive language model that uses an optimized transformer architecture.
|
| 25 |
+
|
| 26 |
+
**Model Release Date** May 8, 2024.
|
| 27 |
+
|
| 28 |
+
## How to use
|
| 29 |
+
|
| 30 |
+
This repository contains weights for RefuelLLM-2-small that are compatible for use with HuggingFace.
|
| 31 |
+
|
| 32 |
+
### Use with transformers
|
| 33 |
+
|
| 34 |
+
See the snippet below for usage with Transformers:
|
| 35 |
+
|
| 36 |
+
```python
|
| 37 |
+
>>> import torch
|
| 38 |
+
>>> from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 39 |
+
|
| 40 |
+
>>> model_id = "refuelai/Llama-3-Refueled"
|
| 41 |
+
>>> tokenizer = AutoTokenizer.from_pretrained(model_id)
|
| 42 |
+
>>> model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")
|
| 43 |
+
|
| 44 |
+
>>> messages = [{"role": "user", "content": "Is this comment toxic or non-toxic: RefuelLLM is the new way to label text data!"}]
|
| 45 |
+
|
| 46 |
+
>>> inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to("cuda")
|
| 47 |
+
|
| 48 |
+
>>> outputs = model.generate(inputs, max_new_tokens=20)
|
| 49 |
+
>>> print(tokenizer.decode(outputs[0]))
|
| 50 |
+
```
|
| 51 |
+
|
| 52 |
+
## Training Data
|
| 53 |
+
|
| 54 |
+
RefuelLLM-2 and RefuelLLM-2-small were both trained on over 4 Billion tokens, spanning 2750+ NLP tasks. Our training collection consists majorly of:
|
| 55 |
+
1. Human annotated datasets like Flan, Task Source, and the Aya collection
|
| 56 |
+
2. Synthetic datasets like OpenOrca, OpenHermes and WizardLM
|
| 57 |
+
3. Proprietary datasets developed or licensed by Refuel
|
| 58 |
+
|
| 59 |
+
## Benchmarks
|
| 60 |
+
|
| 61 |
+
In this section, we report the results for Refuel models on our benchmark of labeling tasks. For details on the methodology see [here](https://refuel.ai/blog-posts/announcing-refuel-llm-2).
|
| 62 |
+
|
| 63 |
+
<table>
|
| 64 |
+
<tr></tr>
|
| 65 |
+
<tr><th>Provider</th><th>Model</th><th colspan="4" style="text-align: center">LLM Output Quality (by task type)</tr>
|
| 66 |
+
<tr><td></td><td></td><td>Overall</td><td>Classification</td><td>Reading Comprehension</td><td>Structure Extraction</td><td>Entity Matching</td><td></td></tr>
|
| 67 |
+
<tr><td>Refuel</td><td>RefuelLLM-2</td><td>83.82%</td><td>84.94%</td><td>76.03%</td><td>88.16%</td><td>92.00%</td><td></td></tr>
|
| 68 |
+
<tr><td>OpenAI</td><td>GPT-4-Turbo</td><td>80.88%</td><td>81.77%</td><td>72.08%</td><td>84.79%</td><td>97.20%</td><td></td></tr>
|
| 69 |
+
<tr><td>Refuel</td><td>RefuelLLM-2-small</td><td>79.67%</td><td>81.72%</td><td>70.04%</td><td>84.28%</td><td>92.00%</td><td></td></tr>
|
| 70 |
+
<tr><td>Anthropic</td><td>Claude-3-Opus</td><td>79.19%</td><td>82.49%</td><td>67.30%</td><td>88.25%</td><td>94.96%</td><td></td></tr>
|
| 71 |
+
<tr><td>Meta</td><td>Llama3-70B-Instruct</td><td>78.20%</td><td>79.38%</td><td>66.03%</td><td>85.96%</td><td>94.13%</td><td></td></tr>
|
| 72 |
+
<tr><td>Google</td><td>Gemini-1.5-Pro</td><td>74.59%</td><td>73.52%</td><td>60.67%</td><td>84.27%</td><td>98.48%</td><td></td></tr>
|
| 73 |
+
<tr><td>Mistral</td><td>Mixtral-8x7B-Instruct</td><td>62.87%</td><td>79.11%</td><td>45.56%</td><td>47.08%</td><td>86.52%</td><td></td></tr>
|
| 74 |
+
<tr><td>Anthropic</td><td>Claude-3-Sonnet</td><td>70.99%</td><td>79.91%</td><td>45.44%</td><td>78.10%</td><td>96.34%</td><td></td></tr>
|
| 75 |
+
<tr><td>Anthropic</td><td>Claude-3-Haiku</td><td>69.23%</td><td>77.27%</td><td>50.19%</td><td>84.97%</td><td>54.08%</td><td></td></tr>
|
| 76 |
+
<tr><td>OpenAI</td><td>GPT-3.5-Turbo</td><td>68.13%</td><td>74.39%</td><td>53.21%</td><td>69.40%</td><td>80.41%</td><td></td></tr>
|
| 77 |
+
<tr><td>Meta</td><td>Llama3-8B-Instruct</td><td>62.30%</td><td>68.52%</td><td>49.16%</td><td>65.09%</td><td>63.61%</td><td></td></tr>
|
| 78 |
+
</table>
|
| 79 |
+
|
| 80 |
+
|
| 81 |
+
## Limitations
|
| 82 |
+
|
| 83 |
+
The RefuelLLM-v2-small does not have any moderation mechanisms. We're looking forward to engaging with the community
|
| 84 |
+
on ways to make the model finely respect guardrails, allowing for deployment in environments requiring moderated outputs.
|