Update README.md
Browse files
README.md
CHANGED
|
@@ -100,8 +100,9 @@ If my model performs poorly, I know that my synthetic data overfit the model and
|
|
| 100 |
because it is a well known model of similar size and structure to mine. It is 8B while mine is 7B, and is also Instruct like mine is. Additionally, it performs well when generating text, which is an essential baseline
|
| 101 |
capability of my model. I chose the other comparison model [deepseek-ai/DeepSeek-R1-0528-Qwen3-8B](https://huggingface.co/deepseek-ai/DeepSeek-R1-0528-Qwen3-8B) for similar reasons, the size is approximately the same and it is a version built of off the base model Qwen, just like mine is.
|
| 102 |
This will allow me to see how well my finetuning performed as compared to other models that use Qwen as a baseline. Overall, my model does not perform better than the baseline model for the testing split, but the high bert scores for
|
| 103 |
-
the testing split of training data still indicate that my model generates accurate text and performs well with my dataset. My model did perform better than the llama model when
|
| 104 |
it also performed better against deepseek's Qwen3 model when it came to E2E NLG Challenge and the testing split. In general, my model has mixed results in its evaluation, but it performs closely to the comparison models.
|
|
|
|
| 105 |
|
| 106 |
# Usage and Intended Use
|
| 107 |
|
|
|
|
| 100 |
because it is a well known model of similar size and structure to mine. It is 8B while mine is 7B, and is also Instruct like mine is. Additionally, it performs well when generating text, which is an essential baseline
|
| 101 |
capability of my model. I chose the other comparison model [deepseek-ai/DeepSeek-R1-0528-Qwen3-8B](https://huggingface.co/deepseek-ai/DeepSeek-R1-0528-Qwen3-8B) for similar reasons, the size is approximately the same and it is a version built of off the base model Qwen, just like mine is.
|
| 102 |
This will allow me to see how well my finetuning performed as compared to other models that use Qwen as a baseline. Overall, my model does not perform better than the baseline model for the testing split, but the high bert scores for
|
| 103 |
+
the testing split of training data still indicate that my model generates accurate text and performs well with my dataset. My model did perform better than the llama model when it came to HumanEval and E2E NLG Challenge,
|
| 104 |
it also performed better against deepseek's Qwen3 model when it came to E2E NLG Challenge and the testing split. In general, my model has mixed results in its evaluation, but it performs closely to the comparison models.
|
| 105 |
+
Additionally, the actual outputs of the model are coherent and relevant, indicating that while the benchmarks are low, the model still performs its task well.
|
| 106 |
|
| 107 |
# Usage and Intended Use
|
| 108 |
|