Transformers
Safetensors
English
ocbyram commited on
Commit
553b802
·
verified ·
1 Parent(s): 16b416a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -1
README.md CHANGED
@@ -100,8 +100,9 @@ If my model performs poorly, I know that my synthetic data overfit the model and
100
  because it is a well known model of similar size and structure to mine. It is 8B while mine is 7B, and is also Instruct like mine is. Additionally, it performs well when generating text, which is an essential baseline
101
  capability of my model. I chose the other comparison model [deepseek-ai/DeepSeek-R1-0528-Qwen3-8B](https://huggingface.co/deepseek-ai/DeepSeek-R1-0528-Qwen3-8B) for similar reasons, the size is approximately the same and it is a version built of off the base model Qwen, just like mine is.
102
  This will allow me to see how well my finetuning performed as compared to other models that use Qwen as a baseline. Overall, my model does not perform better than the baseline model for the testing split, but the high bert scores for
103
- the testing split of training data still indicate that my model generates accurate text and performs well with my dataset. My model did perform better than the llama model when t came to HumanEval and E2E NLG Challenge,
104
  it also performed better against deepseek's Qwen3 model when it came to E2E NLG Challenge and the testing split. In general, my model has mixed results in its evaluation, but it performs closely to the comparison models.
 
105
 
106
  # Usage and Intended Use
107
 
 
100
  because it is a well known model of similar size and structure to mine. It is 8B while mine is 7B, and is also Instruct like mine is. Additionally, it performs well when generating text, which is an essential baseline
101
  capability of my model. I chose the other comparison model [deepseek-ai/DeepSeek-R1-0528-Qwen3-8B](https://huggingface.co/deepseek-ai/DeepSeek-R1-0528-Qwen3-8B) for similar reasons, the size is approximately the same and it is a version built of off the base model Qwen, just like mine is.
102
  This will allow me to see how well my finetuning performed as compared to other models that use Qwen as a baseline. Overall, my model does not perform better than the baseline model for the testing split, but the high bert scores for
103
+ the testing split of training data still indicate that my model generates accurate text and performs well with my dataset. My model did perform better than the llama model when it came to HumanEval and E2E NLG Challenge,
104
  it also performed better against deepseek's Qwen3 model when it came to E2E NLG Challenge and the testing split. In general, my model has mixed results in its evaluation, but it performs closely to the comparison models.
105
+ Additionally, the actual outputs of the model are coherent and relevant, indicating that while the benchmarks are low, the model still performs its task well.
106
 
107
  # Usage and Intended Use
108