charent
/

Phi2-Chinese-0.2B

Text Generation

text-generation-inference

Model card Files Files and versions

charent commited on Jan 4, 2024

Commit

7700418

·

1 Parent(s): a5f918f

Update README.md

Files changed (1) hide show

README.md +11 -1

README.md CHANGED Viewed

@@ -8,6 +8,15 @@ library_name: transformers
 tags:
 - text-generation-inference
 pipeline_tag: text-generation
 ---
 # Phi2-Chinese-0.2B 从0开始训练自己的Phi2中文小模型
@@ -62,7 +71,8 @@ text = f"##提问:\n{example['instruction']}\n##回答:\n{example['output'][EOS]
 记得添加`EOS`句子结束特殊标记，否则模型`decode`的时候不知道要什么时候停下来。`BOS`句子开始标记可填可不填。
-# 5. 📝dpo偏好优化
 代码：[dpo.ipynb](https://github.com/charent/Phi2-mini-Chinese/blob/main/4.dpo.ipynb)
 根据个人喜好对SFT模型微调，数据集要构造三列`prompt`、`chosen`和 `rejected`，`rejected`这一列有部分数据我是从sft阶段初级模型（比如sft训练4个`epoch`，取0.5个`epoch`检查点的模型）生成，如果生成的`rejected`和`chosen`相似度在0.9以上，则不要这条数据。

 tags:
 - text-generation-inference
 pipeline_tag: text-generation
+widget:
+- text: "##提问:\n感冒了要怎么办？\n##回答:\n"
+  example_title: "感冒了要怎么办？"
+- text: "##提问:\n介绍一下Apple公司\n##回答:\n"
+  example_title: "介绍一下Apple公司"
+- text: "##提问:\n现在外面天气怎么样\n##回答:\n"
+  example_title: "介绍一下Apple公司？"
+- text: "##提问:\n推荐一份可口的午餐\n##回答:\n"
+  example_title: "推荐一份可口的午餐"
 ---
 # Phi2-Chinese-0.2B 从0开始训练自己的Phi2中文小模型
 记得添加`EOS`句子结束特殊标记，否则模型`decode`的时候不知道要什么时候停下来。`BOS`句子开始标记可填可不填。
+# 5. 📝RLHF优化
+本项目使用dpo优化方法
 代码：[dpo.ipynb](https://github.com/charent/Phi2-mini-Chinese/blob/main/4.dpo.ipynb)
 根据个人喜好对SFT模型微调，数据集要构造三列`prompt`、`chosen`和 `rejected`，`rejected`这一列有部分数据我是从sft阶段初级模型（比如sft训练4个`epoch`，取0.5个`epoch`检查点的模型）生成，如果生成的`rejected`和`chosen`相似度在0.9以上，则不要这条数据。