Kedreamix
/

Linly-Talker

Diffusers

Safetensors

Model card Files Files and versions

xet

Community

Kedreamix commited on May 2, 2024

Commit

dca1f25

verified ·

1 Parent(s): 7a9f647

Update README.md

Browse files

Files changed (1) hide show

README.md +199 -2

README.md CHANGED Viewed

@@ -16,6 +16,7 @@
 </div>
 **2023.12 更新** 📆
 **用户可以上传任意图片进行对话**
@@ -34,6 +35,12 @@
 - **加入了语音克隆方法GPT-SoVITS模型，能够通过微调一分钟对应人的语料进行克隆，效果还是相当不错的，值得推荐。**
 - **集成一个WebUI界面，能够更好的运行Linly-Talker。**
 ---
 <details>
@@ -93,11 +100,41 @@ Linly-Talker的设计理念是创造一种全新的人机交互方式，不仅
 ![The system architecture of multimodal human–computer interaction.](https://github.com/Kedreamix/Linly-Talker/raw/main/docs/HOI.png)
 > 查看我们的介绍视频 [demo video](https://www.bilibili.com/video/BV1rN4y1a76x/)
 ###### 模型文件和权重，请浏览“模型文件”页面获取。
 **HuggingFace下载**
 如果速度太慢可以考虑镜像，参考[简便快捷获取 Hugging Face 模型（使用镜像站点）](https://kedreamix.github.io/2024/01/05/Note/HuggingFace/?highlight=镜像)
@@ -129,7 +166,7 @@ model_dir = snapshot_download('Kedreamix/Linly-Talker')
 ```bash
 # 移动所有模型到当前目录
 # checkpoint中含有SadTalker和Wav2Lip
-mv Linly-Talker/chechpoints/* ./checkpoints/
 # SadTalker的增强GFPGAN
 # pip install gfpgan
@@ -142,4 +179,164 @@ mv Linly-Talker/GPT_SoVITS/pretrained_models/* ./GPT_SoVITS/pretrained_models/
 mv Linly-Talker/Qwen ./
 ```

 </div>
 **2023.12 更新** 📆
 **用户可以上传任意图片进行对话**
 - **加入了语音克隆方法GPT-SoVITS模型，能够通过微调一分钟对应人的语料进行克隆，效果还是相当不错的，值得推荐。**
 - **集成一个WebUI界面，能够更好的运行Linly-Talker。**
+**2024.04 更新** 📆
+- **更新了除 Edge TTS的 Paddle TTS的离线方式。**
+- **更新了ER-NeRF作为Avatar生成的选择之一。**
+- **更新了app_talk.py，在不基于对话场景可自由上传语音和图片视频生成。**
 ---
 <details>
 ![The system architecture of multimodal human–computer interaction.](https://github.com/Kedreamix/Linly-Talker/raw/main/docs/HOI.png)
 > 查看我们的介绍视频 [demo video](https://www.bilibili.com/video/BV1rN4y1a76x/)
+>
+> 在B站上我录了一系列视频，也代表我更新的每一步与使用方法，详细查看[数字人智能对话系统 - Linly-Talker合集](https://space.bilibili.com/241286257/channel/collectiondetail?sid=2065753)
+>
+> -  [🔥🔥🔥数字人对话系统 Linly-Talker🔥🔥🔥](https://www.bilibili.com/video/BV1rN4y1a76x/)
+> - [🚀数字人的未来：Linly-Talker+GPT-SoVIT语音克隆技术的赋能之道](https://www.bilibili.com/video/BV1S4421A7gh/)
+> - [AutoDL平台部署Linly-Talker (0基础小白超详细教程)](https://www.bilibili.com/video/BV1uT421m74z/)
+> - [Linly-Talker 更新离线TTS集成及定制数字人方案](https://www.bilibili.com/video/BV1Mr421u7NN/)
+## TO DO LIST
+- [x] 基本完成对话系统流程，能够`语音对话`
+- [x] 加入了LLM大模型，包括`Linly`，`Qwen`和`GeminiPro`的使用
+- [x] 可上传`任意数字人照片`进行对话
+- [x] Linly加入`FastAPI`调用方式
+- [x] 利用微软`TTS`加入高级选项，可设置对应人声以及音调等参数，增加声音的多样性
+- [x] 视频生成加入`字幕`，能够更好的进行可视化
+- [x] GPT`多轮对话`系统（提高数字人的交互性和真实感，增强数字人的智能）
+- [x] 优化Gradio界面，加入更多模型，如Wav2Lip，FunASR等
+- [x] `语音克隆`技术，加入GPT-SoVITS，只需要一分钟的语音简单微调即可（语音克隆合成自己声音，提高数字人分身的真实感和互动体验）
+- [x] 加入离线TTS以及NeRF-based的方法和模型
+- [ ] `实时`语音识别（人与数字人之间就可以通过语音进行对话交流)
+🔆 该项目 Linly-Talker 正在进行中 - 欢迎提出PR请求！如果您有任何关于新的模型方法、研究、技术或发现运行错误的建议，请随时编辑并提交 PR。您也可以打开一个问题或通过电子邮件直接联系我。📩⭐ 如果您发现这个Github Project有用，请给它点个星！🤩
+> 如果在部署的时候有任何的问题，可以关注[常见问题汇总.md](https://github.com/Kedreamix/Linly-Talker/blob/main/常见问题汇总.md)部分，我已经整理了可能出现的所有问题，另外交流群也在这里，我会定时更新，感谢大家的关注与使用！！！
 ###### 模型文件和权重，请浏览“模型文件”页面获取。
+接下来还需要安装对应的模型，有以下下载方式，下载后安装文件架结构放置，文件夹结构在本文最后有说明。
+- [Baidu (百度云盘)](https://pan.baidu.com/s/1eF13O-8wyw4B3MtesctQyg?pwd=linl) (Password: `linl`)
+- [huggingface](https://huggingface.co/Kedreamix/Linly-Talker)
+- [modelscope](https://www.modelscope.cn/models/Kedreamix/Linly-Talker/summary)
 **HuggingFace下载**
 如果速度太慢可以考虑镜像，参考[简便快捷获取 Hugging Face 模型（使用镜像站点）](https://kedreamix.github.io/2024/01/05/Note/HuggingFace/?highlight=镜像)
 ```bash
 # 移动所有模型到当前目录
 # checkpoint中含有SadTalker和Wav2Lip
+mv Linly-Talker/checkpoints/* ./checkpoints
 # SadTalker的增强GFPGAN
 # pip install gfpgan
 mv Linly-Talker/Qwen ./
 ```
+为了大家的部署使用方便，更新了一个`configs.py`文件，可以对其进行一些超参数修改即可
+```bash
+# 设备运行端口 (Device running port)
+port = 7860
+# api运行端口及IP (API running port and IP)
+mode = 'api' # api 需要先运行Linly-api-fast.py，暂时仅仅适用于Linly
+# 本地端口localhost:127.0.0.1 全局端口转发:"0.0.0.0"
+ip = '127.0.0.1'
+api_port = 7871
+# L模型路径 (Linly model path)
+mode = 'offline'
+model_path = 'Qwen/Qwen-1_8B-Chat'
+# ssl证书 (SSL certificate) 麦克风对话需要此参数
+# 最好调整为绝对路径
+ssl_certfile = "./https_cert/cert.pem"
+ssl_keyfile = "./https_cert/key.pem"
+```
+## 启动WebUI
+之前我将很多个版���都是分开来的，实际上运行多个会比较麻烦，所以后续我增加了变成WebUI一个界面即可体验，后续也会不断更新
+现在已加入WebUI的功能如下
+- [x] 文本/语音数字人对话（固定数字人，分男女角色）
+- [x] 任意图片数字人对话（可上传任意数字人）
+- [x] 多轮GPT对话（加入历史对话数据，链接上下文）
+- [x] 语音克隆对话（基于GPT-SoVITS设置进行语音克隆，内置烟嗓音，可根据语音对话的声音进行克隆）
+```bash
+# WebUI
+python webui.py
+```
+![](https://github.com/Kedreamix/Linly-Talker/raw/main/docs/WebUI.png)
+现在的启动一共有几种模式，可以选择特定的场景进行设置
+第一种只有固定了人物问答，设置好了人物，省去了预处理时间
+```bash
+python app.py
+```
+![](https://github.com/Kedreamix/Linly-Talker/raw/main/docs/UI.png)
+最近更新了第一种模式，加入了Wav2Lip模型进行对话
+```bash
+python appv2.py
+```
+第二种是可以任意上传图片进行对话
+```bash
+python app_img.py
+```
+![](https://github.com/Kedreamix/Linly-Talker/raw/main/docs/UI2.png)
+第三种是在第一种的基础上加入了大语言模型，加入了多轮的GPT对话
+```bash
+python app_multi.py
+```
+![](https://github.com/Kedreamix/Linly-Talker/raw/main/docs/UI3.png)
+现在加入了语音克隆的部分，可以自由切换自己克隆的声音模型和对应的人图片进行实现，这里我选择了一个烟嗓音和男生图片
+```bash
+python app_vits.py
+```
+加入了第四种方式，不固定场景进行对话，直接输入语音或者生成语音进行数字人生成，内置了Sadtalker，Wav2Lip，ER-NeRF等方式
+> ER-NeRF是针对单独一个人的视频进行训练的，所以需要替换特定的模型才能进行渲染得到正确的结果，内置了Obama的权重，可直接用
+```bash
+python app_talk.py
+```
+![](https://github.com/Kedreamix/Linly-Talker/raw/main/docs/UI4.png)
+## 文件夹结构
+所有的权重部分可以从这下载
+- [Baidu (百度云盘)](https://pan.baidu.com/s/1eF13O-8wyw4B3MtesctQyg?pwd=linl) (Password: `linl`)
+- [huggingface](https://huggingface.co/Kedreamix/Linly-Talker)
+- [modelscope](https://www.modelscope.cn/models/Kedreamix/Linly-Talker/files) comming soon
+权重文件夹结构如下
+```bash
+Linly-Talker/
+├── checkpoints
+│   ├── hub
+│   │   └── checkpoints
+│   │       └── s3fd-619a316812.pth
+│   ├── lipsync_expert.pth
+│   ├── mapping_00109-model.pth.tar
+│   ├── mapping_00229-model.pth.tar
+│   ├── SadTalker_V0.0.2_256.safetensors
+│   ├── visual_quality_disc.pth
+│   ├── wav2lip_gan.pth
+│   └── wav2lip.pth
+├── gfpgan
+│   └── weights
+│       ├── alignment_WFLW_4HG.pth
+│       └── detection_Resnet50_Final.pth
+├── GPT_SoVITS
+│   └── pretrained_models
+│       ├── chinese-hubert-base
+│       │   ├── config.json
+│       │   ├── preprocessor_config.json
+│       │   └── pytorch_model.bin
+│       ├── chinese-roberta-wwm-ext-large
+│       │   ├── config.json
+│       │   ├── pytorch_model.bin
+│       │   └── tokenizer.json
+│       ├── README.md
+│       ├── s1bert25hz-2kh-longer-epoch=68e-step=50232.ckpt
+│       ├── s2D488k.pth
+│       ├── s2G488k.pth
+│       └── speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch
+├── Qwen
+│   └── Qwen-1_8B-Chat
+│       ├── assets
+│       │   ├── logo.jpg
+│       │   ├── qwen_tokenizer.png
+│       │   ├── react_showcase_001.png
+│       │   ├── react_showcase_002.png
+│       │   └── wechat.png
+│       ├── cache_autogptq_cuda_256.cpp
+│       ├── cache_autogptq_cuda_kernel_256.cu
+│       ├── config.json
+│       ├── configuration_qwen.py
+│       ├── cpp_kernels.py
+│       ├── examples
+│       │   └── react_prompt.md
+│       ├── generation_config.json
+│       ├── LICENSE
+│       ├── model-00001-of-00002.safetensors
+│       ├── model-00002-of-00002.safetensors
+│       ├── modeling_qwen.py
+│       ├── model.safetensors.index.json
+│       ├── NOTICE
+│       ├── qwen_generation_utils.py
+│       ├── qwen.tiktoken
+│       ├── README.md
+│       ├── tokenization_qwen.py
+│       └── tokenizer_config.json
+└── README.md
+```