zai-org
/

cogvlm2-llama3-caption

Video-Text-to-Text

text-generation

Model card Files Files and versions

ShiyuHuang commited on Sep 19, 2024

Commit

d8db667

·

verified ·

1 Parent(s): c307c31

Update README.md

Files changed (1) hide show

README.md +10 -5

README.md CHANGED Viewed

@@ -13,23 +13,30 @@ inference: false
 # CogVLM2-Llama3-Caption
 <div align="center">
-<img src=https://raw.githubusercontent.com/THUDM/CogVLM2/cf9cb3c60a871e0c8e5bde7feaf642e3021153e6/resources/logo.svg>
 </div>
 # Introduction
 Typically, most video data does not come with corresponding descriptive text, so it is necessary to convert the video
-data into textual descriptions to provide the essential training data for text-to-video models.
 ## Usage
 ```python
 import io
 import numpy as np
 import torch
 from decord import cpu, VideoReader, bridge
 from transformers import AutoModelForCausalLM, AutoTokenizer
-import argparse
 MODEL_PATH = "THUDM/cogvlm2-llama3-caption"
@@ -77,7 +84,6 @@ def load_video(video_data, strategy='chat'):
 tokenizer = AutoTokenizer.from_pretrained(
     MODEL_PATH,
     trust_remote_code=True,
-    # padding_side="left"
 )
 model = AutoModelForCausalLM.from_pretrained(
@@ -132,7 +138,6 @@ def test():
 if __name__ == '__main__':
     test()
 ```
 ## License

 # CogVLM2-Llama3-Caption
 <div align="center">
+    <img src=https://raw.githubusercontent.com/THUDM/CogVLM2/cf9cb3c60a871e0c8e5bde7feaf642e3021153e6/resources/logo.svg>
 </div>
 # Introduction
 Typically, most video data does not come with corresponding descriptive text, so it is necessary to convert the video
+data into textual descriptions to provide the essential training data for text-to-video models.
+CogVLM2-Caption is a video captioning model used to generate training data for the CogVideoX model.
+<div align="center">
+    <img width="600px" height="auto" src="./CogVLM2-Caption-example.png">
+</div>
 ## Usage
 ```python
 import io
+import argparse
 import numpy as np
 import torch
 from decord import cpu, VideoReader, bridge
 from transformers import AutoModelForCausalLM, AutoTokenizer
 MODEL_PATH = "THUDM/cogvlm2-llama3-caption"
 tokenizer = AutoTokenizer.from_pretrained(
     MODEL_PATH,
     trust_remote_code=True,
 )
 model = AutoModelForCausalLM.from_pretrained(
 if __name__ == '__main__':
     test()
 ```
 ## License