Spaces:
Runtime error
Runtime error
File size: 17,568 Bytes
ebb79f2 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 |
> 📝 Click on the language section to expand / 言語をクリックして展開
## Dataset Configuration
Please create a TOML file for dataset configuration.
Image and video datasets are supported. The configuration file can include multiple datasets, either image or video datasets, with caption text files or metadata JSONL files.
The cache directory must be different for each dataset.
<details>
<summary>日本語</summary>
データセットの設定を行うためのTOMLファイルを作成してください。
画像データセットと動画データセットがサポートされています。設定ファイルには、画像または動画データセットを複数含めることができます。キャプションテキストファイルまたはメタデータJSONLファイルを使用できます。
キャッシュディレクトリは、各データセットごとに異なるディレクトリである必要があります。
</details>
### Sample for Image Dataset with Caption Text Files
```toml
# resolution, caption_extension, batch_size, num_repeats, enable_bucket, bucket_no_upscale should be set in either general or datasets
# otherwise, the default values will be used for each item
# general configurations
[general]
resolution = [960, 544]
caption_extension = ".txt"
batch_size = 1
enable_bucket = true
bucket_no_upscale = false
[[datasets]]
image_directory = "/path/to/image_dir"
cache_directory = "/path/to/cache_directory"
num_repeats = 1 # optional, default is 1. Number of times to repeat the dataset. Useful to balance the multiple datasets with different sizes.
# other datasets can be added here. each dataset can have different configurations
```
`cache_directory` is optional, default is None to use the same directory as the image directory. However, we recommend to set the cache directory to avoid accidental sharing of the cache files between different datasets.
`num_repeats` is also available. It is optional, default is 1 (no repeat). It repeats the images (or videos) that many times to expand the dataset. For example, if `num_repeats = 2` and there are 20 images in the dataset, each image will be duplicated twice (with the same caption) to have a total of 40 images. It is useful to balance the multiple datasets with different sizes.
<details>
<summary>日本語</summary>
`cache_directory` はオプションです。デフォルトは画像ディレクトリと同じディレクトリに設定されます。ただし、異なるデータセット間でキャッシュファイルが共有されるのを防ぐために、明示的に別のキャッシュディレクトリを設定することをお勧めします。
`num_repeats` はオプションで、デフォルトは 1 です(繰り返しなし)。画像(や動画)を、その回数だけ単純に繰り返してデータセットを拡張します。たとえば`num_repeats = 2`としたとき、画像20枚のデータセットなら、各画像が2枚ずつ(同一のキャプションで)計40枚存在した場合と同じになります。異なるデータ数のデータセット間でバランスを取るために使用可能です。
resolution, caption_extension, batch_size, num_repeats, enable_bucket, bucket_no_upscale は general または datasets のどちらかに設定してください。省略時は各項目のデフォルト値が使用されます。
`[[datasets]]`以下を追加することで、他のデータセットを追加できます。各データセットには異なる設定を持てます。
</details>
### Sample for Image Dataset with Metadata JSONL File
```toml
# resolution, batch_size, num_repeats, enable_bucket, bucket_no_upscale should be set in either general or datasets
# caption_extension is not required for metadata jsonl file
# cache_directory is required for each dataset with metadata jsonl file
# general configurations
[general]
resolution = [960, 544]
batch_size = 1
enable_bucket = true
bucket_no_upscale = false
[[datasets]]
image_jsonl_file = "/path/to/metadata.jsonl"
cache_directory = "/path/to/cache_directory" # required for metadata jsonl file
num_repeats = 1 # optional, default is 1. Same as above.
# other datasets can be added here. each dataset can have different configurations
```
JSONL file format for metadata:
```json
{"image_path": "/path/to/image1.jpg", "caption": "A caption for image1"}
{"image_path": "/path/to/image2.jpg", "caption": "A caption for image2"}
```
<details>
<summary>日本語</summary>
resolution, batch_size, num_repeats, enable_bucket, bucket_no_upscale は general または datasets のどちらかに設定してください。省略時は各項目のデフォルト値が使用されます。
metadata jsonl ファイルを使用する場合、caption_extension は必要ありません。また、cache_directory は必須です。
キャプションによるデータセットと同様に、複数のデータセットを追加できます。各データセットには異なる設定を持てます。
</details>
### Sample for Video Dataset with Caption Text Files
```toml
# resolution, caption_extension, target_frames, frame_extraction, frame_stride, frame_sample,
# batch_size, num_repeats, enable_bucket, bucket_no_upscale should be set in either general or datasets
# num_repeats is also available for video dataset, example is not shown here
# general configurations
[general]
resolution = [960, 544]
caption_extension = ".txt"
batch_size = 1
enable_bucket = true
bucket_no_upscale = false
[[datasets]]
video_directory = "/path/to/video_dir"
cache_directory = "/path/to/cache_directory" # recommended to set cache directory
target_frames = [1, 25, 45]
frame_extraction = "head"
# other datasets can be added here. each dataset can have different configurations
```
__In HunyuanVideo and Wan2.1, the number of `target_frames` must be "N*4+1" (N=0,1,2,...).__
<details>
<summary>日本語</summary>
resolution, caption_extension, target_frames, frame_extraction, frame_stride, frame_sample, batch_size, num_repeats, enable_bucket, bucket_no_upscale は general または datasets のどちらかに設定してください。
__HunyuanVideoおよびWan2.1では、target_framesの数値は「N*4+1」である必要があります。__
他の注意事項は画像データセットと同様です。
</details>
### Sample for Video Dataset with Metadata JSONL File
```toml
# resolution, target_frames, frame_extraction, frame_stride, frame_sample,
# batch_size, num_repeats, enable_bucket, bucket_no_upscale should be set in either general or datasets
# caption_extension is not required for metadata jsonl file
# cache_directory is required for each dataset with metadata jsonl file
# general configurations
[general]
resolution = [960, 544]
batch_size = 1
enable_bucket = true
bucket_no_upscale = false
[[datasets]]
video_jsonl_file = "/path/to/metadata.jsonl"
target_frames = [1, 25, 45]
frame_extraction = "head"
cache_directory = "/path/to/cache_directory_head"
# same metadata jsonl file can be used for multiple datasets
[[datasets]]
video_jsonl_file = "/path/to/metadata.jsonl"
target_frames = [1]
frame_stride = 10
cache_directory = "/path/to/cache_directory_stride"
# other datasets can be added here. each dataset can have different configurations
```
JSONL file format for metadata:
```json
{"video_path": "/path/to/video1.mp4", "caption": "A caption for video1"}
{"video_path": "/path/to/video2.mp4", "caption": "A caption for video2"}
```
<details>
<summary>日本語</summary>
resolution, target_frames, frame_extraction, frame_stride, frame_sample, batch_size, num_repeats, enable_bucket, bucket_no_upscale は general または datasets のどちらかに設定してください。
metadata jsonl ファイルを使用する場合、caption_extension は必要ありません。また、cache_directory は必須です。
他の注意事項は今までのデータセットと同様です。
</details>
### frame_extraction Options
- `head`: Extract the first N frames from the video.
- `chunk`: Extract frames by splitting the video into chunks of N frames.
- `slide`: Extract frames from the video with a stride of `frame_stride`.
- `uniform`: Extract `frame_sample` samples uniformly from the video.
For example, consider a video with 40 frames. The following diagrams illustrate each extraction:
<details>
<summary>日本語</summary>
- `head`: 動画から最初のNフレームを抽出します。
- `chunk`: 動画をNフレームずつに分割してフレームを抽出します。
- `slide`: `frame_stride`に指定したフレームごとに動画からNフレームを抽出します。
- `uniform`: 動画から一定間隔で、`frame_sample`個のNフレームを抽出します。
例えば、40フレームの動画を例とした抽出について、以下の図で説明します。
</details>
```
Original Video, 40 frames: x = frame, o = no frame
oooooooooooooooooooooooooooooooooooooooo
head, target_frames = [1, 13, 25] -> extract head frames:
xooooooooooooooooooooooooooooooooooooooo
xxxxxxxxxxxxxooooooooooooooooooooooooooo
xxxxxxxxxxxxxxxxxxxxxxxxxooooooooooooooo
chunk, target_frames = [13, 25] -> extract frames by splitting into chunks, into 13 and 25 frames:
xxxxxxxxxxxxxooooooooooooooooooooooooooo
oooooooooooooxxxxxxxxxxxxxoooooooooooooo
ooooooooooooooooooooooooooxxxxxxxxxxxxxo
xxxxxxxxxxxxxxxxxxxxxxxxxooooooooooooooo
NOTE: Please do not include 1 in target_frames if you are using the frame_extraction "chunk". It will make the all frames to be extracted.
注: frame_extraction "chunk" を使用する場合、target_frames に 1 を含めないでください。全てのフレームが抽出されてしまいます。
slide, target_frames = [1, 13, 25], frame_stride = 10 -> extract N frames with a stride of 10:
xooooooooooooooooooooooooooooooooooooooo
ooooooooooxooooooooooooooooooooooooooooo
ooooooooooooooooooooxooooooooooooooooooo
ooooooooooooooooooooooooooooooxooooooooo
xxxxxxxxxxxxxooooooooooooooooooooooooooo
ooooooooooxxxxxxxxxxxxxooooooooooooooooo
ooooooooooooooooooooxxxxxxxxxxxxxooooooo
xxxxxxxxxxxxxxxxxxxxxxxxxooooooooooooooo
ooooooooooxxxxxxxxxxxxxxxxxxxxxxxxxooooo
uniform, target_frames =[1, 13, 25], frame_sample = 4 -> extract `frame_sample` samples uniformly, N frames each:
xooooooooooooooooooooooooooooooooooooooo
oooooooooooooxoooooooooooooooooooooooooo
oooooooooooooooooooooooooxoooooooooooooo
ooooooooooooooooooooooooooooooooooooooox
xxxxxxxxxxxxxooooooooooooooooooooooooooo
oooooooooxxxxxxxxxxxxxoooooooooooooooooo
ooooooooooooooooooxxxxxxxxxxxxxooooooooo
oooooooooooooooooooooooooooxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxooooooooooooooo
oooooxxxxxxxxxxxxxxxxxxxxxxxxxoooooooooo
ooooooooooxxxxxxxxxxxxxxxxxxxxxxxxxooooo
oooooooooooooooxxxxxxxxxxxxxxxxxxxxxxxxx
```
## Specifications
```toml
# general configurations
[general]
resolution = [960, 544] # optional, [W, H], default is None. This is the default resolution for all datasets
caption_extension = ".txt" # optional, default is None. This is the default caption extension for all datasets
batch_size = 1 # optional, default is 1. This is the default batch size for all datasets
num_repeats = 1 # optional, default is 1. Number of times to repeat the dataset. Useful to balance the multiple datasets with different sizes.
enable_bucket = true # optional, default is false. Enable bucketing for datasets
bucket_no_upscale = false # optional, default is false. Disable upscaling for bucketing. Ignored if enable_bucket is false
### Image Dataset
# sample image dataset with caption text files
[[datasets]]
image_directory = "/path/to/image_dir"
caption_extension = ".txt" # required for caption text files, if general caption extension is not set
resolution = [960, 544] # required if general resolution is not set
batch_size = 4 # optional, overwrite the default batch size
num_repeats = 1 # optional, overwrite the default num_repeats
enable_bucket = false # optional, overwrite the default bucketing setting
bucket_no_upscale = true # optional, overwrite the default bucketing setting
cache_directory = "/path/to/cache_directory" # optional, default is None to use the same directory as the image directory. NOTE: caching is always enabled
# sample image dataset with metadata **jsonl** file
[[datasets]]
image_jsonl_file = "/path/to/metadata.jsonl" # includes pairs of image files and captions
resolution = [960, 544] # required if general resolution is not set
cache_directory = "/path/to/cache_directory" # required for metadata jsonl file
# caption_extension is not required for metadata jsonl file
# batch_size, num_repeats, enable_bucket, bucket_no_upscale are also available for metadata jsonl file
### Video Dataset
# sample video dataset with caption text files
[[datasets]]
video_directory = "/path/to/video_dir"
caption_extension = ".txt" # required for caption text files, if general caption extension is not set
resolution = [960, 544] # required if general resolution is not set
target_frames = [1, 25, 79] # required for video dataset. list of video lengths to extract frames. each element must be N*4+1 (N=0,1,2,...)
# NOTE: Please do not include 1 in target_frames if you are using the frame_extraction "chunk". It will make the all frames to be extracted.
frame_extraction = "head" # optional, "head" or "chunk", "slide", "uniform". Default is "head"
frame_stride = 1 # optional, default is 1, available for "slide" frame extraction
frame_sample = 4 # optional, default is 1 (same as "head"), available for "uniform" frame extraction
# batch_size, num_repeats, enable_bucket, bucket_no_upscale, cache_directory are also available for video dataset
# sample video dataset with metadata jsonl file
[[datasets]]
video_jsonl_file = "/path/to/metadata.jsonl" # includes pairs of video files and captions
target_frames = [1, 79]
cache_directory = "/path/to/cache_directory" # required for metadata jsonl file
# frame_extraction, frame_stride, frame_sample are also available for metadata jsonl file
```
<!--
# sample image dataset with lance
[[datasets]]
image_lance_dataset = "/path/to/lance_dataset"
resolution = [960, 544] # required if general resolution is not set
# batch_size, enable_bucket, bucket_no_upscale, cache_directory are also available for lance dataset
-->
The metadata with .json file will be supported in the near future.
<!--
```toml
# general configurations
[general]
resolution = [960, 544] # optional, [W, H], default is None. This is the default resolution for all datasets
caption_extension = ".txt" # optional, default is None. This is the default caption extension for all datasets
batch_size = 1 # optional, default is 1. This is the default batch size for all datasets
enable_bucket = true # optional, default is false. Enable bucketing for datasets
bucket_no_upscale = false # optional, default is false. Disable upscaling for bucketing. Ignored if enable_bucket is false
# sample image dataset with caption text files
[[datasets]]
image_directory = "/path/to/image_dir"
caption_extension = ".txt" # required for caption text files, if general caption extension is not set
resolution = [960, 544] # required if general resolution is not set
batch_size = 4 # optional, overwrite the default batch size
enable_bucket = false # optional, overwrite the default bucketing setting
bucket_no_upscale = true # optional, overwrite the default bucketing setting
cache_directory = "/path/to/cache_directory" # optional, default is None to use the same directory as the image directory. NOTE: caching is always enabled
# sample image dataset with metadata **jsonl** file
[[datasets]]
image_jsonl_file = "/path/to/metadata.jsonl" # includes pairs of image files and captions
resolution = [960, 544] # required if general resolution is not set
cache_directory = "/path/to/cache_directory" # required for metadata jsonl file
# caption_extension is not required for metadata jsonl file
# batch_size, enable_bucket, bucket_no_upscale are also available for metadata jsonl file
# sample video dataset with caption text files
[[datasets]]
video_directory = "/path/to/video_dir"
caption_extension = ".txt" # required for caption text files, if general caption extension is not set
resolution = [960, 544] # required if general resolution is not set
target_frames = [1, 25, 79] # required for video dataset. list of video lengths to extract frames. each element must be N*4+1 (N=0,1,2,...)
frame_extraction = "head" # optional, "head" or "chunk", "slide", "uniform". Default is "head"
frame_stride = 1 # optional, default is 1, available for "slide" frame extraction
frame_sample = 4 # optional, default is 1 (same as "head"), available for "uniform" frame extraction
# batch_size, enable_bucket, bucket_no_upscale, cache_directory are also available for video dataset
# sample video dataset with metadata jsonl file
[[datasets]]
video_jsonl_file = "/path/to/metadata.jsonl" # includes pairs of video files and captions
target_frames = [1, 79]
cache_directory = "/path/to/cache_directory" # required for metadata jsonl file
# frame_extraction, frame_stride, frame_sample are also available for metadata jsonl file
```
# sample image dataset with lance
[[datasets]]
image_lance_dataset = "/path/to/lance_dataset"
resolution = [960, 544] # required if general resolution is not set
# batch_size, enable_bucket, bucket_no_upscale, cache_directory are also available for lance dataset
The metadata with .json file will be supported in the near future.
--> |