zer0int/CLIP-adversarial-typographic-attack_text-image
Viewer • Updated • 236 • 93 • 3
Love ❤️ this CLIP?
ᐅ Buy me a coffee on Ko-Fi ☕
3PscBrWYvrutXedLmvpcnQbE12Py8qLqMK
The original CLIP model has 77 tokens max input - but only ~20 tokens effective length. See the original Long-CLIP paper for details. HunyuanVideo demo:
69 tokens, normal scene:
52 tokens, OOD (Out-of-Distribution) scene: Superior handling for consistency and prompt-following despite OOD concept.
Base model
BeichenZhang/LongCLIP-L