π§ Ariadne
This is the official model checkpoint for the paper:
Ariadne: A Controllable Framework for Probing and Extending VLM Reasoning Boundaries
π¬ Example
from transformers import AutoModelForImageTextToText, AutoProcessor
MODEL_ID = "..." # path
# Load model and tokenizer
model = AutoModelForImageTextToText.from_pretrained(
MODEL_ID,
torch_dtype=torch.bfloat16 if torch.cuda.is_available() else torch.float32,
device_map="auto",
low_cpu_mem_usage=True,
)
processor = AutoProcessor.from_pretrained(MODEL_ID)
# Format question example
SYSTEM_PROMPT = "..."
img = None
conversation = [
{"role": "system", "content": [{"type": "text", "text": SYSTEM_PROMPT}]},
{
"role": "user",
"content": [
{"type": "image", "image": img},
{"type": "text", "text": "..."},
],
},
]
# Generate output
prompt_text = processor.apply_chat_template(
conversation, add_generation_prompt=True, tokenize=False
)
inputs = processor(text=prompt_text, images=img, return_tensors="pt")
with torch.inference_mode():
gen_out = model.generate(
**inputs,
max_new_tokens=256,
do_sample=False,
return_dict_in_generate=True,
output_scores=False,
)
sequences = gen_out.sequences
input_len = inputs["input_ids"].shape[1]
gen_ids = sequences[0, input_len:]
resp_text = processor.tokenizer.decode(
gen_ids, skip_special_tokens=True, clean_up_tokenization_spaces=True
).strip()
Model tree for KOKKKOKK/Ariadne
Base model
Qwen/Qwen2.5-VL-7B-Instruct