🧠 Ariadne

This is the official model checkpoint for the paper:
Ariadne: A Controllable Framework for Probing and Extending VLM Reasoning Boundaries

🔬 Example


from transformers import AutoModelForImageTextToText, AutoProcessor

MODEL_ID = "..." # path
# Load model and tokenizer
model = AutoModelForImageTextToText.from_pretrained(
    MODEL_ID,
    torch_dtype=torch.bfloat16 if torch.cuda.is_available() else torch.float32,
    device_map="auto",
    low_cpu_mem_usage=True,
)
processor = AutoProcessor.from_pretrained(MODEL_ID)

# Format question example
SYSTEM_PROMPT = "..."
img = None

conversation = [
    {"role": "system", "content": [{"type": "text", "text": SYSTEM_PROMPT}]},
    {
        "role": "user",
        "content": [
            {"type": "image", "image": img},
            {"type": "text", "text": "..."},
        ],
    },
]

# Generate output
prompt_text = processor.apply_chat_template(
    conversation, add_generation_prompt=True, tokenize=False
)
inputs = processor(text=prompt_text, images=img, return_tensors="pt")
with torch.inference_mode():
    gen_out = model.generate(
        **inputs,
        max_new_tokens=256,
        do_sample=False,
        return_dict_in_generate=True,
        output_scores=False,
    )
    sequences = gen_out.sequences

input_len = inputs["input_ids"].shape[1]
gen_ids = sequences[0, input_len:]
resp_text = processor.tokenizer.decode(
    gen_ids, skip_special_tokens=True, clean_up_tokenization_spaces=True
).strip()

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Reinforcement Learning

Model tree for KOKKKOKK/Ariadne

Base model

Qwen/Qwen2.5-VL-7B-Instruct

Finetuned

(936)

this model