CLaRa: Continuous Latent Reasoning Agent
This is a trained CLaRa model based on Qwen/Qwen3-4B-Instruct-2507.
Model Description
CLaRa (Bridging Retrieval and Generation with Continuous Latent Reasoning) is a system that unifies retrieval and generation into a shared continuous space.
Usage
Since this model relies on custom modeling code, you must use trust_remote_code=True.
from transformers import AutoModel, AutoTokenizer
import torch
model_id = "bagpipejerry/clara-qwen3-4b"
# Load model
# Note: device_map="auto" is recommended for large models
model = AutoModel.from_pretrained(model_id, trust_remote_code=True, device_map="auto")
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
# Example 1: Single Document QA
question = "What is CLaRa?"
documents = [["CLaRa is a framework that bridges retrieval and generation with continuous latent reasoning."]]
outputs = model.generate_from_text(
questions=[question],
documents=documents,
tokenizer=tokenizer,
max_new_tokens=100
)
print(outputs[0])
# Example 2: Multi-Document Retrieval & Reranking (Stage 3 capability)
print("\n" + "="*60)
print("๐ Example 2: Multi-Document Retrieval & Reranking")
print("="*60)
test_questions_multi = ["How does CLaRa compress documents?"]
test_documents_multi = [[
"CLaRa uses a compression pretraining approach called KPCP framework with QA pairs and paraphrases.",
"The compressor is trained to compress documents into continuous latent representations while retaining key semantics.",
"CLaRa achieves 32x-64x compression rates through its three-stage training pipeline.",
"The framework uses DeepSpeed ZeRO-2 for distributed training across multiple GPUs.",
]]
outputs_multi, topk_indices = model.generate_from_questions(
questions=test_questions_multi,
documents=test_documents_multi,
tokenizer=tokenizer,
max_new_tokens=100,
temperature=0.7,
)
print(f"Question: {test_questions_multi[0]}")
print(f"Candidate Documents: {len(test_documents_multi[0])} docs")
print(f"Top-K Selected Indices: {topk_indices[0].tolist() if torch.is_tensor(topk_indices[0]) else topk_indices[0]}")
print(f"CLaRa Response: {outputs_multi[0]}")
Citation
If you use this model, please cite the CLaRa paper/project.
- Downloads last month
- 25
Model tree for bagpipejerry/clara-qwen3-4b
Base model
Qwen/Qwen3-4B-Instruct-2507