--- license: apache-2.0 tags: - rag - clara - qwen base_model: Qwen/Qwen3-4B-Instruct-2507 library_name: transformers --- # CLaRa: Continuous Latent Reasoning Agent This is a trained CLaRa model based on Qwen/Qwen3-4B-Instruct-2507. ## Model Description CLaRa (Bridging Retrieval and Generation with Continuous Latent Reasoning) is a system that unifies retrieval and generation into a shared continuous space. ## Usage Since this model relies on custom modeling code, you must use `trust_remote_code=True`. ```python from transformers import AutoModel, AutoTokenizer import torch model_id = "bagpipejerry/clara-qwen3-4b" # Load model # Note: device_map="auto" is recommended for large models model = AutoModel.from_pretrained(model_id, trust_remote_code=True, device_map="auto") # Load tokenizer tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True) # Example 1: Single Document QA question = "What is CLaRa?" documents = [["CLaRa is a framework that bridges retrieval and generation with continuous latent reasoning."]] outputs = model.generate_from_text( questions=[question], documents=documents, tokenizer=tokenizer, max_new_tokens=100 ) print(outputs[0]) # Example 2: Multi-Document Retrieval & Reranking (Stage 3 capability) print("\n" + "="*60) print("📝 Example 2: Multi-Document Retrieval & Reranking") print("="*60) test_questions_multi = ["How does CLaRa compress documents?"] test_documents_multi = [[ "CLaRa uses a compression pretraining approach called KPCP framework with QA pairs and paraphrases.", "The compressor is trained to compress documents into continuous latent representations while retaining key semantics.", "CLaRa achieves 32x-64x compression rates through its three-stage training pipeline.", "The framework uses DeepSpeed ZeRO-2 for distributed training across multiple GPUs.", ]] outputs_multi, topk_indices = model.generate_from_questions( questions=test_questions_multi, documents=test_documents_multi, tokenizer=tokenizer, max_new_tokens=100, temperature=0.7, ) print(f"Question: {test_questions_multi[0]}") print(f"Candidate Documents: {len(test_documents_multi[0])} docs") print(f"Top-K Selected Indices: {topk_indices[0].tolist() if torch.is_tensor(topk_indices[0]) else topk_indices[0]}") print(f"CLaRa Response: {outputs_multi[0]}") ``` ## Citation If you use this model, please cite the CLaRa paper/project.