LLaVE: Large Language and Vision Embedding Models with Hardness-Weighted Contrastive Learning Paper • 2503.04812 • Published Mar 4, 2025 • 17
From Distillation to Hard Negative Sampling: Making Sparse Neural IR Models More Effective Paper • 2205.04733 • Published May 10, 2022 • 3
view article Article Follow the White Rabbit: Using Embeddings So You Never Get Lost in Translation 2 days ago • 6
Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup Paper • 2101.06983 • Published Jan 18, 2021 • 2
Qwen3 Voice Embedding Collection Standalone ECAPA-TDNN x-vector speaker encoders extracted from Qwen3-TTS. 1024-dim (0.6B) and 2048-dim (1.7B). • 4 items • Updated 3 days ago • 24
view article Article GGML and llama.cpp join HF to ensure the long-term progress of Local AI +4 6 days ago • 428
ColBERT-Zero 🐶 Collection First large-scale fully pre-trained ColBERT model using only public data, outperforming GTE-ModernColBERT and GTE-ModernBERT • 10 items • Updated 6 days ago • 16
view article Article **ColBERT-Zero: To Pre-train Or Not To Pre-train ColBERT models?** 6 days ago • 16
jina-embeddings-v5-text: Task-Targeted Embedding Distillation Paper • 2602.15547 • Published 8 days ago • 21
jina-embeddings-v5-text Collection Our 5th-gen embeddings: two lightweight multilingual models with SOTA performance in retrieval, matching, clustering, and classification. • 27 items • Updated 1 day ago • 31
view article Article LateOn-Code & ColGrep: LightOn unveils state-of-the-art code retrieval models and code search tooling 13 days ago • 46
LateOn-Code 💻 Collection State-of-the-art late interaction code retrieval models • 6 items • Updated 6 days ago • 13