sammsun's picture

1 3

sammsun

sammsun

·

AI & ML interests

None yet

Recent Activity

reacted to codelion's post with 🔥 1 day ago

Introducing Dhara-70M: A diffusion language model that achieves 3.8x higher throughput than autoregressive models! Key findings from our research on optimal architectures for small language models: → Depth beats width: 32 layers outperforms 12 layers at the same parameter count → Best-in-class factuality: 47.5% on TruthfulQA → 10x training efficiency using WSD (Warmup-Stable-Decay) conversion → Canon layers add only 0.13% parameters but improve reasoning We trained on 1B tokens using the optimal 50-30-20 dataset mix (PDFs + filtered web + educational content), then converted to diffusion with just 100M additional tokens. Blog: https://huggingface.co/blog/codelion/optimal-model-architecture Model: https://huggingface.co/codelion/dhara-70m

liked a model 7 months ago

tencent/Hunyuan-A13B-Instruct

upvoted a paper about 1 year ago

Scaling Laws for Floating Point Quantization Training

View all activity

Organizations

liked a model 7 months ago

tencent/Hunyuan-A13B-Instruct

Text Generation • 80B • Updated Aug 21, 2025 • 8.07k • 679

liked a model about 1 year ago

tencent/Tencent-Hunyuan-Large

Text Generation • Updated Jan 19, 2025 • 178 • 616

liked a model over 1 year ago

Undi95/Meta-Llama-3-8B-hf

Text Generation • 8B • Updated May 10, 2024 • 112 • • 30