Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
1
3
sammsun
sammsun
Follow
21world's profile picture
kangzhanhui's profile picture
2 followers
·
4 following
AI & ML interests
None yet
Recent Activity
reacted
to
codelion
's
post
with 🔥
1 day ago
Introducing Dhara-70M: A diffusion language model that achieves 3.8x higher throughput than autoregressive models! Key findings from our research on optimal architectures for small language models: → Depth beats width: 32 layers outperforms 12 layers at the same parameter count → Best-in-class factuality: 47.5% on TruthfulQA → 10x training efficiency using WSD (Warmup-Stable-Decay) conversion → Canon layers add only 0.13% parameters but improve reasoning We trained on 1B tokens using the optimal 50-30-20 dataset mix (PDFs + filtered web + educational content), then converted to diffusion with just 100M additional tokens. Blog: https://huggingface.co/blog/codelion/optimal-model-architecture Model: https://huggingface.co/codelion/dhara-70m
liked
a model
7 months ago
tencent/Hunyuan-A13B-Instruct
upvoted
a
paper
about 1 year ago
Scaling Laws for Floating Point Quantization Training
View all activity
Organizations
sammsun
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
Articles
liked
a model
7 months ago
tencent/Hunyuan-A13B-Instruct
Text Generation
•
80B
•
Updated
Aug 21, 2025
•
8.07k
•
679
liked
a model
about 1 year ago
tencent/Tencent-Hunyuan-Large
Text Generation
•
Updated
Jan 19, 2025
•
178
•
616
liked
a model
over 1 year ago
Undi95/Meta-Llama-3-8B-hf
Text Generation
•
8B
•
Updated
May 10, 2024
•
112
•
•
30