kaizuberbuehler 's Collections LM Training
updated
Rho-1: Not All Tokens Are What You Need
Paper
• 2404.07965
• Published
• 94
VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time
Paper
• 2404.10667
• Published
• 24
Instruction-tuned Language Models are Better Knowledge Learners
Paper
• 2402.12847
• Published
• 26
DoRA: Weight-Decomposed Low-Rank Adaptation
Paper
• 2402.09353
• Published
• 32
QLoRA: Efficient Finetuning of Quantized LLMs
Paper
• 2305.14314
• Published
• 59
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper
• 2403.03507
• Published
• 189
Reverse Training to Nurse the Reversal Curse
Paper
• 2403.13799
• Published
• 13
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective
Depth Up-Scaling
Paper
• 2312.15166
• Published
• 61
ReFT: Representation Finetuning for Language Models
Paper
• 2404.03592
• Published
• 101
Direct Nash Optimization: Teaching Language Models to Self-Improve with
General Preferences
Paper
• 2404.03715
• Published
• 62
Learn Your Reference Model for Real Good Alignment
Paper
• 2404.09656
• Published
• 90
Megalodon: Efficient LLM Pretraining and Inference with Unlimited
Context Length
Paper
• 2404.08801
• Published
• 66
Pre-training Small Base LMs with Fewer Tokens
Paper
• 2404.08634
• Published
• 36
JetMoE: Reaching Llama2 Performance with 0.1M Dollars
Paper
• 2404.07413
• Published
• 38
MiniCPM: Unveiling the Potential of Small Language Models with Scalable
Training Strategies
Paper
• 2404.06395
• Published
• 24
SambaLingo: Teaching Large Language Models New Languages
Paper
• 2404.05829
• Published
• 13
Advancing LLM Reasoning Generalists with Preference Trees
Paper
• 2404.02078
• Published
• 46
Poro 34B and the Blessing of Multilinguality
Paper
• 2404.01856
• Published
• 15
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your
Phone
Paper
• 2404.14219
• Published
• 259
The Instruction Hierarchy: Training LLMs to Prioritize Privileged
Instructions
Paper
• 2404.13208
• Published
• 40
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence
Paper
• 2404.05892
• Published
• 40
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper
• 2312.00752
• Published
• 150
OpenELM: An Efficient Language Model Family with Open-source Training
and Inference Framework
Paper
• 2404.14619
• Published
• 126
Jamba: A Hybrid Transformer-Mamba Language Model
Paper
• 2403.19887
• Published
• 112
Make Your LLM Fully Utilize the Context
Paper
• 2404.16811
• Published
• 55
Tele-FLM Technical Report
Paper
• 2404.16645
• Published
• 18
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video
Dense Captioning
Paper
• 2404.16994
• Published
• 37
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report
Paper
• 2405.00732
• Published
• 122
Iterative Reasoning Preference Optimization
Paper
• 2404.19733
• Published
• 49
What matters when building vision-language models?
Paper
• 2405.02246
• Published
• 103
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
Paper
• 2405.12130
• Published
• 50
Your Transformer is Secretly Linear
Paper
• 2405.12250
• Published
• 157
Perplexed by Perplexity: Perplexity-Based Data Pruning With Small
Reference Models
Paper
• 2405.20541
• Published
• 24
How Do Large Language Models Acquire Factual Knowledge During
Pretraining?
Paper
• 2406.11813
• Published
• 31
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs
with Nothing
Paper
• 2406.08464
• Published
• 72
The Llama 3 Herd of Models
Paper
• 2407.21783
• Published
• 117
Gemma 2: Improving Open Language Models at a Practical Size
Paper
• 2408.00118
• Published
• 78
MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware
Experts
Paper
• 2407.21770
• Published
• 22
LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs
Paper
• 2408.07055
• Published
• 68
Data curation via joint example selection further accelerates multimodal
learning
Paper
• 2406.17711
• Published
• 3
Jamba-1.5: Hybrid Transformer-Mamba Models at Scale
Paper
• 2408.12570
• Published
• 32
OLMoE: Open Mixture-of-Experts Language Models
Paper
• 2409.02060
• Published
• 80
Training Language Models to Self-Correct via Reinforcement Learning
Paper
• 2409.12917
• Published
• 140
GRIN: GRadient-INformed MoE
Paper
• 2409.12136
• Published
• 16
Preference Tuning with Human Feedback on Language, Speech, and Vision
Tasks: A Survey
Paper
• 2409.11564
• Published
• 20
NVLM: Open Frontier-Class Multimodal LLMs
Paper
• 2409.11402
• Published
• 74
Instruction Following without Instruction Tuning
Paper
• 2409.14254
• Published
• 29
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art
Multimodal Models
Paper
• 2409.17146
• Published
• 121
Programming Every Example: Lifting Pre-training Data Quality like
Experts at Scale
Paper
• 2409.17115
• Published
• 64
Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from
Disparate Training Data
Paper
• 2406.14546
• Published
• 3
Thinking LLMs: General Instruction Following with Thought Generation
Paper
• 2410.10630
• Published
• 20
Paper
• 2412.08905
• Published
• 122
Offline Reinforcement Learning for LLM Multi-Step Reasoning
Paper
• 2412.16145
• Published
• 38
RobustFT: Robust Supervised Fine-tuning for Large Language Models under
Noisy Response
Paper
• 2412.14922
• Published
• 88
Diving into Self-Evolving Training for Multimodal Reasoning
Paper
• 2412.17451
• Published
• 42
Paper
• 2412.16720
• Published
• 37
TÜLU 3: Pushing Frontiers in Open Language Model Post-Training
Paper
• 2411.15124
• Published
• 67
Natural Language Reinforcement Learning
Paper
• 2411.14251
• Published
• 31
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models
Paper
• 2411.04905
• Published
• 127
Mixture-of-Transformers: A Sparse and Scalable Architecture for
Multi-Modal Foundation Models
Paper
• 2411.04996
• Published
• 50
2.5 Years in Class: A Multimodal Textbook for Vision-Language
Pretraining
Paper
• 2501.00958
• Published
• 109
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta
Chain-of-Though
Paper
• 2501.04682
• Published
• 99
Scaling Laws for Floating Point Quantization Training
Paper
• 2501.02423
• Published
• 26
Virgo: A Preliminary Exploration on Reproducing o1-like MLLM
Paper
• 2501.01904
• Published
• 33
O1 Replication Journey -- Part 3: Inference-time Scaling for Medical
Reasoning
Paper
• 2501.06458
• Published
• 31
Enhancing Human-Like Responses in Large Language Models
Paper
• 2501.05032
• Published
• 61
Do generative video models learn physical principles from watching
videos?
Paper
• 2501.09038
• Published
• 34
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via
Reinforcement Learning
Paper
• 2501.12948
• Published
• 440
Kimi k1.5: Scaling Reinforcement Learning with LLMs
Paper
• 2501.12599
• Published
• 126
Demystifying Long Chain-of-Thought Reasoning in LLMs
Paper
• 2502.03373
• Published
• 58
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model
Post-training
Paper
• 2501.17161
• Published
• 124
Test-Time Preference Optimization: On-the-Fly Alignment via Iterative
Textual Feedback
Paper
• 2501.12895
• Published
• 61
Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary
Feedback
Paper
• 2501.10799
• Published
• 15
Qwen2.5-1M Technical Report
Paper
• 2501.15383
• Published
• 72
Baichuan-Omni-1.5 Technical Report
Paper
• 2501.15368
• Published
• 60
Optimizing Large Language Model Training Using FP4 Quantization
Paper
• 2501.17116
• Published
• 36
Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling
Paper
• 2501.16975
• Published
• 32
Critique Fine-Tuning: Learning to Critique is More Effective than
Learning to Imitate
Paper
• 2501.17703
• Published
• 59
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model
Paper
• 2502.02737
• Published
• 255
LIMO: Less is More for Reasoning
Paper
• 2502.03387
• Published
• 62
QuEST: Stable Training of LLMs with 1-Bit Weights and Activations
Paper
• 2502.05003
• Published
• 44
CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference
Paper
• 2502.04416
• Published
• 12
Scaling Pre-training to One Hundred Billion Data for Vision Language
Models
Paper
• 2502.07617
• Published
• 29
Gemstones: A Model Suite for Multi-Faceted Scaling Laws
Paper
• 2502.06857
• Published
• 24
Typhoon T1: An Open Thai Reasoning Model
Paper
• 2502.09042
• Published
• 16
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse
Attention
Paper
• 2502.11089
• Published
• 168
How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM?
Paper
• 2502.14502
• Published
• 91
Continuous Diffusion Model for Language Modeling
Paper
• 2502.11564
• Published
• 53
Train Small, Infer Large: Memory-Efficient LoRA Training for Large
Language Models
Paper
• 2502.13533
• Published
• 13
LongRoPE2: Near-Lossless LLM Context Window Scaling
Paper
• 2502.20082
• Published
• 36
Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam
Paper
• 2502.17055
• Published
• 20
Visual-RFT: Visual Reinforcement Fine-Tuning
Paper
• 2503.01785
• Published
• 86
Predictive Data Selection: The Data That Predicts Is the Data That
Teaches
Paper
• 2503.00808
• Published
• 56
Gemini Robotics: Bringing AI into the Physical World
Paper
• 2503.20020
• Published
• 31
Large-Scale Data Selection for Instruction Tuning
Paper
• 2503.01807
• Published
• 14
Unified Reward Model for Multimodal Understanding and Generation
Paper
• 2503.05236
• Published
• 123
Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and
Beyond
Paper
• 2503.10460
• Published
• 30
TTRL: Test-Time Reinforcement Learning
Paper
• 2504.16084
• Published
• 120
Learning from Failures in Multi-Attempt Reinforcement Learning
Paper
• 2503.04808
• Published
• 18
TinyR1-32B-Preview: Boosting Accuracy with Branch-Merge Distillation
Paper
• 2503.04872
• Published
• 15
Self-Taught Self-Correction for Small Language Models
Paper
• 2503.08681
• Published
• 15
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning
Paper
• 2503.15558
• Published
• 50
SkyLadder: Better and Faster Pretraining via Context Window Scheduling
Paper
• 2503.15450
• Published
• 12
Paper
• 2503.19786
• Published
• 55
Modifying Large Language Model Post-Training for Diverse Creative
Writing
Paper
• 2503.17126
• Published
• 36
FastCuRL: Curriculum Reinforcement Learning with Progressive Context
Extension for Efficient Training R1-like Reasoning Models
Paper
• 2503.17287
• Published
• 11
ZClip: Adaptive Spike Mitigation for LLM Pre-Training
Paper
• 2504.02507
• Published
• 88
RL Tango: Reinforcing Generator and Verifier Together for Language
Reasoning
Paper
• 2505.15034
• Published
• 5
Improved Visual-Spatial Reasoning via R1-Zero-Like Training
Paper
• 2504.00883
• Published
• 67
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement
Learning on the Base Model
Paper
• 2503.24290
• Published
• 62
JudgeLRM: Large Reasoning Models as a Judge
Paper
• 2504.00050
• Published
• 62
Inference-Time Scaling for Generalist Reward Modeling
Paper
• 2504.02495
• Published
• 58
Understanding R1-Zero-Like Training: A Critical Perspective
Paper
• 2503.20783
• Published
• 59
Exploring Data Scaling Trends and Effects in Reinforcement Learning from
Human Feedback
Paper
• 2503.22230
• Published
• 45
Unicorn: Text-Only Data Synthesis for Vision Language Model Training
Paper
• 2503.22655
• Published
• 39
Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal
LLMs on Academic Resources
Paper
• 2504.00595
• Published
• 37
Rethinking RL Scaling for Vision Language Models: A Transparent,
From-Scratch Framework and Comprehensive Evaluation Scheme
Paper
• 2504.02587
• Published
• 32
Scaling Analysis of Interleaved Speech-Text Language Models
Paper
• 2504.02398
• Published
• 31
Scaling Language-Free Visual Representation Learning
Paper
• 2504.01017
• Published
• 32
RIG: Synergizing Reasoning and Imagination in End-to-End Generalist
Policy
Paper
• 2503.24388
• Published
• 29
Z1: Efficient Test-time Scaling with Code
Paper
• 2504.00810
• Published
• 26
Command A: An Enterprise-Ready Large Language Model
Paper
• 2504.00698
• Published
• 29
Expanding RL with Verifiable Rewards Across Diverse Domains
Paper
• 2503.23829
• Published
• 24
ActionStudio: A Lightweight Framework for Data and Training of Large
Action Models
Paper
• 2503.22673
• Published
• 12
Reasoning-SQL: Reinforcement Learning with SQL Tailored Partial Rewards
for Reasoning-Enhanced Text-to-SQL
Paper
• 2503.23157
• Published
• 10
Paper
• 2504.07491
• Published
• 137
Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought
Paper
• 2504.05599
• Published
• 85
Rethinking Reflection in Pre-Training
Paper
• 2504.04022
• Published
• 80
OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training
Tokens
Paper
• 2504.07096
• Published
• 77
Scaling Laws for Native Multimodal Models Scaling Laws for Native
Multimodal Models
Paper
• 2504.07951
• Published
• 30
VAPO: Efficient and Reliable Reinforcement Learning for Advanced
Reasoning Tasks
Paper
• 2504.05118
• Published
• 26
A Sober Look at Progress in Language Model Reasoning: Pitfalls and Paths
to Reproducibility
Paper
• 2504.07086
• Published
• 21
SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual
Reasoning Self-Improvement
Paper
• 2504.07934
• Published
• 21
VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement
Fine-Tuning
Paper
• 2504.06958
• Published
• 13
Efficient Reinforcement Finetuning via Adaptive Curriculum Learning
Paper
• 2504.05520
• Published
• 11
InternVL3: Exploring Advanced Training and Test-Time Recipes for
Open-Source Multimodal Models
Paper
• 2504.10479
• Published
• 306
CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for
Language Model Pre-training
Paper
• 2504.13161
• Published
• 93
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations
Paper
• 2504.10481
• Published
• 85
BitNet b1.58 2B4T Technical Report
Paper
• 2504.12285
• Published
• 83
Genius: A Generalizable and Purely Unsupervised Self-Training Framework
For Advanced Reasoning
Paper
• 2504.08672
• Published
• 55
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models
with Reinforcement Learning
Paper
• 2504.08837
• Published
• 43
Heimdall: test-time scaling on the generative verification
Paper
• 2504.10337
• Published
• 33
SFT or RL? An Early Investigation into Training R1-Like Reasoning Large
Vision-Language Models
Paper
• 2504.11468
• Published
• 30
NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation
Paper
• 2504.13055
• Published
• 19
A Minimalist Approach to LLM Reasoning: from Rejection Sampling to
Reinforce
Paper
• 2504.11343
• Published
• 19
DUMP: Automated Distribution-Level Curriculum Learning for RL-based LLM
Post-training
Paper
• 2504.09710
• Published
• 19
DataDecide: How to Predict Best Pretraining Data with Small Experiments
Paper
• 2504.11393
• Published
• 18
Breaking the Data Barrier -- Building GUI Agents Through Task
Generalization
Paper
• 2504.10127
• Published
• 17
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
Paper
• 2504.10449
• Published
• 15
SimpleAR: Pushing the Frontier of Autoregressive Visual Generation
through Pretraining, SFT, and RL
Paper
• 2504.11455
• Published
• 14
Efficient Process Reward Model Training via Active Learning
Paper
• 2504.10559
• Published
• 13
Exploring Expert Failures Improves LLM Agent Tuning
Paper
• 2504.13145
• Published
• 12
Eagle 2.5: Boosting Long-Context Post-Training for Frontier
Vision-Language Models
Paper
• 2504.15271
• Published
• 67
ToolRL: Reward is All Tool Learning Needs
Paper
• 2504.13958
• Published
• 49
OTC: Optimal Tool Calls via Reinforcement Learning
Paper
• 2504.14870
• Published
• 35
QuaDMix: Quality-Diversity Balanced Data Selection for Efficient LLM
Pretraining
Paper
• 2504.16511
• Published
• 22
LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making
Abilities
Paper
• 2504.16078
• Published
• 21
Efficient Pretraining Length Scaling
Paper
• 2504.14992
• Published
• 20