-
Chain-of-Thought Reasoning Without Prompting
Paper β’ 2402.10200 β’ Published β’ 109 -
Large Language Models Cannot Self-Correct Reasoning Yet
Paper β’ 2310.01798 β’ Published β’ 36 -
Premise Order Matters in Reasoning with Large Language Models
Paper β’ 2402.08939 β’ Published β’ 28 -
Chain of Thought Empowers Transformers to Solve Inherently Serial Problems
Paper β’ 2402.12875 β’ Published β’ 13
Omar Sanseviero
osanseviero
AI & ML interests
Llamas, model merging, massive ASR for data collection, 3D ML, on-device ML, quantization, model judging, ML in browser, healthcare applications, education, intersection of art and ML.π¦
Recent Activity
new activity
about 1 month ago
google/medgemma-4b-it:Fix model parameter count
liked
a model
about 2 months ago
Qwen/Qwen3-4B-SafeRL
liked
a Space
2 months ago
multimodalart/nano-banana
Organizations
MoEs papers reading list
-
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
Paper β’ 1701.06538 β’ Published β’ 7 -
Sparse Networks from Scratch: Faster Training without Losing Performance
Paper β’ 1907.04840 β’ Published β’ 3 -
ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
Paper β’ 1910.02054 β’ Published β’ 7 -
A Mixture of h-1 Heads is Better than h Heads
Paper β’ 2005.06537 β’ Published β’ 2
OS Week Highlights - Oct 16 - 22
OS Week Highlights - Oct 2 - 8
OS Week Highlights - Sept 18 - 24
-
Running on ZeroFeatured5.32k
IllusionDiffusion
π5.32kGenerate stunning high quality illusion artwork
-
Runtime errorFeatured2.77k
XTTS
πΈ2.77kGenerate speech from text using a reference voice
-
PausedFeatured73
Nougat Transformers
π«73Convert PDFs to markup language using OCR
-
monster-labs/control_v1p_sd15_qrcode_monster
Updated β’ 26.1k β’ 1.43k
Mistral Instruct Merges
Merge of Mistral Instruct 1 and 2 using different mergekit techniques
Instruction Pre-Training
-
Instruction Pre-Training: Language Models are Supervised Multitask Learners
Paper β’ 2406.14491 β’ Published β’ 95 -
Runtime errorFeatured86
Instruction Synthesizer
π86Generate instruction-response pairs from text
-
instruction-pretrain/InstructLM-1.3B
Text Generation β’ 1B β’ Updated β’ 49 β’ 43 -
instruction-pretrain/InstructLM-500M
Text Generation β’ 0.6B β’ Updated β’ 57 β’ 34
Model Merging
Model Merging is a very popular technique nowadays in LLM. Here is a chronological list of papers on the space that will help you get started with it!
-
Qualitatively characterizing neural network optimization problems
Paper β’ 1412.6544 β’ Published β’ 4 -
Convergent Learning: Do different neural networks learn the same representations?
Paper β’ 1511.07543 β’ Published β’ 2 -
Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models
Paper β’ 1909.11299 β’ Published β’ 2 -
Model Fusion via Optimal Transport
Paper β’ 1910.05653 β’ Published β’ 1
ML for Tools
Collection of papers about ML for using tools!
-
Internet-Augmented Dialogue Generation
Paper β’ 2107.07566 β’ Published β’ 2 -
Multi-hop Question Answering via Reasoning Chains
Paper β’ 1910.02610 β’ Published β’ 2 -
LaMDA: Language Models for Dialog Applications
Paper β’ 2201.08239 β’ Published β’ 5 -
WebGPT: Browser-assisted question-answering with human feedback
Paper β’ 2112.09332 β’ Published β’ 2
OS Week Highlights - Oct 9 - 15
OS Week Highlights - Sept 25 - Oct 1
Historical - Spaces of the Week
All Spaces of the Week...from all weeks
Papers I want to read
Papers in my to-read list
-
RLHF Workflow: From Reward Modeling to Online RLHF
Paper β’ 2405.07863 β’ Published β’ 71 -
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Paper β’ 2405.09818 β’ Published β’ 132 -
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
Paper β’ 2405.15574 β’ Published β’ 55 -
An Introduction to Vision-Language Modeling
Paper β’ 2405.17247 β’ Published β’ 90
Papers I've read
-
Chain-of-Thought Reasoning Without Prompting
Paper β’ 2402.10200 β’ Published β’ 109 -
Large Language Models Cannot Self-Correct Reasoning Yet
Paper β’ 2310.01798 β’ Published β’ 36 -
Premise Order Matters in Reasoning with Large Language Models
Paper β’ 2402.08939 β’ Published β’ 28 -
Chain of Thought Empowers Transformers to Solve Inherently Serial Problems
Paper β’ 2402.12875 β’ Published β’ 13
Model Merging
Model Merging is a very popular technique nowadays in LLM. Here is a chronological list of papers on the space that will help you get started with it!
-
Qualitatively characterizing neural network optimization problems
Paper β’ 1412.6544 β’ Published β’ 4 -
Convergent Learning: Do different neural networks learn the same representations?
Paper β’ 1511.07543 β’ Published β’ 2 -
Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models
Paper β’ 1909.11299 β’ Published β’ 2 -
Model Fusion via Optimal Transport
Paper β’ 1910.05653 β’ Published β’ 1
MoEs papers reading list
-
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
Paper β’ 1701.06538 β’ Published β’ 7 -
Sparse Networks from Scratch: Faster Training without Losing Performance
Paper β’ 1907.04840 β’ Published β’ 3 -
ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
Paper β’ 1910.02054 β’ Published β’ 7 -
A Mixture of h-1 Heads is Better than h Heads
Paper β’ 2005.06537 β’ Published β’ 2
ML for Tools
Collection of papers about ML for using tools!
-
Internet-Augmented Dialogue Generation
Paper β’ 2107.07566 β’ Published β’ 2 -
Multi-hop Question Answering via Reasoning Chains
Paper β’ 1910.02610 β’ Published β’ 2 -
LaMDA: Language Models for Dialog Applications
Paper β’ 2201.08239 β’ Published β’ 5 -
WebGPT: Browser-assisted question-answering with human feedback
Paper β’ 2112.09332 β’ Published β’ 2
OS Week Highlights - Oct 16 - 22
OS Week Highlights - Oct 9 - 15
OS Week Highlights - Oct 2 - 8
OS Week Highlights - Sept 25 - Oct 1
OS Week Highlights - Sept 18 - 24
-
Running on ZeroFeatured5.32k
IllusionDiffusion
π5.32kGenerate stunning high quality illusion artwork
-
Runtime errorFeatured2.77k
XTTS
πΈ2.77kGenerate speech from text using a reference voice
-
PausedFeatured73
Nougat Transformers
π«73Convert PDFs to markup language using OCR
-
monster-labs/control_v1p_sd15_qrcode_monster
Updated β’ 26.1k β’ 1.43k
Historical - Spaces of the Week
All Spaces of the Week...from all weeks
Mistral Instruct Merges
Merge of Mistral Instruct 1 and 2 using different mergekit techniques
Papers I want to read
Papers in my to-read list
-
RLHF Workflow: From Reward Modeling to Online RLHF
Paper β’ 2405.07863 β’ Published β’ 71 -
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Paper β’ 2405.09818 β’ Published β’ 132 -
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
Paper β’ 2405.15574 β’ Published β’ 55 -
An Introduction to Vision-Language Modeling
Paper β’ 2405.17247 β’ Published β’ 90
Instruction Pre-Training
-
Instruction Pre-Training: Language Models are Supervised Multitask Learners
Paper β’ 2406.14491 β’ Published β’ 95 -
Runtime errorFeatured86
Instruction Synthesizer
π86Generate instruction-response pairs from text
-
instruction-pretrain/InstructLM-1.3B
Text Generation β’ 1B β’ Updated β’ 49 β’ 43 -
instruction-pretrain/InstructLM-500M
Text Generation β’ 0.6B β’ Updated β’ 57 β’ 34