-
Language agents achieve superhuman synthesis of scientific knowledge
Paper β’ 2409.13740 β’ Published -
Nougat: Neural Optical Understanding for Academic Documents
Paper β’ 2308.13418 β’ Published β’ 41 -
datalab-to/chandra
Image-to-Text β’ 9B β’ Updated β’ 87.1k β’ 409 -
Denario
π»25GUI for Denario
Collections
Discover the best community collections!
Collections including paper arxiv:2308.13418
-
LayerCake: Token-Aware Contrastive Decoding within Large Language Model Layers
Paper β’ 2507.04404 β’ Published β’ 21 -
70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float
Paper β’ 2504.11651 β’ Published β’ 31 -
A Token is Worth over 1,000 Tokens: Efficient Knowledge Distillation through Low-Rank Clone
Paper β’ 2505.12781 β’ Published β’ 2 -
A Survey of Context Engineering for Large Language Models
Paper β’ 2507.13334 β’ Published β’ 259
-
LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding
Paper β’ 2306.17107 β’ Published β’ 11 -
On the Hidden Mystery of OCR in Large Multimodal Models
Paper β’ 2305.07895 β’ Published β’ 1 -
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities
Paper β’ 2308.12966 β’ Published β’ 11 -
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
Paper β’ 2401.15947 β’ Published β’ 53
-
Language models are weak learners
Paper β’ 2306.14101 β’ Published β’ 10 -
Large Language Models as Tax Attorneys: A Case Study in Legal Capabilities Emergence
Paper β’ 2306.07075 β’ Published β’ 10 -
TableGPT: Towards Unifying Tables, Nature Language and Commands into One GPT
Paper β’ 2307.08674 β’ Published β’ 48 -
Nougat: Neural Optical Understanding for Academic Documents
Paper β’ 2308.13418 β’ Published β’ 41
-
Nougat: Neural Optical Understanding for Academic Documents
Paper β’ 2308.13418 β’ Published β’ 41 -
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
Paper β’ 2307.02499 β’ Published β’ 15 -
Text Rendering Strategies for Pixel Language Models
Paper β’ 2311.00522 β’ Published β’ 12
-
Language agents achieve superhuman synthesis of scientific knowledge
Paper β’ 2409.13740 β’ Published -
Nougat: Neural Optical Understanding for Academic Documents
Paper β’ 2308.13418 β’ Published β’ 41 -
datalab-to/chandra
Image-to-Text β’ 9B β’ Updated β’ 87.1k β’ 409 -
Denario
π»25GUI for Denario
-
Language models are weak learners
Paper β’ 2306.14101 β’ Published β’ 10 -
Large Language Models as Tax Attorneys: A Case Study in Legal Capabilities Emergence
Paper β’ 2306.07075 β’ Published β’ 10 -
TableGPT: Towards Unifying Tables, Nature Language and Commands into One GPT
Paper β’ 2307.08674 β’ Published β’ 48 -
Nougat: Neural Optical Understanding for Academic Documents
Paper β’ 2308.13418 β’ Published β’ 41
-
LayerCake: Token-Aware Contrastive Decoding within Large Language Model Layers
Paper β’ 2507.04404 β’ Published β’ 21 -
70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float
Paper β’ 2504.11651 β’ Published β’ 31 -
A Token is Worth over 1,000 Tokens: Efficient Knowledge Distillation through Low-Rank Clone
Paper β’ 2505.12781 β’ Published β’ 2 -
A Survey of Context Engineering for Large Language Models
Paper β’ 2507.13334 β’ Published β’ 259
-
LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding
Paper β’ 2306.17107 β’ Published β’ 11 -
On the Hidden Mystery of OCR in Large Multimodal Models
Paper β’ 2305.07895 β’ Published β’ 1 -
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities
Paper β’ 2308.12966 β’ Published β’ 11 -
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
Paper β’ 2401.15947 β’ Published β’ 53
-
Nougat: Neural Optical Understanding for Academic Documents
Paper β’ 2308.13418 β’ Published β’ 41 -
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
Paper β’ 2307.02499 β’ Published β’ 15 -
Text Rendering Strategies for Pixel Language Models
Paper β’ 2311.00522 β’ Published β’ 12