LLMs
updated
TinyLlama: An Open-Source Small Language Model
Paper
• 2401.02385
• Published
• 95
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper
• 2401.13601
• Published
• 48
SliceGPT: Compress Large Language Models by Deleting Rows and Columns
Paper
• 2401.15024
• Published
• 73
Rephrasing the Web: A Recipe for Compute and Data-Efficient Language
Modeling
Paper
• 2401.16380
• Published
• 51
Weaver: Foundation Models for Creative Writing
Paper
• 2401.17268
• Published
• 45
Dolma: an Open Corpus of Three Trillion Tokens for Language Model
Pretraining Research
Paper
• 2402.00159
• Published
• 65
BlackMamba: Mixture of Experts for State-Space Models
Paper
• 2402.01771
• Published
• 25
Chain-of-Thought Reasoning Without Prompting
Paper
• 2402.10200
• Published
• 109
Nomic Embed: Training a Reproducible Long Context Text Embedder
Paper
• 2402.01613
• Published
• 15
OLMo: Accelerating the Science of Language Models
Paper
• 2402.00838
• Published
• 85
SPAR: Personalized Content-Based Recommendation via Long Engagement
Attention
Paper
• 2402.10555
• Published
• 35
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
Paper
• 2402.13753
• Published
• 116
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper
• 2402.17764
• Published
• 627
Gemini 1.5: Unlocking multimodal understanding across millions of tokens
of context
Paper
• 2403.05530
• Published
• 65
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Paper
• 2403.09611
• Published
• 129
LLM Agent Operating System
Paper
• 2403.16971
• Published
• 73
RecurrentGemma: Moving Past Transformers for Efficient Open Language
Models
Paper
• 2404.07839
• Published
• 48
Understanding the planning of LLM agents: A survey
Paper
• 2402.02716
• Published
• 1
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Paper
• 2201.11903
• Published
• 15
HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in HuggingFace
Paper
• 2303.17580
• Published
• 15
OpenELM: An Efficient Language Model Family with Open-source Training
and Inference Framework
Paper
• 2404.14619
• Published
• 126
Prometheus 2: An Open Source Language Model Specialized in Evaluating
Other Language Models
Paper
• 2405.01535
• Published
• 124
Better & Faster Large Language Models via Multi-token Prediction
Paper
• 2404.19737
• Published
• 81
Octopus v4: Graph of language models
Paper
• 2404.19296
• Published
• 118
InternLM-XComposer-2.5: A Versatile Large Vision Language Model
Supporting Long-Contextual Input and Output
Paper
• 2407.03320
• Published
• 94
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your
Phone
Paper
• 2404.14219
• Published
• 259
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report
Paper
• 2405.00732
• Published
• 122
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts
Language Model
Paper
• 2405.04434
• Published
• 25
Large Language Diffusion Models
Paper
• 2502.09992
• Published
• 126
Less is More: Recursive Reasoning with Tiny Networks
Paper
• 2510.04871
• Published
• 509