-
A Tale of Tails: Model Collapse as a Change of Scaling Laws
Paper • 2402.07043 • Published • 16 -
What Characterizes Effective Reasoning? Revisiting Length, Review, and Structure of CoT
Paper • 2509.19284 • Published • 22 -
OnePiece: Bringing Context Engineering and Reasoning to Industrial Cascade Ranking System
Paper • 2509.18091 • Published • 33 -
Strategic Dishonesty Can Undermine AI Safety Evaluations of Frontier LLM
Paper • 2509.18058 • Published • 12
Collections
Discover the best community collections!
Collections including paper arxiv:2402.07043
-
Scaling Laws for Downstream Task Performance of Large Language Models
Paper • 2402.04177 • Published • 20 -
A Tale of Tails: Model Collapse as a Change of Scaling Laws
Paper • 2402.07043 • Published • 16 -
Scaling Laws for Fine-Grained Mixture of Experts
Paper • 2402.07871 • Published • 14 -
When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method
Paper • 2402.17193 • Published • 26
-
YAYI 2: Multilingual Open-Source Large Language Models
Paper • 2312.14862 • Published • 15 -
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling
Paper • 2312.15166 • Published • 60 -
TrustLLM: Trustworthiness in Large Language Models
Paper • 2401.05561 • Published • 69 -
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper • 2401.06066 • Published • 58
-
Chain-of-Verification Reduces Hallucination in Large Language Models
Paper • 2309.11495 • Published • 39 -
EIPE-text: Evaluation-Guided Iterative Plan Extraction for Long-Form Narrative Text Generation
Paper • 2310.08185 • Published • 8 -
The Consensus Game: Language Model Generation via Equilibrium Search
Paper • 2310.09139 • Published • 14 -
In-Context Pretraining: Language Modeling Beyond Document Boundaries
Paper • 2310.10638 • Published • 30
-
Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning
Paper • 2211.04325 • Published • 1 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 24 -
On the Opportunities and Risks of Foundation Models
Paper • 2108.07258 • Published • 1 -
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks
Paper • 2204.07705 • Published • 2
-
Efficient Tool Use with Chain-of-Abstraction Reasoning
Paper • 2401.17464 • Published • 21 -
Transforming and Combining Rewards for Aligning Large Language Models
Paper • 2402.00742 • Published • 12 -
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 138 -
Specialized Language Models with Cheap Inference from Limited Domain Data
Paper • 2402.01093 • Published • 47
-
Orca 2: Teaching Small Language Models How to Reason
Paper • 2311.11045 • Published • 77 -
ToolTalk: Evaluating Tool-Usage in a Conversational Setting
Paper • 2311.10775 • Published • 10 -
Adapters: A Unified Library for Parameter-Efficient and Modular Transfer Learning
Paper • 2311.11077 • Published • 29 -
MultiLoRA: Democratizing LoRA for Better Multi-Task Learning
Paper • 2311.11501 • Published • 37
-
A Tale of Tails: Model Collapse as a Change of Scaling Laws
Paper • 2402.07043 • Published • 16 -
What Characterizes Effective Reasoning? Revisiting Length, Review, and Structure of CoT
Paper • 2509.19284 • Published • 22 -
OnePiece: Bringing Context Engineering and Reasoning to Industrial Cascade Ranking System
Paper • 2509.18091 • Published • 33 -
Strategic Dishonesty Can Undermine AI Safety Evaluations of Frontier LLM
Paper • 2509.18058 • Published • 12
-
Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning
Paper • 2211.04325 • Published • 1 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 24 -
On the Opportunities and Risks of Foundation Models
Paper • 2108.07258 • Published • 1 -
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks
Paper • 2204.07705 • Published • 2
-
Scaling Laws for Downstream Task Performance of Large Language Models
Paper • 2402.04177 • Published • 20 -
A Tale of Tails: Model Collapse as a Change of Scaling Laws
Paper • 2402.07043 • Published • 16 -
Scaling Laws for Fine-Grained Mixture of Experts
Paper • 2402.07871 • Published • 14 -
When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method
Paper • 2402.17193 • Published • 26
-
Efficient Tool Use with Chain-of-Abstraction Reasoning
Paper • 2401.17464 • Published • 21 -
Transforming and Combining Rewards for Aligning Large Language Models
Paper • 2402.00742 • Published • 12 -
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 138 -
Specialized Language Models with Cheap Inference from Limited Domain Data
Paper • 2402.01093 • Published • 47
-
YAYI 2: Multilingual Open-Source Large Language Models
Paper • 2312.14862 • Published • 15 -
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling
Paper • 2312.15166 • Published • 60 -
TrustLLM: Trustworthiness in Large Language Models
Paper • 2401.05561 • Published • 69 -
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper • 2401.06066 • Published • 58
-
Orca 2: Teaching Small Language Models How to Reason
Paper • 2311.11045 • Published • 77 -
ToolTalk: Evaluating Tool-Usage in a Conversational Setting
Paper • 2311.10775 • Published • 10 -
Adapters: A Unified Library for Parameter-Efficient and Modular Transfer Learning
Paper • 2311.11077 • Published • 29 -
MultiLoRA: Democratizing LoRA for Better Multi-Task Learning
Paper • 2311.11501 • Published • 37
-
Chain-of-Verification Reduces Hallucination in Large Language Models
Paper • 2309.11495 • Published • 39 -
EIPE-text: Evaluation-Guided Iterative Plan Extraction for Long-Form Narrative Text Generation
Paper • 2310.08185 • Published • 8 -
The Consensus Game: Language Model Generation via Equilibrium Search
Paper • 2310.09139 • Published • 14 -
In-Context Pretraining: Language Modeling Beyond Document Boundaries
Paper • 2310.10638 • Published • 30