Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2402.07043

A Tale of Tails: Model Collapse as a Change of Scaling Laws

Paper • 2402.07043 • Published Feb 10, 2024 • 16
What Characterizes Effective Reasoning? Revisiting Length, Review, and Structure of CoT

Paper • 2509.19284 • Published Sep 23 • 22
OnePiece: Bringing Context Engineering and Reasoning to Industrial Cascade Ranking System

Paper • 2509.18091 • Published Sep 22 • 33
Strategic Dishonesty Can Undermine AI Safety Evaluations of Frontier LLM

Paper • 2509.18058 • Published Sep 22 • 12

A Tale of Tails: Model Collapse as a Change of Scaling Laws

Paper • 2402.07043 • Published Feb 10, 2024 • 16
How to Train Data-Efficient LLMs

Paper • 2402.09668 • Published Feb 15, 2024 • 42

Scaling Laws for Downstream Task Performance of Large Language Models

Paper • 2402.04177 • Published Feb 6, 2024 • 20
A Tale of Tails: Model Collapse as a Change of Scaling Laws

Paper • 2402.07043 • Published Feb 10, 2024 • 16
Scaling Laws for Fine-Grained Mixture of Experts

Paper • 2402.07871 • Published Feb 12, 2024 • 14
When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method

Paper • 2402.17193 • Published Feb 27, 2024 • 26

YAYI 2: Multilingual Open-Source Large Language Models

Paper • 2312.14862 • Published Dec 22, 2023 • 15
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling

Paper • 2312.15166 • Published Dec 23, 2023 • 60
TrustLLM: Trustworthiness in Large Language Models

Paper • 2401.05561 • Published Jan 10, 2024 • 69
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

Paper • 2401.06066 • Published Jan 11, 2024 • 58

Chain-of-Verification Reduces Hallucination in Large Language Models

Paper • 2309.11495 • Published Sep 20, 2023 • 39
EIPE-text: Evaluation-Guided Iterative Plan Extraction for Long-Form Narrative Text Generation

Paper • 2310.08185 • Published Oct 12, 2023 • 8
The Consensus Game: Language Model Generation via Equilibrium Search

Paper • 2310.09139 • Published Oct 13, 2023 • 14
In-Context Pretraining: Language Modeling Beyond Document Boundaries

Paper • 2310.10638 • Published Oct 16, 2023 • 30

A collection of arXiv papers from Chip Huyen's AI Engineering organized by chapter and ordered by when each appears in the book.

Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning

Paper • 2211.04325 • Published Oct 26, 2022 • 1
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 24
On the Opportunities and Risks of Foundation Models

Paper • 2108.07258 • Published Aug 16, 2021 • 1
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks

Paper • 2204.07705 • Published Apr 16, 2022 • 2

A Tale of Tails: Model Collapse as a Change of Scaling Laws

Paper • 2402.07043 • Published Feb 10, 2024 • 16

Efficient Tool Use with Chain-of-Abstraction Reasoning

Paper • 2401.17464 • Published Jan 30, 2024 • 21
Transforming and Combining Rewards for Aligning Large Language Models

Paper • 2402.00742 • Published Feb 1, 2024 • 12
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Paper • 2402.03300 • Published Feb 5, 2024 • 138
Specialized Language Models with Cheap Inference from Limited Domain Data

Paper • 2402.01093 • Published Feb 2, 2024 • 47

Research Papers

A collection of papers focused on LLM

Orca 2: Teaching Small Language Models How to Reason

Paper • 2311.11045 • Published Nov 18, 2023 • 77
ToolTalk: Evaluating Tool-Usage in a Conversational Setting

Paper • 2311.10775 • Published Nov 15, 2023 • 10
Adapters: A Unified Library for Parameter-Efficient and Modular Transfer Learning

Paper • 2311.11077 • Published Nov 18, 2023 • 29
MultiLoRA: Democratizing LoRA for Better Multi-Task Learning

Paper • 2311.11501 • Published Nov 20, 2023 • 37

A Tale of Tails: Model Collapse as a Change of Scaling Laws

Paper • 2402.07043 • Published Feb 10, 2024 • 16
What Characterizes Effective Reasoning? Revisiting Length, Review, and Structure of CoT

Paper • 2509.19284 • Published Sep 23 • 22
OnePiece: Bringing Context Engineering and Reasoning to Industrial Cascade Ranking System

Paper • 2509.18091 • Published Sep 22 • 33
Strategic Dishonesty Can Undermine AI Safety Evaluations of Frontier LLM

Paper • 2509.18058 • Published Sep 22 • 12

A collection of arXiv papers from Chip Huyen's AI Engineering organized by chapter and ordered by when each appears in the book.

Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning

Paper • 2211.04325 • Published Oct 26, 2022 • 1
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 24
On the Opportunities and Risks of Foundation Models

Paper • 2108.07258 • Published Aug 16, 2021 • 1
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks

Paper • 2204.07705 • Published Apr 16, 2022 • 2

A Tale of Tails: Model Collapse as a Change of Scaling Laws

Paper • 2402.07043 • Published Feb 10, 2024 • 16
How to Train Data-Efficient LLMs

Paper • 2402.09668 • Published Feb 15, 2024 • 42

A Tale of Tails: Model Collapse as a Change of Scaling Laws

Paper • 2402.07043 • Published Feb 10, 2024 • 16

Scaling Laws for Downstream Task Performance of Large Language Models

Paper • 2402.04177 • Published Feb 6, 2024 • 20
A Tale of Tails: Model Collapse as a Change of Scaling Laws

Paper • 2402.07043 • Published Feb 10, 2024 • 16
Scaling Laws for Fine-Grained Mixture of Experts

Paper • 2402.07871 • Published Feb 12, 2024 • 14
When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method

Paper • 2402.17193 • Published Feb 27, 2024 • 26

Efficient Tool Use with Chain-of-Abstraction Reasoning

Paper • 2401.17464 • Published Jan 30, 2024 • 21
Transforming and Combining Rewards for Aligning Large Language Models

Paper • 2402.00742 • Published Feb 1, 2024 • 12
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Paper • 2402.03300 • Published Feb 5, 2024 • 138
Specialized Language Models with Cheap Inference from Limited Domain Data

Paper • 2402.01093 • Published Feb 2, 2024 • 47

YAYI 2: Multilingual Open-Source Large Language Models

Paper • 2312.14862 • Published Dec 22, 2023 • 15
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling

Paper • 2312.15166 • Published Dec 23, 2023 • 60
TrustLLM: Trustworthiness in Large Language Models

Paper • 2401.05561 • Published Jan 10, 2024 • 69
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

Paper • 2401.06066 • Published Jan 11, 2024 • 58

Research Papers

A collection of papers focused on LLM

Orca 2: Teaching Small Language Models How to Reason

Paper • 2311.11045 • Published Nov 18, 2023 • 77
ToolTalk: Evaluating Tool-Usage in a Conversational Setting

Paper • 2311.10775 • Published Nov 15, 2023 • 10
Adapters: A Unified Library for Parameter-Efficient and Modular Transfer Learning

Paper • 2311.11077 • Published Nov 18, 2023 • 29
MultiLoRA: Democratizing LoRA for Better Multi-Task Learning

Paper • 2311.11501 • Published Nov 20, 2023 • 37

Chain-of-Verification Reduces Hallucination in Large Language Models

Paper • 2309.11495 • Published Sep 20, 2023 • 39
EIPE-text: Evaluation-Guided Iterative Plan Extraction for Long-Form Narrative Text Generation

Paper • 2310.08185 • Published Oct 12, 2023 • 8
The Consensus Game: Language Model Generation via Equilibrium Search

Paper • 2310.09139 • Published Oct 13, 2023 • 14
In-Context Pretraining: Language Modeling Beyond Document Boundaries

Paper • 2310.10638 • Published Oct 16, 2023 • 30

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs