LongDPO: Unlock Better Long-form Generation Abilities for LLMs via Critique-augmented Stepwise Information
Abstract
Long-form generation is crucial for academic writing papers and repo-level code generation. Despite this, current models, including GPT-4o, still exhibit unsatisfactory performance. Existing methods that utilize preference learning with outcome supervision often fail to provide detailed feedback for extended contexts. This shortcoming can lead to content that does not fully satisfy query requirements, resulting in issues like length deviations, and diminished quality. In this paper, we propose enhancing long-form generation by incorporating process supervision. We employ Monte Carlo Tree Search to gather stepwise preference pairs, utilizing a global memory pool to maintain consistency. To address the issue of suboptimal candidate selection, we integrate external critiques to refine and improve the quality of the preference pairs. Finally, we apply step-level DPO using the collected stepwise preference pairs. Experimental results show that our method improves length and quality on long-form generation benchmarks, with almost lossless performance on general benchmarks across various model backbones.
Community
LongDPO focuses on improving long-form generation abilities for LLMs
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Seed-CTS: Unleashing the Power of Tree Search for Superior Performance in Competitive Coding Tasks (2024)
- A Systematic Examination of Preference Learning through the Lens of Instruction-Following (2024)
- Leveraging Metamemory Mechanisms for Enhanced Data-Free Code Generation in LLMs (2025)
- CORD: Balancing COnsistency and Rank Distillation for Robust Retrieval-Augmented Generation (2024)
- Think&Cite: Improving Attributed Text Generation with Self-Guided Tree Search and Progress Reward Modeling (2024)
- LongViTU: Instruction Tuning for Long-Form Video Understanding (2025)
- CoT-based Synthesizer: Enhancing LLM Performance through Answer Synthesis (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 1
Datasets citing this paper 1
Spaces citing this paper 0
No Space linking this paper