Benchmarking AI Models in Software Engineering: A Review, Search Tool, and Enhancement Protocol Paper • 2503.05860 • Published 7 days ago • 7
Language Models for Code Completion: A Practical Evaluation Paper • 2402.16197 • Published Feb 25, 2024 • 1
The Heap: A Contamination-Free Multilingual Code Dataset for Evaluating Large Language Models Paper • 2501.09653 • Published Jan 16 • 12
The Heap: A Contamination-Free Multilingual Code Dataset for Evaluating Large Language Models Paper • 2501.09653 • Published Jan 16 • 12