-
The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism
Paper • 2407.10457 • Published • 23 -
Adding Error Bars to Evals: A Statistical Approach to Language Model Evaluations
Paper • 2411.00640 • Published • 3 -
Law of the Weakest Link: Cross Capabilities of Large Language Models
Paper • 2409.19951 • Published • 54
Vignesh
Vigneshwaran
AI & ML interests
None yet
Recent Activity
liked
a dataset
4 days ago
teknium/OpenHermes-2.5
updated
a collection
17 days ago
RLHF
updated
a collection
23 days ago
RLHF
Organizations
Collections
5
models
None public yet
datasets
None public yet