Spaces:
Running
on
CPU Upgrade
Failed orca_mini_v8_* Evaluation
Opening new discussion, as suggested in previous comment on another discussion:
Hi @alozowski ,
Happy Monday, just reaching out to make sense out of following eval requests commits for model "pankajmathur/orca_mini_v8_0_70b", the below commit shows file rename and changes from wrong "params": 35.277,
https://huggingface.co/datasets/open-llm-leaderboard/requests/commit/5660c4c4b9156fa0f15d99be7eee061d5de24764#d2h-741276
Does the model failed to evaluate and these changes reflect re submission for evaluation again?
If it is true, can we submit "pankajmathur/orca_mini_v8_1_70b" again too, as It shows it is failed too?
https://huggingface.co/datasets/open-llm-leaderboard/requests/commit/8b40ba212c48dc470be4f661b67cc085ed456477#d2h-702908
Is there any reason they are failing? Just for background, I have successfully evaluated both of them on my own servers, before submitting them to HF Open LLM LB, using:
lm_eval --model hf --model_args pretrained=pankajmathur/orca_mini_v8_1_70b,dtype=bfloat16,parallelize=True --tasks leaderboard --output_path lm_eval_results/leaderboard --batch_size auto
and these results are now updated for both model cards:
https://huggingface.co/pankajmathur/orca_mini_v8_0_70b
https://huggingface.co/pankajmathur/orca_mini_v8_1_70b
Again, thanks again for helping out on this really appreciated.
Regards,
Pankaj
Happy New Year Folks,
Just reaching out to see if you can help me out to understand the failures of above and also many more evaluation of models below failed., here are request json:
https://huggingface.co/datasets/open-llm-leaderboard/requests/blob/main/pankajmathur/orca_mini_v9_3_70B_eval_request_False_bfloat16_Original.json
https://huggingface.co/datasets/open-llm-leaderboard/requests/blob/main/pankajmathur/orca_mini_v9_2_70b_eval_request_False_bfloat16_Original.json
https://huggingface.co/datasets/open-llm-leaderboard/requests/blob/main/pankajmathur/orca_mini_v7_72b_eval_request_False_bfloat16_Original.json
https://huggingface.co/datasets/open-llm-leaderboard/requests/blob/main/pankajmathur/orca_mini_v9_5_1B-Instruct_eval_request_False_bfloat16_Original.json
https://huggingface.co/datasets/open-llm-leaderboard/requests/blob/main/pankajmathur/orca_mini_v9_6_3B-Instruct_eval_request_False_bfloat16_Original.json
Again, Just for background, I was able to successfully evaluated all of them on my own servers using reproducibility steps. So, please advise, if there is something need to done with these models for evaluation.
Thanks for all the work on Open LLM LB.
Pankaj
Hi
@alozowski
and Team,
Please any update on this. Mostly Why all of above model evaluation failed and if there is any way to rerun them.