Spaces:

open-llm-leaderboard
/

open_llm_leaderboard

Running on CPU Upgrade

App Files Files Community

1068

Failed orca_mini_v8_* Evaluation

#1051

by pankajmathur - opened 21 days ago

Discussion

pankajmathur

21 days ago

Opening new discussion, as suggested in previous comment on another discussion:

Hi @alozowski ,

Happy Monday, just reaching out to make sense out of following eval requests commits for model "pankajmathur/orca_mini_v8_0_70b", the below commit shows file rename and changes from wrong "params": 35.277,
https://huggingface.co/datasets/open-llm-leaderboard/requests/commit/5660c4c4b9156fa0f15d99be7eee061d5de24764#d2h-741276
Does the model failed to evaluate and these changes reflect re submission for evaluation again?

If it is true, can we submit "pankajmathur/orca_mini_v8_1_70b" again too, as It shows it is failed too?
https://huggingface.co/datasets/open-llm-leaderboard/requests/commit/8b40ba212c48dc470be4f661b67cc085ed456477#d2h-702908

Is there any reason they are failing? Just for background, I have successfully evaluated both of them on my own servers, before submitting them to HF Open LLM LB, using:

https://huggingface.co/docs/leaderboards/open_llm_leaderboard/about#reproducibility

lm_eval --model hf --model_args pretrained=pankajmathur/orca_mini_v8_1_70b,dtype=bfloat16,parallelize=True --tasks leaderboard --output_path lm_eval_results/leaderboard --batch_size auto

and these results are now updated for both model cards:
https://huggingface.co/pankajmathur/orca_mini_v8_0_70b
https://huggingface.co/pankajmathur/orca_mini_v8_1_70b

Again, thanks again for helping out on this really appreciated.

Regards,
Pankaj

pankajmathur

6 days ago

•

edited 6 days ago

Happy New Year Folks,

Just reaching out to see if you can help me out to understand the failures of above and also many more evaluation of models below failed., here are request json:

https://huggingface.co/datasets/open-llm-leaderboard/requests/blob/main/pankajmathur/orca_mini_v9_3_70B_eval_request_False_bfloat16_Original.json
https://huggingface.co/datasets/open-llm-leaderboard/requests/blob/main/pankajmathur/orca_mini_v9_2_70b_eval_request_False_bfloat16_Original.json
https://huggingface.co/datasets/open-llm-leaderboard/requests/blob/main/pankajmathur/orca_mini_v7_72b_eval_request_False_bfloat16_Original.json

https://huggingface.co/datasets/open-llm-leaderboard/requests/blob/main/pankajmathur/orca_mini_v9_5_1B-Instruct_eval_request_False_bfloat16_Original.json
https://huggingface.co/datasets/open-llm-leaderboard/requests/blob/main/pankajmathur/orca_mini_v9_6_3B-Instruct_eval_request_False_bfloat16_Original.json

Again, Just for background, I was able to successfully evaluated all of them on my own servers using reproducibility steps. So, please advise, if there is something need to done with these models for evaluation.

Thanks for all the work on Open LLM LB.

Pankaj

pankajmathur

3 days ago

Hi @alozowski and Team,
Please any update on this. Mostly Why all of above model evaluation failed and if there is any way to rerun them.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment