https://huggingface.co/DazzlingXeno/FusedSlushDumpling-32b

#637
by Nurburgring - opened

@mradermacher I just tried to queue this but it states that it already exists however it doesn't exist as uploaded model or in the queue. I assume this means that it was queued in the past and failed. Somewhat strange given that it's a Qwen2ForCausalLM model but maybe some wrong tokenizer/pre-tokenizer or some other issue. Is there a way for me to check why it failed? I tried finding it in the audit but it didn't happen today and so was likely already nuked.

nico1 ~# llmc add -2007 si https://huggingface.co/DazzlingXeno/FusedSlushDumpling-32b
submit tokens: ["-2007","static","imatrix","https://huggingface.co/DazzlingXeno/FusedSlushDumpling-32b"]
https://huggingface.co/DazzlingXeno/FusedSlushDumpling-32b
["https://huggingface.co/DazzlingXeno/FusedSlushDumpling-32b",["s","i","-2000"],1738871223],
https://huggingface.co/DazzlingXeno/FusedSlushDumpling-32b already in llmjob.submit.txt

I just force added it so I will then hopefully see in the audit why it failed.

@mradermacher I'm not getting it. I saw the model in the queue after I forcefully queued it but now it is gone from the queue without any error being shown on the status page and I cannot locate the uploaded model and when I enter llmc audit nothing happens. llmc audit actually just froze a few minutes ago and now when I try to use it, it terminates without showing anything anymore likely because you nuked all the errors in the meantime.

It failed the first time I submitted it with incompatible tensor shapes during imatrix measuring. There are no permanent logs of this event happening at the moment, and since it is a very rare occurance, I wasn't bothered enough to rectify this (the symptom is usually the same as you described "it has been submitted? but it never failed and dopesn't exist? what? why?" - with the added problem of you not having access to the fail log that audit writes).

It should have worked fine with force... I suspect I simply removed it during my breakfast, together with the two other failed imatrix ones.

llmc audit probably didn't freeze, but waited for the lock (which, kind of, freezes it). When there are no failed models, llmc audit simply ends up with no output.

It should be safe to ^C it at any time. The lock can be held for a very long time (10-20 minutes) in rare cases, if e.g. a sync() is running on one of the machines (I have waited for almost a minute for sync to finish on nico1, imagine how long it can take on some of the poorer nodes :)

Or somebody else did run llmc audit while you did, and the lock is global. Could be lifted, but...

mradermacher changed discussion status to closed

I just force added it so I will then hopefully see in the audit why it failed.

Ah, and you would not have, because the imatrix logs are currently off-limits. Or actually, you can llmc shell kaos and find them in /tmp, as MODEL.slog, probably.

But audit does nothing with imatrix failures at the moment. They practically only happen when tensors are missing or tensor shapes are wrong.

Sign up or log in to comment