Quantum Entanglement and the Sentient Toaster: Revolutionizing LLM Training

#3
by mradermacher - opened

I'm downloading the Q6_K for snowflake - remember, it often scores better at the correct_token metric than the source model :) But if you insist on the Q8_0 we can do that as well.

-rw------- 1 root root 509G Dec 7 13:01 snowflake-arctic-instruct.Q8_0.gguf

I assume that is in GB and not GiB. In which case 474 GiB might fit as we have 503 GiB of RAM (after subtracting RAM reserved for hardware) but would be extremely tight given the RAM required for context.

I'm downloading the Q6_K for snowflake - remember, it often scores better at the correct_token metric than the source model :) But if you insist on the Q8_0 we can do that as well.

Q6_K is fine for me. Q8_0 might not fit without offloading and it is unclear if offloading is even possible. I don't think it's worth using RPC if Q6_K fits. As a bonus there will be enough RAM left to let quantization tasks running if we do Q6_K. If you already have Q8_0 locally you should give it a try and see if it fits but if not Q6_K is fine for me.

I just checked and you do have it locally under /tmp/snowflake-arctic-instruct.Q8_0.gguf so please give it a try to see if it fits. I believe it should fit if nothing else is running as the model has such a small number of layers. If it doesn't fit use Q6_K instead.

474G Dec 7 13:01 snowflake-arctic-instruct.Q8_0.gguf

I'll try an offload of 1 and 0, then Q6. hopefully it does not crash.

I think you have to finish or kill the frozen quantisation tasks first. They are using a lot of reserved RAM (not cached RAM that can be taked away).

So, despite it listing both cpus, it only allocated something on cpu 0 (19GB). Otherwise, top says the process uses 435.6g, which is good, because I forgot to resume/stop the running quantize. I'd say we can even quantize, and if I manipulate the job a bit more, we might even do small imatrix calculations.

457.4g after warming up.

So, despite it listing both GPUs, it only allocated something on GPU0 (19GB)

llama.cpp uses booth GPUs for imatrix but only offloaded to one because you set -ngl 1 and it can only offload on a per-layer bases. Also ince when are quantisation tasks using the GPUs?

grafik.png

I'd say we can even quantize, and if I manipulate the job a bit more, we might even do small imatrix calculations.

I'm not so sure about that. Keep in mind that imatrix uses mmap memory that can be taken away by other processes like quantisation tasks that use reserved memory.

grafik.png

dstat shows a relatively high disk read rate so imatrix might now be streaming from SSD:

grafik.png

Yes it is clearly streaming from SSD now:

grafik.png

Once the quantisation tasks are interrupted it should work without SSD streaming again.

ssh (2222 => 22) and (more importantly) wireguard (7103 => 7103) forwardings are missing. the latter is required for nico1 to get a reliable connction to rich1

This is fixed now. Sorry for the networking issues. ifupdown wasn't installed as it wasn't required with the old networking setup so /etc/network/interfaces set by the Proxmox host got ignore. It instead used systemd-networkd which resulted on it getting a random IP over DHCP breaking the port forwarding rules pointing to 192.168.1.101. I now installed ifupdown and enabled the networking service so this should't happen again.

on the 28th 00:02 (likely), /etc/tmpfiles.d/tmp.conf was removed, causing /tmp to be deleted, which causes a loss of all models and jobs. i was able to restore most jobs with some work from a backup, but I don't always have a backup. it is important to find out what happened so it can be prevented in the future.

No idea who or what deleted this config. /tmp was empty after Richard stopped the container on 26th of January. I don't think it will happen again as we are now using Proxmox to manage the container instead of LXC. Very unfortunately that we lost the entirety of /tmp.

resolv.conf also changed weirdly

That makes sense as Proxmox is injecting its own network configuration into LXC containers so nothing to worry about.

If I can trust the mtime, then the only obvious change outside of /usr is the tmpfiles.d/tmp.conf deletion.

You should be able to trust it as the container is still pointing to the same rootfs folder on the same disk. We didn't copy or move to container at all.

I'd rather start from debian than work with a partially corrupted vm with such surprises.

If you want to start fresh just let me know and I can easily give you a new container. It takes 1 minute for me to create a new one and doing so would be cleaner.

Also, it seems I have 500GB less space - du says 694GB, df says 1192GB is use.

This is because the same disk contains a backup of Richard's website and rich1 just in case. The rich1 backup was unfortunately made when /tmp was already gone. I will delete the backups as soon Richard confirms I can do so.

It all looks fine from my side, thanks for your work. The chattr +i should prevent accidental deletion in the future, but it is very weird. I could chalk it up to my script messing up and fogetting about it, but then it would have happened on previous reboots, and the directory had an mtime of when it was down. Very strange.

Also, it seems I have 500GB less space - du says 694GB, df says 1192GB is use.

I deleted the backups half an hour ago so all the storage should now be available for you to use again.

It all looks fine from my side, thanks for your work.

Thanks. Great to finally see rich1 working again.

I could chalk it up to my script messing up and forgetting about it, but then it would have happened on previous reboots

I don't think we ever rebooted rich1 after we had to reinstall it after the LXC corruption incident.

I don't think we ever rebooted rich1 after we had to reinstall it after the LXC corruption incident.

No, but rich1 and the vm rebooted multiple times before, and once after, and the only time that file was created was when I initially ran my script to configure wireguard and other stuff (i.e. twice only). I can only imagine some script went around and either deleted all 0-size files or any file starting with tmp.* - just very weird. But who knows, maybe whatever script that was run to essentially destroy rich1 also ran a find over the whole disk.

The only evidence is that something mucked with that directory on jan 28th, so it's unlikely to have been something that happened before. I was lucky that I made a copy of the queue just in case when it went down, otherwise restoring the jobs would be... difficult.

Thanks. Great to finally see rich1 working again.

Yeah, I was getting a bit desperate - nico1 much less than 50% usable for weeks, rich1 gone, and an unprecedented number of models, and big ones, too (I mean 70B..130B, not deepseek) made for very tense moments. All in all, it's making good progress despite all, and we even did make a tiny bit of progress on the nice 1000+ models.

Why does it say:

0   66 si Virtuoso-Medium-v2                           error/255 repo create

The repository clearly exists under https://huggingface.co/mradermacher/Virtuoso-Medium-v2-GGUF - it is supposed to do static quants to that repo as the status shows si.

Edit: Now that the imatirx is done it shows sI as status but is still stuck at error/255 repo create. Luckily it just skips this task and works on other tasks in the meantime.
Edit: Ah nice it either fixed itself or you manually fixed it. In any case the model is now getting quantized.

This night and also this morning hf had enourmous timeout problems. Everything was affected, including web page loading. It's not fully fixed yet, but much better. I need to manually retry when it fails at this step.

Ah, and yes, if "s" is in the flags, it will never try imatrix quanting first.

Oh, and btw., hetzner sometimes has good offers, which might or might not be something to consider for richard, if he actually pays €250/month. Can't see an obvious candidate, but didn't look long, and the offers change considerably over time, e.g.

https://www.hetzner.com/sb/#price_from=180&price_to=250&cpuType=AMD&search=threadripper

All of these are a bit faster than his box, and cheaper, afaics.

Sign up or log in to comment