mradermacher/model_requests · Mom i want AGI at home

@mradermacher Are you aware that this is a 405B model? This will likely require some manual handling. This is the best 405B model we ever created and it turned out absolutely amazing.

mradermacher

Owner 5 days ago

I was not, because the sneaky rat didn't follow the naming scheme sane people use.

Well, a 405B model will not do anything bad, it will simply get stuck at the imatrix step. However, there are now four 405B models in the queue, and another one kind of stuck on nico1. I don't think we can do them, with the queue as it is. Too many, too big new models at the moment.

GuilhermeNaturaUmana

5 days ago

bro its AGI, what you would expect😜

mradermacher

Owner 5 days ago

I see AGI models every day :)

GuilhermeNaturaUmana

5 days ago

this one is different for sure @nicoboss agree

mradermacher

Owner 5 days ago

that's what they all say. make it in 70B, and I might be interested :)

GuilhermeNaturaUmana

5 days ago

70b cant be AGI, change my mind lol, need to go now, good work mradermacher you are an awesome guy!

mradermacher

Owner 5 days ago

haha :) llms can't be agi. see who changes his mind first :)

anyway, i manually pushed it to nico1, but then adjusted it's priority down to be above the poor CALM-405B. massive nepotism here :)

seriously, i don't know if i can stomach any 405b at the moment, certainly not when doing imatrix on f16.

stephansolncev

5 days ago

•

edited 5 days ago

Looking for a product to teach baby AGI like a GodFather, what do we have?

mradermacher

Owner 5 days ago

It's better than sliced bread!

stephansolncev

4 days ago

Sup whats new

stephansolncev

4 days ago

why so slow

stephansolncev

4 days ago

•

edited 4 days ago

ok lemme help you... we can't wait!

try:
AGI().evolve()
except NotEnoughComputeError:
push_priority_up(nico1)

nicoboss

4 days ago

•

edited 4 days ago

I was not, because the sneaky rat didn't follow the naming scheme sane people use.

The name is indeed very unfortinate as nobody knows it is the first reasoning finetuned 405B model as you don't even find it when searching for 405B. I already tried to convince Guilherme to change it but he wants to stick with the current name.

Sup whats new

GuilhermeNaturaUmana/Nature-Reason-1-AGI is a reasoning finetuned version of nicoboss/Hermes-3-Llama-3.1-405B-Uncensored trained using H200 GPUs. It is similar to DeepSeek-R1 which is based on DeepSeek-V3-Base but for Llama-3.1-405B. This is really interesting as unlike DeepSeek-V3-Base which uses MoE the Llama-3.1-405B model is the largest monolithic openly available model and this is the first reasoning finetune of it.

why so slow

It's not slow at all. Just run https://huggingface.co/GuilhermeNaturaUmana/Nature-Reason-1-AGI-AWQ on 4x A100 using vLLM and you get over 500 tokens/second. I already spent quite some time testing the model and it is awesome. It achieved 0.9477% in gsm8k which is an amazing result showing that reasoning finetuning existing large models make them really good.

If you mean why this model is taking so long to quant well it CALM-405B is currently hogging all the storage of nico1 so it really seems like we have to complete it first. I ordered 72 TB storage at new year but I'm still waiting for it due to delivery delays but it is scheduled to arrive in around a month should they not postpone it again. If you can't wait, we created and uploaded the Q4_K_M static GGUF to https://huggingface.co/GuilhermeNaturaUmana/Nature-Reason-1-AGI-GGUFs/tree/main

ok lemme help you... we can't wait!

We are giving our best to quant this beast as quickly as possible but really not much we can do given the current storage constraints.

When looking at the next 3 entries in the wait queue I'm terrified - I believe we should lower their priority as otherwise they are going to clog up nico1 completely:

    0  812 si Llama-3.1-Tulu-3-405B-SFT                   
    0 1624 si Llama-3.1-Tulu-3-405B                       
    0  812 si Llama-3.1-Tulu-3-405B-DPO

nicoboss

4 days ago

•

edited 4 days ago

I lowered priority of Llama-3.1-Tulu-3-405B-SFT, Llama-3.1-Tulu-3-405B and Llama-3.1-Tulu-3-405B-DPO from 0 to 10. This to prevent them from clogging up nico1 and because the models I queued in the 1 to 9 range are more important than the Tulu-3-405B models for sure. It would be nice to get to at least some of them before the queue is flooded again with 0 priority models as all models in that range are currently trending on HuggingFace.

mradermacher

Owner 4 days ago

They would not have been queued to nico1 due to the queue settings. What's clogging up nico1 (temporarily) is the long stretches of unavailability of imatrix calculations, causing the queue to fill up. I manually queued up static-only models for a while because of this (and other scheduling) reasons. And Nature-Reason-1-AGI, of course :)

Anyway, nico1 is in good shape nevertheless, and I think we are making positive progress on the queue for the last tow days.

For Llama-3.1-Tulu-3-405B, I think I will need your help, because that would require 2-3TB of disk space to download and convert.

mradermacher

Owner 4 days ago

•

edited 4 days ago

we created and uploaded the Q4_K_M static GUUF

I hear mradermacher has a Q2_K and Q4_K_S also.

72 TB storage at new year

I ordered 160TB december last year, and they arrived within 4 days. Relatively cheap, too.

muahahaha.

(just a day before my existing 140TB raid corrupted. it was very handy for recovery)

nicoboss

4 days ago

•

edited 4 days ago

They would not have been queued to nico1 due to the queue settings.

Great to know. I was worried they all would be getting downloaded to nico1 and then clogging up all remaining storage but you are right the budget would likely have prevented it.

What's clogging up nico1 (temporarily) is the long stretches of unavailability of imatrix calculations, causing the queue to fill up.

Sorry yesterday all the materials to switch back from the AiO cooler to the water block required for the outdoor water-cooling system arrived so I spent an evening doing maintenance on StormPeak. Other than that, I'm currently juggling around between finetuning and imatrix tasks. I’m aware that I really need to find a better way to finetune models without severely impacting the imatrix queue. Currently during daytime, I just wait for imatrix queue to run dry and then let axolotl run until the imatrix queue contains a few model and repeat. This manual process is not just annoying but also not something I can do during nighttime. The RTX 3080 would be available during nighttime but our current setup only supports imatrix computation on the RTX 4090 GPUs which also just happen to be the only GPUs supported by axolotl. Maybe an easy temporary fix would be an flag like /tmp/ngl to disable GPU layer offloading so we can try running axolotl and imatrix computation on the same GPUs at the same time but not sure how this will impact imatrix/axolotl performance but would be worth a try. Further I will be trying to configure axolotl so I can do small models on a single GPU because the 14B models I am currently finetuning would easily fit on a single one. That way one would still be available for imatrix computation even if I finetune 14B and smaller. There really is a lot we can improve in this regard.

I manually queued up static-only models for a while because of this (and other scheduling) reasons. And Nature-Reason-1-AGI, of course :)

Oh no you should have mentioned this earlier and we could have found a better solution.

Anyway, nico1 is in good shape nevertheless, and I think we are making positive progress on the queue for the last tow days.

We are making really fast progress at the moment and I'm really happy we finally came to the 1 to 9 priority range. The 0-priority rage was absolutely massive.

As soon the new PSU arrives (which should happen within the next 2 weeks) we can consider creating nico2 and nico3 for even faster throughput.

For Llama-3.1-Tulu-3-405B, I think I will need your help, because that would require 2-3TB of disk space to download and convert.

I think so as well. I could again do the NFS thing but maybe I should just pause nico1 again for a night to complete the 4 TB worth of tasks still scheduled to be processed by the performance measurement project. Just very bad timing to do so at the moment While I yesterday installed the new waterlocked it still needs further testing. I might also need to first replace the superglued but now no longer leaking water pump. The new one arrived yesterday but exchanging it is not as easy as one would think as I need to make sure the cooling loop doesn't fill up with air. Maybe I will really just end up using NFS because at we have storage available there right now and I know it works. IU will start preparing the source GGUF of allenai/Llama-3.1-Tulu-3-405B once CALM-405B is done.

I hear mradermacher has a Q2_K and Q4_K_S also.

Oh wow you are right. That's awesome. I somehow completely missed them getting generated: https://hf.tst.eu/model#Nature-Reason-1-AGI-GGUF

I ordered 160TB december last year, and they arrived within 4 days. Relatively cheap, too.

Mine are cheaper than cheap. All ST18000NM003D-FR for around $200 each (around $11 per TB). I don't care that they are factory recertified as I have redundancy and backups but got extremely unlucky with delivery time. Someone bought up the entire European stock for Christmas and I ordered end of December then my order got canceled and ordered again early January. Consider yourself really lucky.

mradermacher

Owner 4 days ago

Great to know. I was worried they all would be getting downloaded to nico1 and then clogging up all remaining storage but you are right the budget would likely have prevented it.

I was not complaining, btw. - it's all good and fine to experiment like this.

I'm currently juggling around between finetuning and imatrix tasks.

I was not complaining here, either, btw. - you are doing a stellar job using what are effectively lulls in gpu use. And all that doesn't matter once the <50 nice models are done. We had a day like that a week ago, too, btw., so we are not that far behind.

The RTX 3080 would be available during nighttime

If it helps, probably all I need is a different binary with different cuda api version compiled in (haven't tried). Or simply compile cuda for the 3080 - not sure what the 4xxx actually adds. Pretty sure that could permanently free up a 4090.

Oh no you should have mentioned this earlier and we could have found a better solution.

Nature-AGI is not an issue - in fact, due to the "clogging" it already started making some static quants for it tonight. I reversed calm and nature-agi nice levels, however, to free up some space (for tulu, potentially), but we are not in a problematic situation.

What I mean is that the work needs to be done eventually, it's just that I wish the big models to be last, probably because least amount of people are wqaiting for that single model, compared to the 100 smaller ones. But even if nico1 for some reaosn ran "dry", it would simply qwuant calm and nature-agi.

We are making really fast progress at the moment and I'm really happy we finally came to the 1 to 9 priority range. The 0-priority rage was absolutely massive.

I am sorry to say, but that's because I pushed most static models first, to free some queue space up. Also, small static mdoels tzend to go to rain/back/..., but that is actually not a good use of their resources. While they are a good match, they are not good at I/O, so letting them do small imatrix models is more efficient.

That's all just because of the current situation.

Just very bad timing to do so at the moment

Kind of the perfect storm :-) But it turns out the whole thing is pretty resilient. Despite asll odds, too (the amount of issues you had is pretty legendary). Reminds me that I once fixed a leaking acrylic waterblock with expoxy glue. Worked for a few months. But never again acrylic.

Mine are cheaper than cheap. All ST18000NM003D-FR for around $200 each (around $11 per TB).

OIk, that's a really good price. Mine were €14, but I don't go for 18TB anymore. Situation sucks for a few years now, with seagate/wd not getting to the promised 40TB drives and cheap shuckable drives simply not being sold anymore.

Ah, and yes, mine are recertified. Bet you didn't know the recertified ones are the better ones :-)

mradermacher

Owner 4 days ago

untested code added, try echo 0 >/tmp/max-ngl and delete it later.

if read ngl </tmp/max-ngl; then
   [ "$ngl" -lt "$NGL" ] && NGL="$ngl"
fi

nicoboss

4 days ago

untested code added, try echo 0 >/tmp/max-ngl and delete it later.

Thanks a lot! You are awesome. I just set this flag and started axolotl again. I'm excited to see what happens with the next imatrix task.

nicoboss

4 days ago

Awesome it works as expected! This is so cool.

mradermacher

Owner 4 days ago

•

edited 4 days ago

Great to see you so happy :) And do you already know how it affects performance? (of axolotl)

GuilhermeNaturaUmana

3 days ago

Sam Altman 2.0 announcement here(me): We achieved AGI publicy, if you dont agree GO F@#%¨&&**&&

just kidding this is your opnion, but mine is that it is😎😎

GuilhermeNaturaUmana

3 days ago

We got AGI before GTA 6 lets goooo

mradermacher

Owner 3 days ago

So, uhm, I assume, using a Q8_0 quant for the imatrix, or reusing llama-405b's imatrix is not going to cut it this time?

dang!

nicoboss

3 days ago

Great to see you so happy :) And do you already know how it affects performance? (of axolotl)

There is no performance impact for axolotl. Each training step still takes exactly the same time. There also seems to not be any major performance impact to imatrix computation beside what it loses by not offloading layers. I was training axolotl with parameters offloaded to RAM so it likely does something very similar to imatrix computation. I'm now a bit curious to see what performance impact running two non-offloaded imatrix tasks on the same GPU would have. I might test it when I have time.

So, uhm, I assume, using a Q8_0 quant for the imatrix, or reusing llama-405b's imatrix is not going to cut it this time?

It really is a fantastic model that deserves an RPC setup but I'm still in a lack of PSU for CastlePeak and until I have a replacement there unfortunately won't be any RPC setup. If we are lucky the new PSU will come somewhere next week. We should consider delaying imatrix quants but then I would need to find a way to temporary archive the source GGUF. The other 405B models in the queue can have their imatrix computed in 8-bit for sure.

stephansolncev

2 days ago

sup guuuys how its going
we're still slow
who s gonna ne the first?

nicoboss

2 days ago

•

edited 2 days ago

sup guuuys how its going

The queue looks relatively healthy. We completed almost all the high priority models.

we're still slow

For this model yes but generally I would say we are quite fast. All the user requests of reasonable sized models are getting completed within hours.

who s gonna ne the first?

Don't worry I ordered a 20 TB HDD today. It will arrive tomorrow evening and should make managing such massive models way easier.

stephansolncev

2 days ago

Sounds potentially good , best of luck. Thanks for your work

stephansolncev

1 day ago

" GuilhermeNaturaUmana/Nature-Reason-1-AGI"

Any ideas How to make Decentralized ASI Child a-ka "Blockchain free baby" ? On every 4090 in the world