AI & ML interests

Dark Data Makes Stronger Outcomes

Recent Activity

p3nGu1nZzΒ  updated a dataset about 20 hours ago
DataTonic/cablegate-pdf-dataset
p3nGu1nZzΒ  updated a dataset about 21 hours ago
DataTonic/cablegate-pdf-dataset
p3nGu1nZzΒ  updated a dataset about 21 hours ago
DataTonic/cablegate-pdf-dataset
View all activity

DataTonic's activity

not-lainΒ 
posted an update about 14 hours ago
TonicΒ 
posted an update 5 days ago
view post
Post
1533
microsoft just released Phi-4 , check it out here : Tonic/Phi-4

hope you like it :-)
not-lainΒ 
posted an update about 2 months ago
view post
Post
2145
ever wondered how you can make an API call to a visual-question-answering model without sending an image url πŸ‘€

you can do that by converting your local image to base64 and sending it to the API.

recently I made some changes to my library "loadimg" that allows you to make converting images to base64 a breeze.
πŸ”— https://github.com/not-lain/loadimg

API request example πŸ› οΈ:
from loadimg import load_img
from huggingface_hub import InferenceClient

# or load a local image
my_b64_img = load_img(imgPath_url_pillow_or_numpy ,output_type="base64" ) 

client = InferenceClient(api_key="hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx")

messages = [
	{
		"role": "user",
		"content": [
			{
				"type": "text",
				"text": "Describe this image in one sentence."
			},
			{
				"type": "image_url",
				"image_url": {
					"url": my_b64_img # base64 allows using images without uploading them to the web
				}
			}
		]
	}
]

stream = client.chat.completions.create(
    model="meta-llama/Llama-3.2-11B-Vision-Instruct", 
	messages=messages, 
	max_tokens=500,
	stream=True
)

for chunk in stream:
    print(chunk.choices[0].delta.content, end="")
TonicΒ 
posted an update 2 months ago
view post
Post
3530
πŸ™‹πŸ»β€β™‚οΈhey there folks,

periodic reminder : if you are experiencing ⚠️500 errors ⚠️ or ⚠️ abnormal spaces behavior on load or launch ⚠️

we have a thread πŸ‘‰πŸ» https://discord.com/channels/879548962464493619/1295847667515129877

if you can record the problem and share it there , or on the forums in your own post , please dont be shy because i'm not sure but i do think it helps πŸ€—πŸ€—πŸ€—
  • 2 replies
Β·
TonicΒ 
posted an update 3 months ago
view post
Post
1154
boomers still pick zenodo.org instead of huggingface ??? absolutely clownish nonsense , my random datasets have 30x more downloads and views than front page zenodos ... gonna write a comparison blog , but yeah... cringe.
  • 1 reply
Β·
TonicΒ 
posted an update 3 months ago
view post
Post
832
πŸ™‹πŸ»β€β™‚οΈ hey there folks ,

really enjoying sharing cool genomics and protein datasets on the hub these days , check out our cool new org : https://huggingface.co/seq-to-pheno

scroll down for the datasets, still figuring out how to optimize for discoverability , i do think on that part it will be better than zenodo[dot}org , it would be nice to write a tutorial about that and compare : we already have more downloads than most zenodo datasets from famous researchers !
TonicΒ 
posted an update 3 months ago
view post
Post
1456
hey there folks,

twitter is aweful isnt it ? just getting into the habbit of using hf/posts for shares πŸ¦™πŸ¦™

Tonic/on-device-granite-3.0-1b-a400m-instruct

new granite on device instruct model demo , hope you like it πŸš€πŸš€
TonicΒ 
posted an update 3 months ago
TonicΒ 
posted an update 3 months ago
TonicΒ 
posted an update 3 months ago
view post
Post
1858
πŸ™‹πŸ»β€β™‚οΈ Hey there folks ,

🦎Salamandra release by @mvillegas and team
@BSC_CNS https://huggingface.co/BSC-LT is absolutely impressive so far !

perhaps the largest single training dataset of high quality text to date of 7.8 trillion tokens in 35 European languages and code.

the best part : the data was correctly licenced so it's actually future-proof!

the completions model is really creative and instruct fine tuned version is very good also.

now you can use such models for multi-lingual enterprise applications with further finetunes , long response generation, structured outputs (coding) also works.

check out πŸ‘‡πŸ»
the collection : BSC-LT/salamandra-66fc171485944df79469043a
the repo : https://github.com/langtech-bsc/salamandra
7B-Instruct demo : Tonic/Salamandra-7B
TonicΒ 
posted an update 3 months ago
view post
Post
1735
@mlabonne hey there πŸ™‹πŸ»β€β™‚οΈ I kinda got obsessed with your great model , and i found the endpoint for it in lambda labs, but basically i got rate limited / banned for trying to make my DPO dataset project, i was wondering if you all had an open ai compatible solution for me to make a great "thinking" sft + dpo dataset with all the splits πŸ™πŸ»πŸ™πŸ» kinda desparate , it's true , but was looking forward to a nice write ups πŸš€πŸš€πŸš€
  • 1 reply
Β·
TonicΒ 
posted an update 3 months ago
TonicΒ 
posted an update 4 months ago
view post
Post
1243
πŸ™‹πŸ»β€β™‚οΈ Hey there folks,

stepfun-ai/GOT-OCR2_0 is in top trending and spaces of the week for the second week straight !!

This is madness 😱

πŸš€πŸš€check out my demo here : Tonic/GOT-OCR
TonicΒ 
posted an update 4 months ago
TonicΒ 
posted an update 4 months ago
view post
Post
2732
πŸ™‹πŸ»β€β™‚οΈHey there folks ,

@ucaslcl released a new OCR model , that'sπŸ‘πŸ»πŸ‘πŸ» fantastic : https://huggingface.co/ucaslcl/GOT-OCR2_0

GPU : Tonic/GOT-OCR
Gradio Demo (Image Edit) : Tonic1/ImageEdit-GOT-OCR

Model : https://huggingface.co/ucaslcl/GOT-OCR2_0
Official demo : https://huggingface.co/spaces/ucaslcl/GOT_online
github : https://github.com/Ucas-HaoranWei/GOT-OCR2.0
Β·
TonicΒ 
posted an update 4 months ago
view post
Post
1108
πŸ™‹πŸ»β€β™‚οΈ hey there folks ,

made an image similarity demo to test out the mistral-community/pixtral-12b-240910 model .

If anyone knows how to generate captions with it , please do let me know x πŸš€

here's the demo : Tonic/Pixtral

hope you like it πŸ€—
TonicΒ 
posted an update 4 months ago
view post
Post
2661
So awesome , now i can deploy a jupyterlab on huggingface and deploy gradio from the jupyterlab
TonicΒ 
posted an update 4 months ago
TonicΒ 
posted an update 4 months ago
view post
Post
2525
πŸ™‹πŸ»β€β™‚οΈhey there folks ,

βœ’οΈInkubaLM has been trained from scratch using 1.9 billion tokens of data for five African languages, along with English and French data, totaling 2.4 billion tokens of data. It is capable of understanding and generating content in five African languages: Swahili, Yoruba, Hausa, isiZulu, and isiXhosa, as well as English and French.

model lelapa/InkubaLM-0.4B
demo Tonic/Inkuba-0.4B