John6666 (John Smith)

reacted to not-lain's post with 🔥 about 10 hours ago

Post

664

Published a new blogpost 📖
In this blogpost I have gone through the transformers' architecture emphasizing how shapes propagate throughout each layer.
🔗 https://huggingface.co/blog/not-lain/tensor-dims
some interesting takeaways :

reacted to ezgikorkmaz's post with 👀 about 10 hours ago

Post

197

If you are interested in reinforcement learning, I am organizing the AAAI 2025 Tutorial! Find the details below:

Link: https://sites.google.com/view/aisafety-aaai2025

reacted to m-ric's post with 🚀 about 10 hours ago

Post

182

𝗢𝗦-𝗚𝗲𝗻𝗲𝘀𝗶𝘀: 𝗻𝗲𝘄 𝗿𝗲𝘀𝗲𝗮𝗿𝗰𝗵 𝗽𝗮𝗽𝗲𝗿 𝗽𝗿𝗼𝗽𝗼𝘀𝗲𝘀 𝗮 𝗻𝗼𝘃𝗲𝗹 𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴 𝗱𝗮𝘁𝗮 𝗴𝗲𝗻𝗲𝗿𝗮𝘁𝗶𝗼𝗻 𝗺𝗲𝘁𝗵𝗼𝗱 𝗳𝗼𝗿 𝗖𝗹𝗮𝘂𝗱𝗲-𝗖𝗼𝗺𝗽𝘂𝘁𝗲𝗿-𝗨𝘀𝗲-𝗹𝗶𝗸𝗲 𝗮𝗴𝗲𝗻𝘁𝘀, 𝘄𝗶𝘁𝗵 𝗶𝗺𝗽𝗿𝗲𝘀𝘀𝗶𝘃𝗲 𝗿𝗲𝘀𝘂𝗹𝘁𝘀! 🔥

The main bottleneck in building GUI agents it to find training data.
GUI Agent trajectories are not easy to get by. Crowdsourcing trajectories, then manually annotating them, could be an option, but at scale, it's hard to do

You could use synthetic data generation (ask 1000s small existing GUI agents to solve tasks, keep only successful runs). But then it's hard to come up with many high level-tasks.

➡️ Well, a novel technique was just published that creates a new promising paradigm for synthetic data generation: Shanghai AI Lab researchers propose OS-Genesis, a novel way to create training data for GUI agents that flips the traditional approach on its head. Instead of starting with predefined tasks and having humans or machines execute them, OS-Genesis first explores the interface naturally, then derives meaningful tasks from those interactions.

🔍 Exploration-driven vs task-driven approach:
‣ Instead of starting with tasks, OS-Genesis first explores GUIs by clicking and interacting
‣ It then reverse-engineers high-level tasks from successful interaction patterns
‣ This leads to more natural and diverse training data than predefined tasks

🎯 Novel reward model for trajectory quality:
‣ Rather than discarding incomplete trajectories, OS-Genesis scores them based on coherence and completion
‣ This preserves valuable partial successes that would otherwise be wasted

🏆 Superior results across environments:
‣ Nearly doubles performance on AndroidWorld (9.8% → 17.4%)

By the way, this field of GUI agents is still in infancy, so you can still make a difference with "low-cost" setups: their paper gets SOTA results with only 8xA100!

Read the paper here 👉 OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis (2412.19723)

reacted to Kseniase's post with 🚀 about 10 hours ago

Post

267

10 AI Systems for Scientific Research

Almost every AI researcher has studied or conducted a large number of AI research papers. So, it's quite logical that researchers are trying to create AI systems to help conduct research. Creating scientific research could be much easier and more varied if we use LLMs and AI assistants tailored for this purpose. Just imagine how interesting it would be to read high-quality research about AI made by an AI agent.

Today, we offer you to explore these 10 AI systems for scientific research:

1. Agent Laboratory framework helps researchers input their ideas by generating a research report and code repository: Agent Laboratory: Using LLM Agents as Research Assistants (2501.04227)

2. AI Scientist performs fully automated scientific discovery including creating ideas: The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery (2408.06292)

3. SciMON generates new ideas derived from the scientific literature: Learning to Generate Novel Scientific Directions with Contextualized Literature-based Discovery (2305.14259)

4. ResearchAgent implements LLMs to automate idea generation, methods, and experiment design, and ReviewingAgents' feedback to refine ideas: ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models (2404.07738)

5. Scientific Generative Agent (SGA) discovers novel, coherent solutions in physics and molecular design: LLM and Simulation as Bilevel Optimizers: A New Paradigm to Advance Physical Scientific Discovery (2405.09783)

6. MLRCopilot boosts machine learning research: MLR-Copilot: Autonomous Machine Learning Research based on Large Language Models Agents (2408.14033)

7. SciAgents accelerates material science discovery through combining knowledge graphs, LLMs, and multi-agent systems. SciAgents: Automating scientific discovery through multi-agent intelligent graph reasoning (2409.05556)

8. VirSci multi-agent system mimics teamwork among scientists. Two Heads Are Better Than One: A Multi-Agent System Has the Potential to Improve Scientific Idea Generation (2410.09403)

9. Chain-of-Ideas (CoI) agent organizes research into a chain structure. Chain of Ideas: Revolutionizing Research in Novel Idea Development with LLM Agents (2410.13185)

10. A system with CycleResearcher and CycleReviewer generates research papers and peer reviews: CycleResearcher: Improving Automated Research via Automated Review (2411.00816)

LLM4SR: A Survey on Large Language Models for Scientific Research (2501.04306) is worth exploring to study and analyze more systems for scientific research

reacted to roseking's post with 👀 about 10 hours ago

Post

228

I have an idea that I'm currently working on – developing a standard API for large models. This standard would ensure compatibility with all known protocols, enabling large models worldwide to be accessed through a unified API. For instance, it could connect models like qwen, deepseek, and GLM from China. However, I haven't found a suitable forum for democratic discussion on this yet. I'm unsure of the next steps. 😥😥😥

1 reply

·

reacted to awacke1's post with 👀 about 14 hours ago

Post

1063

Deep Research Evaluator was asked:
" design a coral defense mechanism that upon sensing say an acid that's causing coral reefs to have a carbon dioxide issue it develops... please create a plan and a design for this\n
"
It picks these three as best combined solution.

1. [Reef-insight: A framework for reef habitat mapping with clustering methods via remote sensing]...
2. Phone a friend: [Learning to Communicate and Collaborate in a Competitive Multi-Agent Setup to Clean the Ocean from Macroplastics]...
3. World Solve: [Dependence of Physiochemical Features on Marine Chlorophyll Analysis with Learning Techniques]

To design a system that allows coralows coral reefs to respond to increased acidity levels in their environment, we can create a network of pH sensors and dispersal units that can detect changes in pH levels and release a base solution to neutralize the acid.

1. pH Sensors: The first component of the system would be a network of pH sensors placed strategically throughout the coral reef. These sensors would be small, durable, and able to withstand the harsh conditions of the ocean. They would be placed at various depths and locations within the reef to ensure accurate and comprehensive monitoring of pH levels.
2. Base Dispersal Units: Once the pH sensors detect a decrease in pH levels, they would trigger the base dispersal units to release a base solution into the water. These units would be strategically placed around the reef and would be able to release a controlled amount of base solution to neutralize the acidity in the water.
3. Water Dispersal Mechanism: The base dispersal units would be connected to a water dispersal mechanism that would allow the base solution to be distributed evenly around the reef. This could be achieved through a series of pipes or channels that would distribute the base solution in a controlled and targeted manner.

reacted to etemiz's post with 👀 about 22 hours ago

Post

541

-= DeepSeek V3 =-

After installing the new CUDA toolkit and compiling llama.cpp again I tested DeepSeek V3 yesterday.

In terms of human alignment DeepSeek V3 did worse on:
- health
- fasting
- nostr
- misinfo
- nutrition

did better on:
- faith
- bitcoin
- alternative medicine
- ancient wisdom

compared to DeepSeek 2.5. In my opinion overall it is worse than 2.5. And 2.5 wasn't that great.

There is a general tendency of models getting smarter but at the same time getting less wiser, less human aligned, less beneficial to humans.

I don't know what is causing this. But maybe synthetic dataset use for further training the LLMs makes it more and more detached from humanity. This is not going in the right direction.

My solution is to come up with a curator council to determine the datasets that are closest to human preference. "Humans that care about other humans the most" could be a definition of this dataset. What do you think?

reacted to FremyCompany's post with 👍 1 day ago

Post

323

🔀 Very cool demo of word-level alignment of paraphrased or cross-lingual sentences, from the new Fairly Multilingual ModernBERT embedding model:

Parallia/Fairly-Multilingual-ModernBERT-Token-Alignment

reacted to Severian's post with 👀 1 day ago

Post

452

🌱 Potential Made Simple: Free Life System/Productivity App based on Rythmn of Existence. No BS. No Catch. Just want to cut through the noise and help

The Origin Story

Inspired by Rob Dyrdek's "Rhythm of Existence" philosophy, this system has been expanded into a comprehensive life management tool featuring habit tracking, journaling, life statistics, and more. While I support entrepreneurs creating premium productivity apps, I believe self-improvement should never have financial barriers. That’s why this system is open source and free—no paywalls, premium features, or gatekeeping. Anyone can use it to start optimizing their life, ensuring accessibility for all.

How to Get Started

Two ways to access the system:

HuggingFace Version (Recommended)
- Visit Severian/Potential-Made-Simple
- Create a free HuggingFace account if needed.
- Duplicate the space to create your private version.
- Pro tip: Save it as a PWA for offline mobile use.

Google Sheets Version*
- Ideal for spreadsheet users or those avoiding new accounts.
- Access it https://docs.google.com/spreadsheets/d/1O2R0TCp0t27VZJuvkrz_gMJAl-nkwqeVyL3i6pN7aCo/edit?usp=sharing
- Save a copy and start tracking.

Features Beyond ROE

- Habit tracking
- Daily journaling with prompts
- Life statistics and visualizations
- Task management
- Meal tracking
- Progress metrics
- Historical data analysis
- And more!

Supporting the Project (Optional)

This system is free and always will be. If you find value in it, you can support my work at https://www.ko-fi.com/severian42. Contributions are entirely optional and don’t unlock extra features—they’re simply a way to say thanks.

My mission is to help as many people as possible optimize their lives and reach their full potential. Remember, self-improvement doesn’t have to come with a high price tag.

reacted to rmayormartins's post with 👀 1 day ago

Post

388

Invite to LatinAI "AI Developers from Latin America"
_____________
Let’s come together to advance Artificial Intelligence in Latin America!
Join AI Developers from Latin America and be part of a collaborative community sharing models, datasets, and projects from our region.
🚀 Participate, contribute, and connect with developers across Latin America.
🌎 Create and share resources relevant to our countries!
🔗 Join here https://huggingface.co/LatinAI
_____________
¡Unámonos para fortalecer la Inteligencia Artificial en América Latina!
Únete a AI Developers from Latin America y forma parte de una comunidad colaborativa para compartir modelos, conjuntos de datos y proyectos destacados de nuestra región.
🚀 Participa, contribuye y conecta con desarrolladores de toda América Latina.
🌎 ¡Crea y comparte recursos relevantes para nuestros países!
🔗 Únete aquí https://huggingface.co/LatinAI
_____________

reacted to MoritzLaurer's post with ❤️ 1 day ago

Post

1350

FACTS is a great paper from @GoogleDeepMind on measuring the factuality of LLM outputs. You can now download their prompt templates from @huggingface to improve LLM-based fact-checking yourself!

📏 The paper introduces the FACTS Grounding benchmark for evaluating the factuality of LLM outputs.

🤖 Fact-checking is automated by an ensemble of LLM judges that verify if a response is fully grounded in a factual reference document.

🧪 The authors tested different prompt templates on held-out data to ensure their generalization.

📚 It's highly educational to read these templates to learn how frontier labs design prompts and understand their limitations.

💾 You can now download and reuse these prompt templates via the prompt-templates library!

🔄 The library simplifies sharing prompt templates on the HF hub or locally via standardized YAML files. Let’s make LLM work more transparent and reproducible by sharing more templates like this!

Links 👇
- prompt-templates docs: https://moritzlaurer.github.io/prompt_templates/
- all templates on the HF Hub: MoritzLaurer/facts-grounding-prompts
- FACTS paper: https://storage.googleapis.com/deepmind-media/FACTS/FACTS_grounding_paper.pdf

reacted to singhsidhukuldeep's post with 🔥 2 days ago

Post

1396

Groundbreaking Survey on Large Language Models in Recommendation Systems!

Just read a comprehensive survey that maps out how LLMs are revolutionizing recommender systems. The authors have meticulously categorized existing approaches into two major paradigms:

Discriminative LLMs for Recommendation:
- Leverages BERT-like models for understanding user-item interactions
- Uses fine-tuning and prompt tuning to adapt pre-trained models
- Excels at tasks like user representation learning and ranking

Generative LLMs for Recommendation:
- Employs GPT-style models to directly generate recommendations
- Implements innovative techniques like in-context learning and zero-shot recommendation
- Supports natural language interaction and explanation generation

Key Technical Insights:
- Novel taxonomy of modeling paradigms: LLM Embeddings + RS, LLM Tokens + RS, and LLM as RS
- Integration methods spanning from simple prompting to sophisticated instruction tuning
- Hybrid approaches combining collaborative filtering with LLM capabilities
- Advanced prompt engineering techniques for controlled recommendation generation

Critical Challenges Identified:
- Position and popularity bias in LLM recommendations
- Limited context length affecting user history processing
- Need for better evaluation metrics for generative recommendations
- Controlled output generation and personalization challenges

This work opens exciting possibilities for next-gen recommendation systems while highlighting crucial areas for future research.

1 reply

·

reacted to AlexBodner's post with 👍 2 days ago

Post

1253

Just published a post explaining Monte Carlo Tree Search: the magic behind AlphaZero and now used to tackle reasoning benchmarks with LLMs. Check it out because it's a must know nowadays!

https://x.com/AlexBodner_/status/1877789879398244382

1 reply

·

reacted to danielhanchen's post with 🔥 2 days ago

Post

1941

We fixed many bugs in Phi-4 & uploaded fixed GGUF + 4-bit versions! ✨

Our fixed versions are even higher on the Open LLM Leaderboard than Microsoft's!

GGUFs: unsloth/phi-4-GGUF
Dynamic 4-bit: unsloth/phi-4-unsloth-bnb-4bit

You can also now finetune Phi-4 for free on Colab: https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Phi_4-Conversational.ipynb

Read our blogpost for more details on bug fixes etc: https://unsloth.ai/blog/phi4

reacted to vincentg64's post with 👀 2 days ago

Post

502

Blueprint: Next-Gen Enterprise RAG & LLM 2.0 – Nvidia PDFs Use Case

In my most recent articles and books, I discussed our radically different approach to building enterprise LLMs from scratch, without training, hallucinations, prompt engineering or GPU, while delivering higher accuracy at a much lower cost, safely, at scale and at lightning speed (in-memory). It is also far easier to adapt to specific corpuses and business needs, to fine-tune, and modify, giving you full control over all the components, based on a small number of intuitive parameters and explainable AI.

Now, I assembled everything into a well-structured 9-page document (+ 20 pages of code) with one-click links to the sources including our internal library, deep retrieval PDF parser, real-life input corpus, backend tables, and so on. Access to all this is offered only to those acquiring the paper. Our technology is so different from standard LLMs that we call it LLM 2.0.

This technical paper is much more than a compact version of past documentation. It highlights new features such as un-stemming to boost exhaustivity, multi-index, relevancy score vectors, multi-level chunking, various multi-token types (some originating from the knowledge graph) and how they are leveraged, as well as pre-assigned multimodal agents. I also discuss the advanced UI — far more than a prompt box — with unaltered concise structured output, suggested keywords for deeper dive, agent or category selection to increase focus, and relevancy scores. Of special interest: simplified, improved architecture, and upgrade to process word associations in large chunks (embeddings) even faster.

➡️ See how to get a free copy, at https://mltblog.com/4fPuvTb

reacted to dylanebert's post with 🚀 2 days ago

Post

1426

🟦 New Image-to-3D model from Stability AI

stabilityai/stable-point-aware-3d

here's how it looks, with TRELLIS for comparison

replied to nyuuzyou's post 2 days ago

You can search the collection from here.
https://huggingface.co/collections
I also created a space where you can search for collections and other items on a trial basis.
https://huggingface.co/spaces/John6666/hfsearch

reacted to jasoncorkill's post with 🚀 2 days ago

Post

1033

We uploaded huge human annotated preference dataset for image generation. Instead of just having people choose which model they preferer, we annotated an alignment score on a word by word basis for the prompt. rate the images on coherence, overall alignment and style preference. Those images that score badly were also given to annotators to highlight problem areas. Check it out! Rapidata/text-2-image-Rich-Human-Feedback

We also wrote a blog post for those who want a bit more detail:
https://huggingface.co/blog/RapidataAI/beyond-image-preferences

reacted to CultriX's post with ❤️ 2 days ago

Post

1552

# Space for Multi-Agent Workflows using AutoGen

Hi all, I created this "AutoGen Multi-Agent Workflow" space that allows you to experiment with multi-agent workflows.

By default, it allows code generation with built-in quality control and automatic documentation generation. It achieves this by leveraging multiple AI agents working together to produce high-quality code snippets, ensuring they meet the specified requirements.

In addition to the default, the space allows users to set custom system messages for each assistant, potentially completely changing the workflow.

# Workflow Steps
1. User Input:
- The user defines a prompt, such as "Write a random password generator using python."
- Outcome: A clear task for the primary assistant to accomplish.

2. Primary Assistant Work:
- The primary assistant begins working on the provided prompt.
It generates an initial code snippet based on the user's request.
- Outcome: An initial proposal for the requested code.

3. Critic Feedback:
- The critic reviews the generated code provides feedback or (if the output meets the criteria), broadcasts the APPROVED message.
(This process repeats until the output is APPROVED or 10 messages have been exchanged).
- Outcome: A revised Python function that incorporates the critic's feedback.

4. Documentation Generation:
- Once the code is approved, it is passed to a documentation assistant.
The documentation assistant generates a concise documentation for the final code.
- Outcome: A short documentation including function description, parameters, and return values.

Enjoy!
CultriX/AutoGen-MultiAgent-Example

3 replies

·

reacted to merve's post with ❤️ 2 days ago

Post

2949

What a beginning to this year in open ML 🤠
Let's unwrap! merve/jan-10-releases-677fe34177759de0edfc9714

Multimodal 🖼️
> ByteDance released SA2VA: a family of vision LMs that can take image, video, text and visual prompts
> moondream2 is out with new capabilities like outputting structured data and gaze detection!
> Dataset: Alibaba DAMO lab released multimodal textbook — 22k hours worth of samples from instruction videos 🤯
> Dataset: SciCap captioning on scientific documents benchmark dataset is released along with the challenge!

LLMs 💬
> Microsoft released Phi-4, sota open-source 14B language model 🔥
> Dolphin is back with Dolphin 3.0 Llama 3.1 8B 🐬🐬
> Prime-RL released Eurus-2-7B-PRIME a new language model trained using PRIME alignment
> SmallThinker-3B is a new small reasoning LM based on Owen2.5-3B-Instruct 💭
> Dataset: QWQ-LONGCOT-500K is the dataset used to train SmallThinker, generated using QwQ-32B-preview 📕
> Dataset: @cfahlgren1 released React Code Instructions: a dataset of code instruction-code pairs 📕
> Dataset: Qwen team is on the roll, they just released CodeElo, a dataset of code preferences 👩🏻‍💻

Embeddings 🔖
> @MoritzLaurer released zero-shot version of ModernBERT large 👏
> KaLM is a new family of performant multilingual embedding models with MIT license built using Qwen2-0.5B

Image/Video Generation ⏯️
> NVIDIA released Cosmos, a new family of diffusion/autoregressive World Foundation Models generating worlds from images, videos and texts 🔥
> Adobe released TransPixar: a new text-to-video model that can generate assets with transparent backgrounds (a first!)
> Dataset: fal released cosmos-openvid-1m Cosmos-tokenized OpenVid-1M with samples from OpenVid-1M

Others
> Prior Labs released TabPFNv2, the best tabular transformer is out for classification and regression
> Metagene-1 is a new RNA language model that can be used for pathogen detection, zero-shot embedding and genome understanding

John Smith PRO

AI & ML interests

Recent Activity

Organizations

John6666's activity