Ksenia Se's picture
8 2

Ksenia Se

Kseniase

AI & ML interests

None yet

Recent Activity

posted an update about 11 hours ago
10 AI Systems for Scientific Research Almost every AI researcher has studied or conducted a large number of AI research papers. So, it's quite logical that researchers are trying to create AI systems to help conduct research. Creating scientific research could be much easier and more varied if we use LLMs and AI assistants tailored for this purpose. Just imagine how interesting it would be to read high-quality research about AI made by an AI agent. Today, we offer you to explore these 10 AI systems for scientific research: 1. Agent Laboratory framework helps researchers input their ideas by generating a research report and code repository: https://huggingface.co/papers/2501.04227 2. AI Scientist performs fully automated scientific discovery including creating ideas: https://huggingface.co/papers/2408.06292 3. SciMON generates new ideas derived from the scientific literature: https://huggingface.co/papers/2305.14259 4. ResearchAgent implements LLMs to automate idea generation, methods, and experiment design, and ReviewingAgents' feedback to refine ideas: https://huggingface.co/papers/2404.07738 5. Scientific Generative Agent (SGA) discovers novel, coherent solutions in physics and molecular design: https://huggingface.co/papers/2405.09783 6. MLRCopilot boosts machine learning research: https://huggingface.co/papers/2408.14033 7. SciAgents accelerates material science discovery through combining knowledge graphs, LLMs, and multi-agent systems. https://huggingface.co/papers/2409.05556 8. VirSci multi-agent system mimics teamwork among scientists. https://huggingface.co/papers/2410.09403 9. Chain-of-Ideas (CoI) agent organizes research into a chain structure. https://huggingface.co/papers/2410.13185 10. A system with CycleResearcher and CycleReviewer generates research papers and peer reviews: https://huggingface.co/papers/2411.00816 https://huggingface.co/papers/2501.04306 is worth exploring to study and analyze more systems for scientific research
View all activity

Articles

Organizations

Turing Post's profile picture Journalists on Hugging Face's profile picture Social Post Explorers's profile picture Hugging Face Discord Community's profile picture

Posts 4

view post
Post
261
10 AI Systems for Scientific Research

Almost every AI researcher has studied or conducted a large number of AI research papers. So, it's quite logical that researchers are trying to create AI systems to help conduct research. Creating scientific research could be much easier and more varied if we use LLMs and AI assistants tailored for this purpose. Just imagine how interesting it would be to read high-quality research about AI made by an AI agent.

Today, we offer you to explore these 10 AI systems for scientific research:

1. Agent Laboratory framework helps researchers input their ideas by generating a research report and code repository: Agent Laboratory: Using LLM Agents as Research Assistants (2501.04227)

2. AI Scientist performs fully automated scientific discovery including creating ideas: The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery (2408.06292)

3. SciMON generates new ideas derived from the scientific literature: Learning to Generate Novel Scientific Directions with Contextualized Literature-based Discovery (2305.14259)

4. ResearchAgent implements LLMs to automate idea generation, methods, and experiment design, and ReviewingAgents' feedback to refine ideas: ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models (2404.07738)

5. Scientific Generative Agent (SGA) discovers novel, coherent solutions in physics and molecular design: LLM and Simulation as Bilevel Optimizers: A New Paradigm to Advance Physical Scientific Discovery (2405.09783)

6. MLRCopilot boosts machine learning research: MLR-Copilot: Autonomous Machine Learning Research based on Large Language Models Agents (2408.14033)

7. SciAgents accelerates material science discovery through combining knowledge graphs, LLMs, and multi-agent systems. SciAgents: Automating scientific discovery through multi-agent intelligent graph reasoning (2409.05556)

8. VirSci multi-agent system mimics teamwork among scientists. Two Heads Are Better Than One: A Multi-Agent System Has the Potential to Improve Scientific Idea Generation (2410.09403)

9. Chain-of-Ideas (CoI) agent organizes research into a chain structure. Chain of Ideas: Revolutionizing Research in Novel Idea Development with LLM Agents (2410.13185)

10. A system with CycleResearcher and CycleReviewer generates research papers and peer reviews: CycleResearcher: Improving Automated Research via Automated Review (2411.00816)

LLM4SR: A Survey on Large Language Models for Scientific Research (2501.04306) is worth exploring to study and analyze more systems for scientific research
view post
Post
2423
10 Free Comprehensive Datasets for Supervised Fine-Tuning

High-quality datasets, their size and relevance directly impact the effectiveness of fine-tuning and the models' real-world applications. Among the numerous datasets for different tasks, it can be challenging to choose the most comprehensive dataset that best suits your purposes.

So today, we invite you to explore top 10 free datasets on natural language processing and maths:

1. fka/awesome-chatgpt-prompts proposes a huge variety of prompts that can be used with ChatGPT. Over 700 models were trained on this dataset.

2. HuggingFaceFW/fineweb from Hugging Face includes 15T tokens of cleaned and deduplicated English web data. It’s suitable for LLM training, benchmarking, model validation.

3. HuggingFaceFW/fineweb-2 is an another version of FineWeb with high-quality pretraining data to over 1000 languages.

4. O1-OPEN/OpenO1-SFT with Chinese and English data can be used for Chain-of-Thought activation.

5. yahma/alpaca-cleaned is a curated version of the original Alpaca Dataset released by Stanford.

6. lmsys/lmsys-chat-1m with 1 million real-world conversations with 25 state-of-the-art LLMs offers diverse use cases, like content moderation, safety benchmarks, and training instruction-following models.

7. allenai/dolma from Allen AI includes 3T tokens from a diverse mix of web content, academic publications, code, books, and encyclopedic materials.

Math datasets:

1. HuggingFaceTB/finemath consists of educational math content and has two versions: 34B tokens and 54B tokens.

2. amphora/QwQ-LongCoT-130K for training O1-like LLMs.

3. openai/gsm8k for training multi-step reasoning.

models

None public yet

datasets

None public yet