Tools such as OpenAI can on occasion give the impression that they are able to prove theorems and even generalize them. Whether this is a sign of real (artificial) intelligence or simply combining facts retrieved from technical papers and put things together without advanced logic, is irrelevant. A correct proof is a correct proof, no matter how the author – a human or a bot – came to it.
My experience is that tools like OpenAI make subtle mistakes, sometimes hard to detect. Some call it hallucinations. But the real explanation is that usually, it blends different results together and add transition words between sentences in the answer, to make it appear as a continuous text. Sometimes, this creates artificial connections between items that are loosely if at all related, without providing precise reference to the source, and the exact location within each reference. It makes it hard to double check and make the necessary corrections. However, the new generation of LLMs (see https://mltblog.com/4g2sKTv) offers that capability: deep, precise references.
Likewise, mathematicians usually make mistakes in the first proof of a new, challenging problem. Sometimes these are glitches that you can fix, sometimes the proof is fundamentally wrong and not fixable. It usually takes a few iterations to get everything right.
➡️ Read full article and learn how I proved a difficult result with the help of AI, at https://mltblog.com/4jqUiUD
9 Tips to Design Hallucination-Free RAG/LLM Systems
And in our case (see https://mltblog.com/4fPuvTb), with no training and zero parameter! By zero parameter, I mean no neural network parameters (the typical 40B you see in many LLMs, that stands for 40 billion parameters also called weights). We do indeed have a few intuitive parameters that you can fine-tune in real time.
Tips to make your system hallucination-free:
- We use sub-LLMs specific to each topic (part of a large corpus), thus mixing unrelated items is much less likely to happen.
- In the base version, the output returned is unaltered rather than reworded. The latter can cause hallucinations.
- It shows a high-level structured summary first, with category, tags, agents attached to each item; the user can click on the items he is most interested in based on summary, reducing the risk of misfit.
- The user can specify agents, tags or categories in the UI, it's much more than a prompt box. He can also include negative keywords, joint keywords that must appear jointly in the corpus, put a higher weight on the first keyword in the prompt, or favor the most recent material in the results.
- Python libraries can cause hallucinations. For instance, project and projected have the same stem. We use these libraries but with workarounds to avoid these issues that can lead to hallucinations.
- We return a relevancy score to each item in the prompt results, ranging from 0 to 10. If we cannot find highly relevant information in your augmented corpus, despite using a synonyms dictionary, the score will be low, telling you that the system knows that this particular item is not great. You can choose to no show items with a low score, though sometimes they contain unexpectedly interesting information (the reason to keep them).
- We show links and references, all coming from reliable sources. The user can double-check in case of doubt.
- We suggest alternate keywords to use in your next prompts (related concept)