2025-05-06: Part 3 - Large Language Models (LLMs) are hallucinating in Arabic about the Quran (DeepSeek)
Introduction By the time I finished trying to reproduce the results in the paper I reviewed in Part 1 and Part 2 of this blog post, DeepSeek released its first free chatbot app DeepSeek-V3 on January 20th 2025. I could not fight the urge to see how it compares to Open AI's ChatGPT and Google Gemini. The purpose of this experiment is to provide a disproof by counter example , that, contrary to popular belief, LLMs are not capable of producing error-free answers to questions. I am using prompts to find Arabic verses in the Quran on misinformation. I repeated the same experiment I did with Google Gemini and ChatGPT-4o in January 2025; the results were not better. In addition to being slower, I kept getting the annoying message "The server is busy message. Please try again later." which I didn't try to find a solution for because the service got restored when I waited and tried again later. For prompts tested in the paper, DeepSeek's answers to the first prompt (...