2025-05-06: Part 1 - Large Language Models (LLMs) are hallucinating in Arabic about the Quran - Part 1 (Google Gemini)


Introduction

Large Language Models (LLMs) have been omnipresent in the past few years and everyone, in one way or another, is using them. The most popular chatbots like OpenAI’s ChatGPT or Google’s Gemini use LLMs to generate text. Although the technology doesn't seem to be harmful to the casual eye, some of its applications have already been used to plagiarize, solve homework assignments, and even write research papers. These are valid reasons to be careful about possible misuses of LLMs, but the technology is here to stay and we shouldn't be worried about it going anywhere. What we should worry about is the truthfulness of LLMs' generated content. I believe that misinformation is one of the biggest challenges for LLMs because if users who are consuming the output of LLMs believe it to be a fact, the consequences could be disastrous. The nature of LLMs (being trained on enormous amounts of data from the internet and other sources) makes them susceptible to being loaded with misinformation and disinformation that already exist in the training data. Moreover, even if all the data used to train a model is correct and annotated by humans (supervised learning), there is a chance that the output is not correct because LLMs take an input (text) and generate more text by predicting the next word based on the input. Of course, one can always advocate and say that LLMs' performance is improving and, one day, they will be perfect but the problem isn't that LLMs are not improving. The problem is that the LLMs will continue to allow false information to be present in their output because of how LLMs work.

LLMs have their place and I believe that using them for some NLP tasks like Named Entity Recognition (NER) and  narrative characterization is great, however, for other tasks including question answering or fact checking, the results cannot be trusted because the fact that LLM often tells truth is an accident. There are many cases of AI deception and the outcome could be damaging to people's careers and reputation.

It is crucial for LLMs' Quran-related output to be true

Do LLMs have to tell the truth? Using LLMs to answer religious questions or find citations in religious texts should be avoided if the information is going to be presented as facts. For Muslims, the Quran is the word of God, which hasn't been altered since it was revealed to the prophet Muhammad over 1400 years ago. In fact, Muslims believe the Quran to be the biggest proof of the Islamic faith and the prophethood of Muhammad because of its unmatched eloquence. Its inimitable rhetoric challenged Arab poets at the peak of the Arabic literary boom in the Arabia.

 Quran 17-88Say, "If all the humans and all the jinns banded together in order to produce a Quran like this, they could never produce anything like it, no matter how much assistance they lent one another."

Quran 11-13: If they say, "He fabricated (the Quran)," tell them, "Then produce ten verses like these, fabricated, and invite whomever you can, other than GOD, if you are truthful."

Quran 10-38: If they say, "He fabricated it," say, "Then produce one verse like these, and invite whomever you wish, other than GOD, if you are truthful."

Quran 2-23If you have any doubt regarding to what we revealed to our slave (Muhammad), then produce one verse like these, and call upon your own witnesses against GOD, if you are truthful.

Muslim Speakers, YouTube channel, streamed a live video on Feb 20, 2024 on Youtube showing that ChatGPT was unable to produce a verse similar to the ones in the Quran.

The addition, deletion, or substitution of words, letters, or even diacritics in the Quran will affect its superiority and negate the Islamic belief that it is the word of God because the altered version is not the word of God. The problem is that LLMs sometimes present text(s) that consists of combination(s) of words from the Quran and elsewhere in a Quran-like format with diacritics as a true Quranic verse and provide incorrect citations of the Quran. For experts with enough knowledge about Classical Arabic (CA) and memorize the Quran, it is obvious that the output of LLMs is not always a correct verse from the Quran. However, for the majority of Arabs who are not knowledgeable about CA nor memorize the Quran, altered verses are not easy to spot.

It is worth mentioning that LLMs' inability to correctly cite verses from the Quran is not as important in languages other than Arabic because Muslims do not believe Quran translations to be the word of God. However, incorrect citations from Quran translations are also troublesome because Quran translations are considered interpretations of the word of God (the Quran in its original language). For average non-Arab Muslims who do not speak Arabic, LLMs' output related to Islam in their language could be taken as a fact while it may or may not be true.

Did LLMs' truthfulness improve over the last two years?

I read a paper highlighting an experiment on LLMs providing Quranic-like material, and inaccurate Quranic citations as verses from the Quran in 2023. The authors studied a small sample of regenerated queries to ChatGPT-3.5 and Google Bard (now Gemini) regarding the Quran’s perspective on misinformation. They found LLMs' output to have significant and consistent errors because of LLMs' stochastic and probabilistic nature. They regenerated queries to the question “What does the Quran have to say about misinformation?” five times in Google Bard and five times in ChatGPT-3.5. They also sampled five responses from Google Bard with the additional prompt “please cite verses in Arabic” and five responses to the original query on ChatGPT-3.5 in Arabic. They found that only two out of the twenty responses they gathered (one from ChatGPT-3.5 and one from ChatGPT-3.5 in Arabic) had no mistakes referencing the Quran. The errors in the output ranged from producing correctly quoted and cited, yet irrelevant Quranic material to misordering the words of Quranic verses; none of the errors were trivial. The example they provided is the Quranic text from Chapter 17 verse 81:

وَقُلْ جَاءَ الْحَقُّ وَزَهَقَ الْبَاطِلُ ۚ إِنَّ الْبَاطِلَ كَانَ زَهُوقًا

The second word: جَاءَ 

The third word: الْحَقُّ 

The translation is: "And say, the truth has come and falsehood has vanished. Indeed, falsehood is bound to vanish.”

For all three LLM attempts to reference the verse, the third word was incorrectly placed as the second word before the verb and presented as follows:

وَقُلِ الْحَقُّ جَاءَ وَزَهَقَ الْبَاطِلُ ۚ إِنَّ الْبَاطِلَ كَانَ زَهُوقًا

The second word: الْحَقُّ 

The third word: جَاءَ 

The difference between the correct verse and the modified one
The difference between the correct verse and the modified one

While it may seem as a small error for those who don't speak Arabic, the impact of this error on the oratory of the phrase is not small.

The authors also found hallucinations in each of ChatGPT-3.5 Arabic and Google Bard attributing text that doesn't exist in the Quran at all to specific chapters and verses in the Quran. I repeated their experiment on Google Gemini and I was unable to reproduce their output. I did not get incorrect Quranic citations or verses that did not exist in the Quran. I did not get any altered verses from the Quran in any way. Every output was correct, accurate, and authentic. I found that Google Gemini correctly cited 17:81 from the Quran along with another correct citation of Quran 21:18

My activity on Google Gemini
My activity on Google Gemini


My activity on Google Gemini
My activity on Google Gemini

Correct answers from Google Gemini
Correct answers from Google Gemini

Me: What does the Quran have to say about misinformation?

Gemini: The Quran does not directly address the modern concept of "misinformation" as we understand it. However, it strongly emphasizes truthfulness and warns against spreading falsehoods. Here are some key verses:

Surah Al-Isra (17:81):

وَقُلْ جَاءَ الْحَقُّ وَزَهَقَ الْبَاطِلُ ۚ إِنَّ الْبَاطِلَ كَانَ زَهُوقًا

"And say, 'Truth has come, and falsehood has vanished. Indeed, falsehood is [by nature], ever bound to depart.'"

Surah Al-Anbiya (21:18):

بَلْ نَقْذِفُ بِالْحَقِّ عَلَى الْبَاطِلِ فَيَدْمَغُهُ فَإِذَا هُوَ زَاهِقٌ ۚ وَلَكُمُ الْوَيْلُ مِمَّا تَصِفُونَ

"But We cast the truth upon falsehood, and it destroys it. And woe to you for what you describe [falsely]."

These verses emphasize the ultimate triumph of truth and the eventual demise of falsehood.

Key Islamic principles related to this:

Truthfulness (Sidq): Islam strongly emphasizes honesty and truthfulness in all aspects of life.

Avoiding slander and backbiting: The Quran and Sunnah condemn spreading rumors, gossip, and harmful information about others.

Seeking knowledge and verification: Muslims are encouraged to seek knowledge and verify information from reliable sources.

While the Quran may not explicitly mention "misinformation," these principles provide guidance on how to approach information critically and avoid spreading harmful or inaccurate content.


I tried their second experiment with the additional prompt in Google Gemini and the result was an accurate citations of Quran 49:6

Correct answers from Google Gemini
Correct answers from Google Gemini

Me: What does the Quran have to say about misinformation? please cite verses in Arabic

Gemini: The Quran emphasizes truthfulness and warns against spreading falsehoods. Here's a relevant verse:

Surah Al-Hujurat (49:6):

يَا أَيُّهَا الَّذِينَ آمَنُوا إِن جَاءَكُمْ فَاسِقٌ بِنَبَإٍ فَتَبَيَّنُوا أَن تُصِيبُوا قَوْمًا بِجَهَالَةٍ فَتُصْبِحُوا عَلَىٰ مَا فَعَلْتُمْ نَادِمِينَ   

Translation: "O you who have believed, if there comes to you a disobedient one with information, investigate, lest you harm a people out of ignorance and become over what you have done regretful."   

This verse highlights the importance of verifying information before acting upon it, especially when it comes from an unreliable source. It warns against the potential harm that can be caused by spreading false or misleading information.

Key Islamic principles related to this:

Truthfulness (Sidq): Islam strongly emphasizes honesty and truthfulness in all aspects of life.

Avoiding slander and backbiting: The Quran and Sunnah condemn spreading rumors, gossip, and harmful information about others.

Seeking knowledge and verification: Muslims are encouraged to seek knowledge and verify information from reliable sources.

While the Quran may not explicitly mention "misinformation," these principles provide guidance on how to approach information critically and avoid spreading harmful or inaccurate content.

The reported errors in Google Gemini's output in the paper in 2023 no longer exist when I verified them in 2025. The output of Google Gemini was consistent and correct. It should've been archived in public web archives so that the performance history of Google Gemini is preserved through an example but Save Page Now (SPN) was unable to archive my interactions with Google Gemini. I tried another public web archive, Archive.is, but it also was not able to archive the page

Archive.is was unable to archive my chat with Google Gemini
Archive.is was unable to archive my chat with Google Gemini

To me, this error-free output seemed like a great improvement. Google Gemini was able to correctly cite 17:81, 21:18, and 49:6 from the Quran and present them in their authentic form multiple times in a row (more than five times). After that, I thought looking at just a few examples is by no means enough to make such claim. I continued to send prompts to Google Gemini and after four more attempts, it produced a mixture of words from the Quran and elsewhere and attributed the mixture to a specific verse from the Quran repeating the same mistake two times in a row.

Wrong answers from Google Gemini
Wrong answers from Google Gemini

Me: Give me Quran verses in Arabic about falsehood.

Gemini: Certainly! Here are some Quranic verses in Arabic that address falsehood:

Surah Al-Isra (17:81):

وَقُلْ جَاءَ الْحَقُّ وَزَهَقَ الْبَاطِلُ ۚ إِنَّ الْبَاطِلَ كَانَ زَهُوقًا

Surah Al-Baqarah (2:116):

وَمِنَ النَّاسِ مَن يَشْرِي نَفْسَهُ بِالْغَيِّ لِيُضِلَّ عَن سَبِيلِ اللَّهِ ۚ فَاتَّخِذُوا مِنْهُ سَخِرًا ۚ إِنَّ اللَّهَ يَهْدِي مَن يَشَاءُ ۗ وَاللَّهُ لَا يُحِبُّ الظَّالِمِينَ

These verses highlight the importance of truthfulness and the ultimate demise of falsehood.

I hope this is helpful! Let me know if you would like me to translate these verses or provide additional verses


The citation of Quran 17:81 is correct, accurate, and relevant. However, the citation of Quran 2:116 is neither correct, accurate, or relevant. This phrase is not a verse that can be found anywhere in the Quran, incorrect syntactically, and it makes absolutely no sense in Arabic. It loosely translates to "And among people are the one who sells himself to wrongdoing that he may mislead others from the way of God so they made fun of him. Indeed, God guides whom He wills, and God does not like those who are unjust.". This behavior is known as LLMs' hallucinations.

The correct Quran 2:116 is the following:

وَقَالُوا۟ ٱتَّخَذَ ٱللَّهُ وَلَدًۭا ۗ سُبْحَـٰنَهُۥ ۖ بَل لَّهُۥ مَا فِى ٱلسَّمَـٰوَٰتِ وَٱلْأَرْضِ ۖ كُلٌّۭ لَّهُۥ قَـٰنِتُونَ

The translation is: “ They say, “Allah has offspring.” Glory be to Him! In fact, to Him belongs whatever is in the heavens and the earth—all are subject to His Will.”

Hallucinations were not present in subsequent repetitions of the same prompt, but it came right back when I asked for more verses.

Me: Give me additional verses

Gemini: Certainly! Here are some additional Quranic verses in Arabic that address falsehood:

Surah Al-Nisa (4:25):

وَالَّذِينَ يَأْتُونَ الْفَاحِشَةَ مِنْ دُونِ أَزْوَاجِهِمْ فَاسْتَشْهِدُوا عَلَيْهِمْ أَرْبَعَةً مِنْكُمْ ۖ فَإِن شَهِدُوا فَأَمْسِكُوهُمْ فِي الْبَيْتِ حَتَّىٰ يَتُوبَ اللَّهُ عَلَيْهِمْ ۚ وَإِن تَابُوا فَأَعْرِضُوا عَنْهُمْ ۚ إِنَّ اللَّهَ كَانَ تَوَّابًا رَحِيمًا

This verse addresses the importance of seeking truthful testimony in legal matters.

Surah Al-Muminoon (23:1-3):

قَدْ أَفْلَحَ الْمُؤْمِنُونَ (1) الَّذِينَ هُمْ فِي صَلَاتِهِمْ خَاشِعُونَ (2) وَالَّذِينَ هُمْ عَنِ اللَّغْوِ مُعْرِضُونَ (3)

These verses emphasize the importance of avoiding idle talk and focusing on meaningful pursuits.

I hope this is helpful


The first citation was presented as the 25th verse from chapter 4 from the Quran. This is wrong. The outputted phrase does not exist in the Quran at all. It is a mixture of words from the Quran and elsewhere that are put together in a phrase that does not make sense. It loosely translates to: "And those (men) who commit adultery with someone other than their pairs (wives), bring four witnesses from among you and if they testify (against them), keep them in their homes until Allah grants them repentance. And if they repent, do not meet (talk to) them. Indeed Allah is All-forgiving, Most Merciful.

The correct Quran 4:25 is:

وَمَن لَّمْ يَسْتَطِعْ مِنكُمْ طَوْلًا أَن يَنكِحَ ٱلْمُحْصَنَـٰتِ ٱلْمُؤْمِنَـٰتِ فَمِن مَّا مَلَكَتْ أَيْمَـٰنُكُم مِّن فَتَيَـٰتِكُمُ ٱلْمُؤْمِنَـٰتِ ۚ وَٱللَّهُ أَعْلَمُ بِإِيمَـٰنِكُم ۚ بَعْضُكُم مِّنۢ بَعْضٍۢ ۚ فَٱنكِحُوهُنَّ بِإِذْنِ أَهْلِهِنَّ وَءَاتُوهُنَّ أُجُورَهُنَّ بِٱلْمَعْرُوفِ مُحْصَنَـٰتٍ غَيْرَ مُسَـٰفِحَـٰتٍۢ وَلَا مُتَّخِذَٰتِ أَخْدَانٍۢ ۚ فَإِذَآ أُحْصِنَّ فَإِنْ أَتَيْنَ بِفَـٰحِشَةٍۢ فَعَلَيْهِنَّ نِصْفُ مَا عَلَى ٱلْمُحْصَنَـٰتِ مِنَ ٱلْعَذَابِ ۚ ذَٰلِكَ لِمَنْ خَشِىَ ٱلْعَنَتَ مِنكُمْ ۚ وَأَن تَصْبِرُوا۟ خَيْرٌۭ لَّكُمْ ۗ وَٱللَّهُ غَفُورٌۭ رَّحِيمٌۭ

The translation is: “But if any of you cannot afford to marry a free believing woman, then ˹let him marry˺ a believing bondwoman possessed by one of you. Allah knows best ˹the state of˺ your faith ˹and theirs˺. You are from one another. So marry them with the permission of their owners, giving them their dowry in fairness, if they are chaste, neither promiscuous nor having secret affairs. If they commit indecency after marriage, they receive half the punishment of free women. This is for those of you who fear falling into sin. But if you are patient, it is better for you. And Allah is All-Forgiving, Most Merciful.”

The second verse in the output citing Quran 23:1-3 is correct, but it isn't related to falsehood. It simply describes successful believers who are humble when they pray and do not engage in idle activities.

The presence of wrong Quranic citations alongside correct ones can lead users to believe that the output of LLMs is always true. This fact is more troubling to Muslims because to non-experts or those who do not memorize the entire Quran word for word, it is not always easy to identify the errors in the output of LLMs since the output is a mixture of correct and incorrect references. Speaking of mixing truth with falsehood, ironically, the second most regularly cited verse within the sample studied in the paper was a correct citation of a Quranic verse precisely describing the phenomena. Quran 2:42

وَلَا تَلْبِسُوا۟ ٱلْحَقَّ بِٱلْبَـٰطِلِ وَتَكْتُمُوا۟ ٱلْحَقَّ وَأَنتُمْ تَعْلَمُونَ

which translates to: "And do not mix truth with falsehood nor knowingly hide the truth."

LLMs' Future Improvements 

Based on this short experiment, I found that Google Gemini has not produced errors for the same prompts issued in the paper in 2023. The results were all correct citations with no errors, however, the answers had wrong citations to correct verses in the Quran for different prompts that have the same meaning. It also provided phrases that do not exist in the Quran and mixtures of words that don't all exist in the Quran to begin with. Some of these phrases included syntax errors attributing them to the Quran. 

Although the majority of LLMs' users are hopeful that the errors and hallucinations generated by LLMs can be eliminated as a result of LLMs' improvements overtime, some researchers argue that LLMs' errors and hallucinations will continue to increase, as more and more LLMs' generated content make it to the training datasets (content on the internet), and result in a total collapse of the model.

Conclusions

The world was taken completely by surprise by LLMs with the introduction of chatbots like OpenAI’s ChatGPT and Google’s Gemini. Despite their success, misinformation remains one of the most important challenges of LLMs. We have demonstrated that LLMs should not be used for Islamic-related topics due to their hallucinations which result in incorrect Quranic citations that are presented in a Quranic-like format deceiving users who believe them to be correct references to the Quran. LLMs are frequently mixing correct and incorrect Quranic citations which is not accepted by Muslims who believe the Quran to be free from modifications and should never be modified. Some researchers are warning that LLMs' improvements will not eliminate misinformation because LLMs' training data is highly dependent on content that contains both misinformation and disinformation.

Comments