Saidot Library
Product
Insights
EU AI Act
AboutPricing
Log in
Get started
ProductInsightsEU AI ActAboutPricingLog in
Saidot Library

Quick guide: What are the risks of AI hallucinations, and how to mitigate them?

Generative AI (GenAI) tools, like ChatGPT, can be asked to generate a wide range of outputs. These models are often trained on extensive data sets and prompted to help users with seemingly unlimited tasks.  

Sometimes, model output does not align with the prompt, training data, or truth. Recently, the media has been covering such AI hallucinations widely. Despite the organisational benefits of AI models in a wide array of applications, the possibility of hallucinations necessitates caution, especially in high-risk applications.

By reading this article, you’ll learn:

  • What a hallucination is in this context
  • Why AI systems hallucinate
  • Why hallucinations are a problem
  • What the ethical concerns with hallucinations are
  • How hallucinations can be measured
  • How to manage the risks associated with hallucinations
  • Methods of mitigating the risks associated with hallucinations

What is a hallucination in the AI context?

It is broadly agreed that hallucination describes an AI model’s output that looks coherent and believable but is factually incorrect, nonsensical or not rooted in the training data. This phenomenon can emerge in different ways, including as false information and irrelevant responses. This incorrect information often blends in well with factually correct information, making hallucinations difficult to identify. Use of the term hallucination itself it debated, as it ascribes a human characteristic to artificial intelligence. Researchers sometimes use alternatives, including confabulation and fabrication, though the term hallucination is still widely use.

In large language models (LLMs), AI hallucinations take the form of text that looks accurate but is false or not supported by training data. Predictions of what the next word or sentence should be are based on patterns, not on fact verification. In vision models, hallucinations look like details or objects in a generated image that were not part of the original input. Hallucinations in both types of models are output that looks convincing but is not rooted in reality.

In specific use cases, hallucinations can be leveraged for creative purposes. The unpredictable nature of patterns learned from training data can cause hallucinations that unlock creative potential in artistic endeavours, such as creative writing and visual arts, data visualisation, and game development. Unfortunately, in most use cases, hallucinations remain a risk to manage.

Hallucinations could look like:

  • Prompting an LLM to edit a research paper draft and finding that the output contains fictional citations, crediting knowledge to sources that do not exist.
  • A customer-service chatbot responds to a customer’s question with false information, such as a non-existent clause in the organisation’s terms and conditions.
  • An image generation model is asked to create an image of a famous person and adds a sixth finger.

Why do AI systems hallucinate?  

AI models rely on patterns identified in a large set of training data to make predictions. These learned patterns allow the model to connect inputs and outputs. When we use a large language model (LLM) like ChatGPT, the input – our prompt – is processed through many layers of neural networks, which predict the most likely next word or sentence in the output. In short, LLMs are designed to generate the most likely output based on patterns learned from training data, not factual accuracy.  It’s essentially all probabilities.

Similarly, vision models are trained to predict and classify new images based on patterns learned from analysing big datasets of labelled images. Like LLMs, they rely on learned patterns instead of information when making their predictions. When a model encounters incomplete, ambiguous, or missing data, it can fill in the gaps with invented information that sounds believable but isn’t based on facts – in other words, the model hallucinates.  

AI models lack the intrinsic comprehension of the real-world that comes natural to humans, such as contextual knowledge of objects, situations, and other fundamental aspects associated with the prompt, increasing the likelihood of hallucinations. Methods like grounding provide models with some of this necessary real-world context to help bridge the gap between pattern-based predictions and reality. However, even AI models better rooted in facts can hallucinate.

Why are hallucinations a problem?

When models are used in real-world applications, significant consequences can arise from AI hallucinations. For example, if a model makes inaccurate medical diagnoses, or if a court orders an organisation must honour a hallucinated clause in their terms and conditions. That’s why it’s important to pay close attention to the truthfulness of generated output.

Let’s say a copywriter asks an LLM to write safety instructions for a new toaster based on a product description and a safety requirements list. The output generated by the model is eloquent, persuasive, and grammatically perfect.

At first glance, it looks like any other set of safety instructions. However, it included a dangerous suggestion about using the toaster underwater. Perhaps the training data didn’t include enough information relevant to toaster safety instructions and the model tried to bridge the gap without understanding that electrical appliances should generally not be used in water, despite say, this being the main use for other products that require safety instructions such as swimming bands.

Its convincing looks make this type of output even more dangerous, as it becomes easier for users to believe the output to be true. In the absence of mitigations like sufficient human oversight or prompt filtering, hallucinations can have significant consequences. Aside from immediate use-related risks, not recognising hallucinations as false information can cause errors in future (training) applications.

Mitigating risks associated with hallucinations is most important in high-stakes applications, where unreliable AI advice or decision-making can have real-world consequences.

The ethical concerns with AI hallucinations

The ethical concerns surrounding AI hallucinations are significant. This section dives into the impact of two ethical implications: misinformation and trust.

Hallucinations can add to the existing stream of misinformation, if they are believed to be true. Furthermore, the mechanisms that produce AI hallucinations can create disinformation, if malicious actors use prompts to intentionally fabricate convincing false content. The prevalence of AI hallucinations casts doubts on the reliability of AI systems and the general accuracy of information.

Misinformation

Misinformation refers to inaccurate information that is spread without the intent to deceive, so it is believed to be true and spread in good faith. Rapid developments in the quality and availability of GenAI models make it increasingly easy to accidentally create and spread misinformation. The quality of generated outputs makes it easy to believe, so users do not always recognise hallucinations for what they are. This adds to the spread of hallucinated content, and thus the spread of misinformation. Given AI models’ potential to exacerbate existing misinformation issues, it is important that model developers install guardrails.

Hallucinations are not intentionally created by AI models, though their ability to hallucinate can be leveraged by malicious actors who intentionally create misleading content, like disinformation. Disinformation refers to inaccurate, intentionally deceitful information that is spread to do harm. To create this, malicious actors use prompts that steer the model to produce false information. The speed at which these models can generate text, audio, images and videos, and the ability to automate these processes all aid the spread of disinformation.  

Though models generally restrict the use of prompts to minimise these risks, they remain vulnerable to malicious intent and hallucinations. For example, when methods meant to decrease the chance of AI hallucinations are used to intentionally create disinformation. False narratives — whether from misinformation or disinformation — can weaken social cohesion and dilute public trust in democratic institutions.

Examples of misinformation:

  • An AI chatbot claims that taking high doses of vitamin C can cure COVID-19, misleading users presented with this information into trying an ineffective treatment.
  • An AI system incorrectly advises freelancers that a non-existent law includes tax exemptions for freelance income, misleading them to file their taxes based on hallucinated legal information.

Trust and reliability

It is often impossible for humans to understand the intricate workings of the algorithms that make up AI models. Combined with evidence of unreliability, like the prevalence of hallucinations, this harms public trust in AI systems and their output.

As recognising AI-generated content is becoming increasingly tricky, this distrust affects most content we engage with online. It even spills over to (social) media content more broadly. The possible creation of hallucinations that add to this distrust is an ethical concern to be mindful of.

The erosion of trust in AI systems and their output alters public perception of the information we encounter daily. This growing uncertainty about the trustworthiness and reliability of information poses risks to fundamental aspects of society, including social cohesion, democratic debate, and the rule of law.


Examples of trust and reliability issues:

  • A viral deepfake video shows a beloved politician making controversial statements. When the video’s artificial origin is revealed, viewers begin to question the authenticity other videos they encounter online.
  • The vision model in a self-driving car registers an unfamiliar object near a pedestrian crossing and hallucinates pedestrians. These imaginary obstacles lead to erratic driving behaviour, resulting in a loss of user trust in autonomous vehicle technology.


It is important to be mindful of the potential ethical consequences of mis- and disinformation and trust and reliability issues, and to manage the associated risks.

How can hallucinations be measured?

Insights by:

‍
Lily Battershill
AI Safety Engineer at Saidot

‍
‍
Evaluating the tendency of AI models to output hallucinations involves a range of metrics, typically calculated through human annotation, automatic methods, or a combination of both. The specific approaches chosen depend on the task, the type of hallucination being considered, and the particular use case.  

For text-based models, key metrics include hallucination rate, factual consistency and faithfulness, which quantify how well the generated content aligns with verified information or source material. Typically, model outputs are compared to trusted knowledge bases, or claims are cross-checked with the given context to determine if they can be inferred from it.

For example, in the task of text summarisation using the CNN/Daily Mail dataset, models are trained to generate concise summaries of news articles. One approach to hallucination evaluation is comparing the generated summaries against the source articles to assess hallucination rates. Human annotators may read both the generated summary and the source article to label instances of hallucinations—statements that introduce unsupported facts. Alternatively, automated tools or models to detect discrepancies between the summary and the source article can be used. The automated tool or model assigns a score based on the percentage of unsupported statements, providing a quantitative measure of the model's tendency to produce hallucinated content.

Visual models rely on different evaluation techniques, with perceptual similarity metrics and automated alignment scores being common. These evaluate aspects such as believability, factual consistency, and physical plausibility in generated images, often using models to compare features between generated and reference images. Object detection and segmentation metrics further assist in assessing the accuracy of specific elements within generated visuals. Human-rated assessments frequently complement these automated measures, providing nuanced evaluation of subjective qualities.

Evaluation frameworks increasingly adopt hybrid approaches, combining multiple metrics and methods for a more comprehensive assessment. These may include benchmark datasets with known ground truths, expert-curated test sets, and adversarial testing designed to challenge models with inputs likely to provoke hallucinations. Task-specific metrics are also important; for instance, ROUGE scores for summarisation tasks and exact match and F1 scores for question-answering.

How to manage the risks associated with hallucinations

AI risks can be managed by following the six-step process detailed in Saidot's quick guide to managing AI risks. Based on the risk evaluation outcomes, you can create a treatment strategy. This strategy prioritises risks based on a measure of risk likelihood and impact. Then, you can implement one of the risk treatment strategies to address AI hallucinations within an AI system.

Here are four strategies to treat AI risks in your organisation:

  • Avoid: Eliminate the risk of hallucinations by abstaining from using models in which these are a concern.
  • ‍Transfer: Allocate the risk to a third party, for example, through insurance policies.
  • ‍Mitigate: Reduce the risk by lowering the probability of hallucinations or their potential impact.
  • ‍Accept: Decide to accept the risk by making an informed decision.


Not every treatment option is suitable for every situation. You can often accept low-level risks without further action, while medium- or high-level risks require mitigation, transfer, or avoidance.

Methods of mitigating the risks associated with hallucinations

To avoid the risks associated with accepting and using AI hallucinations as factually correct information, your organisation can focus on limiting the occurrence of hallucinations and lowering the possibility that users accept them as truth.

AI hallucination mitigation strategies

High-quality and representative training data: The quality and relevance of training data impact AI systems’ performance, accuracy, and reliability. Ensuring high-quality and representative training data is crucial for developing high-quality AI systems. For this reason, organisations should aim to use diverse and representative training data.

Process supervision: Process supervision provides feedback for a model during each stage instead of only for the outcome. It rewards models for following a human-endorsed chain of thought. This approach helps you identify errors in specific locations.

Use case restriction: You can restrict the allowed use cases to those where incorrect outputs can be tolerated or to limit the scope of specific tasks.

Retrieval augmented generation (RAG): RAG is an architecture that incorporates an information retrieval system into an LLM to provide grounding data. This helps foundation models generate more accurate output for specific topics that were not included in the training data.

Model temperature tuning: Model temperature tunes the degree of randomness of an LLM’s output.

Explanations and references: Programming a model to produce explanations or references along with generated content allows users to verify the information and understand the system’s reasoning. Aside from enhancing user trust, it empowers users to make informed judgements about the reliability of AI-generated output.

Prompt engineering: Prompt engineering uses specific instructions as input for generative AI tools to produce optimal or desired outcomes. Modifying user input with context and constraints in the prompts allows for the guidance of model behaviour and promotes responsible outputs.

Input or prompt filtering: Careful crafting or filtering of input prompts can guide the model towards desired or specific responses.

Feedback collection of AI-generated outputs: The systematic gathering of user responses can aid the quality and safety evaluations of the model’s output. This information can help inform the implementation of new measures to moderate and improve future output.

Testing and validation: Testing and validating the model ensures that generated outputs align with the intended purpose.

How Saidot simplifies your risk management journey

AI risk management on our platform involves identifying, evaluating, and mitigating specific risks throughout the lifecycle of AI systems. Saidot's risk library contains 135 risks and comprehensive descriptions of their relevant mitigations.

Here are six steps to help you ensure effective AI risk management:

Step 1: Identify risks

Add recommended risks from Saidot’s risk library or record custom risks.

Step 2: Document risks

Assign a risk owner, identify the risk source, and risk type, and include a risk description, describe the risk consequences. For risks imported from the library, an editable risk description is automatically included.

Step 3: Evaluate risks

Analyse the risk level before treatments and the marginal risk level that results from introducing AI technology. Based on the evaluation, an inherent risk level is selected, simplifying risk prioritisation.

Step 4: Risk treatment

Select a risk treatment strategy based on the inherent risk level and our recommendations. When using a risk from the Saidot risk library, mitigations suitable for risk treatments can be imported.

Step 5: Assess residual risk

After selecting and implementing the treatments, select the treatment status and assess the residual risk.

Step 6: Monitor risks

To ensure good AI governance throughout the system’s lifecycle, organisations should continue monitoring and reviewing risks and their treatments. The Saidot AI governance tool simplifies the risk management process by saving this information in one place and making it easily accessible.

Learn more about Saidot’s AI governance tool

Further Reading and References

  • Saidot Risk Library
  • IBM, 2023 — What are AI hallucinations?‍
  • Google, n.d. — What are AI hallucinations?‍
  • Steven M. Williamson & Victor Prybutok, 2024 — The Era of Artificial Intelligence Deception: Unraveling the Complexities of False Realities and Emerging Threats of Misinformation‍
  • miquido, n.d. — Grounding‍
  • Yue Zhang et al., 2023 — Siren’s Song in the AI Ocean: A Survey on Hallucination in Large Language Models‍
  • Kees van Deemter, 2024 — The Pitfalls of Defining Hallucination‍
  • UN, n.d. — Countering Disinformation‍
  • Shomit Ghose, UC Berkeley Sutardja Center, 2024 —  Why Hallucinations Matter: Misinformation, Brand Safety and Cybersecurity in the Age of Generative AI‍
  • Julius Endert, DW Akademie, 2024 —  Generative AI is the ultimate disinformation amplifier‍
  • Advisory Body on Artificial Intelligence, UN, 2023 — Interim Report: Governing AI for Humanity‍
  • Cynthia Dwork & Martha Minow, 2022 — Distrust of Artificial Intelligence: Sources & Responses from Computer Science & Law‍
  • Negar Maleki, Balaji Padmanabhan & Kaushik Dutta (2024) -- AI Hallucinations: A Misnomer Worth Clarifying

More Insights

Why AI governance is good for the bottom line

Miltton and Saidot enter strategic partnership on responsible AI

AI Governance Handbook: Your guide to scaling AI responsibly

Get started with responsible AI.

Book intro
Saidot Library
hello@saidot.ai
sales@saidot.ai
+358 407725010
Saidot Ltd.
Lapinlahdenkatu 16
00180 Helsinki, Finland
Terms of ServicePrivacy PolicyAI Policy

Product

Insights

EU AI Act

About us

Get started

Book intro

Help

Log in

Get Saidot on Microsoft Azure Marketplace

© 2024 Saidot Ltd. All rights reserved