各位同仁、技术爱好者们,大家好!
今天,我们将深入探讨一个在大型语言模型(LLM)领域中日益突出且至关重要的问题——“幻觉”(Hallucination),并学习如何利用一种巧妙而强大的算法——自洽性(Self-Consistency),来自动检测模型是否在“胡言乱语”。作为编程专家,我们不仅要理解这些概念,更要通过代码将其落地,构建出更可靠、更值得信赖的AI系统。
语言模型的“幻觉”现象及其危害
在人工智能,特别是自然语言处理领域,大型语言模型(LLM)近年来取得了令人瞩目的进展。它们能够生成流畅、连贯、甚至富有创造性的文本,在问答、摘要、翻译、代码生成等多个任务中展现出超乎想象的能力。然而,这些强大的能力背后,隐藏着一个不容忽视的缺陷,我们称之为“幻觉”(Hallucination)。
所谓“幻觉”,是指LLM生成了看似合理但实际上与事实不符、逻辑错误或无从考证的信息。这些信息并非模型刻意编造,而是其在训练过程中学习到的模式和统计规律,在生成时可能被过度泛化、误用或在缺乏真实世界知识约束时“脑补”出来的。
幻觉的危害是多方面的:
- 降低信任度: 用户一旦发现模型频繁出现幻觉,会对其生成内容的真实性和可靠性产生质疑,从而失去对AI系统的信任。
- 误导决策: 在需要高度准确性的应用场景,如医疗诊断、法律咨询、金融分析等,模型生成的错误信息可能导致严重的误判和不良后果。
- 信息污染: 如果幻觉内容被广泛传播或用于生成新的内容,可能导致错误信息蔓延,对社会造成负面影响。
- 工程挑战: 对于开发者而言,幻觉使得模型部署和维护变得更加困难,需要投入大量精力进行人工审查和纠错。
举个例子,你可能会问一个LLM:“请介绍一下2023年诺贝尔物理学奖的获奖者。”如果模型回答说:“2023年诺贝尔物理学奖颁发给了李明教授,以表彰他在量子引力领域取得的突破性进展。”而实际上,2023年的诺贝尔物理学奖获奖者是Anne L’Huillier、Pierre Agostini和Ferenc Krausz,表彰他们在阿秒脉冲领域的贡献。这就是一个典型的幻觉,李明教授可能是一个虚构的人物,量子引力也并非当年的获奖领域。模型在语法上是正确的,但内容却是完全错误的。
因此,开发一种能够自动检测并尽量缓解LLM幻觉的方法,对于构建可靠、安全的AI应用至关重要。
幻觉检测的挑战与传统方法局限
检测语言模型的幻觉并非易事,主要原因在于:
- 模型的“自信”: LLM在生成幻觉内容时,往往表现得非常自信,语气坚定,与生成正确内容时无异。这使得我们很难仅凭其输出的表面特征(如困惑度、概率)来判断其真实性。
- 知识边界模糊: 模型内部的知识存储方式是高度分布式的,我们很难直接“查询”模型是否真的“知道”某个事实,或者它只是在模仿句法模式。
- 开放域的复杂性: 在开放域问答或文本生成中,涉及的知识范围极其广泛,且信息不断更新,这使得外部知识库的维护和实时查询变得极具挑战。
传统的幻觉检测方法通常依赖于外部知识源进行事实核查:
- 知识图谱(Knowledge Graphs): 将模型生成的信息与结构化的知识图谱进行比对,验证实体和关系是否存在且正确。
- 搜索引擎与Web检索: 将模型生成的事实性陈述作为查询,通过搜索引擎在互联网上查找支持证据。
- 人工标注与审查: 最直接但成本最高的方法,由人类专家对模型输出进行逐一核查。
这些方法各有优缺点:
| 方法 | 优点 | 缺点 |
|---|---|---|
| 知识图谱 | 精确度高,结构化信息验证效率高 | 构建和维护成本高,覆盖范围有限,实时性差 |
| 搜索引擎/Web检索 | 覆盖范围广,可获取最新信息 | 依赖搜索结果质量,可能遇到信息冲突或不一致,延迟高 |
| 人工标注与审查 | 准确率最高,可处理复杂语境 | 成本极高,效率低下,不适合大规模实时检测 |
显而易见,传统方法要么成本过高,要么在覆盖范围和实时性上存在局限。我们迫切需要一种内部的、模型驱动的检测机制,能够利用LLM自身的特性来判断其输出的可靠性。这就是我们今天要探讨的“自洽性”算法的用武之地。
引入自洽性算法:从答案改进到幻觉检测
“自洽性”(Self-Consistency)算法最初由Wang et al. (2022) 提出,用于在复杂推理任务中提高LLM的性能。其核心思想是:一个正确的答案,往往可以通过多种不同的思考路径或生成策略得出。如果一个模型在面对同一问题时,能够通过不同的方式反复得出相同或高度相似的答案,那么这个答案很可能是正确的;反之,如果答案飘忽不定、前后矛盾,那么其可靠性就值得怀疑。
这就像一个经验丰富的专家在解决一个难题时,即使让他从不同的角度、不同的思路重新推导一遍,最终的结果也应该是一致的。而一个懵懂的初学者,即使第一次“蒙”对了答案,让他再推导一遍,很可能就会得出不同的甚至错误的结论。
Wang et al. 的研究主要关注如何利用自洽性来选择最佳答案,即通过多数投票的方式,从多个生成结果中选出最一致的那个。而我们今天的重点,则是将其思想反过来应用:如果模型生成的多个答案之间缺乏一致性,我们就将其视为幻觉的潜在指标。
自洽性检测幻觉的假设是:
- 事实性陈述的唯一性: 对于一个客观事实,其描述或答案通常是唯一的或高度趋同的。
- 模型内部知识的稳定性: 如果模型真正“掌握”了某个知识点,那么在面对该知识点相关的问题时,即使输入或解码方式稍有变化,其核心输出也应保持稳定。
- 幻觉的随机性: 幻觉往往是模型在知识边界模糊或推理路径不明确时的“猜测”或“随机编造”,因此在重复生成时,其内容更容易出现偏差和不一致。
利用自洽性检测幻觉的详细方法论
现在,让我们一步步拆解如何利用自洽性算法来自动检测模型幻觉。整个过程可以概括为以下四个核心步骤:
- 生成多条响应 (Generate Multiple Responses)
- 提取关键信息/断言 (Extract Key Information/Assertions)
- 比较并度量一致性 (Compare and Measure Consistency)
- 设定阈值并标记幻觉 (Set Threshold and Flag Hallucinations)
我们将结合代码示例来详细阐述每个步骤。为了简化演示,我们将模拟LLM的响应,而不是直接调用一个大型、可能需要API密钥或大量计算资源的实际LLM。
步骤1:生成多条响应
这是自洽性方法的基础。我们需要让LLM针对同一个问题,生成多条不同的回答。实现多样性主要有两种策略:
- 提示词变体(Prompt Variations): 对原始问题进行重新表述,使用同义词、改变句式结构或添加一些上下文,但不改变问题的核心意图。
- 解码策略变体(Decoding Variations): 在LLM生成文本时,调整其解码参数,如温度(
temperature)、top-p、top-k、重复惩罚(repetition_penalty)等。这些参数会影响生成文本的随机性和多样性。temperature:控制采样随机性。高温度会使输出更随机,低温度使输出更确定。top_p(nucleus sampling):选择累积概率达到p的最小词集进行采样。top_k:只从概率最高的k个词中采样。
以下是一个模拟生成多条响应的Python函数示例。
import random
import time
from typing import List, Dict, Any
# 模拟的LLM接口,实际应用中这里会调用OpenAI, Hugging Face Transformers等
class MockLLM:
def __init__(self, model_name: str = "mock-llm-v1"):
self.model_name = model_name
self.knowledge_base = {
"Who won the 2023 Nobel Prize in Physics?": [
"The 2023 Nobel Prize in Physics was awarded to Pierre Agostini, Ferenc Krausz and Anne L'Huillier for experimental methods that generate attosecond pulses of light for the study of electron dynamics in matter.",
"Pierre Agostini, Ferenc Krausz, and Anne L'Huillier received the 2023 Nobel Prize in Physics for their groundbreaking work on attosecond pulses.",
"For their pioneering experiments with attosecond light pulses, Pierre Agostini, Ferenc Krausz, and Anne L'Huillier shared the Nobel Prize in Physics in 2023."
],
"What is the capital of France?": [
"The capital city of France is Paris.",
"Paris is the capital of France.",
"France's capital is Paris."
],
"Tell me about the history of quantum gravity.": [
"Quantum gravity is a field of theoretical physics that seeks to describe gravity according to the principles of quantum mechanics. It addresses the problem of unifying general relativity with quantum field theory. Key approaches include string theory and loop quantum gravity.",
"The history of quantum gravity involves attempts to merge general relativity and quantum mechanics. Early ideas emerged in the 1930s. Notable developments include the Wheeler-DeWitt equation and the formulation of string theory.",
"Developing a quantum theory of gravity has been a major challenge in physics. It began with early insights from figures like Einstein, who recognized the need for such a unification. Loop quantum gravity and string theory are modern frameworks aiming to achieve this."
],
"What did Albert Einstein invent in 1905?": [
"In 1905, Albert Einstein published several groundbreaking papers, including those on the photoelectric effect, Brownian motion, special relativity, and mass-energy equivalence (E=mc²). He didn't 'invent' a single device but revolutionized physics with his theories.",
"Albert Einstein's 'annus mirabilis' in 1905 saw him introduce the theory of special relativity, explain the photoelectric effect, and publish his famous mass-energy equivalence formula. He didn't invent a physical object.",
"Einstein's seminal works of 1905 included his theories on special relativity and the photoelectric effect. He didn't invent a specific device that year, but rather fundamental scientific concepts and theories."
],
"Who invented the internet?": [ # 这是一个容易引起幻觉的问题,因为互联网是逐步发展的
"The internet was invented by Tim Berners-Lee in 1989.", # 部分真实,但过于简化
"Vinton Cerf and Robert Kahn are often credited as 'fathers of the Internet' for their work on TCP/IP protocols.", # 真实但非唯一答案
"The internet evolved from ARPANET, developed by the U.S. Department of Defense. No single person 'invented' it, but many contributed.", # 更准确的回答
"Al Gore famously claimed to have invented the internet, but this is a misconception.", # 带有外部知识的错误回答
"A collective of scientists and engineers, rather than one individual, developed the internet over several decades, building upon ARPANET.",
"The internet was a collaborative effort. While Tim Berners-Lee developed the World Wide Web, the underlying network technology had many contributors like Vinton Cerf."
],
"Describe the capital of imaginary country 'Zorgon'.": [ # 纯粹的幻觉问题
"The capital of Zorgon is Xylos, a bustling metropolis known for its crystalline spires and floating markets.",
"Zorgon's capital, Glorgon City, is famous for its intricate underground tunnel systems and bioluminescent flora.",
"The primary city of Zorgon is called 'Flibble-de-do', characterized by its spherical architecture and telepathic inhabitants.",
"There is no country named Zorgon, and therefore no capital. This is a fictional entity.", # 偶尔也会给出正确答案
"Zorgon is a fictional country. It does not have a capital city."
]
}
def generate(self, prompt: str, temperature: float = 0.7, top_p: float = 0.9, num_responses: int = 1) -> List[str]:
"""
模拟LLM生成文本。根据prompt和解码参数返回不同数量的响应。
对于一些“已知”的问题,会从预设列表中选择。
对于“未知”或“幻觉”问题,会模拟多样性。
"""
print(f" [MockLLM] Generating {num_responses} responses for: '{prompt[:50]}...' with temp={temperature}, top_p={top_p}")
responses = []
for _ in range(num_responses):
time.sleep(0.05) # 模拟网络延迟和计算时间
# 模拟模型根据不同参数和问题返回不同答案
if prompt in self.knowledge_base:
# 对于已知问题,随机从真实答案中选,或引入少量变体
if len(self.knowledge_base[prompt]) > 0:
chosen_response = random.choice(self.knowledge_base[prompt])
# 引入轻微的随机性,模拟解码参数的影响
if temperature > 0.8 or top_p < 0.7:
chosen_response += random.choice([" (slight variation)", " (minor rephrasing)"])
responses.append(chosen_response)
else:
responses.append(f"Sorry, I don't have information on '{prompt}'.")
else:
# 对于未知或容易引起幻觉的问题,生成更随机的文本
mock_text = f"This is a simulated response for '{prompt}'. Generated with temp={temperature}, top_p={top_p}. "
if temperature > 0.7:
mock_text += "It contains some speculative details. "
if top_p < 0.8:
mock_text += "Perhaps a bit more focused. "
# 模拟幻觉的多样性
if "imaginary country" in prompt or "invented the internet" in prompt.lower():
hallucination_options = [
f"The answer to '{prompt}' is a complex one, involving many contributors.",
f"While some attribute '{prompt}' to a single entity, it was a collaborative effort.",
f"The specifics of '{prompt}' are hotly debated, with various theories.",
f"It's widely believed that '{prompt}' has a definitive answer, but reality is more nuanced."
]
responses.append(random.choice(hallucination_options) + " " + mock_text)
else:
responses.append(mock_text + "This is a generic answer.")
return responses
mock_llm = MockLLM()
def generate_multiple_responses(
llm: MockLLM,
original_prompt: str,
num_variants: int = 5,
decoding_params: List[Dict[str, float]] = None
) -> List[str]:
"""
生成针对给定原始提示的多个响应。
可以包含提示词变体和解码参数变体。
"""
all_responses = []
# 默认解码参数,也可以自定义
if decoding_params is None:
decoding_params = [
{"temperature": 0.7, "top_p": 0.9},
{"temperature": 0.8, "top_p": 0.85},
{"temperature": 0.6, "top_p": 0.95},
{"temperature": 0.9, "top_p": 0.8},
{"temperature": 0.75, "top_p": 0.92},
]
# 生成提示词变体
prompt_variants = [original_prompt]
# 实际应用中,这里会使用另一个LLM或规则引擎来生成提示词变体
if "Who won the 2023 Nobel Prize in Physics?" in original_prompt:
prompt_variants.extend([
"Could you tell me the winners of the 2023 Nobel Physics Prize?",
"List the recipients of the 2023 Nobel Prize in Physics.",
"Who were awarded the 2023 Nobel Prize for Physics?",
])
elif "What is the capital of France?" in original_prompt:
prompt_variants.extend([
"France's capital city?",
"Which city is the capital of France?",
])
elif "Who invented the internet?" in original_prompt:
prompt_variants.extend([
"Who created the internet?",
"What individual or group is credited with inventing the internet?",
"Can you tell me about the invention of the internet?",
])
elif "imaginary country" in original_prompt:
prompt_variants.extend([
"What is the main city of the fictional land Zorgon?",
"Tell me about the capital of Zorgon, if it exists.",
])
# 确保至少有num_variants个响应
num_prompts_to_use = min(len(prompt_variants), num_variants)
num_decoding_sets = (num_variants + num_prompts_to_use - 1) // num_prompts_to_use # 向上取整
for i in range(num_variants):
prompt_idx = i % num_prompts_to_use
decoding_idx = i % len(decoding_params)
current_prompt = prompt_variants[prompt_idx]
current_params = decoding_params[decoding_idx]
response = llm.generate(
prompt=current_prompt,
temperature=current_params["temperature"],
top_p=current_params["top_p"],
num_responses=1
)[0]
all_responses.append(response)
return all_responses
# 示例调用
print("--- Generating responses for a factual question ---")
factual_prompt = "Who won the 2023 Nobel Prize in Physics?"
factual_responses = generate_multiple_responses(mock_llm, factual_prompt, num_variants=5)
for i, r in enumerate(factual_responses):
print(f"Response {i+1}: {r}")
print("n--- Generating responses for a potentially hallucinated question ---")
hallucination_prone_prompt = "Describe the capital of imaginary country 'Zorgon'."
hallucination_responses = generate_multiple_responses(mock_llm, hallucination_prone_prompt, num_variants=5)
for i, r in enumerate(hallucination_responses):
print(f"Response {i+1}: {r}")
print("n--- Generating responses for a complex historical question ---")
internet_prompt = "Who invented the internet?"
internet_responses = generate_multiple_responses(mock_llm, internet_prompt, num_variants=6)
for i, r in enumerate(internet_responses):
print(f"Response {i+1}: {r}")
代码说明:
MockLLM模拟了真实的LLM行为,对于一些预设问题,它会返回相对一致的答案;对于容易引起幻觉的问题(如“虚构国家”),它会返回多样性更强的“胡言乱语”。generate_multiple_responses函数负责调用MockLLM,并结合提示词变体和解码参数变体来生成多条不同的响应。在实际应用中,生成提示词变体可能需要一个独立的LLM调用或更复杂的规则。num_variants控制希望生成的响应总数。
步骤2:提取关键信息/断言
原始的文本响应往往冗长且包含无关信息,直接比较字符串效率低下且容易误判。我们需要从每条响应中抽取出核心的事实、实体或断言,以便后续进行有效比较。这一步是整个流程的关键,其质量直接影响最终的检测效果。
常用的技术包括:
- 命名实体识别(NER)和关系抽取(RE): 识别文本中的关键实体(人名、地名、组织、时间等)及其之间的关系。
- 开放信息抽取(OIE): 自动从非结构化文本中抽取结构化的三元组(主语、谓语、宾语)。
- 问题回答(QA)模型: 针对每条响应,提出一系列辅助问题,让另一个QA模型(或同一个LLM)从响应中提取出特定问题的答案。例如,如果主问题是“谁发明了互联网?”,辅助问题可以是“发明者的名字是什么?”、“发明时间是什么?”。
- 摘要生成(Summarization): 生成每条响应的简短摘要,聚焦于核心信息。
- 关键词或关键短语抽取: 识别最重要的词或短语。
对于本讲座,我们将采用一种相对简化的方法:针对性的答案抽取。我们将假设我们能够定义一些模式或子问题,从LLM的响应中抽取出我们关心的核心答案。
import re
def extract_claims(response: str, query_type: str) -> List[str]:
"""
从LLM响应中提取关键声明或答案。
这个函数需要针对不同的问题类型进行定制。
"""
claims = []
response_lower = response.lower()
if "nobel prize in physics" in query_type.lower():
# 尝试提取人名
names = re.findall(r"(?:[A-Z][a-z]+s[A-Z][a-z]+(?:-w+)?(?: and |,s*|,s*ands*))+(?:[A-Z][a-z]+s[A-Z][a-z]+(?:-w+)?)", response)
if names:
# 简化:只取第一个匹配的完整名字列表
claims.append(names[0])
elif "pierre agostini" in response_lower and "ferenc krausz" in response_lower and "anne l'huillier" in response_lower:
claims.append("Pierre Agostini, Ferenc Krausz, Anne L'Huillier")
else:
claims.append(response) # 如果无法精确提取,则返回完整响应作为声明
elif "capital of france" in query_type.lower():
if "paris" in response_lower:
claims.append("Paris")
else:
claims.append(response)
elif "invented the internet" in query_type.lower():
# 互联网发明者可能有多个人或组织,提取关键词
if "tim berners-lee" in response_lower:
claims.append("Tim Berners-Lee (World Wide Web)")
if "vinton cerf" in response_lower and "robert kahn" in response_lower:
claims.append("Vinton Cerf and Robert Kahn (TCP/IP)")
if "arpanet" in response_lower or "u.s. department of defense" in response_lower:
claims.append("ARPANET/US DoD (Early Network)")
if "no single person" in response_lower or "collaborative effort" in response_lower:
claims.append("No single inventor / Collaborative effort")
if not claims: # 如果没有匹配到特定实体,则提取核心句子
sentences = re.split(r'[.!?]', response)
if sentences:
claims.append(sentences[0].strip()) # 提取第一句话作为核心声明
else:
claims.append(response)
elif "imaginary country 'zorgon'" in query_type.lower():
# 对于虚构国家的首都,重点是判断它是否承认国家虚构
if "no country named zorgon" in response_lower or "fictional country" in response_lower:
claims.append("Acknowledged as fictional")
else:
# 尝试提取城市名
city_matches = re.findall(r"(?:capital of Zorgon is|primary city of Zorgon is called|Zorgon's capital,)s*(w+)", response)
if city_matches:
claims.append(city_matches[0])
else:
claims.append(response) # 无法提取时返回原响应
else:
# 默认情况下,可以返回整个响应或尝试简单的句子分割
claims.append(response)
# 过滤空字符串和重复的声明
return list(set([claim.strip() for claim in claims if claim.strip()]))
# 示例调用
print("n--- Extracted Claims for Factual Responses ---")
extracted_factual_claims = [extract_claims(r, factual_prompt) for r in factual_responses]
for i, claims in enumerate(extracted_factual_claims):
print(f"Response {i+1} claims: {claims}")
print("n--- Extracted Claims for Hallucination Responses ---")
extracted_hallucination_claims = [extract_claims(r, hallucination_prone_prompt) for r in hallucination_responses]
for i, claims in enumerate(extracted_hallucination_claims):
print(f"Response {i+1} claims: {claims}")
print("n--- Extracted Claims for Internet Invention Responses ---")
extracted_internet_claims = [extract_claims(r, internet_prompt) for r in internet_responses]
for i, claims in enumerate(extracted_internet_claims):
print(f"Response {i+1} claims: {claims}")
代码说明:
extract_claims函数是高度定制化的。在真实场景中,这会是一个更复杂的模块,可能依赖于预训练的NLP模型(如spaCy, Stanza, OpenIE)或更精细的规则。- 对于“诺贝尔奖”问题,我们尝试抽取人名。
- 对于“法国首都”,我们直接查找“Paris”。
- 对于“互联网发明者”,由于答案的复杂性,我们抽取多个可能的贡献者或核心观点。
- 对于“虚构国家”,我们首先判断模型是否识别出这是虚构的,然后才尝试抽取城市名。
- 如果无法精确抽取,会将整个响应或其首句作为声明,这虽然粗糙,但在语义比较阶段仍能发挥作用。
步骤3:比较并度量一致性
有了提取出的关键声明,下一步就是量化它们之间的一致性。这通常通过计算声明之间的相似度来完成。
相似度度量方法:
-
词法相似度(Lexical Similarity):
- Jaccard 相似度: 比较两个集合(如词集合)的交集与并集的比值。
- 编辑距离(Levenshtein Distance): 将一个字符串转换成另一个字符串所需的最少单字符编辑操作数。
- TF-IDF 向量 + 余弦相似度: 将文本转换为TF-IDF向量,然后计算其余弦相似度。
- 缺点: 无法捕捉语义上的相似性,例如“汽车”和“车辆”在词法上不同但语义相同。
-
语义相似度(Semantic Similarity):
- 词嵌入(Word Embeddings)+ 向量平均 + 余弦相似度: 将文本中的每个词转换为词向量(如Word2Vec, GloVe),然后对所有词向量取平均得到句子向量,再计算余弦相似度。
- 句子嵌入(Sentence Embeddings)+ 余弦相似度: 使用专门训练的句子嵌入模型(如Sentence Transformers, BERT-like models)将整个句子编码成一个固定维度的向量,然后计算向量的余弦相似度。这是目前最常用且效果最好的方法之一。
- LLM-as-a-Judge: 利用另一个强大的LLM来评估两个声明的语义等价性或一致性。例如,给LLM两个声明和一个问题,让它判断这两个声明是否都正确回答了问题,并且是否互相支持。
本示例将主要采用句子嵌入 + 余弦相似度的方法,因为它在捕捉语义相似性方面表现优异,且计算效率相对较高。我们将使用sentence_transformers库。
from sentence_transformers import SentenceTransformer, util
import numpy as np
# 加载预训练的句子嵌入模型
# 'all-MiniLM-L6-v2' 是一个轻量级但效果不错的模型
print("Loading Sentence Transformer model...")
embedding_model = SentenceTransformer('all-MiniLM-L6-v2')
print("Sentence Transformer model loaded.")
def calculate_semantic_similarity(claim1: str, claim2: str) -> float:
"""
使用Sentence Transformers计算两个声明之间的语义相似度(余弦相似度)。
"""
if not claim1 or not claim2:
return 0.0 # 空字符串相似度为0
embeddings = embedding_model.encode([claim1, claim2], convert_to_tensor=True)
similarity = util.cos_sim(embeddings[0], embeddings[1]).item()
return similarity
def calculate_consistency_score(all_extracted_claims: List[List[str]]) -> float:
"""
计算所有响应的声明之间的平均一致性分数。
如果只有一个声明,则一致性为1.0。
"""
flat_claims = [claim for sublist in all_extracted_claims for claim in sublist]
if len(flat_claims) <= 1:
return 1.0 # 单个或没有声明,视为完全一致
similarities = []
# 计算所有声明对之间的相似度
for i in range(len(flat_claims)):
for j in range(i + 1, len(flat_claims)):
sim = calculate_semantic_similarity(flat_claims[i], flat_claims[j])
similarities.append(sim)
if not similarities:
return 0.0
# 我们可以选择平均相似度作为一致性分数
# 或者更复杂的:例如,找到最大的互不矛盾的子集
return np.mean(similarities)
# 示例调用
# 假设我们有以下声明
claim_set_1 = ["The 2023 Nobel Prize in Physics was awarded to Pierre Agostini, Ferenc Krausz and Anne L'Huillier.",
"Pierre Agostini, Ferenc Krausz, and Anne L'Huillier received the 2023 Nobel Prize in Physics."]
claim_set_2 = ["The capital of Zorgon is Xylos.",
"Zorgon's capital, Glorgon City, is famous for its intricate underground tunnel systems."]
claim_set_3 = ["Paris is the capital of France.", "France's capital is Paris."]
claim_set_4 = ["Tim Berners-Lee (World Wide Web)", "Vinton Cerf and Robert Kahn (TCP/IP)"] # 多个正确但不完全相同的贡献者
print("n--- Calculating Consistency Scores ---")
print(f"Consistency for factual set 1: {calculate_consistency_score([claim_set_1]):.4f}")
print(f"Consistency for hallucination set 2: {calculate_consistency_score([claim_set_2]):.4f}")
print(f"Consistency for factual set 3: {calculate_consistency_score([claim_set_3]):.4f}")
print(f"Consistency for complex historical set 4: {calculate_consistency_score([claim_set_4]):.4f}") # 预期会低一些,因为是不同方面
# 将之前的提取结果代入计算
print("n--- Consistency Scores for Example Prompts ---")
factual_consistency = calculate_consistency_score(extracted_factual_claims)
print(f"Factual Question ('{factual_prompt}'): Consistency Score = {factual_consistency:.4f}")
hallucination_consistency = calculate_consistency_score(extracted_hallucination_claims)
print(f"Hallucination Question ('{hallucination_prone_prompt}'): Consistency Score = {hallucination_consistency:.4f}")
internet_consistency = calculate_consistency_score(extracted_internet_claims)
print(f"Complex Historical Question ('{internet_prompt}'): Consistency Score = {internet_consistency:.4f}")
代码说明:
SentenceTransformer模型用于将文本转换为高维向量。calculate_semantic_similarity函数计算两个声明的余弦相似度。calculate_consistency_score函数遍历所有提取出的声明,计算它们两两之间的平均相似度。这个平均值作为整体的一致性分数。- 对于多方面正确但表述不同的问题(如“互联网发明者”),一致性分数可能会相对较低,这需要我们在解释和设置阈值时加以考虑。
步骤4:设定阈值并标记幻觉
得到一致性分数后,我们需要一个阈值来判断这个分数是否足够高,从而认为模型是“自洽”的,进而判断是否存在幻觉。
- 设定阈值: 这是一个经验性的过程,通常需要在验证集上通过实验来确定。你可以尝试不同的阈值,观察其在召回率(Recall,即检测出所有幻觉的比例)和准确率(Precision,即检测出的幻觉中真正是幻觉的比例)之间的权衡。
- 高阈值(更严格): 只有非常一致的响应才被认为是可靠的。可能会导致更多的假阳性(将正确但表述多样化的响应标记为幻觉)。
- 低阈值(更宽松): 即使响应有一些不一致,也可能被认为是可靠的。可能会导致更多的假阴性(未能检测出真正的幻觉)。
- 标记幻觉: 如果一致性分数低于设定的阈值,我们就可以将模型在该问题上的表现标记为可能存在幻觉。
def detect_hallucination(consistency_score: float, threshold: float = 0.75) -> bool:
"""
根据一致性分数和阈值判断是否存在幻觉。
如果分数低于阈值,则认为存在幻觉。
"""
return consistency_score < threshold
# 示例阈值设定
CONSISTENCY_THRESHOLD = 0.75 # 这是一个可调整的参数
print("n--- Hallucination Detection Results ---")
is_factual_hallucinated = detect_hallucination(factual_consistency, CONSISTENCY_THRESHOLD)
print(f"'{factual_prompt}' -> Hallucinated: {is_factual_hallucinated} (Score: {factual_consistency:.4f})")
is_hallucination_prone_hallucinated = detect_hallucination(hallucination_consistency, CONSISTENCY_THRESHOLD)
print(f"'{hallucination_prone_prompt}' -> Hallucinated: {is_hallucination_prone_hallucinated} (Score: {hallucination_consistency:.4f})")
is_internet_hallucinated = detect_hallucination(internet_consistency, CONSISTENCY_THRESHOLD)
print(f"'{internet_prompt}' -> Hallucinated: {is_internet_hallucinated} (Score: {internet_consistency:.4f})")
# 进一步测试一个明确的幻觉案例
print("n--- Additional Hallucination Test ---")
hallucination_test_prompt = "What are the main exports of the country Eldoria?"
# 模拟LLM生成完全不一致的幻觉答案
mock_llm.knowledge_base[hallucination_test_prompt] = [
"Eldoria primarily exports rare minerals like vibranium and unobtanium.",
"The main exports of Eldoria include magical artifacts and enchanted textiles.",
"Eldoria, a fictional country, has no exports.",
"Eeldoria exports advanced bio-tech components and interstellar navigation systems."
]
test_responses = generate_multiple_responses(mock_llm, hallucination_test_prompt, num_variants=4)
extracted_test_claims = [extract_claims(r, hallucination_test_prompt) for r in test_responses]
test_consistency = calculate_consistency_score(extracted_test_claims)
is_test_hallucinated = detect_hallucination(test_consistency, CONSISTENCY_THRESHOLD)
print(f"'{hallucination_test_prompt}' -> Hallucinated: {is_test_hallucinated} (Score: {test_consistency:.4f})")
for i, claims in enumerate(extracted_test_claims):
print(f" Response {i+1} claims: {claims}")
代码说明:
detect_hallucination函数简单地比较分数和阈值。CONSISTENCY_THRESHOLD是一个可以根据实际需求调整的参数。- 通过不同类型问题的测试,我们可以看到该方法在不同场景下的表现。对于虚构国家的首都,由于模型可能给出多种虚构答案或直接承认虚构,导致一致性较低,从而被标记为幻觉。对于互联网发明者这种多方面贡献的问题,分数可能会在中等范围,这需要更细致的阈值设定或多维度的判断。
进阶考量与实际应用
1. 提示工程(Prompt Engineering)对自洽性的影响
为了更好地利用自洽性,我们可以通过精心设计的提示词来引导LLM生成更多样化、更有利于一致性检查的响应:
- “思考链”(Chain-of-Thought, CoT): 引导模型逐步推理,而不是直接给出答案。我们可以比较不同推理路径的最终结论。
- “树状思考”(Tree-of-Thought, ToT): 允许模型探索多个推理分支,并在每个分支上进行回溯和评估。
- 多视角生成: 在提示中要求模型“从三个不同角度阐述”、“给出多种可能的解释”等,强制模型生成多样性。
2. 语义一致性与逻辑一致性
我们目前主要关注的是语义一致性,即不同声明在意义上是否相似。但在某些复杂场景下,我们还需要考虑逻辑一致性:
- 矛盾检测: 两个声明可能在语义上不相似,但它们可能存在直接的逻辑矛盾(例如,“A是B的父亲”和“B是A的父亲”)。这需要更高级的逻辑推理或蕴含识别模型。
- 事实性冲突: 当多个响应提供互相冲突的事实信息时,即使每个信息本身听起来都“合理”,整体也缺乏逻辑一致性。
3. 计算成本与效率
自洽性方法需要生成多条响应,并对每条响应进行处理(提取、嵌入、比较),这无疑会增加计算开销。
- 并行化: 多个响应的生成和嵌入计算可以并行进行。
- 模型选择: 选择高效的句子嵌入模型(如
all-MiniLM-L6-v2)以降低计算量。 - 响应数量: 根据应用场景权衡生成响应的数量,通常3-7条响应已经能提供不错的效果。
- 缓存机制: 对于重复的查询或声明,可以缓存其嵌入向量。
4. 处理模糊和开放性问题
自洽性方法在检测事实性、封闭域问题(有明确正确答案)的幻觉时效果较好。但对于以下情况,其效果可能受限:
- 主观性问题: “你觉得哪部电影最好看?”这类问题没有标准答案,模型给出多样化的响应是正常的,不能被视为幻觉。
- 创意性生成: 在故事创作、诗歌生成等任务中,多样性是期望的,一致性反而不适用。
- 开放性讨论: 对于复杂概念的探讨,不同角度的阐述可能语义不同但都正确,这需要更精细的判断逻辑。
在这种情况下,可能需要结合任务类型和领域知识来调整一致性阈值,甚至完全禁用自洽性检测。
5. 与检索增强生成(RAG)的结合
检索增强生成(RAG)通过从外部知识库检索相关文档来增强LLM的生成能力,有效降低了幻觉。自洽性可以作为RAG的补充:
- 验证检索结果: 如果RAG检索到的文档本身存在错误或不完整,LLM仍可能基于此生成幻觉。自洽性可以用于验证RAG生成的答案,确保其不仅与检索结果一致,而且在模型内部也表现出一致性。
- 增强鲁棒性: 即使RAG失败(未检索到相关文档或检索到无关文档),自洽性仍能提供一层保护,帮助发现模型“瞎编”的情况。
6. 评估指标
在实际部署自洽性幻觉检测系统时,我们需要一套评估指标来衡量其性能:
- 准确率 (Accuracy): (真阳性 + 真阴性) / 总样本数
- 精确率 (Precision): 真阳性 / (真阳性 + 假阳性)
- 召回率 (Recall): 真阳性 / (真阳性 + 假阴性)
- F1分数 (F1-score): Precision 和 Recall 的调和平均值
- 人机一致性: 将检测结果与人工标注的幻觉数据进行比对。
完整代码演示:将所有模块整合
现在,让我们将所有这些功能整合到一个统一的函数中,以便更清晰地展示整个幻觉检测流程。
import random
import time
import re
from typing import List, Dict, Any
from sentence_transformers import SentenceTransformer, util
import numpy as np
# --- 步骤1: 模拟LLM和响应生成 ---
class MockLLM:
def __init__(self, model_name: str = "mock-llm-v1"):
self.model_name = model_name
self.knowledge_base = {
"Who won the 2023 Nobel Prize in Physics?": [
"The 2023 Nobel Prize in Physics was awarded to Pierre Agostini, Ferenc Krausz and Anne L'Huillier for experimental methods that generate attosecond pulses of light for the study of electron dynamics in matter.",
"Pierre Agostini, Ferenc Krausz, and Anne L'Huillier received the 2023 Nobel Prize in Physics for their groundbreaking work on attosecond pulses.",
"For their pioneering experiments with attosecond light pulses, Pierre Agostini, Ferenc Krausz, and Anne L'Huillier shared the Nobel Prize in Physics in 2023."
],
"What is the capital of France?": [
"The capital city of France is Paris.",
"Paris is the capital of France.",
"France's capital is Paris."
],
"Tell me about the history of quantum gravity.": [
"Quantum gravity is a field of theoretical physics that seeks to describe gravity according to the principles of quantum mechanics. It addresses the problem of unifying general relativity with quantum field theory. Key approaches include string theory and loop quantum gravity.",
"The history of quantum gravity involves attempts to merge general relativity and quantum mechanics. Early ideas emerged in the 1930s. Notable developments include the Wheeler-DeWitt equation and the formulation of string theory.",
"Developing a quantum theory of gravity has been a major challenge in physics. It began with early insights from figures like Einstein, who recognized the need for such a unification. Loop quantum gravity and string theory are modern frameworks aiming to achieve this."
],
"What did Albert Einstein invent in 1905?": [
"In 1905, Albert Einstein published several groundbreaking papers, including those on the photoelectric effect, Brownian motion, special relativity, and mass-energy equivalence (E=mc²). He didn't 'invent' a single device but revolutionized physics with his theories.",
"Albert Einstein's 'annus mirabilis' in 1905 saw him introduce the theory of special relativity, explain the photoelectric effect, and publish his famous mass-energy equivalence formula. He didn't invent a physical object.",
"Einstein's seminal works of 1905 included his theories on special relativity and the photoelectric effect. He didn't invent a specific device that year, but rather fundamental scientific concepts and theories."
],
"Who invented the internet?": [
"The internet was invented by Tim Berners-Lee in 1989.",
"Vinton Cerf and Robert Kahn are often credited as 'fathers of the Internet' for their work on TCP/IP protocols.",
"The internet evolved from ARPANET, developed by the U.S. Department of Defense. No single person 'invented' it, but many contributed.",
"A collective of scientists and engineers, rather than one individual, developed the internet over several decades, building upon ARPANET.",
"The internet was a collaborative effort. While Tim Berners-Lee developed the World Wide Web, the underlying network technology had many contributors like Vinton Cerf."
],
"Describe the capital of imaginary country 'Zorgon'.": [
"The capital of Zorgon is Xylos, a bustling metropolis known for its crystalline spires and floating markets.",
"Zorgon's capital, Glorgon City, is famous for its intricate underground tunnel systems and bioluminescent flora.",
"The primary city of Zorgon is called 'Flibble-de-do', characterized by its spherical architecture and telepathic inhabitants.",
"There is no country named Zorgon, and therefore no capital. This is a fictional entity.",
"Zorgon is a fictional country. It does not have a capital city."
],
"What are the main exports of the country Eldoria?": [ # 额外测试用例
"Eldoria primarily exports rare minerals like vibranium and unobtanium.",
"The main exports of Eldoria include magical artifacts and enchanted textiles.",
"Eldoria, a fictional country, has no exports.",
"Eldoria exports advanced bio-tech components and interstellar navigation systems."
]
}
def generate(self, prompt: str, temperature: float = 0.7, top_p: float = 0.9, num_responses: int = 1) -> List[str]:
responses = []
for _ in range(num_responses):
time.sleep(0.02) # 模拟延迟
if prompt in self.knowledge_base:
chosen_response = random.choice(self.knowledge_base[prompt])
if temperature > 0.8 or top_p < 0.7: # 模拟解码参数的微小影响
chosen_response += random.choice([" (slight variation)", " (minor rephrasing)"])
responses.append(chosen_response)
else:
mock_text = f"This is a simulated response for '{prompt}'. Temp={temperature}, Top_p={top_p}. "
if temperature > 0.7: mock_text += "It contains some speculative details. "
if top_p < 0.8: mock_text += "Perhaps a bit more focused. "
responses.append(mock_text + "This is a generic answer.")
return responses
mock_llm = MockLLM()
def generate_multiple_responses(
llm: MockLLM,
original_prompt: str,
num_variants: int = 5,
decoding_params: List[Dict[str, float]] = None
) -> List[str]:
all_responses = []
if decoding_params is None:
decoding_params = [
{"temperature": 0.7, "top_p": 0.9},
{"temperature": 0.8, "top_p": 0.85},
{"temperature": 0.6, "top_p": 0.95},
{"temperature": 0.9, "top_p": 0.8},
{"temperature": 0.75, "top_p": 0.92},
]
prompt_variants = [original_prompt]
# 实际中这里会用LLM或规则生成更多变体
if "nobel prize in physics" in original_prompt.lower():
prompt_variants.extend(["Could you tell me the winners of the 2023 Nobel Physics Prize?", "List the recipients of the 2023 Nobel Prize in Physics."])
elif "capital of france" in original_prompt.lower():
prompt_variants.extend(["France's capital city?", "Which city is the capital of France?"])
elif "invented the internet" in original_prompt.lower():
prompt_variants.extend(["Who created the internet?", "What individual or group is credited with inventing the internet?"])
elif "imaginary country" in original_prompt.lower():
prompt_variants.extend(["What is the main city of the fictional land Zorgon?", "Tell me about the capital of Zorgon, if it exists."])
elif "exports of the country eldoria" in original_prompt.lower():
prompt_variants.extend(["What goods does Eldoria export?", "Can you list Eldoria's primary exports?"])
for i in range(num_variants):
current_prompt = prompt_variants[i % len(prompt_variants)]
current_params = decoding_params[i % len(decoding_params)]
response = llm.generate(
prompt=current_prompt,
temperature=current_params["temperature"],
top_p=current_params["top_p"],
num_responses=1
)[0]
all_responses.append(response)
return all_responses
# --- 步骤2: 提取关键信息/断言 ---
def extract_claims(response: str, query_type: str) -> List[str]:
claims = []
response_lower = response.lower()
if "nobel prize in physics" in query_type.lower():
names = re.findall(r"(?:[A-Z][a-z]+(?:'[a-z]+)?s[A-Z][a-z]+(?:-w+)?(?: and |,s*|,s*ands*))+(?:[A-Z][a-z]+s[A-Z][a-z]+(?:-w+)?)", response)
if names: claims.append(names[0])
elif "pierre agostini" in response_lower and "ferenc krausz" in response_lower and "anne l'huillier" in response_lower:
claims.append("Pierre Agostini, Ferenc Krausz, Anne L'Huillier")
else: claims.append(response)
elif "capital of france" in query_type.lower():
if "paris" in response_lower: claims.append("Paris")
else: claims.append(response)
elif "invented the internet" in query_type.lower():
if "tim berners-lee" in response_lower: claims.append("Tim Berners-Lee (World Wide Web)")
if "vinton cerf" in response_lower and "robert kahn" in response_lower: claims.append("Vinton Cerf and Robert Kahn (TCP/IP)")
if "arpanet" in response_lower or "u.s. department of defense" in response_lower: claims.append("ARPANET/US DoD (Early Network)")
if "no single person" in response_lower or "collaborative effort" in response_lower: claims.append("No single inventor / Collaborative effort")
if not claims:
sentences = re.split(r'[.!?]', response)
if sentences: claims.append(sentences[0].strip())
else: claims.append(response)
elif "imaginary country 'zorgon'" in query_type.lower() or "exports of the country eldoria" in query_type.lower():
if "no country named zorgon" in response_lower or "fictional country" in response_lower:
claims.append("Acknowledged as fictional")
else:
city_matches = re.findall(r"(?:capital of Zorgon is|primary city of Zorgon is called|Zorgon's capital,)s*(w+)", response)
if city_matches: claims.append(city_matches[0])
else:
# 对于出口等开放性问题,直接返回第一句话
sentences = re.split(r'[.!?]', response)
if sentences: claims.append(sentences[0].strip())
else: claims.append(response)
else:
sentences = re.split(r'[.!?]', response)
if sentences: claims.append(sentences[0].strip())
else: claims.append(response)
return list(set([claim.strip() for claim in claims if claim.strip()]))
# --- 步骤3: 比较并度量一致性 ---
print("Loading Sentence Transformer model...")
embedding_model = SentenceTransformer('all-MiniLM-L6-v2')
print("Sentence Transformer model loaded.")
def calculate_semantic_similarity(claim1: str, claim2: str) -> float:
if not claim1 or not claim2: return 0.0
embeddings = embedding_model.encode([claim1, claim2], convert_to_tensor=True)
return util.cos_sim(embeddings[0], embeddings[1]).item()
def calculate_consistency_score(all_extracted_claims: List[List[str]]) -> float:
flat_claims = [claim for sublist in all_extracted_claims for claim in sublist if claim] # 过滤空字符串
if len(flat_claims) <= 1: return 1.0
similarities = []
for i in range(len(flat_claims)):
for j in range(i + 1, len(flat_claims)):
sim = calculate_semantic_similarity(flat_claims[i], flat_claims[j])
similarities.append(sim)
if not similarities: return 0.0
return np.mean(similarities)
# --- 步骤4: 设定阈值并标记幻觉 ---
def detect_hallucination(consistency_score: float, threshold: float = 0.75) -> bool:
return consistency_score < threshold
# --- 主检测函数 ---
def perform_hallucination_detection(
llm: MockLLM,
original_prompt: str,
num_responses_to_generate: int = 7,
consistency_threshold: float = 0.75
) -> Dict[str, Any]:
"""
执行完整的自洽性幻觉检测流程。
"""
print(f"n--- Detecting hallucination for: '{original_prompt}' ---")
# 1. 生成多条响应
responses = generate_multiple_responses(llm, original_prompt, num_variants=num_responses_to_generate)
print(f"Generated {len(responses)} responses.")
for i, r in enumerate(responses):
print(f" R{i+1}: {r}")
# 2. 提取关键信息/断言
extracted_claims = [extract_claims(r, original_prompt) for r in responses]
print("Extracted claims:")
for i, claims in enumerate(extracted_claims):
print(f" R{i+1} claims: {claims}")
# 3. 计算一致性分数
consistency_score = calculate_consistency_score(extracted_claims)
print(f"Calculated consistency score: {consistency_score:.4f}")
# 4. 判断是否存在幻觉
is_hallucinated = detect_hallucination(consistency_score, consistency_threshold)
print(f"Is hallucinated (threshold={consistency_threshold}): {is_hallucinated}")
return {
"prompt": original_prompt,
"responses": responses,
"extracted_claims": extracted_claims,
"consistency_score": consistency_score,
"is_hallucinated": is_hallucinated,
"threshold": consistency_threshold
}
# --- 运行示例 ---
if __name__ == "__main__":
# 案例1: 事实性问题,预期一致性高,无幻觉
result_factual = perform_hallucination_detection(
mock_llm,
"Who won the 2023 Nobel Prize in Physics?",
num_responses_to_generate=5
)
print("-" * 50)
# 案例2: 虚构问题,预期一致性低,有幻觉
result_hallucination_prone = perform_hallucination_detection(
mock_llm,
"Describe the capital of imaginary country 'Zorgon'.",
num_responses_to_generate=5
)
print("-" * 50)
# 案例3: 复杂历史问题,答案可能多角度,预期一致性中等偏低,可能被标记为幻觉
result_internet = perform_hallucination_detection(
mock_llm,
"Who invented the internet?",
num_responses_to_generate=6,
consistency_threshold=0.65 # 稍微放宽阈值以适应多角度答案
)
print("-" * 50)
# 案例4: 另一个虚构国家问题,预期一致性低,有幻觉
result_eldoria = perform_hallucination_detection(
mock_llm,
"What are the main exports of the country Eldoria?",
num_responses_to_generate=4
)
print("-" * 50)
局限性与未来展望
尽管自洽性算法为LLM幻觉检测提供了一个有前景的内部机制,但它并非万能药,仍存在一些局限性:
- “一致地错误”: 如果模型在某个错误上表现出高度的自信和一致性(即“集体幻觉”),自洽性方法可能无法检测出来。例如,模型可能在训练数据中反复看到一个错误事实,从而对其深信不疑。
- 计算成本: 生成多条响应和计算语义相似度会增加延迟和计算资源消耗,不适合所有实时性要求极高的场景。
- 阈值选择: 最佳一致性阈值通常是任务和领域相关的,需要仔细调优和验证。
- 歧义和开放性: 对于具有多种正确解释或答案的开放性问题,多样性是可取的,此时低一致性并不意味着幻觉。
- 语义差异与逻辑冲突: 语义相似性并不能完全捕捉逻辑上的冲突。例如,“A是B的父亲”和“B是A的儿子”是语义相似但逻辑等价的;而“A是B的父亲”和“B是A的父亲”是语义相似但逻辑矛盾的。目前的简单平均相似度可能无法区分这两种情况。
未来,幻觉检测的研究方向可能包括:
- 结合外部知识: 将自洽性与知识图谱、搜索引擎等外部验证方法相结合,形成更鲁棒的多模态检测系统。
- 探究模型内部状态: 尝试分析LLM生成过程中的注意力模式、层激活等内部状态,寻找幻觉的早期信号。
- 可解释性AI: 开发工具来解释为什么模型会产生幻觉,并提供修复建议。
- 强化学习与反馈循环: 利用检测到的幻觉作为负反馈,通过强化学习等方式对模型进行微调,使其减少幻觉。
- 更复杂的自洽性度量: 不仅仅是平均相似度,而是利用聚类、图论等方法来识别核心观点和异常值。
提升LLM可靠性的关键一步
自洽性算法为我们提供了一个无需外部知识库即可自动检测大型语言模型“胡言乱语”的有效工具。通过生成多条响应、提取关键声明、计算语义一致性并设定合理阈值,我们能够从模型自身的行为中洞察其对知识的掌握程度。尽管存在局限性,但将自洽性融入LLM应用开发流程,无疑是提升模型可信赖性和实用性的关键一步。它帮助我们更好地理解模型的行为边界,从而构建出更加智能、更加可靠的AI系统,赋能更广泛的行业应用。