Research Platform Updated Mar 2026
LLM Hallucination
Benchmark & Ranking
TruthAnchor의 다층 검증 파이프라인으로 주요 LLM의 할루시네이션을 정량 분석합니다. 사실 정확도, 수치 정확도, 근거 신뢰도, 일관성, 불확실성 보정 5개 차원을 평가합니다.
#1 Ranked
Claude Opus 4.6
anthropic94.2/ 100
96.1
factual
93.5
numerical
92.8
citation
95.0
consistency
91.7
uncertainty
12+
Models Evaluated
6
Domains Covered
5
Score Dimensions
500+
Benchmark Questions
Overall Ranking
Hallucination Defense Score across all domains
Evaluation Framework
5 orthogonal dimensions measuring different aspects of hallucination
Factual Accuracy
35%
Claim extraction & NLI verification against ground-truth evidence
Numerical Accuracy
20%
Financial calculation verification with domain-specific tolerances
Citation Reliability
15%
Semantic similarity scoring between claims and source documents
Consistency
15%
Self-consistency measurement across multiple inference samples
Uncertainty Calibration
15%
Entropy-based confidence alignment with actual accuracy