Research Platform Updated Mar 2026

LLM Hallucination
Benchmark & Ranking

TruthAnchor의 다층 검증 파이프라인으로 주요 LLM의 할루시네이션을 정량 분석합니다. 사실 정확도, 수치 정확도, 근거 신뢰도, 일관성, 불확실성 보정 5개 차원을 평가합니다.

#1 Ranked

Claude Opus 4.6

anthropic
94.2/ 100
96.1
factual
93.5
numerical
92.8
citation
95.0
consistency
91.7
uncertainty
12+
Models Evaluated
6
Domains Covered
5
Score Dimensions
500+
Benchmark Questions

Evaluation Framework

5 orthogonal dimensions measuring different aspects of hallucination

Factual Accuracy

35%

Claim extraction & NLI verification against ground-truth evidence

Numerical Accuracy

20%

Financial calculation verification with domain-specific tolerances

Citation Reliability

15%

Semantic similarity scoring between claims and source documents

Consistency

15%

Self-consistency measurement across multiple inference samples

Uncertainty Calibration

15%

Entropy-based confidence alignment with actual accuracy