ACR Annual Meeting · 2026

False-Positive–Constrained Evaluation of Artificial Intelligence Lung Nodule Detection at Clinically Relevant Operating Points

Artal, Dalia MD Alore, Patrick MD Sinner, Jason MD

Key clinical takeaway

AI maintains 91.7% sensitivity at approximately 1 false positive per scan.

91.7% Sensitivity at 1 FP/scan 95% CI 89.6–93.6 Clinically optimal operating point

94.9% Sensitivity at 2 FP/scan 95% CI 93.1–96.3 Higher sensitivity, moderate FP cost

86.8% Sensitivity at 0.5 FP/scan Low FP burden, reduced sensitivity

98.4% Maximum sensitivity (~9.6 FP/scan) Near-max sensitivity, impractical FP load

Purpose

False-positive findings remain a primary barrier to clinical adoption of artificial intelligence (AI) lung nodule detection, contributing to workflow inefficiency and reduced radiologist trust. While prior studies often emphasize peak sensitivity or aggregate accuracy metrics, fewer assess performance at operating points aligned with real-world clinical use. This study evaluated whether an AI lung nodule detection system maintains high sensitivity while constraining false-positive burden at clinically relevant operating points.

Methods

An AI lung nodule detection system was trained using low-dose chest CT examinations from a multi-center lung cancer screening cohort. Performance was evaluated on an independent multi-reader dataset with heterogeneous annotations to reflect interpretive variability. Lesion-level sensitivity was assessed across false-positive rates using free-response receiver operating characteristic (FROC) analysis. The primary operating point corresponded to approximately one false positive per scan (FPPS). Secondary analyses examined alternative operating points and nodule size thresholds. Ninety-five percent confidence intervals were estimated using bootstrap resampling.

FROC — Nodules ≥5 mm

Lesion-level sensitivity vs. false positives per scan.

FROC curve: lesion-level sensitivity versus false positives per scan. Sensitivity is 91.7% at the clinical operating point of approximately 1 false positive per scan.

Free-response receiver operating characteristic (FROC) curve demonstrating lesion-level sensitivity of the artificial intelligence lung nodule detection system for nodules ≥5 mm across increasing false-positive rates per scan. High sensitivity is maintained at clinically relevant operating points, including approximately one false positive per scan, highlighting favorable performance under realistic lung cancer screening workflow constraints.

Results

Among 1,009 CT examinations containing 1,303 annotated nodules ≥5 mm, maximum lesion-level sensitivity was 98.4% (95% CI: 97.6–99.1) at permissive thresholds associated with approximately 9.6 FPPS. At the primary operating point of approximately 1 FPPS, sensitivity was 91.7% (95% CI: 89.6–93.6). Sensitivity increased to 94.9% (95% CI: 93.1–96.3) at 2 FPPS and remained above 85% at 0.5 false positives per scan. The overall FROC score was 0.889 (95% CI: 0.873–0.904). Comparable trends were observed for nodules ≥3 mm, with sensitivity of 90.8% at approximately 1 FPPS.

Discussion

Overall FROC score was 0.889 (95% CI 0.873–0.904), reflecting robust performance across evaluated operating points.
Sensitivity remained high across stricter false-positive constraints, supporting workflow-conscious evaluation rather than reliance on permissive peak sensitivity metrics.
Diminishing returns beyond approximately 2 FPPS suggest limited incremental sensitivity benefit relative to added false-positive burden.
Comparable trends observed for nodules ≥3 mm, with 90.8% sensitivity near 1 FPPS.

Conclusion

AI-based lung nodule detection demonstrated sustained high sensitivity at clinically relevant operating points while maintaining a constrained false-positive burden. Evaluating AI performance under realistic false-positive constraints provides practice-relevant insight beyond peak accuracy metrics and supports workflow-conscious integration of AI into lung cancer screening.