갑상선초음파에서 인공지능: 임상적 성과, 잠재적 한계, 및 임상 실무 통합
Artificial Intelligence for Thyroid Ultrasound: Clinical Performance, Pitfalls, and Practice Integration
Article information
Trans Abstract
The use of artificial intelligence (AI) in thyroid ultrasound is bringing important changes to endocrine imaging, helping improve diagnostic accuracy and make the assessment of thyroid nodules more consistent. This review examines the current applications, technological approaches, clinical performance, adversities, and future directions of AI-based systems in thyroid ultrasound. Recent studies suggest that AI technologies hold significant potential in thyroid ultrasound, particularly in automated nodule detection, classification, and risk stratification. Deep learning models, particularly convolutional neural networks, achieve diagnostic accuracies exceeding 90% in distinguishing benign from malignant nodules, often matching or surpassing human radiologist performance. Current applications include Thyroid Imaging Reporting and Data System-based classification systems, lymph node metastasis prediction, and real-time diagnostic assistance. However, challenges including reproducibility concerns, clinical workflow integration, and regulatory considerations remain significant barriers to widespread adoption. While AI shows remarkable promise in thyroid ultrasound applications, challenges including validation requirements, standardization needs, and clinical integration barriers must be addressed for widespread adoption. Future developments should focus on multimodal integration, explainable AI systems, and prospective clinical trials to fully utilize the potential of AI in transforming thyroid diagnostics.
INTRODUCTION
Thyroid nodules represent one of the most common endocrine disorders, affecting up to 68% of the general population when detected by high-resolution ultrasound [1]. Accurate differentiation between malignant and benign thyroid nodules is crucial for appropriate patient management, as most are benign while only 7–15% prove malignant [2,3]. With rapid advances in medical imaging, detection rates of thyroid disease and cancer have risen markedly [4,5].
Ultrasound remains the first-line modality because of its real-time capability, safety, and soft-tissue contrast [4,6]. Yet traditional interpretation suffers from inter-observer variability and operator dependency [4]. Inexperienced clinicians are prone to misclassification and unnecessary fine-needle aspiration (FNA) biopsies [4,7].
The growing incidence of thyroid cancer and clinical workload have created an urgent need for objective and efficient diagnostic tools [8]. Artificial intelligence (AI) provides advanced pattern-recognition capabilities for ultrasound imaging [9]. By leveraging deep-learning algorithms, AI serves as a powerful adjunct to enhance diagnostic efficiency and consistency [5,10].
AI applications have evolved from early computer-aided detection or diagnosis (CAD) systems to sophisticated models capable of automated nodule characterization, risk stratification, and decision support [11]. Collectively, these systems improve accuracy and standardization while reducing inter-observer variability [4,6,10].
This review provides an integrated overview of AI technologies, clinical p erformance, r egulatory f rameworks, a nd f uture opportunities in thyroid ultrasound (Fig. 1).
AI TECHNOLOGIES IN THYROID ULTRASOUND
Machine-Learning (ML) Models
ML constitutes the foundation of AI applications in thyroid ultrasound. Early studies employed classifiers such as support-vector machines (SVMs) and random forests (RFs) to distinguish malignant from benign nodules [10].
RF models aggregate multiple decision trees, reducing bias and variance and demonstrating robustness in high-dimensional imaging data [11]. Compared with conventional manual feature assessment, ML-based systems markedly improve diagnostic performance [10,12].
Deep-Learning Architectures
Convolutional neural networks (CNNs)
CNNs form the backbone of modern AI for thyroid ultrasound. They extract hierarchical spatial features directly from raw images, eliminating the need for handcrafted preprocessing [8,10].
Widely used architectures—ResNet, DenseNet, VGG, and EfficientNet— offer differing trade-offs between depth and efficiency [13,14]. ResNet-50, for example, achieved an F1-score of 92% in nodule classification [13], and ensemble approaches further improved performance. EfficientNet-B4 excelled in histopathological classification of thyroid carcinomas [14,15].
Image segmentation and feature extraction
Advanced CNNs enable automated region of interest (ROI) segmentation and quantitative feature extraction [16]. They consistently identify hypoechoic patterns and irregular margins as malignant features [17-19]. Training on heterogeneous datasets mitigates variability across scanners and operators, enhancing robustness.
Vision Transformers (ViTs) and Advanced Architectures
Emerging vision-transformer architectures capture global spatial relationships through attention mechanisms [20]. ViTs have achieved performance comparable to CNNs in complex diagnostic tasks.
Multi-channel CNNs attain 90.9% accuracy in multi-class thyroid disease classification, while multi-scale detection networks reach 97.5% accuracy in nodule detection [4,20].
AI-Assisted Systems: Real-Time CAD, TI-RADS and K-TIRADS Integration, and Multimodal Approaches
AI-assisted systems combine algorithmic analysis with real-time ultrasound acquisition
CAD platforms analyze live ultrasound feeds and immediately highlight suspicious nodules, improving inexperienced physicians’ diagnostic accuracy from 73.82% to 76.44% (Table 1) [6,21].
Thyroid Imaging Reporting and Data System (TI-RADS)-based AI automatically classifies nodules according to American College of Radiology (ACR) TI-RADS criteria, achieving area under the curve (AUC) values around 0.91 and outperforming junior radiologists [18].
To reflect regional practice, the Korean Thyroid Imaging Reporting and Data System (K-TIRADS) provides analogous stratification standards widely adopted in East Asia [22,23]. Integration of AI with both ACR TI-RADS and K-TIRADS enhances consistency across international centers.
Commercial implementations—Koios DS and See-Mode Technologies—illustrate successful translation of research into clinical products [24,25]. Koios DS employs semi-automated ROI selection and ensemble CNNs, whereas See-Mode offers fully automated detection and reporting (Table 2).
Multimodal integration further extends diagnostic depth by fusing B-mode, Doppler, and elastography data with clinical variables [26,27]. Collectively, these AI-assisted solutions standardize assessment, reduce inter-reader variability, and improve efficiency in thyroid imaging workflows [28].
CLINICAL APPLICATIONS AND PERFORMANCE
Diagnostic Accuracy and Nodule Classification
AI-assisted ultrasound systems demonstrate substantial gains in diagnostic accuracy, often exceeding 90% and reaching >99% in validation cohorts [6,29,30]. Detection accuracies above 97% and AUCs approaching 0.985 have been reported [4].
Meta-analyses reveal pooled sensitivity 0.88 and specificity 0.81 for benign-malignant differentiation [9]. Integration of AI into clinical workflows benefits both junior and senior radiologists by reducing inter-observer variability and false-positive/negative rates [6,10,17].
Lymph Node Metastasis Prediction
Risk Stratification and Therapeutic Decision Support
Radiomic models quantify imaging biomarkers to predict tumor invasiveness and nodal metastasis [6]. AI-assisted TIRADS/K-TIRADS scoring reduces unnecessary FNA biopsies from 61.9% to 35.2% while maintaining accuracy [33].
Clinical decision-support systems integrate imaging and clinical data to generate treatment recommendations aligned with guidelines [34,35]. In one study, AI recommendations matched surgical decisions in 78.9% of cases [30], underscoring its utility in multidisciplinary care.
Real-World Clinical Validation
Large-scale multicenter validations (>20,000 patients) confirm that AI maintains diagnostic accuracy across heterogeneous ultrasound equipment [36].
A prospective study of 1,500 nodules reported sensitivity 96% and specificity 95%, with strong concordance with expert sonographers [29]. These findings affirm the robustness and clinical readiness of AI systems (Table 3).
CHALLENGES AND LIMITATIONS
Data Quality and Generalizability
Reproducibility and generalizability remain the foremost barriers to clinical translation. Nearly 90% of published AI studies rely on proprietary, single-center datasets with limited external access, preventing independent replication [30,34].
Performance often deteriorates on unseen data—for instance, ThyNet accuracy dropped from 89.1% to 64% on external validation [4]. Heterogeneity in equipment, operator technique, and patient populations introduces distribution shifts and spectrum bias [37].
Robust AI models require transparent preprocessing pipelines, diverse multicenter datasets, and open-source code sharing to enable true external validation [38,39].
Interpretability and Explainable AI (XAI)
The black-box nature of deep networks limits clinician trust [7,40]. XAI approaches such as Grad-CAM, saliency mapping, and LIME are increasingly adopted to visualize decision-making processes.
These methods enhance interpretability, allow correlation of algorithmic attention with sonographic features (e.g., microcalcifications, irregular margins), and strengthen physician confidence in AI-assisted diagnosis.
Clinical Workflow Integration and Operator Dependency
Integration with picture archiving and communication systems (PACS) and radiology reporting software remains inconsistent [30,41]. Without seamless connectivity, AI tools may lengthen interpretation time or disrupt established workflows.
Successful adoption requires interoperability with existing ultrasound consoles, minimal user interaction, and real-time processing compatible with clinical throughput [18,42].
Operator dependency persists even with AI: image quality still hinges on acquisition skill [37]. Standardized scanning protocols and structured AI training for clinicians are essential to sustain diagnostic reliability.
Regulatory and Ethical Frameworks
Regulatory approval for AI-based thyroid ultrasound systems necessitates rigorous validation to ensure clinical safety, efficacy, and reproducibility across diverse patient populations. These frameworks are essential to translate algorithmic innovation into safe, reliable clinical practice.
Global regulatory pathways
In the United States, the Food and Drug Administration (FDA) has pioneered the evaluation and clearance of AI-enabled imaging devices. FDA-cleared systems such as Koios DS and See-Mode Technologies underwent extensive premarket review processes confirming diagnostic accuracy, reproducibility, and risk management in accordance with Software as a Medical Device (SaMD) principles [24,25].
Similarly, in the European Union, the Medical Device Regulation (MDR) and certification experts (CE) marking processes emphasize continuous post-market surveillance, traceability, and transparency throughout the product lifecycle. These frameworks collectively ensure that AI-based diagnostic software meets both safety and ethical standards before widespread deployment.
Korean regulatory framework and Digital Medical Products Act (DMPA)
In South Korea, the regulation of AI-based thyroid ultrasound systems falls under the Ministry of Food and Drug Safety (MFDS), which serves as the primary authority for medical device software (SaMD) [43]. The MFDS is responsible for ensuring clinical safety and efficacy, having issued specialized guidance on both regulatory review and approval [44] and clinical trial design [45] for machine learning-enabled diagnostic devices.
This framework was institutionalized through the DMPA, which entered into force on January 24, 2025 [46]. The DMPA establishes a comprehensive, dedicated legal framework for digital medical products, defining requirements for classification, manufacturing, import authorization, and technical documentation [47].
Furthermore, compliance with the stringent Personal Information Protection Act (PIPA) is mandatory for the handling of sensitive health data [43]. The DMPA also mandates adoption of “security-by-design” principles, requiring manufacturers to conduct proactive cybersecurity risk assessments during software development [47].
This integrated regulatory landscape ensures that AI systems deployed in Korean healthcare institutions comply with global quality benchmarks while addressing local challenges, including PACS interoperability, hospital information system integration, and reimbursement structures. Such harmonization between regulatory oversight and clinical infrastructure facilitates both innovation and patient safety in real-world implementation.
Ethical, legal, and data governance considerations
Ethical and legal frameworks governing AI in healthcare remain in rapid evolution. Key issues include liability in cases of AI misdiagnosis, the requirement for informed consent when predictive or automated decision support tools are used, and the delineation of responsibility between human clinicians and algorithmic systems [30,41].
From a governance perspective, data integrity and privacy protection are paramount. AI training and deployment must comply with both local and international data privacy laws, including Health Insurance Portability and Accountability Act (HIPAA) in the U.S., European Union General Data Protection Regulation (GDPR), and PIPA/DMPA in Korea. The implementation of secure data pipelines, access control mechanisms, and auditable data logs ensures that patient information remains protected during model development and clinical use [26].
Transparency, fairness, and explainability
Algorithmic transparency and fairness are critical to maintaining clinician trust. The so-called “black box” problem limits interpretability, as deep learning models often yield accurate predictions without explaining their reasoning [7,40]. To address this, current research emphasizes XAI methodologies that provide human-understandable rationales for model outputs.
Equitable AI deployment also requires mitigating algorithmic bias, ensuring balanced performance across sex, age, and ethnic subgroups [26]. Independent validation using multi-institutional, demographically diverse datasets is essential for achieving fairness and inclusivity.
Ultimately, regulatory frameworks must evolve toward continuous oversight models that account for adaptive learning, enabling safe updates of AI algorithms post-approval while preserving accountability and ethical compliance.
FUTURE DIRECTIONS AND OPPORTUNITIES
Technological Advances
Enhanced diagnostic tools
Next-generation systems such as AI-SONIC™ are expanding training datasets to encompass diverse imaging and histopathologic data, improving differentiation between malignant and benign nodules [41,48].
Multimodal integration
Integrating B-mode, Doppler, and elastography with clinical, laboratory, and molecular data can provide holistic characterization of thyroid pathology [26,27]. Such fusion addresses the multifactorial nature of disease and may support personalized treatment pathways.
Edge computing and mobile deployment
Edge and mobile implementations will permit point-of-care AI on handheld or portable ultrasound devices, democratizing expert-level interpretation in community and low-resource settings [35,40].
Federated learning
Federated learning enables collaborative model training among hospitals without exchanging raw patient data, thus preserving privacy while enhancing generalizability [7].
Clinical Research Priorities and Standardization
Prospective clinical trials
Large-scale randomized controlled trials comparing AI-assisted versus conventional workflows are essential to quantify patient-outcome benefits and cost-effectiveness [18,42].
Standardization efforts
Professional societies should establish consensus guidelines on dataset annotation, validation metrics, and TI-RADS/K-TIRADS-aligned reporting to ensure reproducibility across institutions [32]. Uniform preprocessing disclosure will facilitate regulatory review and meta-analysis.
Education, Implementation, and Quality Assurance
Clinician training remains pivotal. Educational programs should emphasize AI interpretation, limitations, and bias awareness [37,42]. Continuous quality-assurance (QA) protocols— periodic validation, drift monitoring, and system recalibration— are needed to maintain accuracy [18].
Effective system integration depends on robust information technology (IT) infrastructure and staged deployment with stakeholder engagement across radiology, surgery, and compliance teams [26]. Transparent demonstration of clinical benefit mitigates resistance and supports organizational change management [42].
CONCLUSION
AI is redefining thyroid-ultrasound diagnostics by enhancing accuracy, reproducibility, and workflow efficiency. Deep-learning architectures—particularly CNNs—achieve performance exceeding 90% and often rival expert radiologists in nodule classification and lymph-node-metastasis prediction.
Clinical applications have progressed from basic CAD tools to fully integrated decision-support systems aligned with TIRADS and K-TIRADS, reducing unnecessary FNAs while maintaining diagnostic fidelity. Remaining challenges include dataset transparency, reproducibility, workflow interoperability, and regulatory harmonization.
Future directions point toward multimodal and federated models, prospective multicenter validation, and standardized reporting frameworks linking AI outputs directly to structured clinical decision pathways. The emergence of FDA- and MFDS-cleared systems demonstrates feasibility but underscores the need for unified international guidelines.
Sustained adoption will depend on education, QA, and evidence of economic value. Ultimately, AI should augment—not replace—clinical expertise, transforming thyroid ultrasound from a subjective art into a reproducible, data-driven science.
Notes
ACKNOWLEDGEMENTS
Figure 1 in this manuscript were generated with the assistance of ChatGPT-5 (OpenAI) based on author-provided prompts.
FUND
None.
CONFLICTS OF INTEREST
No potential conflict of interest relevant to this article was reported.
AUTHOR CONTRIBUTIONS
J.A. and J.H.H. designed the study. J.A. and J.H.H. were responsible for the data acquisition. J.K., J.A. and J.H.H. analyzed the data. J.K. and J.A. wrote the first draft of the manuscript. J.A. and J.H.H. critically revised the manuscript. J.H.H. supervised the project. All authors read and approved the final manuscript.
