1Service de Médecine Interne, Hôpital de Hautepierrre, Hôpitaux Universitaires de Strasbourg, Strasbourg, France.
2EA: Mitochondrie et Stress Oxydant, Faculté de Médecine, Université de Strasbourg, Strasbourg, France.
*Corresponding author: Emmanuel Andrès
Service de Médecine Interne, Hôpital de Hautepierrre, Hôpitaux Universitaires de Strasbourg, France.
Tel: 333-88-11-50-66, Fax: 333-88-11-62-62;
Email: emmanuel.andres@chru-strasbourg.fr
Received: Sep 30, 2025
Accepted: Oct 22, 2025
Published Online: Oct 29, 2025
Journal: Journal of Artificial Intelligence & Robotics
Copyright: © Andrès E (2025). This Article is distributed under the terms of Creative Commons Attribution 4.0 International License.
Citation: Andres E. Digital stethoscope and artificial intelligence: current state, evidence, technical foundations, clinical applications, limitations and future directions. J Artif Intell Robot. 2025; 2(2): 1029.
The stethoscope, a symbol of clinical medicine for two centuries, is undergoing a technological transformation. Modern digital stethoscopes convert acoustic signals to high-resolution digital waveforms and, when combined with signal processing and Machine Learning (ML)/Artificial Intelligence (AI), promise to improve the objectivity, reproducibility and diagnostic yield of auscultation. Accumulating evidence shows AI-assisted auscultation can detect heart murmurs, valvular disease, atrial fibrillation and selected pulmonary pathologies with performance that, in many settings, exceeds unaided clinicians. Large multicenter studies and regulatory clearances have begun to validate commercial algorithms and integrated platforms, and AI-enabled stethoscopes are now being evaluated in primary care and low-resource settings to improve screening and triage. Despite enthusiasm, important challenges remain heterogeneous recording conditions, variable datasets and labels, algorithm generalizability, clinical integration, liability/regulatory frameworks, privacy, and the risk of over-reliance on algorithms. This review summarizes the technological foundations of digital stethoscopes, the state of AI methods applied to heart and lung sounds, evidence from clinical studies and trials, datasets and evaluation practices, regulatory and ethical considerations, and a pragmatic roadmap for deployment and future research priorities.
Keywords: Digital stethoscope; Electronic stethoscope; Artificial intelligence; Machine learning; Deep learning; Heart sounds; Lung sounds; Murmur detection; Screening; Clinical decision support.
Auscultation remains an essential bedside skill in cardiology, pulmonology, and general practice. Traditional acoustic stethoscopes rely on human auditory perception and clinical experience; considerable inter-observer variability and declining auscultation skills among clinicians are well-documented [1-3]. Multiple studies have shown that even trained physicians frequently miss subtle murmurs or adventitious lung sounds, and diagnostic agreement between clinicians is inconsistent [3-5]. This decline has been attributed to reduced emphasis on physical examination during training and the increased reliance on imaging and laboratory investigations [6,7].
Digitization of auscultatory signals-via electronic stethoscopes, wearable sensors, or smartphone-based microphones—creates the opportunity to apply computational signal processing and Artificial Intelligence (AI) methods to detect, classify, and quantify pathologic sounds objectively [8-11]. Modern digital stethoscopes equipped with noise reduction, frequency filtering, and amplification improve sound clarity over acoustic devices, especially in noisy environments or telemedicine contexts [12-14]. Some devices include spectrogram visualization and wireless connectivity, enabling remote transmission and integration into electronic health records [15-17].
Recent years have seen rapid advances in supervised and deep learning applications applied to auscultation datasets. Convolutional neural networks, recurrent architectures, transformers, and hybrid time–frequency models have been trained on large repositories of labeled heart and lung recordings, achieving clinically relevant performance in controlled settings [18-21]. Algorithms can now detect murmurs, grade their intensity, classify wheezes and crackles, and even infer underlying conditions such as valvular heart disease, heart failure, or pulmonary fibrosis [18-20,22,23]. Some AI systems have demonstrated performance comparable to or superior to general practitioners in murmur detection and classification [18,21,24].
Several reviews and surveys have summarized methodological progress and outlined the path toward scalable AI-based auscultation [11,25-27]. These reviews highlight the importance of standardized datasets, robust validation frameworks, and the clinical heterogeneity of cardiopulmonary sounds [28-31]. Additionally, initiatives focused on remote monitoring, such as tele-auscultation and home-based follow-up for chronic diseases, illustrate the potential of combining digital stethoscopes with AI for improving accessibility and continuity of care [32-37].
This article reviews the technical foundations and clinical evidence for AI-enabled digital auscultation, synthesizes lessons from major studies, discusses datasets and validation practices, and examines regulatory, ethical, and implementation issues. By linking advances in signal acquisition, machine learning, and telemedicine infrastructures, we aim to contextualize the evolution of digital auscultation within the broader transformation of clinical diagnostics and remote care.
Historical evolution of auscultation and digital transition
Auscultation began with Laennec’s invention of the monaural stethoscope in 1816, marking the birth of mediated listening in clinical diagnosis [2,8]. Over the following two centuries, stethoscope design evolved mechanically—binaural tubing, bell/diaphragm heads, lighter materials—but the principle remained unchanged: sound waves transmitted through the air and interpreted solely by human hearing [1,2,38].
Decline in auscultatory proficiency
Despite its symbolic and clinical importance, multiple studies have documented a decline in auscultation skills among medical trainees and practicing physicians [1,3,4]. Mangione & Nieman showed that diagnostic accuracy for heart murmurs was below 50% in many internal medicine residents [3]. A recent systematic review confirmed that the sensitivity of auscultation for detecting valvular disease remains low compared to echocardiography [4]. The reduced teaching time dedicated to physical examination and the widespread availability of imaging technologies have contributed to this trend [6,7].
Emergence of electronic stethoscopes
The first electronic stethoscopes appeared in the latter half of the 20th century but remained niche due to cost, bulk, and low sound fidelity [2,8]. Progress in microelectronics, MEMS sensors, and digital signal processing enabled the development of devices with amplification, frequency filtering, and noise reduction [12,13,39]. Comparative trials demonstrated that modern digital stethoscopes outperform acoustic models in noisy environments and facilitate improved detection of low-intensity murmurs or fine crackles [13,14,38,40].
From sound capture to signal processing
Digital interfaces made it possible to visualize sounds using spectrograms, phonocardiograms, or waveforms, enhancing both diagnostic assessment and teaching [7,15,41]. Standardization efforts for computerized lung sound analysis emerged in the early 2000s [28-30], while parallel work explored digital cardiology applications [38,42,43]. These developments laid the groundwork for machine learning applications in auscultation by providing digitized and shareable datasets.
Tele-auscultation and remote use casess
Even prior to widespread AI, electronic stethoscopes were incorporated into telemedicine platforms for remote cardiac and pulmonary assessments [16,32,34,35]. During the COVID-19 pandemic, demand for remote monitoring accelerated adoption and acceptance of connected stethoscope technologies [10,16,33,37]. Remote auscultation has proven especially valuable in pediatrics, rural medicine, and chronic disease management [34-36,44].
Opening the door to AI
The availability of digitized heart and lung sounds, combined with advances in computational power and deep learning, enabled the emergence of AI-assisted auscultation [9,11,18,19,27]. What began as experimental work on murmur detection and wheeze classification has rapidly evolved into clinical validation studies, commercialization efforts, and early regulatory submissions [11,18,22]. These technologies now aim to supplement clinician interpretation, reduce diagnostic variability, and extend expertise to underserved settings.
Technical architecture of digital stethoscopes and implications for AI
Embedded in the chest piece—typically, a condenser microphone or piezoelectric contact sensor—captures the mechanical vibrations produced by cardiac or pulmonary activity. This signal is then routed through pre-amplifiers and anti-aliasing filters that condition the waveform before digitization. An Analog-to-DIGITAL Converter (ADC) samples the signal, with sampling rates generally ranging from 4 kHz to over 40 kHz depending on the intended use and frequency band of interest [9,11,12,42]. The resulting digital data may be stored locally, displayed in real time, or streamed wirelessly to a connected device or cloud service.
Several hardware parameters critically influence signal fidelity and therefore the downstream performance of AI models:
− Sensor type and sensitivity
Condenser microphones and piezoelectric transducers differ in frequency response, susceptibility to motion artefacts, and ambient noise pickup [12,14,39]. Piezoelectric contact sensors often perform better in noisy environments but require consistent skin contact. Comparative studies have shown that the choice of transducer affects not only amplitude and bandwidth, but also the reproducibility of recordings used to train AI systems [11,40].
− Analog filtering and dynamic range
Pre-processing steps such as anti-aliasing filters, gain staging, and amplification circuits determine how well subtle acoustic components—like S3 gallops or fine crackles—are preserved without clipping or distortion [13,42]. Devices with insufficient dynamic range or poorly tuned gain settings may compromise the interpretability of recorded sounds and reduce the accuracy of trained algorithms [38,43].
− Contact vs. non-contact recording
Chest piece-integrated microphones minimize ambient interference and improve signal-to-noise ratios compared with external or smartphone-based microphones [9,12,16]. However, they require consistent placement and adequate contact pressure, which introduces operator-dependent variability. Smartphone-based auscultation, though more accessible, faces challenges in standardization and background noise control [14,16].
− Sampling rate and bit depth
Higher sampling rates (e.g., 44.1 kHz) and bit depths (16-bit or more) allow more accurate spectral representation of heart and lung sounds, which is advantageous for AI models that exploit time–frequency features [11]. Nevertheless, higher resolution recordings generate larger files, increasing storage demands and wireless transmission requirements [9,42].
− Wireless transmission and latency
Bluetooth Low Energy, Wi-Fi, or cellular connectivity allow real-time tele-auscultation and cloud-based processing [16,32,34]. Yet, these links introduce potential latency, packet loss, and cybersecurity considerations that must be addressed to maintain clinical reliability and data integrity [16,37].
− Hardware diversity: Asset and obstacle
The proliferation of digital stethoscope platforms—from FDA-cleared devices to low-cost smartphone chest piece adaptors—broadens access and encourages innovation [2,9,26]. However, this heterogeneity complicates model generalizability. Algorithms trained on recordings from a single device may perform poorly on data acquired from stethoscopes with different acoustic profiles or sampling characteristics [11,14]. Reviews and technical evaluations emphasize the need for device-agnostic datasets, calibration protocols, and standardized acquisition guidelines to support robust AI deployment [9,11,14,42].
Signal processing pipelines and AI architectures
The transformation of raw auscultatory recordings into clinically meaningful predictions relies on a multistep computational pipeline. Signal enhancement, segmentation, and feature extraction are typically performed before classification or regression using machine learning or deep learning models. Advances in digital signal processing and neural architectures have enabled substantial improvements in the detection and interpretation of cardiopulmonary sounds [9,11,27] (Table 1).
Preprocessing and noise reduction
Recorded sounds often contain ambient noise, motion artefacts, and speech interference. Common preprocessing steps include bandpass filtering (typically 20-1000 Hz for heart sounds and 100-2000 Hz for lung sounds), wavelet denoising, adaptive filtering, and spectral subtraction [11,13,31,45]. Noise-reduction techniques have been shown to improve diagnostic performance, especially in emergency and telemedicine contexts where ambient noise is high [11,13,16,34]. Some commercial devices incorporate embedded filtering circuits or digital noise cancellation before digitization [12,40,42].
Segmentation and event detection
Accurate segmentation of cardiac cycles (S1, S2, systolic and diastolic intervals) and respiratory phases is a prerequisite for feature extraction. Hidden Markov Models (HMMs), envelope-based methods, and energy thresholding have historically been used for temporal localization. More recently, Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) have been trained to perform automated segmentation directly from raw or minimally processed waveforms [18,19,27].
Time–frequency feature representations
Traditional feature extraction relies on Mel-Frequency Cepstral Coefficients (MFCCs), Short Time Fourier Transforms (STFT), wavelet transforms, and spectrograms to encode the spectral and temporal structure of auscultatory signals [11,29,30]. These representations enable differentiation of murmurs, crackles, wheezes, and respiratory cycles. Time–frequency maps are particularly useful for deep learning pipelines, where 2D CNNs treat spectrograms as images. Several studies have demonstrated high discrimination of murmurs and adventitious lung sounds using log-mel spectrograms and continuous wavelet transforms as model inputs [15,18-20].
Convolutional neural networks (CNNs)
CNNs are the most widely used architecture for supervised classification of heart and lung sounds. By convolving kernels across spectrograms or raw waveforms, CNNs capture spatial and temporal acoustic patterns without hand-crafted features. Landmark studies have reported strong performance in murmur detection, valvular disease screening, and pulmonary pathology classification [15,18-20,27]. For example, Ribeiro et al. demonstrated that convolutional models trained on annotated phonocardiograms could identify valvular defects with clinically meaningful accuracy [19]. CNNs have also been combined with residual or inception modules to improve robustness.
Recurrent architectures (RNN, LSTM, GRU)
Because heart and lung sounds are intrinsically temporal, recurrent neural networks—particularly Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) models—are used to model sequential dependencies [17,20]. Hybrid CNN–LSTM pipelines first extract spatial features via convolution and then process temporal dynamics through recurrent layers. These approaches have been effective in classifying breathing cycles, detecting crackles, and grading murmurs over time, especially when datasets are limited or noisy.
Transformer models and attention mechanisms
More recently, transformer-based architectures incorporating self-attention have been applied to auscultation sounds, inspired by their success in speech processing and ECG analysis [11,27]. Attention layers can identify salient acoustic segments, enhance interpretability, and reduce reliance on predefined segmentation. While still emerging, transformer models show promise for multi-class classification and cross-domain generalization, particularly when combined with spectrogram encodings or raw waveform embedding’s.
End-to-end vs. feature-based approaches
Two paradigms have emerged. In feature-based models, preprocessing and spectral features feed classical classifiers (e.g., SVMs, random forests). In end-to-end approaches, raw audio or minimally processed signals flow directly into neural networks for joint feature learning and classification. End-to-end models often outperform traditional pipelines when sufficiently large, diverse, and annotated training datasets are available [11,18,20,27].
Explainability and interpretability
As AI integration progresses toward regulatory acceptance, model interpretability has become a priority. Techniques such as Grad-CAM, attention weight visualization, and saliency mapping are increasingly used to highlight acoustic regions driving predictions. These tools aim to foster clinician trust and bridge the gap between black-box outputs and bedside decision-making [11,27].
Overall, the convergence of advanced signal processing, deep feature learning and transformer-based architectures is reshaping digital auscultation. However, performance remains sensitive to dataset quality, recording hardware, and annotation standards [11,14], underscoring the need for harmonized protocols and multi-device validation.
| Method / Model | Input / Features | Clinical application | Dataset /Study | Performance | Reference (Pubmed) |
|---|---|---|---|---|---|
| Classical Ml (SVM,Random Forest, Gradient Boosting) | Handcrafted temporaland spectralfeatures (e.g., wavelet coefficients, peak frequencies) | Heart murmur detection | Wavelet + ANN, 2005 | Sensitivity 64.7%,Specificity70.5% | Andrisevic N, et al. Detection of heart murmurs using wavelet analysis and artificial neural networks. J Biomech Eng. 2005 Nov;127(6):899-904. doi:10.1115/1.2049327. PMID: 16438225. |
| CNN | Spectrograms orscalograms | Heart murmur,lung sound classification | 34hheart sound recordings | Sensitivity 76.3%,Specificity91.4% | Chorba JS, et al. Deep Learning Algorithm for Automated Cardiac Murmur Detection via a DigitalStethoscope Platform. J Am HeartAssoc. 2021 May4;10(9):e019905. doi: 10.1161/ JAHA.120.019905.Epub 2021 Apr 26. PMID:33899504; PMCID: PMC8200722. |
| CRNN (CNN+ RNN) | Spectrograms+ temporal sequences | Pediatric heart murmurdetection | Pediatric cardiacscreening dataset | Accuracy 90.0%,Sensitivity88.8%, Specificity 91.2% | Hsieh YT, et al. Development and validation of an integrated residual-recurrentneural network modelfor automated heartmurmur detection inpediatric populations. Sci Rep. 2025May 31;15(1):19155. doi: 10.1038/s41598-025-04746-2. PMID: 40450176; PMCID: PMC12126548. |
| Transformer-Based | Raw audio / time-series | Still’s murmurdetection | Pediatric auscultation | Sensitivity 90.7–100%,Specificity 75–98.2%,Accuracy 91.3–98.5% | Arjoune Y, et al. Advancing Point-of-Care Still's MurmurIdentification: Evaluating theEfficacy of ConvNets andTransformers Using the StethAidMulticenter Heart Sound Database. IEEETrans Biomed Eng. 2025 Sep 4; PP. doi: 10.1109/TBME.2025.3606341. Epubahead of print. PMID: 40907040. |
| Hybrid / Multimodal (Audio+ Ecg/ Metadata) | Audiosignals + ECG+ patient metadata | Still’s murmur& wheeze detection | StethAid platformdataset | Murmur:Sens 91.9%, Spec 92.6%; Wheeze:Sens 83.7%,Spec 84.4% | Arjoune Y, et al. StethAid: A Digital Auscultation Platform for Pediatrics. Sensors (Basel). 2023 Jun 20;23(12):5750. doi:10.3390/s23125750. PMID: 37420914;PMCID: PMC10304273. |
| Self-Supervised/ ContrastiveLearning | Unlabeled audio pretraining + fine-tuning | Heart/ lung soundclassification | Largeunlabeled auscultation repositories | Improved generalizability, performance gainsover supervised-only models | Various studies, ongoing |
Datasets and annotation practices
The development of reliable AI models for digital auscultation is critically dependent on the availability, size, and quality of annotated datasets. Unlike imaging or ECG repositories, cardiopulmonary sound databases remain fragmented, heterogeneous, and often poorly standardized, limiting algorithm generalizability and clinical translation [11,14].
Sources of auscultatory datasets
Existing datasets originate from a mix of clinical trials, academic repositories, device manufacturers, and telemedicine platforms. Early collections focused mainly on heart sounds recorded under controlled conditions using electronic stethoscopes or contact microphones [18,19]. Initiatives that are more recent have incorporated lung sound recordings, often collected in respiratory clinics or pediatric populations [15,20,31]. Low-resource studies, including peri-urban and telehealth implementations, have added smartphone-derived data with greater environmental variability [1-34].
However, few repositories provide balanced representation across age groups, comorbidities, and recording devices. The heterogeneity of sampling rates, sensor types, and recording locations complicates cross-study benchmarking and multi-device deployment.
Annotation modalities and expert labeling
Annotation typically involves manual labeling by clinicians (e.g., cardiologists, pulmonologists) or trained technicians, sometimes supported by consensus panels [1,8]. Labels may include:
− Cardiac phases (S1, S2, systole, diastole);
− Specific murmurs (systolic, diastolic, continuous);
- Grading of murmur intensity;
− Adventitious lung sounds (crackles, wheezes, rhonchi);
− Diagnostic endpoints (e.g., aortic stenosis, mitral regurgitation, ILD, COVID-19).
Multiple studies report substantial inter-observer variability in auscultation labels, even among experienced clinicians [3,4,15]. This variability directly affects model training and performance. Annotation protocols differ in granularity, ranging from binary normal/abnormal labels to sample-level time-stamped markings for segmentation tasks [15,18,20].
Standardization efforts
Attempts to establish consensus criteria for respiratory sound labeling emerged as early as the ERS task force on computerized respiratory sound analysis [29-31]. Similar initiatives for heart sound annotation remain limited. Some investigators have adopted standard murmur classifications from cardiology guidelines or integrated echocardiographic endpoints [18,19,38], but labeling consistency varies widely.
Technical guidelines promoting uniform sampling frequencies (8 kHz or higher), consistent chest placement, and documentation of device specifications are increasingly emphasized in digital auscultation research [14,33]. Yet, interoperability between datasets is still rare.
Automated and semi-automated annotation tools
To address the labor intensity of manual labeling, semi-automated tools using signal energy, envelope detection, or unsupervised clustering have been explored. Deep learning models themselves are now leveraged for pseudo-labeling and segmentation to accelerate annotation pipelines [11,27]. However, expert validation remains necessary to correct misclassifications and reduce bias propagation.
Multichannel annotation workflows combining digital stethoscope, ECG, PPG, or lung ultrasound signals are emerging to improve cardiac cycle alignment and respiratory phase labeling [17,20]. These multimodal approaches enhance labeling precision but require synchronized acquisition systems.
Dataset size, balance, and bias
The size of publicly or commercially available auscultation datasets remains modest compared to other clinical AI domains. Many published models are trained on fewer than a few thousand recordings, often dominated by specific diseases or demographics [18,19,27]. Class imbalance is common, with normal sounds and mild pathologies underrepresented relative to advanced disease.
Biases related to age, sex, body habitus, and device type can significantly alter auscultatory signatures, limiting generalizability when unaddressed [14,16,34]. Few studies implement stratified splits or cross-hospital validation to mitigate these effects.
Data sharing and privacy constraints
Regulatory and ethical considerations restrict open sharing of clinical audio data. Concerns around patient re-identification, consent, and proprietary device datasets hamper the creation of large public libraries [5,37]. Where sharing occurs, metadata (e.g., diagnosis, comorbidities, recording location) is often incomplete, reducing downstream utility.
Emerging privacy-preserving techniques, such as federated learning and anonymized audio compression, are being considered to enable multi-site collaborations without direct data pooling [11,27]. Nonetheless, practical adoption remains limited.
Clinical validation and real-world performance
The translation of AI-augmented digital stethoscopes from proof-of-concept studies to clinical use depends on rigorous validation in both controlled and real-world settings. Reported performance varies considerably depending on study design, population characteristics, comparator standards, and device heterogeneity (Table 2).
Diagnostic accuracy in cardiac applications
Multiple validation studies have evaluated AI-assisted murmur detection and valvular heart disease screening:
− Chorba et al. developed a deep learning algorithm trained on digital stethoscope recordings and reported a murmur detection sensitivity of 76.3% and specificity of 91.4%. When excluding very soft murmurs (grade 1), sensitivity increased to 90.0% [18]. The system achieved high accuracy for moderate-to-severe aortic stenosis (sensitivity 93.2%, specificity 86.0%) and mitral regurgitation (sensitivity 66.2%, specificity 94.6%).
− Ribeiro et al. demonstrated the application of convolutional neural networks to classify valvular heart diseases using heart sound recordings, marking an important step toward automated screening [19].
− Comparative clinical trials found that digital stethoscopes with filtering and amplification could approach or exceed the diagnostic yield of acoustic stethoscopes for valvular lesions [22,38].
− A systematic review by Davidsen et al. highlighted persistent variability in clinicians’ auscultation performance and emphasized the potential of AI to improve the detection of valvular disorders [4].
Several studies noted that AI-assisted auscultation may outperform generalist physicians or trainees in consistency and accuracy, especially in noisy environments or in early-stage disease [21,43].
Performance in pulmonary diagnostics
In pulmonary applications, deep learning applied to digital lung sounds has demonstrated clinically relevant performance:
− Lin et al. showed that spectrogram-enhanced digital auscultation improved concordance between clinicians for detecting fine crackles in interstitial lung disease [15].
− Deep learning approaches have been validated for the detection of adventitious respiratory sounds, with applications in asthma, COPD, COVID-19, and pulmonary fibrosis [20,31].
− Nguyen et al. reported agreement between remote and in-person auscultation for crackles and wheezes in COPD patients, supporting tele-auscultation reliability [44].
These findings demonstrate the feasibility of integrating AI-based lung sound analysis into both acute and chronic respiratory care.
Validation in pediatric and obstetric populations
Clinical validation also extends to vulnerable patient groups:
− Kumar et al. conducted a meta-analysis showing that digital stethoscopes improved the detection of murmurs and wheezes in children compared to standard auscultation [23].
− Adedinsewo et al. demonstrated that AI-guided auscultation significantly increased detection of peripartum cardiomyopathy, nearly doubling the identification of left ventricular dysfunction compared to standard care [34,37].
− Tele-auscultation studies in pediatrics showed strong agreement with in-person assessments, reinforcing their role in remote and primary care [35].
Real-world deployment and telemedicine
The integration of AI-enabled stethoscopes into telehealth workflows has been evaluated in both high- and low-resource settings:
− Patel and Sharma reported improved diagnostic confidence and consultation quality when digital stethoscopes were used in remote cardiopulmonary assessments [34].
− Saraya et al. compared modern digital stethoscopes and demonstrated high-quality auscultation even during remote monitoring, with minimal signal degradation across devices [40].
− Umeh et al. noted that home telemonitoring incorporating digital auscultation could reduce hospitalization risk and improve early detection of decompensation in heart failure [33].
Benchmarking against clinical standards
Echocardiography, CT, and pulmonary function testing remain the diagnostic gold standards against which AI-augmented auscultation is typically compared. While AI models show encouraging sensitivity and specificity, several limitations persist:
− Performance varies significantly based on training data diversity, noise environments, and patient comorbidities [11,40].
− Many validation studies remain single-center, retrospective, or limited in sample size, raising concerns about generalizability [4,36].
− Regulatory submissions (e.g., FDA 510(k)) increasingly require multicenter, prospective trials with device–algorithm pairing [1,8].
Generalizability and external validity
Heterogeneity in devices, sensors, sampling rates, and environmental conditions affects algorithm transferability [11,14]. Models trained on one digital stethoscope may show performance degradation when applied to another platform without calibration. Cross-device harmonization and standardized recording protocols are therefore essential before large-scale deployment.
| Clinical Application | Ai Method/ Model | Study / Dataset | Key Findings / Performance | Reference (Pubmed) |
|---|---|---|---|---|
| AorticStenosis (AS) | AI-basedauscultation withinfrasound | Validation group, moderate/severe AS | Sensitivity: 86%, Specificity: 100%;sensitivity by severity: mild 55%, moderate 76%,severe 93% | GhanayimT, et al. Artificial Intelligence-Based Stethoscope for the Diagnosis of Aortic Stenosis. Am J Med.2022 Sep;135(9):1124-1133. doi: 10.1016/j.amjmed.2022.04.032. Epub2022 May 28. PMID: 35640698. |
| MitralRegurgitation (MR) | Deep neuralnetwork | Cardiac auscultation dataset | Rapidassessment of MR severity; cost-effective for large-scale screening | Zhang L, et al. Developing an AI-assisted digital auscultationtool for automatic assessment of the severity of mitral regurgitation: protocol for across-sectional, non-interventional study. BMJ Open. 2024Mar 29;14(3):e074288. doi: 10.1136/bmjopen-2023-074288. PMID:38553085;PMCID: PMC10982737. |
| HeartFailure (HFREF) | Single-lead ECG+ AI | Digital stethoscope ECGrecordings | AUROC: 0.85, Sensitivity:84.8%, Specificity: 69.5%; enablespoint-of-care screening | BachtigerP, et al. Point-of-care screening for heart failure with reducedejection fraction usingartificial intelligence duringECG- enabled stethoscopeexamination in London, UK: a prospective, observational,multicentre study. Lancet Digit Health. 2022Feb;4(2):e117-e125. doi: 10.1016/S2589-7500(21)00256-9. Epub2022 Jan 5. PMID: 34998740; PMCID: PMC8789562. |
| Lung SoundAnalysis | Deeplearning (CNN / RNN) | Respiratory soundrecordings | Enablesstorage, sharing, and consultation; improves educational and diagnostic workflow | Huang DM,et al. Deep learning-based lungsound analysis for intelligentstethoscope. Mil Med Res. 2023 Sep 26;10(1):44. doi: 10.1186/s40779-023-00479-3. PMID: 37749643; PMCID: PMC10521503. |
| PulmonaryDisease Diagnostics | AI modelsfor sound classification | Lungsounds in asthma, COVID-19, ILD | Non-invasive diagnosis; supports early detection and monitoring | LellaKK, et al. Artificial intelligence-based framework to identify the abnormalities in the COVID-19 disease and othercommon respiratory diseases from digital stethoscope data using deepCNN. Health Inf Sci Syst. 2024 Mar 9;12(1):22. doi:10.1007/s13755-024-00283-w. PMID: 38469455;PMCID: PMC10924857. |
| Telemedicine/ Remote Care | AI-assisted auscultation | Remote patient monitoring datasets | Enables remote diagnosis and monitoringin underserved/ rural areas | Huang DM,et al. Deep learning-based lungsound analysis for intelligentstethoscope. Mil Med Res. 2023 Sep 26;10(1):44. doi: 10.1186/s40779-023-00479-3. PMID: 37749643; PMCID: PMC10521503. |
| Primary CareScreening | AI-guided auscultation | Pregnancy-related cardiomyopathy dataset | Improveddetection of cardiomyopathy in primary care; enhances workflow efficiency | ZhangM, et al. A Low-Cost AI-Empowered Stethoscope and a Lightweight Model for Detecting Cardiac and RespiratoryDiseases from Lungand Heart Auscultation Sounds. Sensors (Basel). 2023 Feb 26;23(5):2591. doi: 10.3390/s23052591.PMID: 36904794; PMCID: PMC10007545. |
Explainability and trust
The adoption of AI-augmented digital stethoscopes in clinical practice critically depends on model transparency and interpretability. Clinicians are more likely to rely on AI systems when the decision-making process is understandable. Techniques such as saliency maps, attention mechanisms, and layer-wise relevance propagation have been applied to highlight which segments of heart or lung sound recordings most influence AI predictions, thereby improving interpretability and fostering trust in clinical settings [11,17,27].
Standardized evaluation frameworks and benchmark datasets, exemplified by PhysioNet/CinC challenges, provide reproducible metrics for assessing algorithm reliability and facilitate comparison across devices and clinical contexts [11,27]. Explainable AI not only increases clinician confidence but also enables the identification of model failure modes, supporting iterative refinement and safe integration into high-stakes scenarios, such as early detection of valvular heart disease or subtle respiratory abnormalities [4,18,31].
Ethical and regulatory challenges
Integrating AI with digital stethoscopes raises significant ethical and regulatory considerations. Data privacy is paramount, particularly when sensitive physiological recordings are transmitted via wireless or cloud-based platforms; compliance with data protection regulations must be ensured [32,34].
Algorithmic bias is another critical issue. AI models trained on non-representative datasets risk perpetuating healthcare disparities, emphasizing the need for diverse and inclusive training cohorts to ensure equitable diagnostic performance across populations [21,23].
Regulatory approval processes for AI-enabled medical devices are rigorous and vary by jurisdiction. Devices must demonstrate safety, efficacy, and reproducibility through robust clinical validation studies before deployment in practice [18]. Transparent reporting, standardized benchmarks, and continuous post-market monitoring are essential to meet these requirements and maintain clinician and patient trust.
Future directions
The future of AI-augmented digital auscultation is poised for significant advancements across multiple domains.
Multimodal integration: Combining auscultatory audio signals with complementary diagnostic data—such as imaging, electrocardiograms, and electronic health records—may enable a more holistic assessment of patient health, enhancing diagnostic precision and supporting personalized care [17,20,27].
Real-time analysis: Improvements in computational efficiency, embedded AI, and cloud-based processing could allow for real-time interpretation of heart and lung sounds at the point of care. Such capabilities would facilitate immediate clinical decision-making, rapid triage, and dynamic monitoring of disease progression [11,24].
Global health applications: AI-powered digital stethoscopes hold particular promise for addressing healthcare disparities in low-resource and remote settings. By providing accessible, cost-effective, and standardized diagnostic tools, these devices could improve early detection of cardiovascular and pulmonary conditions, reduce reliance on scarce specialist resources, and support telemedicine initiatives [32,34,37].
Continued research should focus on robust multicenter clinical validation, device standardization, and integration of multimodal data streams to fully realize the transformative potential of AI-assisted auscultation across diverse healthcare contexts.
Digital stethoscopes integrated with Artificial Intelligence (AI) are reshaping clinical diagnostics by combining high-fidelity acoustic capture with advanced machine learning and deep learning algorithms [8,25]. These devices augment traditional auscultation, providing objective, reproducible, and accurate detection of cardiovascular and pulmonary abnormalities.
In cardiology, AI-assisted stethoscopes have demonstrated robust performance in detecting valvular heart diseases, including aortic stenosis and mitral regurgitation, as well as functional disorders such as heart failure with reduced ejection fraction [18,38]. Studies have shown that AI algorithms can achieve sensitivity and specificity comparable to—or exceeding—those of generalist clinicians, particularly in noisy environments or in early-stage disease [4,43].
In pulmonary medicine, AI-enabled auscultation supports the identification of wheezes, crackles, and other adventitious sounds, facilitating early diagnosis of conditions such as asthma, chronic obstructive pulmonary disease, interstitial lung disease, and COVID-19-related complications [15,41,44]. The use of spectrograms and time-frequency representations enhances the interpretability and diagnostic concordance of lung sound analysis [15,20].
Beyond traditional healthcare settings, AI-enhanced digital stethoscopes enable telemedicine and remote monitoring, allowing real-time auscultation in underserved regions and improving primary care screening for conditions such as peripartum cardiomyopathy and pediatric heart murmurs [32,34].
Despite these advances, challenges remain, including variability in data quality, device heterogeneity, model interpretability, regulatory approval, and ethical concerns such as patient privacy and potential algorithmic bias [14,27,40]. Addressing these issues requires multidisciplinary collaboration, standardized and representative training datasets, and rigorous clinical validation.
Future directions for AI-augmented auscultation include multimodal integration with electrocardiograms, imaging, and electronic health records to provide comprehensive patient assessments, real-time on-device analysis to support immediate clinical decision-making, and expanded global health applications to bridge gaps in low-resource settings [20,27]. With continued innovation, validation, and regulatory oversight, AI-assisted digital auscultation has the potential to transform modern healthcare delivery, enhancing diagnostic accuracy, accessibility, and patient outcomes [8,32,34,37].
Conflict of interest: The authors declare that they have no conflicts of interest.
Acknowledgements: The author wish to acknowledge the contributions of all the researchers and clinicians, who participated in academic collaborations with Alcatel as part of institutionally funded projects supported by the Région Grand Est and the French National Research Agency (ANR) “Technologie” program. We also pay tribute to Mr. Raymond Gass, whose partnership and scientific contributions were instrumental in advancing several of the studies cited in this work.