Abstract
Objectives
A first step in studying diagnostic delays is to select the signs, symptoms and alternative diseases that represent missed diagnostic opportunities. Because this step is labor intensive requiring exhaustive literature reviews, we developed machine learning approaches to mine administrative data sources and recommend conditions for consideration. We propose a methodological approach to find diagnostic codes that exhibit known patterns of diagnostic delays and apply this to the diseases of tuberculosis and appendicitis.
Methods
We used the IBM MarketScan Research Databases, and consider the initial symptoms of cough before tuberculosis and abdominal pain before appendicitis. We analyze diagnosis codes during healthcare visits before the index diagnosis, and use k-means clustering to recommend conditions that exhibit similar trends to the initial symptoms provided. We evaluate the clinical plausibility of the recommended conditions and the corresponding number of possible diagnostic delays based on these diseases.
Results
For both diseases of interest, the clustering approach suggested a large number of clinically-plausible conditions to consider (e.g., fever, hemoptysis, and pneumonia before tuberculosis). The recommended conditions had a high degree of precision in terms of clinical plausibility: >70% for tuberculosis and >90% for appendicitis. Including these additional clinically-plausible conditions resulted in more than twice the number of possible diagnostic delays identified.
Conclusions
Our approach can mine administrative datasets to detect patterns of diagnostic delay and help investigators avoid under-identifying potential missed diagnostic opportunities. In addition, the methods we describe can be used to discover less-common presentations of diseases that are frequently misdiagnosed.
Introduction
Diagnostic errors are an important cause of avoidable harms and increased healthcare costs [1, 2]. Yet, diagnostic errors represent a challenging area of research that relies on a range of methods (e.g., chart reviews or surveys) and measures (e.g., mortality, costs, or malpractice-claims) [3], [4], [5]. Delays in diagnosing a disease represent an important type of diagnostic error [6]. To study diagnostic delays, a growing body of research has utilized large administrative datasets, such as insurance claims or hospital discharge records generated for institutional or billing purposes. Such data have advantages for studying the diagnostic process. First, they tend to be less costly and easier to analyze compared to surveys or chart reviews that require additional collection and processing. Second, they often represent heterogeneous patient populations covering wide geographic areas. Third, these data often contain longitudinal information spanning multiple institutions and settings. Thus, patients who receive fragmented care can be studied across disconnected health systems.
Studying diagnostic delays, whether through chart review or using administrative records, requires researchers to define the criteria for identifying delays. Specifically, the types of antecedent healthcare visits, defined as visits that precede the index disease diagnosis (i.e. the initial diagnosis of the underlying disease) that represent a missed diagnostic opportunity. For example, visits where a patient presented with fever or cough in the weeks prior to a tuberculosis diagnosis may be considered as antecedent conditions signaling a potential missed diagnostic opportunity. Numerous studies have used this approach to study diagnostic delays with different administrative data sources [7], [8], [9], [10], [11], [12], [13], [14], [15], [16], [17], [18], [19], [20], [21], [22], [23], [24]. However, a challenge for defining antecedent conditions associated with a disease of interest is identifying which symptoms to evaluate. Including too few symptoms may underestimate the number of missed diagnostic opportunities. Failing to include rare symptoms may also lead to systematic biases in research design that excludes patient populations with atypical disease presentations. Such patients might also be the most susceptible to diagnostic delays [25].
In addition to considering symptoms, patients may be mistakenly diagnosed with diseases that share similar symptoms to the index disease (e.g., pneumonia instead of tuberculosis) [9, 10]; such visits also represent missed diagnostic opportunities and should be considered in the evaluation process. Furthermore, some of the most important missed opportunities to consider may occur when patients present with less common symptoms of disease. Alternatively, missed opportunities may occur because of atypical disease manifestations. If such visits are missed by clinicians, researchers may also fail to consider such antecedent visits as a potential missed opportunity. Because this selection process is intensive (e.g., exhaustive literature reviews) and may miss atypical manifestations, automated approaches may help to identify potential missed opportunities.
The goal of this paper is to demonstrate an exploratory machine-learning approach that can be used to aid the process of selecting the criteria (symptoms and symptomatically similar diseases) used to identify potential missed opportunities in subsequent study designs. Machine learning techniques are used in many applications as “recommender systems” to aid human selection; examples include online shopping and streaming services [26]. In the diagnostic delay context, pattern recognition techniques may detect visits appearing to match common trends of known diagnostic opportunities and suggest similar antecedent conditions for consideration. In this paper we present a methodological framework for using administrative data and unsupervised clustering approaches (i.e., machine learning techniques where patterns are uncovered from data without a defined or labelled outcome) to identify sets of diagnosis codes that may capture potential diagnostic opportunities. We apply our approach to the diseases of tuberculosis and appendicitis to demonstrate how antecedent signs and symptoms of these diseases can be discovered. We then evaluate the antecedent conditions recommended in terms of their clinical plausibility.
Methods
Temporal visit trends prior to diagnosis
A common observation in studies of diagnostic opportunities is an increasing trend of visits with symptoms of a given disease (e.g., chest pain before AMI) prior to the index diagnosis/visit. This increasing trend has been found for multiple diseases including stroke [8, 27], AMI [7, 22, 27], tuberculosis [9, 10], endemic fungal infections [16, 18, 19], and many others [11, 17, 20, 21]. Figure 1 depicts an example of this trend for tuberculosis, where visits for cough and fever increase prior to an index diagnosis. When nearing the index date of a given disease diagnosis, one expects to observe more healthcare encounters where patients present with signs and symptoms of the disease. Such symptomatic visits ultimately lead to diagnosis, and many of these encounters represent diagnostic opportunities.

Symptomatic visits for cough and fever prior to tuberculosis diagnosis. The number of patients with a healthcare visit for a given symptom are presented each day prior to the index tuberculosis diagnosis. Similar increasing patterns of symptomatic visits have been found across a wide range of diseases.
The goal of our methodological approach is to identify the set of antecedent conditions suggesting a given disease is present (i.e., diagnosis codes for symptoms or other diseases with similar symptoms) and where encounters for such conditions may represent diagnostic opportunities. Thus, our objective is to find conditions that satisfy the increasing trends depicted in Figure 1. We do so in a clinically-guided fashion where a single focal condition known to be associated with the disease of interest (e.g., cough prior to tuberculosis) is selected by an expert, then other conditions exhibiting similar trends recommended by the clustering approach are evaluated by the expert.
Data processing and curve fitting
For this study, we utilized data from the IBM MarketScan Research Databases from 2001 to 2015. These data represent longitudinal health insurance claims in the United States covering inpatient and outpatient settings. We identified all patients diagnosed with the diseases of interest below. We computed the number of healthcare visits every day for each of the top 500 most common ICD-9-CM diagnosis codes during the year before the index diagnosis. To remove unrelated observation effects and create comparable model fits across codes we applied two data transformations. First, we converted daily counts of healthcare visits to relative frequencies by dividing the daily counts by the number of patients with a healthcare visit each day prior to diagnosis. Supplemental Figure 1 depicts how this can eliminate unrelated conditions appearing to increase due to patient observation. Second, we normalized the relative frequencies by subtracting the minimum and dividing by the range for each code.
Next, we estimated the temporal trends in the daily relative frequencies leading up to diagnosis using a piecewise linear regression model, where separate linear trends are estimated before and after a certain number of days prior to diagnosis (i.e., change-point). We enforced a continuity requirement on the model so the fitted trend is continuous before and after the change-point. We used the Akaike Information Criterion to select the optimal change-point. We apply this procedure to fit separate models for each of the top ICD-9-CM codes. The estimated parameters of these models are used in the clustering process. Each resulting model contains 4 parameters: a change-point, intercept, and two slope parameters (before and after the change-point).
Guided unsupervised clustering
Unsupervised clustering is a category of machine learning techniques that can be used to identify natural clusters of data that share similar characteristics and a relatively common approach is the k-means algorithm. Figure 2 provides a visualization that demonstrates the theoretical basis for how k-means clustering is applied to visit trends prior to diagnosis.

Visual depiction of the theoretical reasoning behind the clustering algorithm. Figure A depicts the empirical pattern of healthcare visits for related symptoms before the index disease diagnosis (data pictured correspond to tuberculosis). There is an increase in symptomatic healthcare visits before diagnosis. The trend is estimated with two curves: The first segment (flatter) captures the period where clinical disease is unlikely to be present, the second segment (steeper) captures symptoms of clinical disease and potential missed opportunities. Figure B depicts how the k-means clustering algorithm is applied to the trends for each potential antecedent condition (note, this is an oversimplified depiction where only the two slope parameters are used to identify k=3 clusters). The central plot depicts examples of clusters of conditions identified based on the slope parameters. The plots on either side of the clustering graph depict examples of trends that might fit the patterns of conditions shown in the corresponding cluster colors. The cluster containing the focal condition, such as cough (highlighted in red) has a slope near zero in the first period, and a large positive slope in the period right before diagnosis. The other two clusters contain slopes that are near zero in both periods (blue) or have a lesser slope in the second period (green) compared to those in the focal cluster.
For each of the top ICD-9-CM codes in consideration, we applied the k-means clustering algorithm to the estimated parameters from the fitted models. Thus, we use the clustering algorithm to find clusters of ICD-9-CM codes that have trend parameters most similar to one another. We use the kmeans function as part of the stats package in the R programming language, with the default Hartigan and Wong algorithm [28]. We set an initial number of random cluster centers of 50 and a maximum number of iterations of 100. After identifying clusters for a given value k, we isolate the cluster containing the focal symptom of each disease, described below. Finally, we use both expert review and prior literature to determine which codes in the focal cluster are clinically plausible signs the disease may be present.
In order to provide a reproducible example, we have developed scripts for the R programming language that can be used to replicate all of the general approaches described in this analysis. This code can be found at https://github.com/aarmiller/diagnosis_cluster, along with synthetic data to demonstrate each stage of this analysis.
Example applications
To demonstrate the feasibility of our clustering approach, we selected two diseases with differing characteristics in trends of diagnostic opportunities – one where diagnostic delays occur over multiple weeks (tuberculosis) and the other where delays last multiple days (appendicitis). For each disease, we select a single well-known and common symptom to initiate the search process and select the focal cluster. We validated each case study populations by requiring one inpatient or >2 outpatient diagnoses to identify the index diagnosis.
Tuberculosis: Tuberculosis is a highly infectious communicable disease that is a major cause of morbidity and mortality, and is associated with over 1.4 million annual deaths worldwide [29]. Missed opportunities to diagnose tuberculosis are relatively common, often lasting up to a few months [9, 10, 30, 31], and have important public health implications [32]. Supplemental Table 1 provides the list of ICD-codes used to identify index cases of tuberculosis. The primary symptom associated with pulmonary tuberculosis selected as our focal condition is cough, ICD-9-CM code 786.2. Because the number of index cases of tuberculosis were relatively low and the delay duration can last multiple months, we aggregated daily counts of each ICD-9 code at a weekly level. We consider potential missed opportunities based on antecedent conditions within 90 days prior to the index diagnosis.
Appendicitis: Acute appendicitis is an inflammation of the appendix and one of the most common causes of emergency abdominal surgery [33, 34]. Most diagnostic delays occur within days, or at most a few weeks, of the index diagnosis [23, 35]. Delays in the diagnosis and treatment of appendicitis can be costly, potentially leading to perforated appendicitis [36, 37]. The primary symptom associated with appendicitis is abdominal pain, for which we used the ICD-9-CM code 789.00 (abdominal pain, unspecified site) as the focal condition. We consider potential missed opportunities based on antecedent conditions within 21 days prior to the index diagnosis.
Performance evaluation
To evaluate our approach, we compare different values for k from 2 to 25. For each of the resulting focal clusters we describe [1]: the size of the cluster [2]; the number of conditions that were clinically plausible, and [3]; the clinically-plausible precision, defined as the percent of conditions in a given cluster considered to be clinically plausible for diagnostic opportunities. Conditions were labelled as clinically plausible if they were either [1] a known sign or symptom of the disease or [2] a disease with similar symptoms (e.g., pneumonia instead of TB). To evaluate the stability of clusters for different values of k, we repeated our clustering approach 10,000 times for each k and report the 0.05 to 0.95 quantiles of resulting cluster measures. In addition, we compute the number of potential missed opportunities that may be captured using the cluster-based list of antecedent-conditions vs. the single focal symptom suggested by expert review (note, we refer to these as “potential” since the presence of symptoms during visits prior to diagnosis does not necessarily imply a missed opportunity).
Results
Example 1: Tuberculosis
We identified 19,423 patients with an index diagnosis of tuberculosis between 2001 and 2015. In total, these patients had 325,039 healthcare visit dates in the year prior to the index tuberculosis diagnosis. Figure 3 depicts the raw counts and relative frequency of visits for the focal symptom of cough prior to the index diagnosis. Figure 3 also depicts the fitted curve using the piecewise linear model, for which the objective of the clustering approach was to identify antecedent conditions with similar model parameters.

Visits for focal condition of cough prior to tuberculosis. The left figure depicts the raw counts. The right figure depicts counts converted to relative visit frequency along with the linear change-point model used to fit the trend and derive parameter estimates used for the cluster analysis.
When applying the clustering approach, it is necessary to select a number of clusters k from which to obtain recommendations from the focal cluster. Supplemental Figure 2 provides a summary of results in terms of the overall size, number of clinically-plausible conditions, and clinically-plausible precision of the resulting focal clusters, across different values of k and clustering trials. We selected k=11 as a type of kink-point in the tradeoff between cluster size and clinically-plausible precision. For k=11, the level of clinically-plausible precision achieves a near maximum at 70.8%, suggesting a potential reviewer would have few unrelated conditions to exclude. This focal cluster also results in a reasonably large number of clinically-plausible conditions (on average 34.1 conditions) recommended to a reviewer.
In total 88 conditions were identified as being clinically plausible across all clustering trials. The top 25 most frequently identified conditions are outlined in Table 1. This table also provides the frequency of each condition appearing in the focal cluster for k=11. As seen in Table 1, many of the commonly associated antecedent conditions, such as fever, hemoptysis, pneumonia or other pulmonary symptoms (pneumothorax, lung abscess, mass or neoplasm), were consistently identified across most focal clusters. Each of these top 25 conditions was selected in every focal cluster for k=11. Figure 4 depicts visit count trends for 9 of the top 25 antecedent conditions in the focal cluster.
Top 25 clinically-plausible antecedent conditions for tuberculosis recommended by the “cough” focal cluster.
ICD-9-CM code | Percent of all focal clusters that contained code | Percent of focal clusters for k=11 that contained code | Description |
---|---|---|---|
786.2 | 100 | 100 | Cough |
780.6 | 100 | 100 | Fever and other physiologic disturbances of temperature regulation |
786.30 | 100 | 100 | Hemoptysis, unspecified |
482.9 | 99.96 | 100 | Bacterial pneumonia, unspecified |
423.9 | 99.92 | 100 | Unspecified disease of pericardium |
784.2 | 99.11 | 100 | Swelling, mass, or lump in head and neck |
785.6 | 99.11 | 100 | Enlargement of lymph nodes |
793.1 | 99.11 | 100 | Lung field |
786.09 | 99.11 | 100 | Other respiratory abnormalities |
486 | 99.11 | 100 | Pneumonia, organism unspecified |
786.3 | 99.11 | 100 | Hemoptysis |
793.11 | 99.09 | 100 | Solitary pulmonary nodule |
518.81 | 99.08 | 100 | Acute respiratory failure |
793.19 | 99.05 | 100 | Other nonspecific abnormal finding of lung field |
518.82 | 98.71 | 100 | Other pulmonary insufficiency, not elsewhere classified |
235.7 | 98.69 | 100 | Neoplasm of uncertain behavior of trachea, bronchus, and lung |
511.9 | 98.67 | 100 | Unspecified pleural effusion |
482.89 | 91.91 | 100 | Pneumonia due to other specified bacteria |
239.1 | 91.85 | 100 | Neoplasm of unspecified nature of respiratory system |
780.60 | 85.92 | 100 | Fever, unspecified |
512.8 | 84.41 | 100 | Other pneumothorax and air leak |
518.89 | 84.39 | 100 | Other diseases of lung, not elsewhere classified |
786.6 | 84.39 | 100 | Swelling, mass, or lump in chest |
485 | 83.13 | 100 | Bronchopneumonia, organism unspecified |
513.0 | 80.9 | 100 | Abscess of lung |
-
Conditions are ordered by the percent of focal clusters containing the antecedent condition appeared, across 10,000 trials and values of k from 2 through 25 (for a total of 240,000 different clusters). See Supplemental Table 2 for the 63 remaining conditions that were identified as clinically plausible.

Examples of trends in top antecedent conditions selected in the “cough” focal cluster prior to tuberculosis. The black dots depict 7-day average counts of visits with the given diagnosis relative to visit frequency. The linear piecewise model used to fit the trend and derive parameter estimates for the cluster analysis is depicted by the red line (see Supplemental Figure 3 for the remaining top 25 conditions).
To evaluate the potential effectiveness of our approach for suggesting criteria to identify potential diagnostic delays, we evaluated the number of visits and patients that would be captured by different sets of antecedent conditions within 90 days of the index diagnosis. Supplemental Figure 4 depicts the range in potential missed opportunities and patients identified using the sets of clinically plausible antecedent conditions recommended by each trial. The number of potential missed opportunities using the focal symptom of “cough” was 4,382 healthcare visits from 2,842 patients. Expanding to the set of conditions identified in the kink-point cluster (k=11) resulted in 31,162 visits from 9,078 patients representing a potential missed opportunity. Using the entire set of 88 plausible antecedent conditions recovered from our cluster-based approach resulted in 49,063 visits from 11,386 patients representing a potential missed opportunity. Thus, the prevalence of potential diagnostic delays identified ranged from 14.6% of patients, using cough alone, to 58.6% using all clinically plausible conditions suggested by the clustering algorithm.
Example 2: Appendicitis
We identified 572,836 patients with an index diagnosis of appendicitis between 2001 and 2015 that had over 4.5 million healthcare visit days in the year prior to the index diagnosis. Figure 5 depicts the raw count, relative frequency and fitted curves of visits for the focal symptom of “unspecified abdominal pain” prior to the index appendicitis diagnosis.

Visits for focal condition of unspecified abdominal pain prior to appendicitis. The left figure depicts the raw counts. The right figure depicts counts relative to visit frequency along with the linear change-point model fit to the data. Note: Counts are only depicted over the 180 days prior to diagnosis to better illustrate the trend. The fitted curves rely on all 365 days prior to diagnosis.
Supplemental Figure 5 depicts the results of our cluster analysis across different values of k and clustering trials. For values k≥4 there was a consistently high level of clinically-plausible precision >90%. We selected the value k=4 as the kink-point in the tradeoff between clinically-plausible precision (92.6%) and cluster size (54 total conditions and 50 that were clinically plausible).
A total of 63 clinically plausible conditions were identified across trials; Table 2 presents the top 25 conditions. The clustering approach identified many diagnoses known to be associated with symptoms of appendicitis, including specific sites for abdominal pain, vomiting, fever, nausea, leukocytosis, intestinal infection, or symptomatically-similar diseases such as gastritis, and pancreatitis. All of the top 43 clinically plausible conditions appeared in every focal cluster for k=4. Figure 6 depicts visit count trends for 9 of the antecedent conditions selected from the 43 conditions in the focal cluster.
Top 25 clinically-plausible antecedent conditions for appendicitis recommended by the “unspecified abdominal pain” focal cluster.
ICD-9-CM code | Percent of all focal clusters that contained code | Percent of focal clusters for k=4 that contained code | Description |
---|---|---|---|
789.00 | 100 | 100 | Abdominal pain, unspecified site |
789.05 | 100 | 100 | Abdominal pain, periumbilic |
789.07 | 100 | 100 | Abdominal pain, generalized |
787.03 | 99.99 | 100 | Vomiting alone |
789.03 | 99.99 | 100 | Abdominal pain, right lower quadrant |
288.60 | 99.99 | 100 | Leukocytosis, unspecified |
535.50 | 99.99 | 100 | Unspecified gastritis and gastroduodenitis, without mention of hemorrhage |
789.06 | 99.99 | 100 | Abdominal pain, epigastric |
577.0 | 86.6 | 100 | Acute pancreatitis |
593.2 | 86.6 | 100 | Cyst of kidney, acquired |
646.83 | 86.6 | 100 | Other specified complications of pregnancy, antepartum condition or complication |
648.93 | 86.6 | 100 | Other current conditions classifiable elsewhere of mother, antepartum condition or complication |
V72.6 | 86.6 | 100 | Laboratory examination |
796.9 | 84.77 | 100 | Other nonspecific abnormal findings |
789.09 | 84.28 | 100 | Abdominal pain, other specified site |
558.9 | 62.81 | 100 | Other and unspecified noninfectious gastroenteritis and colitis |
780.60 | 62.81 | 100 | Fever, unspecified |
535.00 | 62.81 | 100 | Acute gastritis, without mention of hemorrhage |
787.01 | 60.06 | 100 | Nausea with vomiting |
787.02 | 59.63 | 100 | Nausea alone |
276.51 | 59.63 | 100 | Dehydration |
088 | 59.57 | 100 | Intestinal infection due to other organism, not elsewhere classified |
789.01 | 56.88 | 100 | Abdominal pain, right upper quadrant |
564.00 | 47.72 | 100 | Constipation, unspecified |
780.6 | 47.72 | 100 | Fever and other physiologic disturbances of temperature regulation |
-
Conditions are ordered by the percent of focal clusters containing the antecedent condition, across 10,000 trials and values of k from 2 through 25 (for a total of 240,000 different clusters). See Supplemental Table 4 for the remaining 38 conditions that were deemed clinically plausible.

Examples of selected trends in antecedent conditions contained in the “abdominal pain” focal cluster prior to appendicitis.
The black dots depict 7-day average counts of visits with the given diagnosis relative to visit frequency. The linear piecewise model used to fit the trend and derive parameter estimates for the cluster analysis is depicted by the red line (see Supplemental Figure 6 for the remaining top 25 conditions).
We computed the number of potential missed opportunities within 21 days prior to the index appendicitis diagnosis using different sets of antecedent conditions. Supplemental Figure 7 depicts the resulting number of potential missed opportunities and patients identified using the antecedent conditions for different values of k. The focal symptom of unspecified abdominal pain identified 49,371 potential missed opportunities from 41,596 patients. Expanding to the additional set of 50 antecedent conditions identified in the cluster k=4 resulted in 137,003 visits from 98,111 patients representing a potential missed opportunity. Using the entire set of plausible antecedent conditions recovered from our cluster-based approach resulted in 142,359 visits from 101,013 patients representing a potential missed opportunity. Thus, the potential prevalence of identified diagnostic delays among our study population ranged from 7.3% of patients, using abdominal pain alone, to 17.6%, using all clinically plausible conditions suggested by the clustering algorithm.
Discussion
In this study we proposed a machine learning approach to recommend symptoms or antecedent health conditions that may indicate potential missed diagnostic opportunities. Starting with a single well-described symptom (e.g., cough for tuberculosis) we used unsupervised k-means clustering to identify other conditions that exhibited similar visit patterns prior to the actual diagnosis. Our findings demonstrated that a large number of symptoms and conditions could be identified with a high degree of precision in terms of clinical plausibility. Moreover, inclusion of these additional conditions resulted in more than twice the number of potential diagnostic opportunities and patients being identified with a potential diagnostic delay compared to using a single common symptom.
Mining administrative data sources is a promising approach to study the diagnostic process, but methods for many of these approaches are still in the early stages of development [38]. The size of these datasets may offer the potential to make discoveries in the diagnostic process that inform the study of diagnostic errors. One particular issue that arises in the study of diagnostic errors is the need to use clinical expertise to define the criteria for healthcare visits that represent potential missed opportunities to diagnose a disease. We demonstrated a relatively simple machine-learning approach that can scan hundreds or thousands of diagnostic codes to identify those with similar patterns to known symptoms of disease. We show our approach can effectively recover a large set of clinically plausible conditions that could be used to detect diagnostic delays. Indeed, we found that the additional conditions recommended by our clustering-based approach significantly increased the potential number of diagnostic opportunities identified, compared to a single symptom. While such criteria must ultimately be refined based on clinical expertise, this exploratory process may significantly aid in the study of diagnostic delays. Our approach may also be applied to study designs involving non-administrative data sources; for example, by first identifying the set of conditions to search for during a retrospective chart review.
In addition to recommending antecedent conditions that might indicate diagnostic opportunities, similar techniques may also be integrated into other aspects of the diagnostic process, such as the development of diagnostic training exercises, clinical decision support systems, or trigger rules designed to flag potential diagnostic errors. Our approach could also be used to increase the current understanding of the natural history of diseases, by describing the frequency and temporal ordering of different symptoms prior to diagnosis. For example, this approach could be applied to discover which conditions appear weeks before a diagnosis vs. days before.
We demonstrated a simple k-means application using only limited clinical feedback (i.e., to select the focal cluster and evaluate clinical plausibility of recommended conditions). However, there are numerous ways to extend our methodology and more thoroughly integrate clinical expertise. For example, other types of event codes can be considered (e.g., procedure, medication), different curve fitting and change-point techniques can be applied, other information can be integrated into the feature space, and different clustering algorithms can be used (e.g., semi-supervised approaches). In addition, greater clinical feedback may be integrated into the analysis process. Recommendations can be made beyond a single focal cluster using multiple initial conditions suggested by experts. Sequential and iterative approaches may be used to expand, grow, or combine clusters with the feedback of expert reviewers, and unrelated data points or clusters may be excluded. Clusters might also be labeled by clinical reviewers as representing other aspects of disease, such as risk factors or triggering events (e.g., alcohol use or infection prior to stroke). Each of these represent possible future extensions of this approach.
There are a number of limitations to consider when using our approach and administrative data to study diagnostic delays. First, administrative data are generated for billing purposes; patterns that emerge may be the biproduct of the administrative data generating process and may omit information in the clinical record. Clinical expertise is critical for evaluation, and results should generally be regarded as exploratory or hypothesis generating. Second, different approaches may yield dramatically different results depending on the model fitting and clustering approaches used. Third, as with any machine-learning based recommendation system, computational resource costs may need to be considered and more advanced techniques may require additional computing resources. Fourth, these methods, especially the curve fitting approaches, require a sufficiently large number of observations. Smaller datasets may lack sufficient observations to obtain stable results across granular codes, and aggregation (e.g., using Clinical Classification Software codes) may need to be applied to identify related code sets. Finally, we used data from the United States and our results may not generalize to other locations.
Cluster-based approaches, coupled with large administrative data sources, may help discover patterns in the diagnostic process. The approach we presented provides an easy-to-implement recommender system that can allow future investigators to mine large databases for potential signals of disease and better study the diagnostic process. There remain a wide range of extensions to the proposed methodological framework. Future investigations should explore how this framework and other machine-learning based approaches may aid the discovery process for studying diagnostic delays.
Funding source: Agency for Healthcare Research and Quality
Award Identifier / Grant number: 5R01HS027375
-
Research funding: This work was supported by the Agency for Healthcare Research and Quality grant number 5R01HS027375.
-
Author contributions: All authors have accepted responsibility for the entire content of this manuscript and approved its submission.
-
Competing interests: Authors state no conflict of interest.
-
Informed consent: Not applicable.
-
Ethical approval: The University of Iowa Institutional Review Board deems this study to be human studies exempt.
References
1. Singh, H, Graber, ML. Improving diagnosis in health care-the next imperative for patient safety. N Engl J Med 2015;373:2493–5. https://doi.org/10.1056/nejmp1512241.Search in Google Scholar PubMed
2. Newman-Toker, DE, Schaffer, AC, Yu-Moe, CW, Nassery, N, Saber Tehrani, AS, Clemens, GD, et al.. Serious misdiagnosis-related harms in malpractice claims: the “Big Three” - vascular events, infections, and cancers. Diagnosis (Berl) 2019;6:227–40. https://doi.org/10.1515/dx-2019-0019.Search in Google Scholar PubMed
3. Zwaan, L, Singh, H. The challenges in defining and measuring diagnostic error. Diagnosis (Berl) 2015;2:97–103. https://doi.org/10.1515/dx-2014-0069.Search in Google Scholar PubMed PubMed Central
4. Graber, ML. The incidence of diagnostic error in medicine. BMJ Qual Saf 2013;22(2 Suppl):ii21–7. https://doi.org/10.1136/bmjqs-2012-001615.Search in Google Scholar PubMed PubMed Central
5. Singh, H, Sittig, DF. Advancing the science of measurement of diagnostic errors in healthcare: the Safer Dx framework. BMJ Qual Saf 2015;24:103–10. https://doi.org/10.1136/bmjqs-2014-003675.Search in Google Scholar PubMed PubMed Central
6. National Academies of Sciences E, Medicine. Improving diagnosis in health care. Washington, DC: National Academies Press; 2015.Search in Google Scholar
7. Moy, E, Barrett, M, Coffey, R, Hines, AL, Newman-Toker, DE. Missed diagnoses of acute myocardial infarction in the emergency department: variation by patient and facility characteristics. Diagnosis 2015;2:29–40. https://doi.org/10.1515/dx-2014-0053.Search in Google Scholar PubMed
8. Newman-Toker, DE, Moy, E, Valente, E, Coffey, R, Hines, AL. Missed diagnosis of stroke in the emergency department: a cross-sectional analysis of a large population-based sample. Diagnosis (Berl) 2014;1:155–66. https://doi.org/10.1515/dx-2013-0038.Search in Google Scholar PubMed PubMed Central
9. Miller, AC, Arakkal, AT, Koeneman, S, Cavanaugh, JE, Gerke, AK, Hornick, DB, et al.. Incidence, duration and risk factors associated with delayed and missed diagnostic opportunities related to tuberculosis: a population-based longitudinal study. BMJ Open 2021;11:e045605. https://doi.org/10.1136/bmjopen-2020-045605.Search in Google Scholar PubMed PubMed Central
10. Miller, AC, Polgreen, LA, Cavanaugh, JE, Hornick, DB, Polgreen, PM. Missed opportunities to diagnose tuberculosis are common among hospitalized patients and patients seen in emergency departments. Open Forum Infect Dis 2015;2:ofv171. https://doi.org/10.1093/ofid/ofv171.Search in Google Scholar PubMed PubMed Central
11. Hester, LL, Gifkins, DM, Bellow, KM, Vermeulen, J, Schecter, JM, Strony, J, et al.. Diagnostic delay and characterization of the clinical prodrome in AL amyloidosis among 1523 US adults diagnosed between 2001 and 2019. Eur J Haematol 2021;107:428–35. https://doi.org/10.1111/ejh.13679.Search in Google Scholar PubMed
12. Salazar, AS, Keller, MR, Olsen, MA, Nickel, KB, George, IA, Larson, L, et al.. Potential missed opportunities for diagnosis of cryptococcosis and the association with mortality: a cohort study. E Clin Med 2020;27:100563. https://doi.org/10.1016/j.eclinm.2020.100563.Search in Google Scholar PubMed PubMed Central
13. Surrey, E, Soliman, AM, Trenz, H, Blauer-Peterson, C, Sluis, A. Impact of endometriosis diagnostic delays on healthcare resource utilization and costs. Adv Ther 2020;37:1087–99. https://doi.org/10.1007/s12325-019-01215-x.Search in Google Scholar PubMed PubMed Central
14. Benedict, K, Lyman, M, Jackson, BR. Possible misdiagnosis, inappropriate empiric treatment, and opportunities for increased diagnostic testing for patients with vulvovaginal candidiasis-United States, 2018. Plos One 2022;17:e0267866. https://doi.org/10.1371/journal.pone.0267866.Search in Google Scholar PubMed PubMed Central
15. Chase, DM, Neighbors, J, Perhanidis, J, Monk, BJ. Gastrointestinal symptoms and diagnosis preceding ovarian cancer diagnosis: effects on treatment allocation and potential diagnostic delay. Gynecol Oncol 2021;161:832–7. https://doi.org/10.1016/j.ygyno.2021.03.021.Search in Google Scholar PubMed
16. Miller, AC, Arakkal, AT, Koeneman, SH, Cavanaugh, JE, Thompson, GR, Baddley, JW, et al.. Frequency and duration of, and risk factors for, diagnostic delays associated with histoplasmosis. Journal of Fungi 2022;8:438. https://doi.org/10.3390/jof8050438.Search in Google Scholar PubMed PubMed Central
17. Miller, AC, Koeneman, SH, Arakkal, AT, Cavanaugh, JE, Polgreen, PM. Incidence, duration, and risk factors associated with missed opportunities to diagnose herpes simplex encephalitis: a population-based longitudinal study. Open Forum Infect Dis 2021;8:ofab400. https://doi.org/10.1093/ofid/ofab400.Search in Google Scholar PubMed PubMed Central
18. Benedict, K, Kobayashi, M, Garg, S, Chiller, T, Jackson, BR. Symptoms in blastomycosis, coccidioidomycosis, and histoplasmosis versus other respiratory illnesses in commercially insured adult outpatients, United States, 2016-2017. Clin Infect Dis 2020;73:e4336–44. https://doi.org/10.1093/cid/ciaa1554.Search in Google Scholar PubMed PubMed Central
19. Benedict, K, Beer, KD, Jackson, BR. Histoplasmosis-related healthcare use, diagnosis, and treatment in a commercially insured population, United States. Clin Infect Dis 2020;70:1003–10. https://doi.org/10.1093/cid/ciz324.Search in Google Scholar PubMed PubMed Central
20. Nassery, N, Horberg, MA, Rubenstein, KB, Certa, JM, Watson, E, Somasundaram, B, et al.. Antecedent treat-and-release diagnoses prior to sepsis hospitalization among adult emergency department patients: a look-back analysis employing insurance claims data using Symptom-Disease Pair Analysis of Diagnostic Error (SPADE) methodology. Diagnosis (Berl) 2021;8:469–78. https://doi.org/10.1515/dx-2020-0140.Search in Google Scholar PubMed
21. Gold, JAW, Jackson, BR, Benedict, K. Possible diagnostic delays and missed prevention opportunities in pneumocystis pneumonia patients without HIV: analysis of commercial insurance claims data-United States, 2011-2015. Open Forum Infect Dis 2020;7:ofaa255. https://doi.org/10.1093/ofid/ofaa255.Search in Google Scholar PubMed PubMed Central
22. Sharp, AL, Baecker, A, Nassery, N, Park, S, Hassoon, A, Lee, MS, et al.. Missed acute myocardial infarction in the emergency department-standardizing measurement of misdiagnosis-related harms using the SPADE method. Diagnosis (Berl) 2021;8:177–86. https://doi.org/10.1515/dx-2020-0049.Search in Google Scholar PubMed
23. Mahajan, P, Basu, T, Pai, CW, Singh, H, Petersen, N, Bellolio, MF, et al.. Factors associated with potentially missed diagnosis of appendicitis in the emergency department. JAMA Netw Open 2020;3:e200612. https://doi.org/10.1001/jamanetworkopen.2020.0612.Search in Google Scholar PubMed PubMed Central
24. Liberman, AL, Newman-Toker, DE. Symptom-Disease Pair Analysis of Diagnostic Error (SPADE): a conceptual framework and methodological approach for unearthing misdiagnosis-related harms using big data. BMJ Qual Saf 2018;27:557–66. https://doi.org/10.1136/bmjqs-2017-007032.Search in Google Scholar PubMed PubMed Central
25. Bjerager, M, Palshof, T, Dahl, R, Vedsted, P, Olesen, F. Delay in diagnosis of lung cancer in general practice. Br J Gen Pract 2006;56:863–8.Search in Google Scholar
26. Park, DH, Kim, HK, Choi, IY, Kim, JK. A literature review and classification of recommender systems research. Expert Syst Appl 2012;39:10059–72. https://doi.org/10.1016/j.eswa.2012.02.038.Search in Google Scholar
27. Waxman, DA, Kanzaria, HK, Schriger, DL. Unrecognized cardiovascular emergencies among medicare patients. JAMA Intern Med 2018;178:477–84. https://doi.org/10.1001/jamainternmed.2017.8628.Search in Google Scholar PubMed PubMed Central
28. Hartigan, JA, Wong, MA. Algorithm as 136: a K-means clustering algorithm. J R Stat Soc Series C 1979;28:100–8. https://doi.org/10.2307/2346830.Search in Google Scholar
29. Chakaya, J, Khan, M, Ntoumi, F, Aklillu, E, Fatima, R, Mwaba, P, et al.. Global tuberculosis report 2020 - reflections on the global TB burden, treatment and prevention efforts. Int J Infect Dis 2021;113(1 Suppl):S7–12. https://doi.org/10.1016/j.ijid.2021.02.107.Search in Google Scholar PubMed PubMed Central
30. Wallace, RM, Kammerer, JS, Iademarco, MF, Althomsons, SP, Winston, CA, Navin, TR. Increasing proportions of advanced pulmonary tuberculosis reported in the United States: are delays in diagnosis on the rise? Am J Respir Crit Care Med 2009;180:1016–22. https://doi.org/10.1164/rccm.200901-0059oc.Search in Google Scholar PubMed
31. Loutet, MG, Sinclair, C, Whitehead, N, Cosgrove, C, Lalor, MK, Thomas, HL. Delay from symptom onset to treatment start among tuberculosis patients in England, 2012-2015. Epidemiol Infect 2018;146:1511–8. https://doi.org/10.1017/s095026881800167x.Search in Google Scholar PubMed PubMed Central
32. Mindra, G, Wortham, JM, Haddad, MB, Powell, KM. Tuberculosis outbreaks in the United States, 2009-2015. Public Health Rep 2017;132:157–63. https://doi.org/10.1177/0033354916688270.Search in Google Scholar PubMed PubMed Central
33. Buckius, MT, McGrath, B, Monk, J, Grim, R, Bell, T, Ahuja, V. Changing epidemiology of acute appendicitis in the United States: study period 1993-2008. J Surg Res 2012;175:185–90. https://doi.org/10.1016/j.jss.2011.07.017.Search in Google Scholar PubMed
34. Pittman-Waller, VA, Myers, JG, Stewart, RM, Dent, DL, Page, CP, Gray, GA, et al.. Appendicitis: why so complicated? Analysis of 5755 consecutive appendectomies. Am Surg 2000;66:548–54.10.1177/000313480006600605Search in Google Scholar
35. Choi, JY, Ryoo, E, Jo, JH, Hann, T, Kim, SM. Risk factors of delayed diagnosis of acute appendicitis in children: for early detection of acute appendicitis. Korean J Pediatr 2016;59:368–73. https://doi.org/10.3345/kjp.2016.59.9.368.Search in Google Scholar PubMed PubMed Central
36. Papandria, D, Goldstein, SD, Rhee, D, Salazar, JH, Arlikar, J, Gorgy, A, et al.. Risk of perforation increases with delay in recognition and surgery for acute appendicitis. J Surg Res 2013;184:723–9. https://doi.org/10.1016/j.jss.2012.12.008.Search in Google Scholar PubMed PubMed Central
37. Glerum, KM, Selbst, SM, Parikh, PD, Zonfrillo, MR. Pediatric malpractice claims in the emergency department and urgent care settings from 2001 to 2015. Pediatr Emerg Care 2021;37:e376–9. https://doi.org/10.1097/PEC.0000000000001602.Search in Google Scholar PubMed
38. Singh, H, Bradford, A, Goeschel, C. Operational measurement of diagnostic safety: state of the science. Diagnosis 2021;8:51–65. https://doi.org/10.1515/dx-2020-0045.Search in Google Scholar PubMed
Supplementary Material
The online version of this article offers supplementary material (https://doi.org/10.1515/dx-2022-0044).
© 2022 Walter de Gruyter GmbH, Berlin/Boston
Articles in the same Issue
- Frontmatter
- Editorial
- The future of diagnosis – where are we going?
- Articles
- Improving diagnosis: adding context to cognition
- The cognitive apprenticeship: advancing reasoning education by thinking aloud
- Teaching the science of uncertainty
- The future of teaching management reasoning: important questions and potential solutions
- Self-regulated learning and the future of diagnostic reasoning education
- Imagining the future of diagnostic performance feedback
- Improving diagnostic decision support through deliberate reflection: a proposal
- A clinically-guided unsupervised clustering approach to recommend symptoms of disease associated with diagnostic opportunities
- “Cephalgia” or “migraine”? Solving the headache of assessing clinical reasoning using natural language processing
Articles in the same Issue
- Frontmatter
- Editorial
- The future of diagnosis – where are we going?
- Articles
- Improving diagnosis: adding context to cognition
- The cognitive apprenticeship: advancing reasoning education by thinking aloud
- Teaching the science of uncertainty
- The future of teaching management reasoning: important questions and potential solutions
- Self-regulated learning and the future of diagnostic reasoning education
- Imagining the future of diagnostic performance feedback
- Improving diagnostic decision support through deliberate reflection: a proposal
- A clinically-guided unsupervised clustering approach to recommend symptoms of disease associated with diagnostic opportunities
- “Cephalgia” or “migraine”? Solving the headache of assessing clinical reasoning using natural language processing