In this study, we applied a machine learning approach to classify major headache disorders using questionnaires completed by patients in a real-world setting. We found that machine learning is applicable in analyzing questionnaires. The performance of the machine learning approach in the classification of migraine was excellent however, its accuracy in classifying headache disorders other than migraine was inferior to that in classifying migraine. Nonetheless, our automated classification results could be still meaningful as the gold standard for the diagnosis of headache is a manual skillful application of the current classification criteria (currently ICHD-3, published in 2018). In the era of ICHD-3, there have been no studies evaluating the reliability and accuracy of the diagnosis of primary headache disorders made by primary care providers or general non-headache neurologists. Furthermore, there has been no classification methods other than ICHD-3.
Our study is one of the first studies to apply machine learning in the analysis of patient-reported questionnaires to classify primary headache disorders7. The diagnosis of headache disorders requires a skillful interview with patients and a comprehensive decision algorithm. We tested whether machine learning can substitute the role of the clinical interview. However, the samples of each headache disorder other than migraine and TTH were insufficient for the training. Headache disorders or syndromes other than migraine and TTH were merged into broader categories such as epicranial headaches or TCHs, which was not ideal for the detailed classification of second- or third-digit ICHD codes. In addition, secondary headaches other than those causing TCHs were excluded from the analysis since they cannot be incorporated into one entity. Secondary headaches should be diagnosed by clinical courses and causative workups rather than headache features. Taken together, our approach could not replace physician-based diagnosis due to insufficient results. However, this study demonstrated the feasibility of developing a better algorithm-based automated classification for headache disorders. Besides, our results might be used to inform or assist physicians by pre-screening with the most important factors of the stacked classifier (i.e., Table 2) or increasing the accuracy of less-specialized providers.
Our approach adopted a stacked XGBoost classifier that resulted in an overall accuracy of 81%, sensitivity and specificity of over 87% in the diagnosis of migraine. Our results were superior to the results from a previous study in which more selective data were used7. Existing studies on the classification of headache disorders with machine learning have focused on a few selected headache disorders such as migraine and tension-type headache due to challenges with sample size7,8. Previous studies used the random forest for classification however, our study adopted the XGBoost. XGBoost belongs to the boosting classifiers in which both the variance and bias of the classifier is reduced, while random forest belongs to the bagging classifiers in which only the variance of the classifier is reduced13. XGBoost has shown improved performance in many recent machine learning challenges where high-dimensional features were involved. The performance of XGBoost in classifying migraine was superior in our study because migraine is characterized by diverse features which cannot be fully incorporated in conventional statistical models, due to the complexity and challenge of multiple testing. Manual analysis even by human experts, may be time-consuming and prone to errors. However, with the automated classification algorithm suggested by this study, multiple features of headache disorders can be systematically identified. This automated classification algorithm is thus time efficient and could minimize human error in the diagnosis of headache disorders.
Our stacked classification model well reflected features of each headache disorder. Top three features used in our classification model show insights into each headache disorder when compared to the ICHD-3 criteria1. First, the mode of onset was important in migraine, TTH, and epicranial vs. TCH classifiers. This important feature should be always considered in the differential diagnosis of secondary and primary headaches, but it has not been listed in the ICHD-3 criteria for migraine, TTH, and epicranial headaches1. While migraine and TTH are typical examples of gradual-onset headaches, thunderclap onset is the most important syndrome-defining features of TCH as its nomenclature implies. For TAC, the mode of onset was not included in the classifier, as most patients with TAC experiences a relatively rapid evolution of headache attack. Second, the demographic feature was also important, while the ICHD only deals with headache characteristics. For example, female sex was ranked as the second important feature of classifying migraine. This may suggest that the female predominance is more robust in migraine than in other primary headache disorders at least in clinic-based samples. Third, the nature of pain was important in TTH and epicranial vs. TCH classifiers: vague and/or cloudy nature of pain for TTH and electric-shock like and jabbing natures for epicranial headaches. These features well reflect the nature of corresponding headache disorders, although they are different from features listed in ICHD-3 criteria1. The ICHD-3 denotes pressing or tightening quality of pain as features of TTH and stabbing, shooting, or sharp quality of pain as epicranial (primary stabbing headache or occipital neuralgia) headaches1. However, these features may be less useful in the differential diagnosis as they can co-exist in migraine attacks, TACs, and even TCHs in the real world. Fourth, the presence or absence of autonomic symptoms was important in differentiating migraine and TACs. The ICHD-3 also denotes autonomic symptoms as characteristic features of TAC1. Although autonomic symptoms can accompany migraine attacks, they are less prominent when compared to those of TACs17. Finally, sleep-awakening hypnic attacks were important in the TAC classifier. The time of headache attack has not been included in the ICHD-3. However, most of the primary headaches other than TAC tend to regress during sleep. In summary, our data showed these features can have greater relative weights in the differential diagnosis between primary headache disorders even though they are not listed as or different from syndrome-defining features in the ICHD-31.
To apply our study results to clinical practice, it should be kept in mind that secondary headache disorders were excluded in this model. This may have some clinical implications: in addition to clinical history, biochemical, radiological, or sometimes histologic evaluations are needed to rule out secondary headache syndromes. Historically, the clinical course rather than headache characteristics has been more important, whilst this cannot be easily captured by the questionnaire. Still, we explored whether automated classification was possible using the same approach. The classification performance was unsatisfactory as shown in the Supplement.
Our study has some limitations. First, the results were derived from data from a single center. Thus, our results need to be validated in an independent cohort study. Second, we applied conventional machine learning approaches in this study. Deep learning could be thought of as a high degree-of-freedom extension of conventional machine learning which has significantly improved classification performance in many domains11,18. Deep learning could be certainly applied in headache research and we believe the autoencoder network could be effective. Autoencoder network is capable of handling high-dimensional features that are correlated and can also learn low-dimensional feature embedding that is robust to noise. The features used in headache were high-dimensional (e.g., 128 dimensions) and could have a substantial correlation among them due to how the features were designed in this study. We plan to pursue research in this direction in the future.
We presented a method to classify subtypes of primary headache by fusing four XGBoost classifiers in a stacked fashion. Each classifier captured important characteristics for the target subtype in a data-driven approach. Existing studies were insufficient as they only considered fewer subtypes and reported worse classification performance than ours. Thus, although our approach was effective for the migraine subtype only, we believe our study is a first step towards building a comprehensive computer-aided diagnosis model for headaches. The software code for this study is open and can be adopted by other researchers to foster novel machine learning research in the migraine field.