Mental Health

The ongoing hunt for biomarkers: Can machine learning help?

November 10, 2025

Psychiatry has long been plagued by the fact that despite diagnoses of things like depression and anxiety being considered distinct disorders, they tend to correlate with each other and co-occur in the same individuals (referred to as comorbidity (McGrath, J. J. et al, 2020)). This overlap – the difficulty in distinguishing disorders from each other – becomes even more of a problem when trying to disentangle diagnoses that share some of the same symptoms, such as major depressive disorder (MDD) and bipolar disorder (BD).

MDD is characterised, amongst other things, by persistent episodes of depressed mood and anhedonia (lack of interest or pleasure) (Marx, W. et al. 2023). BD, formerly known as ‘manic depression’, is also characterised by prolonged episodes of depression, but sufferers also experience episodes of hypermania, where periods of intense elation, energy, and activity are present in addition to periods of low mood or depression (NIMH, 2025).

Despite the fact that these two disorders are quite distinct from each other, the shared experience of depressive episodes puts BD patients at risk of being misdiagnosed as having MDD. The misdiagnosis rate between MDD and BD is high, with estimates that the majority (60%) of BD patients first receive an incorrect MDD diagnosis (Calesella, F. et al., 2025). In addition to this being potentially distressing and confusing for the patient, misdiagnosis can also hinder individuals from accessing the appropriate care and treatment for their illness.

This new brain imaging study used machine learning (ML) prediction models to explore whether looking at connectivity in the brain regions of people living with either MDD or BD can help us better differentiate between these disorders (Calesella, F. et al., 2025).

High misdiagnosis rates between bipolar and major depressive disorder highlight the need for better diagnostic tools. A new study explores whether brain connectivity and machine learning can help.

Methods

This study used various techniques to investigate whether brain activity can be used to differentiate MDD and BD. The researchers recruited 201 people to the IRCCS San Raffaele Hospital in Italy, consisting of a healthy control group (n=76), an MDD group (n=62), and a bipolar depression group (n=63). Various clinical instruments were used to measure presence of current and previous depression symptoms.

Participants underwent resting state functional magnetic resonance imaging (fMRI) scanning to measure brain activity at rest. Features like (i) measures of activation between different parts of the brain and (ii) activity in specific brain regions the researchers believed may be implicated in depressive neuropathology were extracted.

The study then explored the use of a support vector machine (SVM) ML model, a type of predictive ML used to separate the sample into different groups based on the neurological features described earlier. They built several SVM models trained on different types of neuroimaging data. If using a specific type of neurological data manages to splice the sample into distinct groups, and the majority of participants within that group also have the same diagnosis as each other, then it arguably serves as evidence that those neurological data contain information about the underlying aetiology of these diseases. This stratification using the SVM model is evaluated using a range of accuracy measures which explore the model’s ability to correctly identify people with the same diagnosis.

Results

There were some demographic differences noted between the different patient groups. MDD patients were older and had a later onset of diagnosis than bipolar patients. The healthy controls were younger and had a higher level of academic attainment. The groups did not differ with regards to sex, illness duration, and medication load (defined as how many low dosage or high dosage medications were used).

Only one ML model managed to successfully discriminate between MDD and BD when results were analysed for statistical accuracy. This model was trained on seed-based connectivity (SBC) data, a technique where connectivity between a specific region (e.g., a section of the amygdala, the part of the brain which processes fear stimuli and is implicated in memory processes) and the rest of the brain is evaluated.

They found that connectivity maps in areas of the brain involved in reward, motivation, and memory were particularly important for prediction. Interestingly, these are areas which have been previously highlighted as having potential relevance for BD.

This model achieved a balanced accuracy of 66.2 and an area-under-the-curve score of 0.71 (see Fraser, H. 2024 and Hagenberg, J. 2024 for a description of what these metrics mean). The model was able to identify BD patients with a sensitivity of 69.36%. These features were then used to train additional models to evaluate the performance of these features alone and performed similarly.

None of the models trained on other types of data achieved an accuracy that was statistically significant after comparing the performances to chance.

Seed-based brain connectivity helped one machine learning model distinguish bipolar from depression, with predictive features linked to reward and memory regions. Other models showed no significant accuracy.

Conclusions

The authors concluded that their study successfully addressed some of the previous limitations of similar approaches in this area, which suffered from methodological issues such as small sample size and confounding factors. They successfully identified key regions of interest using a predictive model trained on SBC neuronal map data, but overall conclude that:

Although our results show that [alterations in the reward system] can significantly differentiate between MDD and BD, the performance remains modest at 66.2% accuracy.

They then continue to discuss how generalising findings from previous literature in this area is challenging due to the variability in sample size and analysis procedures used between different studies.

The authors conclude that while reward-related brain activity can significantly differentiate between bipolar disorder and major depression, the model’s modest accuracy and variability across studies limit its clinical utility.

Strengths and limitations

The researchers went to great efforts here to understand the limitations of the current evidence base in this area. They highlighted how other studies use models trained on data sets that likely are too small to obtain any generalisable insight from. They also accounted for a large amount of clinical and demographic confounding variables, such as medication history. This is a huge strength, as there is evidence to suggest that psychiatric medication such as antidepressants or antipsychotics can impact brain structure (Vernon, A. C. et al., 2012), which is relevant to any study aiming to characterise the relationship between neuronal areas and psychiatric disorders.

There was also significant effort made to remove confounding variables. One interesting element of this study is the fact that two types of MRI scanner were used to obtain neuroimaging data. The authors again went to great lengths to correct for the potential impact this might have on the data set; the use of two different machines means that the sample could have been vulnerable to ‘batch effects’ in the data. This means that subtle differences in image acquisition across scans taken by both scanners could have leaked into the data set, which the predictive models could then have picked up on in addition to neurological differences. The authors were able to statistically control for this difference, making sure that there were no ‘batch effects’ present, increasing the reliability of these results.

However, this highlights that heterogeneity in how neurological data are acquired may limit replicability of this finding, and arguably any future fMRI finding from any research group. Even though measurement differences were accounted for in this study, it does suggest that future research using different fMRI equipment, and potentially different data acquisition protocols or pre-processing software may limit the generalisability of the findings between studies. If every fMRI measurement may give rise to slightly different sets of data unrelated to the disease, how can we reliably reproduce these studies in different populations?

Both MDD and BD are heterogenous disorders, with patients from a range of different demographic backgrounds. Detecting the disease specific signal from within such variability (age, sex, ethnicity, healthcare service provision, country of residence etc.) in addition to variability derived from scanner heterogeneity limits the potential impact of this work.

The authors made significant efforts to understand and correct the limitations of this work, but variability in fMRI methods and patient demographics may still limit replicability, generalisability, and the overall impact of this work.

Implications for practice

My main consideration when reading papers like this is that whilst understanding the potential neurobiological correlates of psychiatric disorders is a valuable pursuit, they tend to end up at the same place – some of the results match previous literature, some results conflict, and there is so much heterogeneity in the methods of previous approaches that the results may not even be directly comparable anyway. fMRI investigation for clinical neuropsychiatry seems to be particularly vulnerable to this limitation, where we see significant variability in the way these data are collected, handled, and analysed. Establishing reproducibility frameworks in cognitive neuroscience could account for this; the challenges and considerations of this are nicely described in this paper (Botvinik-Nezer, R. & Wager, T. D., 2023).

I would argue that the implicit goal of studies that apply prediction inferentially (i.e., what can the things which predict X tell us about X), especially in the case of neurobiological data and psychiatric diagnoses, is to find something which can serve as a biomarker of that disease state. Despite decades of research into the neurochemistry and neurobiology of mental health disorders, there are no known neural correlates of psychiatric disease that can reliably be used to identify or diagnose any mental health conditions in the absence of clinical data. In this study, we see rs-fMRI features differentiate MDD from BD with an accuracy of 66.2%. Whilst this performance is better than chance (the model has learned something from the data), it is still nowhere near accurate enough to suggest that the predictive features are reliable ‘signs’ of the disease that point reliably and accurately to the psychopathology.

As the authors mention, previous studies in this area provide inconsistent and quite varied results, and other ML applications in this area have suffered from small sample sizes and poor validation methodologies, with others vulnerable to confounding factors. In contrast to this, the authors also note that studies that have larger sample sizes (n≥100) may also be vulnerable to poor performance due to ‘larger and more heterogenous validation sets’, implying that previous models have lower generalisability.

Due to such stark variability in fMRI measurement, preprocessing, patient groups, eligibility criteria, ML training protocols, and sample size in these studies, it is hard to know at what point we will develop a robust evidence base. As stated previously, there are methodological ideas that can tackle variability in this space, but care must be taken with the assumption that applying ML or other artificial intelligence techniques to neuroimaging data can or will lead to a paradigm shift in how we understand psychiatric disease.

Machine learning offers promise, but without reproducibility frameworks and reliable biomarkers, we must be cautious in assuming that AI techniques applied to neuroimaging will lead to a paradigm shift in in how we understand psychiatric disease.

Statement of interests

None to declare.

Links

Primary paper

Calesella, F. et al. Differences in resting-state functional connectivity between depressed bipolar and major depressive disorder patients: A machine learning study. Eur Neuropsychopharmacol 97, 28–37 (2025). DOI: 10.1016/j.euroneuro.2025.05.011

Other references

McGrath, J. J. et al. Comorbidity within mental disorders: a comprehensive analysis based on 145 990 survey respondents from 27 countries. Epidemiol Psychiatr Sci 29, e153 (2020).

Marx, W. et al. Major depressive disorder. Nat Rev Dis Primers 9, 44 (2023).

Bipolar Disorder – National Institute of Mental Health (NIMH). https://www.nimh.nih.gov/health/publications/bipolar-disorder

Vernon, A. C. et al. Contrasting Effects of Haloperidol and Lithium on Rodent Brain Structure: A Magnetic Resonance Imaging Study with Postmortem Confirmation. Biological Psychiatry 71, 855–863 (2012).

Botvinik-Nezer, R. & Wager, T. D. Reproducibility in Neuroimaging Analysis: Challenges and Solutions. Biological Psychiatry: Cognitive Neuroscience and Neuroimaging 8, 780–788 (2023).