mapping the prodrome of severe mental disorders

0
7


Severe mental disorders (SMDs) such as bipolar disorder, unipolar mood disorders, and psychosis are thought to affect around half a million people in the United Kingdom (Public Health England, 2018). People with severe mental illness are particularly vulnerable to a wide range of stressors, such as difficulties finding safe and secure housing (Homeless Link, 2024), issues with employment and joblessness (Chilman et al., 2024) and social and relationship problems (Public Health England, 2018). They also experience challenges accessing healthcare despite being of higher risk for physical health problems than the general population (Grudniewicz et al., 2022), and shockingly, have a life expectancy 20 years shorter than the average population.

The pathways that lead to the diagnosis of SMDs are complex and different for each individual. Despite these differences, it’s thought that there are potentially transdiagnostic similarities between patients in the time period before symptoms become most severe, known as the prodrome. The prodrome can last anywhere from a few weeks to a few years, and can be thought of as the time when early warning signs and signals appear.

It is really important to learn more about what the signs of SMD are in this time period before diagnosis, as tapping into those warning signs may potentially stop symptoms escalating. Intervening on specific symptoms or behaviours may reduce the risk of worsening mental health.

A study by Arribas and colleagues (2025) modelled key signs and symptoms from the SMD prodrome as a network of interconnecting factors. The method they used is called temporal network analysis, which is a method we have blogged about before. In the models, they used clinical notes that had been analysed by a series of natural language processing (NLP) algorithms, a machine learning (ML) method which allows for computers to comprehend and quantify unprocessed human speech and text.

This type of analysis can tell us a) how signs and symptoms are connected to each other, and b) which symptoms are the most important to target early. Researchers can estimate which symptoms can predict other symptoms by being strongly connected to them, and importantly, which strongly connected symptoms appear first in the pre-diagnosis time period.

You can think of this approach as like looking at a weather forecast app; when low pressure arrives, there is a X% chance of rain. You can therefore prepare accordingly by leaving the house with an umbrella, or changing your plans to avoid getting rained on.

A complex web or network

Network analysis can help us understand how early signs and symptoms of mental illness are connected and which to target.

Methods

The data for these analyses was provided by a secondary mental health care service based in South London: the South London and Maudsley (SLaM) National Health Service Foundation Trust. Electronic Health Record (EHR) data was extracted from SLaM using the Clinical Record Interactive Search (CRIS) tool, a library of NLP algorithms that create data points from free text clinical information (Stewart et al., 2009).

The researchers gathered data on SLaM patients’ age, gender, ethnicity, medication usage, and then used the CRIS tool to extract information on the sorts of symptoms a person might experience in the severe mental illness prodrome period. These included things like hallucinations, sleep disturbance, paranoia, as well as the usage of substances like tobacco, cocaine, and cannabis.

The study looked back at the two years before patients received a diagnosis of either unipolar mood disorder (UMD), bipolar disorder, or psychotic disorder, to collect information on these signs and symptoms in this time period. These data were then modelled as networks of interconnecting factors; first across the entire sample (known as a ‘transdiagnostic’ approach), and then within each specific diagnosis group.

Results

Given the complexity of findings, this blog focuses on findings related to the primary (‘SMD’) model and readers who are interested in finding related to individual populations and detailed statistics should make reference to the main paper.

6,462 participants had enough retrospective data to be included in the final study population. There weren’t any significant clinical or sociodemographic differences between participants who had enough data to be included in the study and those who didn’t.

There were 61 prodromal features initially included in the analyses, but 38 of them had to be excluded due to the variables having very low variance. This means that the variables’ values are almost all the same and cannot be statistically analysed. A network model of the entire sample (the primary model) was based on the remaining 23 prodromal features, using data from 6 timepoints.

They found that this model performed well when considering metrics like ‘recoverability’ and ‘robustness,’ which measure how reliable and valid a model is. Within this network, ‘tearfulness’ was the most consistent feature over time. There were positive relationships between anxiety and cannabis use going in one direction, meaning that anxiety predicted cannabis use. There were also bidirectional positive relationships observed in pairs, such as between aggression and hostility, delusional thinking and hallucinations, and aggression and agitation.

The researchers were also interested in which nodes in the network were the most ‘central’, which is a way to determine how connected a feature is to other features in the model. A feature, or ‘node’ that is highly central is particularly important within the network. Features that are highly connected to other features may be good targets for intervention. This is because is not only leads to alleviation of that symptom, but it may also intercept the network of symptoms and reduce the risk of connected symptoms triggering each other. Features that had high centrality in the primary network included aggression, poor insight, and delusional thinking.

Communities of prodromal symptoms

An important finding was that nodes within networks clustered together in what’s referred to as ‘communities’. These communities are made up of features that are closely connected to each other and disconnected from other features. Within the network analysis of the whole sample, there were three distinct communities of symptoms that emerged:

  1. Delusional thinking, hallucinations, paranoia
  2. Aggression, cannabis use, cocaine use, hostility
  3. Aggression, agitation, hostility

The finding that cannabis and cocaine use have transdiagnostic risk relevance is particularly important, as these factors are directly intervenable.

A hand holding a phone which displays a map

Understanding the direction of risk factor relationships helps us to understand the dynamic nature of illness vulnerability.

Conclusions

The authors concluded that their findings shed light on dynamic relationships between features of the severe mental illness prodrome, which are transdiagnostic across multiple diagnoses. They say that:

…understanding these dynamics can be used to identify risk states to prevent the progression to SMD onset.

The authors then go on to explain how this research could go on to influence the development of at-risk psychometric tools within specific groups. An enhanced understanding of how specific disorders present will help people to receive the right help faster, and may even halt the progression of their symptoms into more serious illness.

A dnager of fall sign in a snowy landscape

This analysis helps us understand how risk presents itself in the pre-diagnosis stages of severe mental illness.

Strengths and limitations

This study is exciting and novel in its use of real-world secondary care data, and its combination of two cutting edge data-driven techniques.

Something to consider in this analysis is the scope of the NLP algorithms used to generate the analysable clinical data. The prodromal features analysed were limited to which NLP algorithms had the highest precision (essentially meaning the most reliable and accurate). This means that there are limitations as to which prodromal symptoms could be evaluated, as it depends on the quality of the algorithm to derive that data from free text. This potentially limits the exploratory scope of the analysis.

NLP methods are super exciting, as they provide us with tools to take unstructured, clinical notes taken by a healthcare provider in an appointment and make clean, numerical data points. It makes the task of comparing people and understanding groups of patients much easier, especially when you’re interested in individual symptoms or behaviours. This is one of the most exciting uses of ML in healthcare in my opinion, as it unleashes a huge amount of information that otherwise would have been bound up in inaccessible data formats. However, this does not mean that the data generated from these methodologies are always inherently usable; this analysis serves as a good example of how these data are just as vulnerable to statistical issues as any other sorts of data, as over 50% of the feature data corpus was unanalysable due to issues with variance. We can see from this that preliminary explorations of data can have a huge impact on the direction of an analysis.

There is also likely to be considerable variation in the quality of clinical notes taken between hospitals and healthcare providers, which means that there will be variability in the way that raw data are recorded. Whilst the CRIS tool used in this analysis has been validated through its use in other studies (Stewart eta., 2009), scaling tools like this up to the wider population through different NHS silos or trusts may be challenging.

A computer screen displaying code

The use of NLP algorithms to extract data from unstructured clinical notes is promising, but their widespread application will be dependent on accuracy and tenability

Implications for practice

This study shows that there are key transdiagnostic factors which if intervened on in a timely fashion, may be able to halt the progression of severe mental disorders. It also showed that whilst some prodromal features are transdiagnostic, with minimal differences in their frequency across all the disorders used in the analysis, the way that they interact with other factors differs between diagnostic populations. This information could tell us more about the mechanisms of disease risk in a diagnosis-specific way. Therefore, transdiagnostic approaches help us to know which diagnostic boundaries are poorly defined vs. those which are clearly defined.

A question to consider for clinical practice is which of the prodromal features are actually intervenable. It’s clear that substance use is a highly intervenable construct – tobacco, alcohol, and substance use issues can be dealt with in both primary and secondary care, and there are support services available for people struggling with addiction or wishing to make healthier lifestyle choices in relation to substance use. Vulnerability to addiction is complex however, and there are many intersecting factors that influence an individual’s aptitude for reducing substance use, such as what their support networks look like, what their social networks and relationships are like, where they live etc.

Approaches that seek to intervene on these constructs to destabilise prodromal networks will also need to consider the complex range of factors that lie outside the scope of this analysis. An interesting parallel analysis considering this would be to model the network of factors that influence substance use decrease or quitting behaviours in people who live with addiction vs. those recovering from it, to see which additional unmeasured features may affect how intervenable those constructs may be. As discussed earlier, there are knowns and unknowns about the limitations of extracting data using NLP approaches, so it may be the case that it’s not possible to analyse everything that may be clinically relevant.

For a clinical context, it’s important to consider what implementation of this might look like in practice: is the goal to intervene in the most highly central or interconnected factors, and then measure the minimisation of potential escalation to diagnosis in secondary care? Or to map an individual’s network and see how the network changes when the most influential features are intervened on?

Mapping complex networks is a promising avenue when understanding complex mental illness states, but there are many factors remaining that influence their tractability in the real (and messy) clinical world.

Statement of interests

I have no personal involvement in this study, but have worked briefly with one co-author of the study on an unrelated project.

Links

Primary paper

Arribas M, Barnby JM, Patel R. et al (2025) Longitudinal evolution of the transdiagnostic prodrome to severe mental disorders: a dynamic temporal network analysis informed by natural language processing and electronic health records. Molecular Psychiatry, 30, 2931-2942.

Other references

Public Health England (2018) Health matters: reducing health inequalities in mental illness. GOV.UK

Homeless Link (2024) Mental health and homelessness: an inextricable link?

Chilman N, Laporte D, Dorrington S. et al (2024) Understanding social and clinical associations with unemployment for people with schizophrenia and bipolar disorders: large-scale health records study. Social Psychiatry and Psychiatric Epidemiology 2023 59, 1709–1719.

Grudniewicz A, Peckham A, Rudoler D. et al (2022) Primary care for individuals with serious mental illness (PriSMI): protocol for a convergent mixed methods study. BMJ Open 2022 12, e065084.

Stewart R, Soremekun M, Perera G. et al (2009) The South London and Maudsley NHS Foundation Trust Biomedical Research Centre (SLAM BRC) case register: development and descriptive data. BMC Psychiatry, 9, 51.

Roberts E, Wessely S, Chalder T. et al (2016) Mortality of people with chronic fatigue syndrome: a retrospective cohort study in England and Wales from the South London and Maudsley NHS Foundation Trust Biomedical Research Centre (SLaM BRC) Clinical Record Interactive Search (CRIS) Register. Lancet 2016 387, 1638–1643.

Photo credits