# Associations between aversive learning processes and transdiagnostic psychiatric symptoms in a general population sample

Aug 21, 2020

### Ethics

This research was approved by the University College London research ethics committee (reference 9929/003) and complied with all relevant ethical regulations. All participants provided informed consent by completing a form including checkboxes that were clicked to indicate agreement with various aspects of the study and overall consent to participate, and were compensated financially for their time at a rate of at least £6 per hour.

### Participants

We recruited 400 participants through Prolific33. Subjects were selected based on being aged 18–65 and having at least a 90% approval rate across studies they had previously participated in. As described in our preregistration, we used a precision-based stopping rule to determine our sample size, stopping at the point at which either the 95% highest posterior density interval (HPDI) for all effects in our regression model reached 0.15 (checking with each 50 subjects recruited) or we had recruited 400 subjects. The precision target was not reached, and so we stopped at 400 subjects.

Traditional lab-based threat learning tasks typically use aversive stimuli such as electric shocks as outcomes to be avoided. As it is not possible to use these stimuli online, we developed a game-based task in which subjects’ goal was to avoid negative outcomes. While no primary aversive stimuli were used, and subjects received no actual monetary reward, there is an extensive literature showing that video games without such outcomes evoke strong positive and negative emotional experiences47,48,49,50,51, making this a promising method for designing an aversive learning task. In this game, participants were tasked with flying a spaceship through asteroid belts. Subjects were able to move the spaceship in the Y-axis alone, and this resulted in a one dimensional behavioural output. Crashing into asteroids diminished the spaceship’s integrity by 10%. The spaceship’s integrity slowly increased over the course of the task, however, if enough asteroids were hit the integrity reduced to zero and the game finished. In this eventuality subjects were able to restart and continue where they left off. The overarching goal was to maximise the number of points scored, where the latter accumulated continuously for as long as the game was ongoing, and reset if the spaceship was destroyed. Subjects were shown the current integrity of the spaceship by a bar displayed in the corner of the screen, along with by a display of their current score.

Crucially, the location of safe spaces in the asteroid belts could be learned, and learning facilitated performance as it allowed correct positioning of the spaceship prior to observing the safe location. The task was designed such that without such pre-emptive positioning it was near impossible to successfully avoid the asteroids, thus encouraging subjects to learn the safest positions. Holes in the asteroids could appear either at the top or bottom of the screen (Fig. 1a), and the probability of safety associated with either location varied independently over the course of the task. Thus, it was possible to learn the safety probability associated with each safety zone and adapt one’s behaviour accordingly. The probability of each zone being safe was largely independent from the other (so that observing safety in one zone did not necessarily indicate the other was dangerous), although at least one zone was always safe on each trial. This was important, because if outcomes were entirely symmetric (i.e. safety in one zone indicated danger in the other), we would be unable to determine the extent to which value updating was driven by safety versus danger. Thus, our task aimed to largely dissociate learning from threat and safety, as outcomes are not entirely symmetric.

Trials were designed such that one option had a 90% or 10% chance of being safe for the duration of between 20 and 80 trials, subject to the condition that on a particular trial one zone had to be safe (so that the subject had an opportunity to avoid the asteroids). This design feature meant that at any time either both, or just one, zone had a high safety likelihood. Safety probability was designed to fluctuate relatively rapidly to ensure that uncertainty fluctuated continuously over the course of the task. The probabilistic nature of the task ensured that behaviour was not straightforwardly dictated by the outcomes observed in the task. For example, encountering danger did not necessarily mean subjects should change their position on the following trial, as this outcome could be a chance event rather than signifying the chosen zone is no longer the safest.

Participants also completed a control task that required avoidance that was not dependent on learning, enabling us to control for general motor-related avoidance ability in further analyses (described in Supplementary Methods). After completing the task, subjects were asked to provide ratings indicating how anxious the task made them feel and how motivated they were to avoid the asteroids, using visual analogue scales ranging from 0 to 100.

### Inclusion and exclusion criteria

We included subjects based on their age (18–65 years) and having a 90% prior approval rate on Prolific. We elected a priori to exclude subjects with limited response variability (indicated by a standard deviation of their positions below 0.05) so as to remove subjects who did not move the spaceship, and subjects who had missing data. However, no subject met these exclusion criteria.

### Behavioural data extraction

For analysis, we treated each pass through an asteroid belt as a trial. Overall there were 269 trials in total. As a measure of behaviour, we extracted the mean Y position across the 1 s prior to observing the asteroid belt, representing where subjects were positioning themselves in preparation for the upcoming asteroid belt. This Y position was used for subsequent model fitting. On each trial, the outcome for each zone was regarded as danger if asteroids were observed (regardless of whether they were hit by the subject) or safety if a hole in the asteroid belt was observed.

### Computational modelling of behaviour

Our modelling approach focused on models that allowed the quantification of subjective uncertainty. To this end, we modelled behaviour using approximate Bayesian models that assume subjects estimate safety probability using a beta distribution. This approach is naturally suited to probability estimation tasks, as the beta distribution is bounded between zero and one, and provides a measure of uncertainty through the variance of the distribution. While certain reinforcement learning formulations can achieve similar uncertainty-dependent learning and quantification of uncertainty, we chose beta models as they have an advantage of being computationally simple. Empirically, these models have been used successfully in previous studies to capture value-based learning61, where they explain behaviour in aversive learning tasks better than commonly used reinforcement learning models15,62, a pertinent characteristic in the current task.

The basic premise underlying these models is that evidence for a given outcome is dependent on the number of times this outcome has occurred previously. For example, evidence for safety in a given location should then be highest when safety has been encountered many times in this location. This count can be represented by a parameter A, which is then incremented by a given amount every time safety is encountered. Danger is represented by a complementary parameter B. The balance between these parameters provides an indication of which outcome is most likely. Meanwhile, the overall number of outcomes counted influences the variance of the distribution and hence the uncertainty about this estimate. Thus, uncertainty is highest when few outcomes have been observed. The exact amount by which A and B are updated after every observed outcome can be estimated as a free parameter (here termed τ), and we can build asymmetry in learning into the model, so that learning about safety and danger have different rates, allowing updates for A and B to take on different values (here termed τ+ and τ).

Such a model is appropriate in stationary environments, when the probability of a given outcome is assumed to be constant throughout the experiment. However, in our task the probability of safety varied, and so it was necessary to build a forgetting process into the model. This is achieved by incorporating a decay (represented by parameter λ) which diminishes the current values of A and B on every trial. The result of this process is akin to reducing the number of times they have been observed, and maintains the model’s ability to update in response to incoming evidence. It would also be possible to build asymmetry into the model here, where subjects could forget about positive and negative outcomes at different rate. However, testing this model in pilot data revealed that separate decay rates for each valence were not recoverable. Estimates for A and B are therefore updated on each trial (t) according to the following equation for both safety zones, independently (termed X and Y here). Both zones are updated on every trial, as subjects saw the outcome associated with both simultaneously. This formed the basis of all the probabilistic models tested:

$$A_{t + 1}^X = left( {1 – lambda } right) cdot A_t^X + {mathrm{outcome}}_t^X cdot tau ^ + cdot W.$$

(1)

$$begin{array}{*{20}{c}} {B_{t + 1}^X = left( {1 – lambda } right) cdot B_t^X + left( {1 – {mathrm{outcome}}_t^X} right) cdot tau ^ – cdot W} end{array}.$$

(2)

We also observed in pilot data that subjects tended to be influenced more by outcomes occurring in the zone they had previously chosen, an effect likely due to attention. On this basis, we incorporated a weighting parameter that allowed the outcome of the unchosen option to be downweighted by an amount shown in the above equation (W) determined by an additional free parameter, ω.

$$begin{array}{*{20}{c}} {W_{t + 1}^X} & = & {1,{mathrm{if}},{mathrm{chosen}}} \ {} & {} & {{omega }},{mathrm{if}},{mathrm{unchosen}} end{array}.$$

(3)

We can calculate the estimated safety probability for each zone (P) by taking the mean of this distribution:

$$P_{t + 1}^X = frac{{A_{t + 1}^X}}{{left( {A_{t + 1}^X + B_{t + 1}^X} right)}}.$$

(4)

Similarly, we can derive a measure of uncertainty on each trial by taking the variance of this distribution.

$$sigma _{t + 1}^X = frac{{A_{t + 1}^X cdot B_{t + 1}^X}}{{left( {A_{t + 1}^X + B_{t + 1}^X} right)^2 cdot left( {A_{t + 1}^X + B_{t + 1}^X + 1} right)}}.$$

(5)

In order to fit our model to the observed behaviour, we require an output that represents the position of the spaceship on the screen. This position (pos) was calculated based on the safety probability of the two safety zones, such that the position was biased towards the safest location and was nearer the centre of the screen when it was unclear which position was safest.

$${mathrm{pos}}_{t + 1}^{} = frac{{left( {P_{t + 1}^X – P_{t + 1}^Y} right) + 1}}{2}.$$

(6)

Further models elaborated on this basic premise, and full details are provided in Supplementary Methods. For completeness, we also tested two reinforcement learning models, a Rescorla–Wagner model and a variant of this model with different learning rates for better and worse than expected outcomes63, both of which are described in Supplementary material. However, we focus on the probabilistic models due to their ability to represent uncertainty naturally; our primary aim was not to differentiate between probabilistic and reinforcement learning models, but to use previously validated models to provide insights into the relationship between aversive learning, uncertainty, and psychopathology.

Models were fit with a hierarchical Bayesian approach using variational inference implemented in PyMC3, through maximising the likelihood of the data given a reparametrised beta distribution with a mean provided by the model and a single free variance parameter. Model fit was assessed using the Watanabe–Akaike Information Criterion (WAIC)64, an index of model fit designed for Bayesian models that accounts for model complexity. Parameter distributions were visualised using raincloud plots65.

### Measures of psychiatric symptoms

Our first set of hypotheses focused on state/trait anxiety and intolerance of uncertainty. These were measured using the State Trait Inventory of Cognitive and Somatic Anxiety (STICSA)66 and the Intolerance of Uncertainty Scale (IUS)67 respectively. We also wished to examine how behaviour in our task related to the three transdiagnostic factors identified by Gillan et al.6, based on factor analysis of a range of psychiatric measures (Supplementary Table 1). To measure these factors more efficiently, we developed a reduced set of questions that provided an accurate approximation of the true factor scores (Supplementary Fig. 3), details of which are provided in Supplementary Methods. We also performed analyses using an approximation of clinical anxiety status, results of which are shown in Supplementary Fig. 8.

### Regression models

Bayesian regression models were used to investigate relationships between behaviour and psychiatric measures, predicting each behavioural measure of interest from the psychiatric measures. Our dependent variables were parameters and quantities derived from our model, which represented the way in which an individual learns about safety probability and how they estimate uncertainty. Specifically, we used the two update parameters from our model (τ+ and τ, referring to the extent to which subjects update in response to safety and danger respectively) and the mean safety probability and uncertainty estimates across the task (generated by simulating data from the model with each subject’s estimated parameter values). Crucially, the fact that task outcomes were identical for every subject ensured these values were dependent only on the manner by which subjects learned about safety, not the task itself.

These models were constructed using Bambi68 and fit using Markov chain Monte Carlo sampling, each with 8000 samples, 2000 of which were used for burn-in. All models included age and sex as covariates, along with performance on our control task to account for non-learning-related avoidance ability. For analyses predicting state and trait anxiety and intolerance of uncertainty, we constructed a separate model for each variable due to the high collinearity between these measures. For analyses including the three transdiagnostic factors, these were entered into a single model. When reporting regression coefficients, we report the mean of the posterior distribution along with the 95% HPDI, representing the points between which 95% of the posterior distribution’s density lies. All analyses were specified in our preregistration. We did not correct for multiple comparisons in these analyses as our approach uses Bayesian parameter estimation, rather than frequentist null hypothesis significance testing, and as such multiple comparison correction is unnecessary and incompatible with this method69.

### PLS regression

To provide a data-driven characterisation of the relationship between task behaviour and psychiatric symptoms, and identify transdiagnostic components that are grounded in both self-report and behaviour, we used PLS regression to identify dimensions of covariance between individual questions and the measures derived from our modelling. We excluded the STICSA state subscale from this analysis, so that only trait measures were included. To ensure robustness of these results, we split our data into training and testing sets, made up of 75 and 25% of the data, respectively. To identify the appropriate number of components within the training set, we used a tenfold cross-validation procedure, fitting the model on 90% of the training data and evaluating its performance on the left-out 10%. The mean squared error of the model’s predictions was then averaged across test folds to provide an index of the model’s predictive accuracy with different numbers of components, using cross-validation to reduce the risk of overfitting

Once the number of components was determined, we validated the model’s predictions by testing its predictive accuracy on the held-out 25% of the data. To provide a measure of statistical significance we used permutation testing, fitting the model on the training data 1000 times with shuffled outcome variables and then testing each fitted model on the held-out data, to assess its predictive accuracy when fitted on data where no relationship exists between the predictors and outcomes. This procedure provides a null distribution, from which we can then determine the likelihood of observing predictive accuracy at least as high as that found in the true data under the null hypothesis.

Recent work has highlighted the risks inherent in PLS-like methods when used in high dimensional datasets37, namely that they can easily be overfit resulting in solutions that do not generalise beyond the data used to fit the model. Our approach avoids these problems by evaluating the performance on our model 25% of the data that has been held out from the model fitting stage.

### Preregistration

The main hypotheses and methods of this study were preregistered on the Open Science Framework [https://osf.io/jp5qn]. The data-driven PLS regression analysis was exploratory.

### Statistics and reproducibility

The reported results derive from a single experiment that was not replicated.

### Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.