All participants provided written informed consent before the start of each experiment. The Regional Ethics Review Board of Stockholm approved the studies. All methods were performed in accordance with the approved guidelines. The inclusion criteria were as follows: (1) age between 18 and 65 years old; (2) no history of severe psychiatric illness or neurological disorder; (3) normal or corrected-to-normal vision and hearing; (4) not wearing glasses during the experiment; and (5) understanding English (see below). These criteria were assessed during an initial interview. Sample sizes were based on similar previous studies (see “Introduction”) and our counterbalancing schemes. Data collection was finalized when the planned number of participants was reached. At the end of each experiment, the participants were debriefed and received compensation. All measures that were used are reported in the manuscript. Because the participants were of different nationalities, all experiments were conducted in English; the participants followed instructions without problems. The stroking procedure in Experiment I was performed by P.T., and in Experiments II and III, it was performed by J.F.
Thirty-three naïve adults participated (age: 25 ± 4; 4 left-handed; 15 females). Data from one participant were excluded due to a procedural error (same condition repeated twice).
The participants first rated how masculine or feminine they felt before experiencing any body perception manipulation (baseline; Fig. 1d). The main experiment consisted of four conditions: “synchronous opposite sex” (syncO), “synchronous same sex,” (syncS), “asynchronous opposite sex,” (asyncO), and “asynchronous same sex” (asyncS). Each condition lasted 3.5 min. During each condition, the participants lay on a bed with their heads tilted forward (~ 45°) and wore a head-mounted display (HMD; Oculus Rift Development Kit 2, Oculus VR, Menlo Park, CA, USA) so that they could not see their actual body. In the HMD, the participants watched prerecorded 3D videos of a stranger’s body, male or female, that was shown from a first-person perspective. The stranger’s body was continuously stroked on the thighs and abdomen, and the experimenter delivered synchronous (syncO, syncS) or asynchronous (1 s delayed; asyncO, asyncS) touches on the corresponding parts of the participant’s body (Fig. 1a). During each condition, there were three “knife threats” that occurred 1, 2, and 3 min after the beginning of each video (Fig. 1c,e). After each condition, the participants took off the HMD, filled out the illusion questionnaire (Fig. 1b) and rated how masculine or feminine they felt during the preceding session (Fig. 1d). The order of conditions was counterbalanced across the participants, and the whole experiment lasted ~ 30 min (Fig. 1e).
During filming, a male and a female lay still on a bed. The experimenter used a 90-cm-long stick with a white plastic ball (diameter 10 cm) attached to its end to deliver strokes to each model’s abdomen, left thigh, or right thigh. The duration of each stroke was 1 s, and each stroke covered ~ 20 cm of the model’s body. The time between the end of one touch and the onset of the next touch ranged between 3 and 5 s. The frequency of strokes was 12 times per minute. The order of strokes was pseudorandom (i.e., no more than two successive strokes of the same body part). Altogether, 36 strokes (12 to each body part) were delivered during each video. The videos were recorded with two identical cameras (GoPro HERO4 Silver, GoPro, Inc., San Mateo, CA, USA) placed parallel to each other (8 cm apart) just above the models’ heads. The recordings from both cameras were combined into a single frame using Final Cut Pro software (version 7, Apple Inc., Cupertino, CA, USA). Two versions of high-quality 3D videos were created: one for the male and one for the female body. Audio cues were then added to each video. These cues were either congruent with touches applied in the videos (same body parts, same onset, same duration) or delayed by 1 s. The experimenter listened to these cues during the experiment and applied touches accordingly. All other aspects were identical in the synchronous and asynchronous videos.
For each of the two videos, we recorded knife-threat events. During these events, a hand holding a knife entered the field of view from above and performed a stabbing movement toward the model’s body (Fig. 1c). The knife stopped just before hitting the body, changed direction (− 180°), and exited the field of view in the same way that it had entered. The whole event lasted 2 s. Great care was taken to ensure that the knife threats in the male and female versions of the videos looked as similar as possible. Knife threats in the synchronous and asynchronous versions of the same video (male or female) were identical. Subsequent knife threats within a given condition were also identical. After each knife threat, there was a 10 s pause when no strokes were delivered. In line with good ethical practice, before the experiment, we informed the participants about the knife threats in the videos to prevent overly high emotional stress.
Visuotactile stimulation during the experiment
The experimenter listened to audio cues from the videos (see earlier) and accordingly applied touches to the participant’s body. These cues were played via headphones, so the participants could not hear them. The number, order, type, length, velocity, and frequency of strokes during the experiment precisely followed the prerecorded videos (see earlier). To deliver touches, the experimenter used the same white ball attached to a stick that had been used in the video recordings.
Subjective experience of the full-body ownership illusion was quantified with a questionnaire that began with an open-ended sentence (“During the last session, there were times when…”). This sentence was followed by three illusion statements that quantified the explicit feeling of body ownership (I1; Fig. 1b) and the sensation of touch directly on the stranger’s body (I2 and I3; Fig. 1b). Ownership and referral of touch are considered to be the two core elements of the multisensory full-body illusion25,26. Apart from the illusion statements, the questionnaire included four control statements (C1–C4; Fig. 1b) that were added to control for potential task compliance or suggestibility effects. The questionnaire administered to the participants had items listed in the following pseudorandom order: C1, I1, C2, I2, C3, C4, I3. The participants marked their responses on a scale from − 3 (“strongly disagree”) to + 3 (“strongly agree”).
Skin conductance responses
The skin conductance response reflects increased sweating attributable to the activation of the autonomic nervous system76. When one’s own body is physically threatened, the threat triggers emotional feelings of fear and anticipation of pain that are associated with autonomic arousal. This arousal can be registered as a brief increase in skin conductance a few seconds after the threat event. Increased threat-evoked skin conductance responses, compared to a well-matched control condition, are often used as an index of body ownership in body illusion paradigms24,30,38. In the current experiment, data were recorded continuously with the Biopac system MP150 (Biopac Systems Inc., Goleta, CA, USA) and AcqKnowledge software (version 3.9). The following parameters were used: sampling rate = 100 Hz, low-pass filter = 1 Hz, high-pass filter = DC, gain = 5 μS/V, and CAL2 scale value = 5. Two Ag–AgCl electrodes (model TSD203, Biopac Systems Inc., Goleta, CA, USA) were placed on the volar surfaces of the distal phalanges of the participants’ left index and middle fingers. Isotonic paste (GEL101; Biopac Systems Inc., Goleta, CA, USA) was used to improve the skin contact and recording quality. At the beginning of the experiment, we asked the participants to take the deepest breath possible and hold it for a couple of seconds. In this way, we tested our equipment and established a near maximum response for each participant. The timing of threat events was marked in the recording file by the experimenter by pressing a laptop key immediately after the threat occurred.
The participants marked their responses on a visual analog scale (Fig. 1d). Scale assignment was different for the male and female participants (Fig. 1d). Baseline ratings were generally greater than zero, as expected for a nontransgender group, but showed some degree of variability (mean = 2.22; SD = 0.97; min = − 1; max = 4).
Sixty-four naïve adults participated (age: 27 ± 5; all right-handed; 32 females).
The participants first completed a practice IAT (20 trials). The main study consisted of the same four conditions as those in Experiment I, that is, syncO, asyncO, syncS, and asyncS (Figs. 1a, 2a). After the initial phase of just watching the videos and feeling touches (30 s), the participants started the first IAT block (Fig. 2b,c). IAT stimuli were presented via headphones (Spectrum, Maxell Europe Ltd., Berkshire, UK). The participants used a wireless computer mouse held in the right hand to indicate responses. During each condition, the participants observed two “knife threats” (see further), one in the middle and one at the end of each condition (Fig. 2c). After each condition, the participants completed the same illusion questionnaire as in Experiment I (Fig. 1b). The order of the conditions was counterbalanced. The whole study lasted ~ 1 h (Fig. 2c).
The videos were prepared analogously to those in Experiment I, but a different male and female were filmed to assure that our results were not driven by a certain body type or clothing style of the models (Fig. 2a). Strokes were applied to three body parts: abdomen, left thigh, and right thigh. The abdomen strokes were either single or double (1 s apart). The duration of each stroke was 1 s, and each stroke covered ~ 20 cm of the model’s body. The time between the offset of one touch and the onset of the next touch ranged from 3 to 6 s. The frequency of strokes was 12 times per minute. The touches were delivered in a pseudorandom sequence, with no more than three successive strokes on the same body part. Altogether, 88 touches (22 on each body part) were applied in each video. The videos were recorded with Infinity cameras (1080p Full HD, CamOneTec, Delbrück, Germany) and prepared in the same way as in Experiment I. In the synchronous videos, audio cues were matched with the touches applied in the videos, whereas in the asynchronous videos, the cues were delayed by 1 s and pertained to different body parts. Altogether, we created four versions (syncO, syncS, asyncO, asyncS) of the high-quality 3D videos, each lasting 7 min 5 s.
We used the auditory version of the brief gender identity IAT47,77. The instruction for one block was as follows: “The test will start in a few seconds. Please listen to the instructions. Try to go as fast as possible while making as few mistakes as possible. If the word belongs to the categories female or self, press left. If the word does not belong to these categories, press right. The test will begin now.” The instruction for the other block differed only with regard to category assignment: “If the word belongs to the categories male or self, press left. If the word does not belong to these categories, press right.” The key assignment remained the same for a given participant across all conditions but was counterbalanced between the participants. The order of IAT blocks was counterbalanced in the same way. The stimulus set consisted of twenty words (Fig. 2b) that were read by an English native speaker. The volume of each word sound was adjusted using Audacity software (the “normalize” effect; version 2.1.2, https://www.audacityteam.org). Each word was edited to have a duration similar to that of other words. Please note that the physical differences between stimuli cannot explain the main IAT results because the congruent and incongruent blocks used exactly the same stimuli. The participants had a maximum of 3 s to provide a response (time from the stimulus onset to the end of each trial). If no key was pressed within this time or the wrong key was pressed, the participants heard a “wrong” feedback beep. Each IAT block consisted of 60 trials (three repetitions of all 20 words) presented in random order. The procedure was self-paced, that is, the next trial started as soon as the participant responded in the previous trial (maximum duration of one block ~ 3 min). Presentation software (Neurobehavioral Systems Inc., Albany, CA, USA) was used to present the stimuli and record responses.
These events were recorded in the same way as in Experiment I (i.e., stabbing movement toward the abdomen; 2 s duration). We used triggers from the Presentation software to automatically flag the onset of the knife threats in the skin conductance recording files.
Forty-five naïve adults participated (age: 26 ± 5; all right-handed; 22 females). One participant was excluded because he did not complete one of the questionnaires.
The study lasted ~ 35 min and comprised two conditions: syncO and asyncO (Fig. 3a,b). Each condition lasted 14 min 10 s. After each condition, the participants filled out the illusion questionnaire (the same as in Experiments I and II) and the Bem Sex-Role Inventory; BSRI49,50 (see further). The order of conditions was counterbalanced across participants (Fig. 3b).
The videos were prepared analogously to those in Experiments I and II. Four types of strokes (single abdomen, double abdomen, left thigh, and right thigh) were applied. The duration of each stroke was 1 s, and each stroke covered ~ 20 cm of the model’s body. The time between the offset of one touch and the onset of the next touch ranged from 2 to 10 s. The frequency of strokes was 12 times per minute. Different touches were delivered in a pseudorandom sequence, with no more than three successive strokes on the same body part. Altogether, 160 touches (40 on each body part) were applied in each video. Infinity cameras (1080p Full HD, CamOneTec, Delbrück, Germany) were used to record the videos. Audio cues were matched to touches in the synchronous videos and delayed by 1 s in the asynchronous videos.
After each condition, the participants filled out a version of the BSRI49,50. The questionnaire contained 5 stereotypically masculine and 5 stereotypically feminine personality traits (Fig. 3c). Using a 7-point Likert scale (1—“not at all”; 7—“very much”), the participants rated how well each trait described them. Ten traits were rated after the first condition and the other ten after the second condition. The order of BSRI versions was counterbalanced.
Analysis of illusion questionnaires
For each participant and condition, we calculated “illusion scores” as the differences between the average illusion (I1–I3) and the control (C1–C4) ratings. To confirm successful induction of the illusion, we compared these illusion scores between the synchronous and asynchronous conditions. The results for individual questionnaire items are shown in Figs. S5 and S6. The effect of “ownership” used in the correlation analyses (Figs. 1h, 2f, 3f) was the difference between I1 ownership ratings in syncO–asyncO (one value per participant). The participants who experienced a strong body-sex-change illusion were selected using the median-split method applied to ownership scores (see above). The median-split analyses (Figs. 1i, 2g, and 3g) were performed mainly for display purposes and to complement the main analyses using continuous scores.
Analysis of skin conductance responses
Each response was measured as the difference between the maximum and minimum values during the 0–6 s period after each knife threat. Responses below 0.02 μS were treated as zeroes but were included in the analysis of the magnitude of skin conductance responses76. Statistical outliers were identified with the ± 1.5 interquartile criterion and removed from the dataset (16% and 6% of the values in Experiments I and II, respectively). Keeping the outliers did not change the main findings (main effect of synchrony in Experiment I: F1,31 = 5.76; P = 0.023; N = 32; Experiment II: F1,63 = 6.43; P = 0.014; N = 64; two-sided). We applied a square-root transformation to the skin conductance data76. Statistical models included the effect of “repetition”, which indicated how many knife threats had already occurred in the study (max. 12 in Experiment I and max. 8 in Experiment II). The magnitude of the skin conductance responses decreased exponentially with subsequent knife threats (Fig. S7). To “linearize” this relationship, we transformed the repetition number (1/repetition), which substantially improved the fit of the linear models to the data (Fig. S7; Experiment I: χ2 = 4.36; P < 0.005; Experiment II: χ2 = 37.26; P < 0.005; two-sided; N = 32 and N = 64, respectively). The effect of repetition (habituation) was highly significant (Tables S2 and S4), which was expected76. For the control analyses presented in Figs. S2 and S3, we (1) calculated residuals from the following model: SCR ~ repetition; (2) averaged them for a given participant and condition; and (3) calculated the difference: syncO–asyncO (Fig. S2) or syncS–asyncS (Fig. S3). Using the residuals accounted for the habituation effect (see earlier).
Analysis of masculinity/femininity ratings, IAT, and BSRI
Raw masculinity/femininity ratings were analyzed (n = 160; one value per condition). IAT data included only correct trials, in which reaction times were longer than 200 ms and shorter than 1,500 ms (95.5% of all trials; n = 29,147). Reaction times were log-transformed. The BSRI analysis was performed on raw ratings (n = 862; 18 ratings missing). Analyses of IAT and BSRI included random intercepts of “1|Item”, which accounted for possible variability between different words (Tables S4–S7).
For each participant in each experiment, we calculated the degree of gender identity updating. In Experiment I, this updating score was calculated as the following difference between the masculinity/femininity ratings: [(syncS + asyncS + asyncO)/3]–syncO. In Experiment II, this score was calculated as the difference between the average reaction times in each IAT block: [(syncSi-c + asyncSi-c + asyncOi-c)/3] – syncOi-c, where “i” and “c” denote “incongruent” and “congruent”, respectively. Finally, in Experiment III, the updating was calculated as the difference between average personality ratings from each condition: asyncOc-i–syncOc-i, where “c” and “i” correspond to stereotype-congruent and stereotype-incongruent traits, respectively. Because these scores were on different scales, we standardized them (i.e., from each participant’s score, we subtracted the group mean from the respective experiment and divided the result by the group standard deviation).
General statistical information
All statistical analyses were performed in RStudio and R software (version 3.3.3, The R Foundation for Statistical Computing, https://www.r-project.org). Linear mixed models were estimated using the “lme4” package. Information regarding model selection is provided in Table S1. All results are reported in Tables S2–S7. The distribution of residuals from each main model are shown in Fig. S8. P-values for the F-tests were based on Satterthwaite’s approximation to degrees of freedom, as implemented by the “lmerTest” package (Tables S2, S4, and S6). P-values for effect size coefficients (Tables S3, S5, and S6) and their 95% confidence intervals were obtained with the bootstrapping method by comparing a given coefficient value to its null distribution derived from resampling the original dataset (“boot” package; 1,000 simulations).