We carry out a suite of numerical experiments on square mesoscale model domains (*L* × *L*) of up to *L* = 960 km horizontal length and horizontal grid resolutions of 1 km and finer. Diurnally oscillating, spatially homogeneous surface temperature, *T*_{s}(*t*), is prescribed as

$${T}_{s}(t)=overline{{T}_{s}}-{T}_{a}cos (2pi t/{t}_{0}),$$

(1)

where (overline{{T}_{s}}=298 {mathrm{K}}), *t*_{0} ≡ 24 h is the duration of the simulated model day, and (overline{{T}_{s}}) and *T*_{a} represent the temporal average and amplitude of *T*_{s}(*t*), respectively. The diurnal insolation cycle is chosen to be typical for the equator and peaks at noon (just like the surface temperature *T*_{s}). We provide sensitivity experiments exploring resolution, domain size, rain evaporation, surface conditions and insolation (see details in “Methods”). We contrast simulations with different diurnal surface temperature amplitudes, *T*_{a}, but equal average surface temperature, and refer to experiments with *T*_{a} = 2 K, 3.5 K and 5 K as *A*2, *A*3.5 and *A*5, respectively. Modifiers, such as *A*5*b* or *A*5*s**e**a* (see Table 1), label sensitivity studies. Unless explicitly stated, our key results regarding clustering occurring under varying values of *T*_{a}, are qualitatively robust under the sensitivity experiments. Each numerical simulation is run for several days, allowing for a spin-up and quasi-steady-state period (Table 1; Supplementary Fig. 1).

### Domain-mean time series

Unsurprisingly, the differences in diurnal surface temperature amplitudes are reflected in larger diurnal amplitudes of atmospheric near-surface temperatures (Fig. 1a; Supplementary Fig. 1a, b). Changes in the time series of domain-mean rain rate are more profound (Fig. 1b): whereas *A*5 yields a relatively sharp mid-afternoon single-peak structure, the curve transitions to a broader and double-peaked structure for *A*2, where it approximates the diurnal cycle typical of oceanic convection^{45}. For *T*_{a} = 0, only a weak nocturnal maximum remains (Supplementary Fig. 2f, compare: Janowiak et al.)^{46}. Again, the differences in the temporal mean (horizontal lines) are minimal, reflecting radiation constraints on rainfall^{47}. Real surface temperatures normally peak at a delay relative to local solar noon. Such delays accordingly shift the onset of organisation; hence, radiative effects are not key in this timing (Supplementary Fig. 2e). Curves that vary in a similar manner to the rain rate are found for rain area fraction, which, by contrast, differs from those of rain rate immediately before the central peak (Fig. 1c). These differences are made more transparent when inspecting rain rates conditional on a threshold (Fig. 1d). Both mean and heavy precipitation show a pronounced evening peak for *A*5 and an early-morning or nocturnal peak for *A*2. In summary, whereas time averages of rain rate are nearly identical for numerical experiments with varying diurnal surface temperature amplitudes, the time series differ markedly.

### Quantifying clustering

Do the differences in domain-mean time series hint at differences in spatial organisation? Consider the horizontal patterns formed by surface rainfall, visualised by temporally averaging the rainfall pattern during each model day (Fig. 1e, f). During the spin-up from the initial condition (day 1), both *A*2 and *A*5 show modest and relatively homogeneous convective activity throughout the domain. During the subsequent model days, convection intensifies for both simulations because near-surface temperatures gradually increase. In *A*2, the spatial pattern of events remains rather homogeneous (Fig. 1e). In contrast, for *A*5, an inhomogeneous pattern self-organises, with several subregions receiving pronounced average rainfall, whereas others are all but dry (Fig. 1f, days 4 and 5). Furthermore, for *A*5, temporal alternations in surface rainfall rate are apparent when comparing one day to the next (compare: Fig. 1f, day 4 vs 5): a cluster on 1 day leaves an almost rain-free area the next day.

To quantify such spatiotemporal inhomogeneities, we determine all surface rain event tracks and compute their centre-of-mass positions. Tracks are thereby defined as spatially and temporally contiguous rainy grid boxes (see “Methods”). We then break the horizontal domain area down into square boxes of side length *l*, yielding *n*(*l*) ≡ (*L*/*l*)^{2} such boxes, and determine the number of tracks located in each of the boxes. The probability *p*_{l} of a track occurring within one of the boxes at random would be *p*_{l} = *n*(*l*)^{−1} and the binomial

$${P}_{l}(m)equiv frac{N!}{(N-m)! m!}{p}_{l}^{m}{left(1-{p}_{l}right)}^{N-m}$$

(2)

hence describes the probability of *m* of *N* randomly distributed tracks lying in one of these boxes during the model day. The variance of counts *m*, given a side length *l*, denoted as Var_{ran}(*l*; *N*), is^{48}

$${rm{V}}{{rm{ar}}}_{ran}(l;N)=N {p}_{l}(1-{p}_{l}),$$

(3)

which we compare with the variance of the empirical data

$${rm{V}}{{rm{ar}}}_{emp}(l;N)=mathop{sum }limits_{i=1}^{n(l)}{({m}_{i}-langle mrangle )}^{2},$$

(4)

where 〈*m*〉 ≡ *N*/*n*(*l*) is the average number of tracks per box, and the sum runs over all boxes *i*. The comparison of the two quantities in Eqs. (3) and (4) hence measures how much variation is obtained empirically when contrasted against completely uncorrelated data. This comparison can be expressed in a clustering coefficient, which we define as ({mathcal{C}}(l)equiv {rm{V}}{{rm{ar}}}_{emp}(l;N)/{rm{V}}{{rm{ar}}}_{ran}(l;N)). ({mathcal{C}}(l)) is below unity when tracks are regularly spaced and above unity when tracks are clustered. The results show that, for small diurnal surface temperature amplitudes (experiment *A*2*a*, Fig. 2a) ({mathcal{C}}(l),<,1) for all box sizes *l*; hence, the spacing of tracks is generally more regular than expected at random. For larger diurnal surface temperature amplitudes (*A*3.5 and *A*5, Fig. 2b, c) regular spacing with ({mathcal{C}}(l),<,1) is found only for relatively small box sizes of *l* ≈ 20 km, whereas spacing at larger box sizes is strongly clustered, that is, ({mathcal{C}}(l)gg 1). Moreover, this clustering increases from day to day (Fig. 2d).

We also define a box size *l*_{max}, at which ({mathcal{C}}(l)) is maximal. Despite some variation, *l*_{max} ≈ 180 km can be identified from A5 (Fig. 2c, e). In addition, we measure the autocorrelation *c*(*τ*) between day *d* and day *d* + *τ* by computing the Pearson correlation coefficient of daily mean precipitation rates from all grid boxes (Fig. 2f). We find rainfall to be anticorrelated from one day to the next for all box sizes (*c*(1) < 0), suggesting a local inhibitory effect of rain and positively correlated two days into the future (*c*(2) > 0). The magnitudes of *c*(1) and *c*(2) both increase with box size *l*, but appear to level off near *l* = 200 km (Fig. 2f).

### Clustering as a result of cell-density difference

We now explore the mechanistic origin of the clustering. Differences in the areal number density, that is, the count of rain cells per unit area, between *A*2 and *A*5, are evident from the rainfall diurnal cycle (Fig. 1b, c). The peak in rain intensity and fractional area covered by rain cells is nearly twice as high for *A*5 compared with *A*2. Given that the area, intensity and duration of each individual rain cell are all similar for *A*2 and *A*5 (Supplementary Table 1), roughly twice the number of rain events will coincide during the peak of rainfall for *A*5 compared with the peak of *A*2. As mentioned, CPs are known to mediate interactions between convective rain cells^{32,49}. We expect interactions to become more relevant at increased rain cell density because CPs will more often collide as they spread along the surface. To quantify density effects on the clustering dynamics in *A*2 versus *A*5, we first perform a simple CP tracking. The method identifies spatially and temporally contiguous patches of low buoyancy, measured by a 1 K temperature depression (for detail, see “Methods”). In *A*2, CP areas typically do not exceed 500 km^{2}, have modest temperature depressions and lifetimes of generally less than 2 h (Fig. 3a–c). In *A*5, where CPs only occur during parts of the day, CP areas often exceed 10^{3} km^{2}, and such large CPs have much stronger temperature depressions and substantially longer lifetimes (Fig. 3d–f). The formation of very large and strongly negatively buoyant CPs suggests that CPs from distinct rain cells often bunch together before each CP has fully expanded. We employ the CP tracking to detect merging events, that is, instances where two previously separate CP areas combine. For *A*5*b*, CP merging is indeed ubiquitous, whereas it is all but absent in *A*2*b* (Supplementary Fig. 4c).

In addition, we compute the CP height, measured by detecting temperature anomalies in the vertical dimension. In *A*5*b*, when CPs first appear, they show heights comparable to those of *A*2*b* (Supplementary Figs. 4a, b and 5). Subsequently, a double peak forms, where groups of merged CPs obtain much larger heights, reaching close to the level of free convection (≈1100 m). Larger CP heights *h* and deeper temperature depressions (T^{prime}) are consistent with larger mean surface wind speed ({v}_{cp} sim {(hT^{prime} )}^{-1/2}) under CPs^{50} (compare: panels in Supplementary Fig. 5) and surface fluxes (Supplementary Fig. 1e–h), resonating with markedly enhanced near-surface moisture and convective available potential energy (CAPE) along CP boundaries in *A*5 (compare: Supplementary Fig. 6e, j).

We aim to formulate a simplified model that incorporates the observed number-density differences. To this end, consider first a detailed comparison of CPs formed in *A*2 and *A*5: in *A*2, CPs are spatially isolated from one another, and the area covered by each CP remains small (Fig. 4a). In *A*5, many CPs occur so close to each other, that their temperature anomalies inevitably merge (Fig. 4a; Supplementary Fig. 4c), forming a large patch of dense air. This combined CP, which we associate with the emergent MCS, reaches a greater height and develops a relatively cold and dry region in its interiour where convection is suppressed. This inhibitory effect in the MCS interiour can be ascribed to the divergence of the level of free convection (LFC) (Fig. 4b) and a depletion of CAPE (Supplementary Fig. 6e, j).

In contrast, as the combined CP spreads outward from the dense region of rain cells, new cells are often triggered at its front—further feeding the MCS. At this combined CP’s gust front, hence at the periphery of the emergent MCS, the greater CP height allows boundary-layer environmental air to be forced higher up and set off new rain cells. In addition, the CP’s surroundings benefit from the moisture transported by the combined CP’s front and the additional latent heat flux and increased CAPE along the combined CP’s edges (compare Supplementary Fig. 6e, j). Indeed, many subsequent rain cells do form near the perimeter of the combined CP, whereas this is rare for *A*2 (thin black contours in Fig. 4a). To quantify this moisture redistribution from within the MCS to its surroundings, we contrast domain subregions of *A*5, which receive intense versus weak precipitation during a given model day (Supplementary Fig. 6f–h). Regions of intense rainfall are characterised by enhanced moisture near the cloud base (*z* ~1 km) before precipitation onset, but marked depletion after rain has occurred. Conversely, areas of weak rainfall show nearly a “mirror image”, with depressed moisture before but enhanced values after precipitation. The bi–diurnal dynamics for *A*5 can hence be characterised as an alternation of cloud-base moisture, driven by the lateral expansion of MCSs, which leaves behind an inhibitory dry patch in their interior that is marked by low CAPE (Supplementary Fig. 6i, j). On the subsequent day, this drying in locations of previous MCSs leads to relatively enhanced near-surface and cloud-base moisture in those regions not affected by these previous MCSs—in turn providing favourable conditions for convective activity there (for detail, see Supplementary Information Text and Supplementary Fig. 6). We further find a positive correlation between cloud-base moisture and precipitation, which could help promote the transition from shallow to deep convection^{29,51} and potentially reinforce the updraughts. In *A*2, day-to-day moisture alternations are lacking (see Supplementary Fig. 6g, h for *A*5*b* to Supplementary Fig. 6b, c for *A*2*b*), a finding that falls in line with the absence of organised convection. As with rainfall, the spatial pattern of cloud-base moisture again reflects a coarsening of scales for *A*5 relative to *A*2 (Supplementary Fig. 6d, i).

In total, the analysis suggests that an MCS emerges, when several CPs combine, become deeper and excite new rain cells along their common gust front. The emergence of an MCS on one day causes a suppressed region the subsequent day. As a further check, we weaken CPs by decreasing the ventilation coefficients (*a*_{v} and *b*_{v} in Eq. (24) of ref. ^{52}) to a fraction of their default values. These coefficients control rain evaporation, hence temperature depression (T^{prime}) and CP propagation ((sim {T}^{^{prime} 1/2})) (for details, see “Methods”). Indeed, decreasing these coefficients systematically reduces the spatial extent of the rain-free regions (Supplementary Fig. 3).

### Clustering from a two-level atmosphere model

A key characteristic of *A*5 is that parts of the day see no rainfall at all. In contrast, immediately after the onset of rain (near mid-day, Fig. 1b), the area covered by precipitation is relatively large—corresponding to a high number density of rain events and CPs. Our simplified model describes the system domain as a square lattice of sites, where each site can reside in two different states. For simplicity, we consider each pair of a rain cell and its CP as one entity. We let this entity occupy one lattice site, termed an “active site”, and say that it fills the area of a rain cell *a*_{0} ≈25 km^{2} (Supplementary Table 1). Lattice sites not active are considered “vacant”. In practice, our model mimics the qualitative clustering dynamics by allowing MCSs to emerge, whenever a sufficient number of active sites occur close to each other.

Assume that, at a given time, the fraction of active sites is *p*_{0}, and sites are independently populated. That is, each site of a square lattice contains a rain event at probability *p*_{0}. To exemplify, in Fig. 4c, i, *p*_{0} is relatively small, and few sites are active. In Fig. 4c, ii, *p*_{0} is larger, more sites are therefore active and many of them are now neighbours. To incorporate the effect of an MCS, we simply let the remaining vacant sites be more likely to become active when they are immediate neighbours to a sufficiently large area *A* > *A*_{crit} of spatially contiguous active sites (compare: Fig. 4a). This is accomplished by assigning increased probabilities to the neighbourhood sites (Fig. 4c, ii, orange sites). When *p*_{0} is small (*p*_{0} ≪ 1), the system will, however, be unlikely to contain contiguous rain areas exceeding *A*_{crit} (compare: shaded box in Fig. 4a). To be explicit: the probability of finding two active sites on two neighbouring sites is proportional to ({p}_{0}^{2}), and this probability will decay exponentially with the number of sites contained in the contiguous area^{53}.

But how does *p*_{0} emerge from the diurnal cycle dynamics, and how can bi–diurnal temporal correlations be captured (Fig. 2)? To self-consistently incorporate these features, we describe the population of rain cells within a two-layer atmosphere model consisting of a prescribed boundary- layer temperature *T*_{bl}, varying sinusoidally with a small diurnal surface temperature amplitude *t*_{a} (see Fig. 1a), and an interactive free-tropospheric temperature *T*_{ft}. We compute the probability for a rain cell to become active by coupling it to the temperature difference between the boundary layer and the free troposphere Δ*T* = *T*_{bl} − *T*_{ft}, as an approximation for atmospheric stability. When *T*_{bl} increases during the model day, rain events will eventually be set off. Two key processes impact on the free-tropospheric temperature: thermal radiation to space reduces *T*_{ft}, whereas latent heat transfer by rain events increases it. The boundary-layer temperature (*T*_{bl}) is not directly affected by rain cells. However, buoyancy depressions, arising mainly from sustained drying after rain events, are implemented by an inhibitory potential. In practice, once an active site transitions back to vacant, it needs to “wait” until it can become active again (for details, see “Methods”).

We implement reasonable coefficients for these processes and find the simulations to reach a repetitive diurnal cycle (Fig. 4e, inset). Indeed, for small *t*_{a}, modest rainfall is present during the entire day, whereas for larger *t*_{a}, rainfall is either intense or absent. Time-averaged rain areas for large and small *t*_{a} match (compare: Fig. 1b), a result of the radiative constraint and in agreement with the numerical experiments. The earlier phase in Fig. 4e, inset versus Fig. 1b, can be attributed to the fact that we directly prescribed atmospheric boundary-layer temperature *T*_{bl}, rather than surface temperature itself, causing an immediate precipitation response.

Considering the variance of the spatial pattern for large *t*_{a}, the simplified model indeed produces increased clustering over time and larger spatial scales (Fig. 4d, colours from red to green, and Supplementary Fig. 7). Conversely, clustering is absent for small *t*_{a} (Fig. 4e). This can be explained intuitively: when the thermal forcing caused by the boundary-layer temperature *T*_{bl} increases rapidly during the day, many rain events will be set off during a short time—leading to large *p*_{0} during those times. The negative feedback on the free-tropospheric temperature *T*_{ft} will then rapidly cause the “budget” of rainfall to be used up. MCS will form as long as the increased probability at the edges of the contiguous CP patches counteracts the ongoing increase of *T*_{ft}. Hence, MCSs will be able to spread, as long as this is the case, thus setting the time (≈6 h) and space (≈100 km) scale for MCS, which is significantly larger than the scale of a single rain event (≈1 h and ≈5 km).

### Conceptualising further

The two scales of individual rain cells (~5 km) and MCS (~100 km) allow for a further simplified conceptual view, which builds on domain total moisture being conserved from day-to-day. In this view, the occurrence of rain cells is directly proportional to the available moisture and rain cells act to redistribute moisture, once they form an MCS. Take the model domain to be broken down into a square lattice consisting of blocks. Each “block” consists of 20 × 20 sites for single rain cells of area *a*_{0}, together yielding a block size of 100 km × 100 km, which provides space for an entire MCS. In each block, an MCS is only set off if a sufficient number of rain cells are present. We find that a simple set of rules can model the MCS dynamics: (1) first, initialise each block in this square lattice, by assigning an integer, encoding the number of rain cells, drawn from a binomial distribution; (2) update the system once per day, by letting all blocks above a particular threshold (an MCS forms) hand over their content (the moisture transported by the MCS) at equal parts to their four neighbouring blocks; after this distribution step, all sites collect the spilt content, hence conserving total moisture.

When the mean of the binomial distribution is low compared with the threshold, which is the case for small *t*_{a}, no MCS will form and no redistribution will take place (Fig. 4f, blue points, and lower row of squares). In contrast, when *t*_{a} is sufficiently large, a long sequence of reallocations will occur, leading to a checkerboard-like clustering, which strengthens in time (red curve and upper row of squares). This example is sufficient to capture the increase of normalised variance for A5 and the lack of it for A2 (Fig. 4f).