Relevance of temporal cores for epidemic spread in temporal networks

Jul 27, 2020

Span-core decomposition and maximal span-cores

In a recent work, Galimberti et al.26 proposed an extension of core decomposition to temporal networks, whereby cores are associated with their temporal spans, i.e., intervals of contiguous timestamps for which the coreness property (i.e., of minimal connectivity) holds. Such cohesive temporal structures are named span-cores.

Let us consider a temporal graph (G = (V,T,tau )), where V is a set of nodes, (T = [0, 1, ldots , t_{max}]) is a discrete time domain, and (tau : V times V times Trightarrow {0,1}) is a function defining for each pair of vertices (u,v in V) and each timestamp (t in T) whether the edge (uv) exists in t. We denote (E = {(u,v,t) mid tau (u,v,t) = 1 }) the set of all temporal edges. Given a timestamp (t in T), (E_t = {(u,v) mid tau (u,v,t) = 1 }) is the set of edges existing at time t. Given a subset of nodes (S subseteq V), let (E_{Delta }[S]) be the set of edges connecting the nodes of S that exist in all timestamps (t in Delta). We then define the temporal degree of a node u within the subgraph (G_{Delta }[S] = (S , E_{Delta }[S])) as (d_Delta (S,u) = |{v in S mid (u,v) in E_Delta [S] }|). In other words, the temporal degree of u is the number of other nodes to which u is linked in all the timestamps of (Delta), without interruption.

Definition 1

(((k,Delta ))-core) The ((k,Delta ))-core of a temporal graph (G = (V,T,tau )) is (when it exists) a maximal and non-empty set of nodes (emptyset ne C_{k,Delta } subseteq V), such that (forall u in C_{k,Delta } : d_Delta (C_{k,Delta },u) ge k), where (Delta sqsubseteq T) is a temporal interval and (k in {mathbb {N}}^+).

The interval (Delta) and the integer k are referred to as span and order of the span-core, respectively. As it is well known, the cores of a static graph define a hierarchy. On the other hand, span-cores are not all nested into each other. Nonetheless, they exhibit containment properties. In particular, we say that a ((k,Delta ))-core is contained into another ((k^{‘},Delta ^{‘}))-core if (k^{‘} < k) and (Delta sqsubseteq Delta ^{‘}).

The number of span-cores is quadratic in |T|. This is definitely not desirable when human inspection is of interest. Therefore, it is useful to focus only on the most relevant ones. Thus, Galimberti et al.26 introduced the concept of maximal span-core.

Definition 2

(Maximal span-core) A span-core (C_{k,Delta }) of a temporal graph G is said maximal if there does not exist any other span-core (C_{k’,Delta ‘}) of G such that (k le k’) and (Delta sqsubseteq Delta ‘).

A span-core is thus identified as maximal if it is not dominated by any other span-core in terms of both order and span. Clearly, maximal span-cores resemble the idea of innermost core, i.e., the core of highest order, in the core decomposition of a static graph. However, maximal span-cores are not unique. Instead, there is at most one maximal span-core for every temporal interval.

We also recall here some basic ideas of the efficient algorithm to compute all span-cores in a temporal network26. A naive approach would involve executing a static core decomposition routine independently for each temporal interval (Delta). A more efficient procedure exploits the containment property in both its dimensions, coreness and temporal intervals. Indeed, given a temporal graph G, and a temporal interval (Delta = [t_s,t_e] sqsubseteq T), let (Delta _+ = [min {t_s+1,t_e}, t_e]) and (Delta _- = [t_s, max {t_e-1,t_s}]). It holds that

begin{aligned} C_{1,Delta } subseteq (C_{1,Delta _+} cap C_{1,Delta _-}) = bigcap _{Delta ‘ sqsubseteq Delta } C_{1, Delta ‘} . end{aligned}

The algorithm26 takes advantage of this simple property by processing temporal intervals of increasing size (starting from size one) and, for each interval (Delta) of width larger than one, the core decomposition is initiated from ((C_{1,Delta _+} cap C_{1,Delta _-})), the smallest intersection of cores containing (C_{1,Delta }). This expedient produces a speed-up of orders of magnitude in the obtention of all span-cores.

The problem of computing the maximal span-cores of a temporal graph can also be addressed by the simple approach of extracting all span-cores and then filtering out those which are not recognized as maximal. However, theoretical properties that relate the maximal span-cores to each other prove that it is not required to compute the overall temporal core decomposition but it is possible to extract only the maximal span-cores. Note that this is a challenging design principle, as it contrasts the idea that a core of order k is typically computed from the core of order (k-1). Theoretical findings provide bounds on the order of a maximal span-core and suggest a top-down algorithm that processes temporal intervals starting from the larger ones, in opposition to the method used to extract the entire span-core decomposition. The procedure does not search the whole span-core space and it has been empirically shown to be markedly more efficient compared to the approach based on filtering out non-maximal span-cores26.

The algorithms are detailed in Ref.26 and the code is publicly available on https://github.com/egalimberti/span_cores

Datasets

We consider 8 data sets describing human interactions with high spatial and temporal resolutions in a variety of contexts: schools (at different levels and in different countries), conferences, workplace (office building) and hospital. These data are publicly available thanks to two independent collaborations. SocioPatterns30 gathers longitudinal data on physical proximity and face-to-face contacts of individuals in different contexts using a sensing platform based on wearable badges equipped with radiofrequency identification devices (RFIDs). Contact data are collected with a temporal resolution of 20 s. Further data describing human proximity are provided by Toth et al.31, who deployed a platform composed of wireless ranging enabled nodes (WRENs) in several schools in the USA. Each WREN collects signals from other WRENs in proximity at intervals of approximately 20 s. Signal strength criteria are used to select pairs of individuals wearing these WRENs located at distance lower than or equal to 1 meter: this is the practical definition used to define a contact between individuals at each time. For each data set, we aggregate all interactions in successive time-windows (timestamps) of 300 s. We thus obtain from each data set a temporal network in which nodes represent individuals, and a temporal edge is drawn in a timestamp t between two nodes if the two corresponding individuals have been in contact in this time-window.

The specific data sets we use are the following. The first 6 are provided by the SocioPatterns30 collaboration, and the last 2 by Toth et al.31.

• The Primary School data set contains the contact events between 242 individuals (232 children and 10 teachers) in a primary school in Lyon, during 2 days in October 200939.

• The High School data set gives the interactions between 327 students of nine classes within a high school in Marseille, during 5 days in December 201335.

• The Hospital data set describes the face-to-face interactions of patients and health-care workers (HCWs) in a hospital ward in Lyon, France during 1 week in December 2010. The study included 46 HCWs and 29 patients40.

• The Conference data set was collected during the ACM Hypertext 2009 conference, which took place between June 6, 2009 and July 1, 2009 in Turin, Italy. The data cover a period of 2 days and a half41.

• The SFHH conference data set describes the face-to-face interactions of 405 participants to the 2009 SFHH conference in Nice, France (June 4–5, 2009) citeGenois:2018.

• The Workplace data set contains the temporal network of contacts between individuals recorded in an office building in France in 201524.

• The Elementary School data set contains the contact data associated with the 476 students in the 21 classes of a suburban elementary school in Utah (USA) on January 31, 2013 and February 1, 201331.

• The Middle School data set describes the proximity interactions occurred on November 28 and 29, 2012 in an urban public middle school in Utah (USA)31.

Some more details are given in Table 2.

We consider the two paradigmatic spreading processes Susceptible-Infectious-Susceptible (SIS) and Susceptible-Infectious-Recovered SIR, with parameters (beta) and (nu), as described in the main text: (beta) is the probability per unit time that a susceptible node in contact with an infectious one becomes infectious, and (nu) is the probability per unit time that an infectious node recovers spontaneously (becoming again susceptible in the SIS case, and recovered in the SIR case). These spreading processes are considered on the temporal network data: contagion can occur only along the temporal edges.

In the spreading mitigation scenario, our aim is to compare the unfolding and impact of these processes, for each data set, on the original temporal network and on temporal networks modified according to several intervention strategies. To quantify the differences between processes in original and altered networks, we resort to two common measures of the epidemic risk and study how its value is modified by the intervention. We first consider the value of the epidemic threshold, i.e., the critical value of disease transmissibility (beta) above which the spread is able to reach a large fraction of the population. The analytical method developed by Valdano et al.32, based on the approximation of the process by a Markov chain, allows to express the epidemic threshold for both processes in terms of the spectral radius of a matrix that encodes both network structure and disease dynamics. We use the Python package publicly available at http://github.com/eugenio-valdano/threshold to compute the threshold as a function of the recovery parameter (nu) for the various data sets and the various intervention strategies, and measure the impact of each strategy through the relative change in the threshold value.

We moreover consider the final size of the epidemic, i.e., the fraction of nodes that have been reached by the process when it ends. At the end of the SIR process, no infectious nodes are left and the epidemic size is given by the fraction of nodes in the R state. Note that, as the SIR epidemic might not end within the finite span of the data, the temporal network is repeated if needed until the process ends. For the original data, for each strategy and for each set of parameters ((beta ,nu )) we simulate (N_{sim} = 1000) SIR processes. We then compute the epidemic size ratio for each strategy and parameter values, defined as the ratio between the average epidemic size for processes on the reduced temporal network obtained according to a strategy and the average epidemic size for processes on the original temporal network:

begin{aligned} rho _{strategy}(beta ,nu ) = frac{langle R_{final} rangle _{strategy}}{langle R_{final}rangle _{original}} . end{aligned}

In the spreading maximization scenario, the goal is to assess the performance of the maximal span-cores as a tool to identify the nodes that,when infected, lead to wide propagation. We consider numerical simulations of the SIR process and we adopt the final size of the epidemics as a performance metric for a seeding strategy. For each seed, we simulate (N_{sim} = 100) SIR processes starting at different timestamps. Again, if required, the temporal domain of a temporal network is repeated until the process end. We then compute the epidemic size ratio, in which the numerator and denominator are given by the average epidemic sizes for processes seeded according to a strategy and for randomly seeded processes, respectively:

begin{aligned} rho _{strategy}(beta ,nu ) = frac{langle R_{final} rangle _{strategy}}{langle R_{final}rangle _{random}} . end{aligned}

Clearly, although in the spreading mitigation and maximization frameworks the epidemic size ratios are defined similarly, in the former smaller values are desiderable while in the latter larger values are to be preferred.

Mitigation strategies

We describe here in detail the targeted intervention strategies aimed at mitigating the spread of an epidemic process unfolding in a host population described by a temporal network. These interventions consist in removing temporal edges in the network, and different strategies consider different ways of choosing the edges to be removed. The strategies we put forward target the maximal span-cores of the given temporal network (G = (V, T, tau )). One possibility is to choose an a priori number (n_{msc}) of maximal span-cores and remove all the corresponding edges. This can be interpreted as temporary isolation: a node u belonging to one of the chosen maximal span-cores (C_{k,Delta }) is isolated over the time interval (Delta). As different cores have different sizes, fixing (n_{msc}) would however lead to different fractions of removed temporal edges for the different strategies. We thus fix the fraction f of temporal edges to be removed and remove them starting with the maximal span-cores taken in a chosen order. We consider here (f=20%) and show in the supplementary material the results of using (f=10%).

As the maximal cores can be classified along two properties, their order and their durations, we consider in fact two separate strategies:

• The top-k maximal span-cores strategy (kM for short) removes temporal edges starting from the maximal span-cores with highest order, independently of their duration. We denote by (G_{kM}) and (E_{kM}) the temporal network and set of temporal edges remaining after the intervention, respectively.

• The top-(Delta) maximal span-cores strategy ((Delta)M for short) removes temporal edges starting from the maximal span-cores with longest duration, independently of their order. We denote by (G_{Delta M}) and (E_{Delta M}) the temporal network and set of temporal edges remaining after the intervention, respectively.

We denote by (n_T) the total number of temporal edges removed: (n_T = f |E|) where |E| is the set of temporal edges in the temporal network considered. For each strategy, we precise that: (1) at given duration or order, the cores are ordered randomly; (2) if removing all the edges of a span-core would lead to removing more than (n_T) edges, edges are removed at random from that core until reaching exactly (n_T) removed edges.

Table 1 gives for each strategy the properties of the span-cores removed for the two data sets considered in the main text. These properties are given in the Supplementary material for the other data sets.

The (n_T) temporal edges removed are not removed uniformly along the timeline of the temporal network, and we denote by (n_t^{s}) the number of temporal edges removed at timestep t for strategy s ((s=kM) or (s=Delta M)). We evaluate the effectiveness of these intervention strategies by comparing their impact to the one of several baselines. Each baseline consists in removing the same number (n_T) of temporal edges from the temporal network. The two simplest baselines consist in removing these edges at random:

• Randomly trimmed network (for short RT): the simplest benchmark is the intervention in which (n_T) edges are randomly removed over the temporal domain.

• Randomly trimmed by timestamp network (for short RTT): this strategy uses the knowledge of the timestamps in which the targeted span-cores are active, by removing exactly (n_{t}^s) edges randomly at each timestamp t. There are therefore two such strategies, kRTT that removes (n_{t}^{kM}) random edges at time t and (Delta)RTT that removes (n_{t}^{Delta M}) random edges at time t.

Note that the two RTT strategies exploit the temporal information provided by the temporal core decomposition while the RT one does not.

We consider as well more sophisticated baselines based on node centrality measures computed on the time-aggregated network. Indeed, it is known for static networks that nodes with large degree or static coreness play important roles in spreading processes. We also consider the weighted counterparts of degree and coreness, strength and weighted coreness34. We recall that in the time-aggregated network, the degree of a node u is equal to the number of distinct nodes with whom u has been in contact, and the weight of an edge (uv) gives the number of temporal edges between u and v in the temporal network.

For each centrality measure, the strategy works as follows:

• we first sort the nodes in decreasing order of their centrality;

• we then consider the nodes one by one, starting by the most central ones, and removing all the temporal edges to which it belongs over the entire temporal domain, until the stopping criterion is met (i.e., until (n_{T}) edges have been discarded). If discarding all the interactions of a node would mean exceeding (n_T), the temporal interactions to be removed for this node are chosen at random.

We thus obtain the four following strategies:

• The highest static degree strategy (SD) strategy;

• The highest static coreness (SC) strategy;

• The highest static strength (weighted degree, SWD) strategy;

• The highest static s-coreness (weighted coreness, SWC) strategy.

Finally, we consider the strategy tPR, which is based on a generalization of PageRank to temporal network29. Here, a temporal PageRank value is assigned to each node u at each timestamp t: tPR(ut). The pairs (ut) are ranked according to these values and we remove their temporal edges using this ranking until (n_T) temporal edges have been removed.

Seeding strategies

In the spreading maximization scenario considered for the SIR model, we consider several seeding strategies, i.e., choice of the initial seed of the spread (the first node in state I) aimed at favouring the spread. The idea underlying the proposed procedure is that nodes that represent influential spreaders are likely to belong to the span-cores of highest order at the timestamp in which the spread begins. We consider a fraction (f = 5 %) of the nodes of a temporal network: The top-k maximal span-core seeding strategy (for short kM) requires to carry out the following steps:

• we rank the nodes according to the value of the highest order of the span-cores they belong to;

• in decreasing order, we take each node as seed of (N_{sim}=100) SIR processes until the fraction f of the total amount of nodes has been considered. For a given node, the spread starts at a timestamp randomly sampled from the union of the spans of the highest order span-cores it belongs to. We then compute the average size of the epidemic, averaged over the (N_{sim}=100) processes.

Strategies based on randomness are not properly defined in this case since random seeds are exploited in order to construct the performance metric. Instead, baseline strategies are based on the same static node centrality measures used in the spread mitigation scenario, namely degree (strategy SD), strength (strategy SWD), coreness (strategy SC) and weighted coreness (strategy SWC). For each centrality measure, the associated strategy is implemented as follows:

• we rank the nodes according to their centrality;

• in decreasing order of centrality, we take each node as a seed of (N_{sim}=100) SIR processes until the fraction f of the total amount of nodes has been considered. In order for the comparison to be fair, the same sequence of initial timestamps considered in the seeding strategy based on the highest order maximal span-cores is considered.