# The education-chasing labor rush in China identified by a heterogeneous migration-network game

Jul 31, 2020

### Data description and the network structure of aggregated migration flows

We study the labor force migration of Chinese online job-seekers by a large resume dataset. The resume data is collected from Zhaopin.com that is one leading online platform for job seeking in China and has a resume database consisting of tens of million resumes filled by real job-seekers when they registered on this platform. Since the filled resumes have high probability to be viewed by HRs from the intended recruiters, the information filled by job-seekers is believed to reflect their truth and is updated in time. The questions mandatory to be answered include the gender, age, marriage status, education experience, past working experience (most job seekers on zhaopin.com has at least one job before), the previous working industries and so on, from which 77 migrant-level feature variables can be extracted (a complete statistic description of these variables in our sample can be found in the supplementary to the paper). The resume also contains the information of migrant’s hukou place, the current working place, the expected working places. Since the hukou place of job seekers can be identified as the place where they come from, the resume data set provides a two-stage OD trajectory data set for the labor force migration in China that are the stage (1): hukou place (rightarrow ) current working place and stage (2): current (rightarrow ) expected working place.

The data that we have the access is a purely random subsample of the full resume database which consists of resumes from 80,000 job-seekers (after dropping the records with missing values, 75,616 records are remained), the most recent update time of the resumes in this sample is by Jun. 2017. The included hukou places, working places and expected working places cover 400+ cities in China which include all the 286 prefecture-level cities and a set of lower-level cities of China, therefore we believe the subsample presents a good representative for the city-level labor force migration trend at least for the sub-population who seek job online. To avoid over-fitting, we group all the destination cities into 22 city clusters among which the first 21 city clusters are officially declared in the Chinese Statistical Yearbook (2015), the last one consists the cities that are not contained in any of the officially declared city clusters.

From the data, the aggregated migration network for the two stages can be calculated simply by counting the number of sampled migrants who migrate between every city pair. We can discuss the structural features of the migration network and its structural changes for the two migration stages. The two-stage networks are presented in Fig. 1a,b, where to generate clean plots, we only display the migration arrows across the 22 city clusters and the migration arrows within every city cluster. Comparing the two plots, the decentralization or multi-centralization trend is observed, and this trend can also be verified through comparing Fig. 1c,d where the blue line represents the total to-degree of every city cluster. From Fig. 1a,c, during the first stage, Beijing is the unique global flow-in center of all migrants, attracting migrants from both the other major cities (city clusters) in China, such as Shanghai, Guangzhou, and the lower-level periphery cities (grouped within “Other cities”). Although there do exist a couple of local flow-in centers such as Changchun in the north China and Shanghai in the East China, while their attractiveness is not comparable to Beijing at all. In contrast, during stage (2), multiple global flow-in centers emerge and they are detectible from comparing Fig. 1b,d. The local centers, such as Guangzhou, Zhengzhou and Xi’an, are upgrade to global centers and their attractiveness becomes in line with Beijing, which impedes the leading position of Beijing. The attractiveness of Beijing is even reduced in the absolute sense that the strong migration flows from the local centers, such as Shanghai, Fuzhou, Guangzhou and Shenyang, in stage (2), are significantly weakened or even disappears during stage (2). Finally, during stage (2), many tier-2 cities, such as Zhengzhou, Changsha and Xi’an, starts to be attractive for migrants who come from periphery cities. If we focus on the within-cluster migration trend, the difference between the stage (1) and (2) migration is also remarkable and coincides with the trend of decentralization. In fact, by the comparison between Fig. 1c,d, the internal migration intensity is strengthened significantly for two city clusters in the stage (2) that are the set of other cities and city cluster centered at Beijing. The former implies an increasing trend of the out-of-center migration46, while the later implies the decline of Beijing in terms of absorbing local migrants which reinforces the finding that the attractiveness of Beijing is reduced in the absolute sense.

The observed decentralization trend is highly consistent with the divergence in city development planning and migrant-related policies since 2016 between the capital city Beijing and the other major cities in the middle and western area of China. Since 2016, Beijing started to control its population scale, issue unfriendly residential policies that expel the so-called “low-end” migrants who are the low-educated, low-skilled, low-income and elderly contractor workers without registration in local hukou system. In the mean time, many tier-2 cities in the middle and western area of China, such as Xi’an, Chongqing, Chengdu, Wuhan, and cities in the Yangtze river delta area, such as Hangzhou and Nanjing, and cities in the Pearl river delta area, such as Shenzhen and Guangzhou, became migrant-friendly and relaxed the hukou restriction to attract young migrants with relatively high education level. Since the second-stage migration in our dataset only refers to the expected migration which has not yet happened in reality, the expectation of migrant workers reacts much more quickly to the policy change than their real migration decision, the structural change of the migration network between the two stages is by and large attributable to the non-homogeneous change of migration policy in different cities.

Although the overall migration network is informative, it is not complete. In reality, the migration network often varies along with migrant types, such as the education, income and age3,34,38,47, while the aggregated network fails to detect the difference. In addition, there are complicated interactions among migrants which also contribute to the formation of migration network28 but are not reflected in the aggregated network. To this end, we shall develop an empirically-fittible population game model for further investigation.

### A family of large migration game

For simplicity we focus on the origin-destination (OD) type of migration trajectories, but the framework can be easily extended to more general situations. To formulate the interactions among migrants, a large population game is established where the player set is viewed as a large random sample drawn from a continuum feature space and all the strategies that players can select are identified as the set of destination locations. The continuum feature space assumption is designed to capture the heterogeneity of players and the fact that there are always enormous players involved in the migration analysis. For short, we shall call this game model as migration game throughout the paper.

Formally, consider a family of normal-form games defined through the following four components and represented as the tuple (G(X_Nsubset {mathfrak {P}}:={mathbf {C}}times {mathbb {R}}^p,{mathbf {C}},mu ,U)):

A1.:

Pure strategies: denote ({mathbf {C}}={c_1,dots ,c_n}) as the pure strategy set which consists of all possible destination locations for players.

A2.:

Players: denote ({mathbb {R}}^p) as the p-dimensional feature space, every (xin {mathbb {R}}^p) represents a personal feature profile that associates with a given player (type). Augmenting ({mathbb {R}}^p) with the origin place ({mathbf {C}}) forms the full set of potential players characterized by both their personal and location features, denoted as ({mathfrak {P}}={mathbf {C}}times {mathbb {R}}^p). The set of N players (X_N) is randomly drawn from ({mathfrak {P}}), according to a known distribution (mu ), which can be interpreted as the population distribution.

A3.:

Mixed strategy: players are allowed to take mixed strategy, the mixed strategy set is represented as the set of vector-valued function ({{mathscr {P}}}={P:X_Nrightarrow S_{mathbf {C}}}) where (S_{mathbf {C}}={(p_0,dots ,p_{|{mathbf {C}}|-1})in {mathbb {R}}^c:,p_ige 0,sum _{i=0}^{|{mathbf {C}}|-1}p_i=1}) is the (|{mathbf {C}}|-1) dimensional simplex, (|{mathbf {C}}|) is the cardinality of ({mathbf {C}}). Then, for every destination (jin {mathbf {C}}), the jth coordinate projection (P_j(i,x)) will be the probability that a player ((i,x)in X_N) selects to migrate to j under the mixed strategy P. Without loss of generality, we assume (Pin {{mathscr {P}}}) is smooth up to a certain order with respect to (xin {mathfrak {P}}), i.e. P is the restriction of a smooth function onto (X_N), which implies that two players who are similar to each other in both of origin and features should make similar choice of strategies to some extent.

A4.:

Utility: denote (u^p) as the pure strategy utility function of players, it takes the following form for a given player (x_+=(i,x)in X_N) and a pure strategy profile ({mathbf {s}}={s_{x’_+}in {mathbf {C}}:x_+’in X_N}):

begin{aligned} u^p(x_+,j,{mathbf {s}}_{-x_+})= Fleft( frac{1}{N-1}sum _{x_+’not = x_+}I(s_{x’_+}=j)T(x_+,x_+’)-gleft( x_+,jright) right) end{aligned}

(1)

where ({mathbf {s}}_{-x_+}) is a pure strategy combination executed by players other than the decision player (x_+), I is the indicator function. F is a continuous function; T is a pairing weight function valued in the unit interval [0, 1] which, for a given player (x_+), describes which group of competitors, namely ({x_+’in X_N:T(x_+,x_+’)>0}), will be taken into account and how influential, measured by the value of function (T(x_+,cdot )), the competitors are for (x_+); g is interpreted as the ideal population ratio that the destination location should have, which is a kind of private information for every player and the features of every player can affect this quantity in a certain way. The utility for mixed strategy is calculated from Eq. (1) by the standard von-Neumann–Morgenstern expected utility theory, denoted as (U_{vNM}).

For every pure strategy j, Eq. (1) postulates that the utility it brings for every decision player depends on the difference between the actual population ratio [represented as the partial sum term in Eq. (1)] and the ideal population ratio [represented as the value of function g in Eq. (1)]. This difference can be interpreted as a measure for the degree of congestion in the destination city. As both the ways to count the actual population ratio and the ideal population ratio in Eq. (1) are allowed to vary from player to player, the congestion effect studied in this paper is essentially the subjective congestion effect, which gives full respect to the migrant-level heterogeneity, and is more flexible, including the widely studied congestion effect and the route congestion effect11,28,29,30 as special cases. For the actual population ratio of a given decision player, only those players within the target group are counted, while which player is in the target group is completely determined by the pairing function (T(x_{+},cdot )) evaluated at the feature type of the decision player. Hence, the pairing function T encodes the relationship among players, it can be interpreted as the continuous-version adjacency matrix of a player-to-player network. This player-to-player network is closely related to the concept of “social capital” in the studies of social network, including it into the utility function helps establish a quantitative connection between the adjacency matrix of the social network among migrants and that of the migration network among regions22. This connection is important to understand the formation of migration trajectories. To our best knowledge, our work is the first attempt to connect the two networks together at the adjacency-matrix level.

The ideal population ratio is also conditional on the feature of every player. This is because different players often disagree with each other, the disagreement can come from both the difference in personal characteristics (represented through their feature vector x) and the different original cities1,28. For instance, local residents with higher education level are more likely to survive in expensive meta-cities than those migrants with lower education level, consequently, the former group of people tend to evaluate a greater g for big cities, while the later group would assign a greater g to small and cheap cities when all others equal.

Throughout the application of the current paper, we simply assume (F(x)=-x^2), and g is parametrized through the logistic form

begin{aligned} g(i,x,j|theta _g)=frac{1}{1+exp (-theta _{g,1}^top cdot x_i-theta _{g,2}^top cdot x-theta _{g,3}^top cdot x_j)}, end{aligned}

(2)

where x is the person-level feature vector for a given player, (x_i) and (x_j) are the city-level feature vectors associated with the origin city i and destination city j, respectively. (theta _g , (:=(theta _{g,1},theta _{g,2},theta _{g,3}))) consists of the three coefficient vectors associated with the feature (x_i), x and (x_j). Under this specification, we assume that every job-seeker obtain utility from migration to city j through comparing the difference between his/her subjective optimal population ratio of city j measured by g(ixj) and the actual ratio of job seekers that select to migrate into j. Using quadratic functional form for F(x) captures that the utility is maximized if and only if the actual population ratio is just the ideal ratio, both overwhelming and unsaturatedness would lower down the attractiveness of a destination (notice that the quadratic form of F can be generalized to any functional form preserving the preference order that would have no impact on the current analytic and numerical results, for instance the quadratic form can be replaced by another function that has unique peak value and is symmetric with respect to the peak). This specification captures the dynamics of migration in many real world settings (e.g. the labor force migration game, the vehicle route selection game), similar utility functions are also used in the literature39.

For the pairing function and T, we consider two alternatives that are the constant function (Tequiv 1) and the binary(({0,1}))-valued function with (Tleft( (i,x),(i’,x’)right) =1) if and only if (i=i’). The later option describes the peer effect behind migration by which migrants only take their “peers”, the migrants who come from the same origin, into account. A preliminary analysis shows that the constant function dominates the “peer-effect” pairing function in forecast accuracy for migration decision at both migration stages, therefore, we would only focus on the simpler setting (Tequiv 1) in the following sections. One reason for the relative disadvantage of the peer-effect T might be that the destination locations in our data are the city-level locations which are too-large to allow the peer effect to work.

Finally, note that under the adjacency matrix interpretation of T, the constant pairing (Tequiv 1) is equivalent to that the social network that drive the formation of migration network is essentially a fully-connected network. In the other words, every job-seeker in the game will put equal weight to the decision of all the other players. This assumption is a bit weird as in reality job seekers is not possible to know all their competitors. But on the other hand, this assumption is equivalent to ask the job-seekers read the officially published population data of every city, which is not that unrealistic any more. It is definitely possible to set a more subtle form of the pairing function T so as to better reflect the impact of social networks among job seekers, but that is beyond the scope of both the data and the current study, we leave it for future studies.

#### Equilibrium, migration network and efficiency

The actual migration should happen in the way that every migrant in the system can only move to the destination that maximizes their utility given the choice of the others, which can be perfectly captured by the Nash equilibrium of the migration game. In fact, we can assume the following without loss of generality.

### Assumption 1

Given n randomly sampled migrants ({x_{+,i}:,i=1,dots ,n}), there exists an Nash equilibrium mixed strategy ({P^*_E(x_{+,i},cdot ):,i=1,dots ,n}) such that the observed n OD trajectories ({OD_i:,i=1,dots ,n}) are independent random samples with each (OD_i) drawn from the law of (P^*_E(x_{+,i},cdot )).

This assumption sets the actual migration trajectory as a random consequence with the randomness governed by a set of probability laws that are derived as a latent mixed-strategy Nash equilibrium of a proper migration game. The randomness set-up is to capture the impact of individual heterogeneities that are unobservable and beyond the scope of the observed feature x. We show in the method section that under Assumption 1 the proposed migration game can be fitted by real OD-trajectory data through a constrained maximum likelihood procedure, and a fast algorithm is provided to generate consistent inference for both the equilibria migration probability and the underlying game that migrants actually play.

For an Nash equilibrium, (P_E), of the migration game (G(X_N,{mathbf {C}},mu ,U)) that generates the observed trajectory data, we can always construct the equilibria migration network which is representable as the following directed weighted adjacency matrix:

begin{aligned} M_{x}={P_{E}(i,x,j)}_{i,jin {mathbf {C}}},,,xin {mathfrak {P}}. end{aligned}

(3)

The ijth entry of (M_{x}) is the probability that a player would like to migrate from his/her origin to the given destination, which can be interpreted as the migration intensity for a certain type of migrants x. The dependence of (M_{x}) on player’s feature x reflects the heterogeneity of migration networks across different types of player, which is of the great interest in this study.

Apart from the adjacency matrix, a set of quantitative indices can be constructed from combining both of the game structural information and the equilibria migration probabilities. For instance, the following index is constructed as an analogue to the measure of energy conservation in physics, which is the product of two deviation quantities and presents a quantitative measure for the inefficiency degree of the migration dynamics governed by the equilibria migration networks (3):

begin{aligned} DE_x=left{ DE_{x,ij}^1cdot DE_{x,ij}^2right} _{i,jin {mathbf {C}}},,,xin {mathfrak {P}} end{aligned}

(4)

where

begin{aligned} DE_{x,ij}^1= & {} left( P_{E}(i,x,j)-int _{{mathbf {C}}times {mathfrak {P}}/{(i,x)}} P_{E}(i’,x’,j)Tleft( (i,x),(i’,x’)right) dmu (i’,x’)right) end{aligned}

(5)

begin{aligned} DE_{x,ij}^2= & {} left( int _{{mathbf {C}}times {mathfrak {P}}/{(i,x)}} P_{E}(i’,x’,j)Tleft( (i,x),(i’,x’)right) dmu (i’,x’)-g(i,x,j)right) end{aligned}

(6)

In fact, under the set-up (F(x)=-x^2), a great positive (DE_{x,ij}) implies two situations: (1) the location j has been overwhelming (positive (DE_{x,ij}^1)) in the view of the player with type x coming from i, but the player is still highly likely to move in (positive (DE_{x,ij}^2)); and (2) the location j has the potential to grow up (negative (DE_{x,ij}^1)) in the view of the player while he/she is less likely to move in (negative (DE_{x,ij}^2)). Therefore, the positive (DE_{x,ij})s would polarize the population distribution across cities, which would induce the coexistence of resource overuse and over in-flow, and cause the migration inefficiency. In contrast, negative (DE_{x,ij}) implies an equalization potential for the population dynamics which we think as the representative for migration efficiency.

### Heterogeneity of migration networks cross migrant types

The proposed game model is applied to infer the type-specific migration networks for a variety of migrant types. In practice, it is needed a set of migrant-level features and city-level features to fit the migration game model and derive the equilibria migration networks. In our case, there are 77 migrant-level features and 21 dummy variables accounting for the heterogeneity induced by the 22 city clusters. To avoid the multicolinarity for migrant-level features, we adopt the principal component analysis (PCA) method to generate a few principal features that covers 99% of total variance of the migrant-level variables. The number of the remained PCA features is 15. We will run the estimation based on the 36 (= 15 + 21) feature variables for the two-stage migration data. The result is presented in Fig. 2. by which we discuss the structural change of the migration network against the two migration stages and three classes of migrant-level features, including the education level, age and income. The three classes of features are selected according to the existing literature in which these features are most critical to migration decision3,34,38,47.

In Fig. 2, we first test the significance of the heterogeneity of migration networks against a variety of migrant types. Within each migrant type, we calculate the type-specific migration network through marginal integration of the estimated equilibria networks over the domain of migrant features provided that the given dimension of x (education, income and age respectively) is fixed at the desired type value. To show the significance of the heterogeneity of type-specific migration network we conduct the Z test against the null hypothesis that the given type-specific migration network is indifferent from the overall network. Under the null hypothesis, the entry-wise difference between the estimation of the type-specific and the overall network should be a random variable drawn from a zero-mean distribution. In Fig. 2, we report the Z test result for all different education, income and age types and for both stages of migration. From Fig. 2, a majority portion of migrant types are heterogeneous at the 0.1 confidence level and around a half of types have distinct migration network at the 0.05 confidence level. This result verifies the necessity to include migrant-level features into the analysis of migration networks. Among the three classes of migrant types, it is found that for education types, the migrants with undergraduate degree at stage (1) and migrants with professional school degree at stage (2) tend to deviate most significantly from the aggregated population; for income types, the extreme high-income group (with monthly salary above 84,000 Chines yuan) at stage (1) and the middle-class (with monthly salary around 6,000 Chinese yuan) at stage (2) deviate most; for age types, the youngest migrants (older than 10 but younger than 20) at stage (1) while the middle-aged adults (between 30- and 40-year-old) at stage (2) deviate most. Based on these results, we can roughly summarize such a rule that at the initial stage of migration, the migrants with some extraordinary features (such as high education, high income and extreme low age) are more likely to behave differently, while in the expected stage of migration, the migrants that take the most proportion within the whole population (such as middle education, income and age class) tend to deviate significantly. This contraction is quite interesting and deserves further explanation in the future studies.

In sum, Fig. 2 validates such a viewpoint that the migration network is not universally homogeneous for different migrant types. The heterogeneity is rooted deeply in the social-economic background for particular types of migrant, therefore, to understand the real mechanisms that govern the labor force migration behavior, including the migrant-level heterogeneity is indispensable. This fact proves, once again, the usefulness of the model proposed in this paper. Finally, note that Fig. 2 only characterizes the significance of the deviation between type-specific and overall migration networks, it does not reveal how the deviation happens. In the supplementary, we present more details of the deviation through plotting the difference migration network, which is formed by entry-wisely subtracting the type-specific migration probability from the overall migration probability (see Supplementary Figs. S1, S2 and the attached discussion).

### Forecast accuracy and the stationary nature of migration dynamics

In this section, we validate the inference made by our population game model in terms of its forecast accuracy. The accuracy is evaluated at both migration stages by which a comparison of the decision pattern between the two stages is conducted. We divide the full sample into a training sample which consists of 68,000 randomly picked resumes from the full sample and a testing sample that include the remained 7,616 resumes. Due to the large size of the training sample, we apply a bootstrap method to fulfil the estimation. We randomly partition the training sample into 10 subsamples with equal sample size and run the fast estimation algorithm (see the “Method” section) on each of these subsamples. Based on the estimation result of each subsample, we generate a set of forecast on the test sample and calculate the accuracy, the final accuracy is aggregated through averaging over all the subsample accuracies. A comparison is made on the forecast accuracy for our game-based method (GBMLE), the kernel-density forecast (see “Method” section) and the forecast by the aggregated adjacency matrix (Fig. 1a,b) that is calculated through counting the proportion of migrants starting from every given city to all the other destination cities9,10.

Since there are about 400 destination locations and every migrant only selects one as its destination, generating forecast in this case is a typical multi-class classification problem. Following the literature48,49, we adopt the set-valued forecast method to generate destination forecast from the computed migration probability and evaluating its accuracy. Given the migration probability for every destination location, we first sort all destinations in the descending way by the probability values. For a given positive integer k, we consider the top k forecast as the set of first k destinations in the sorted sequence. The top k forecast is thought to be accurate if and only if the true destination falls into the top k set. In Fig. 3, we plot variation trends of the forecast accuracy along with the parameter k for the migration probabilities calculated by three different methods.

From Fig. 3a,b, the proposed GBMLE method generates the best forecast accuracy for almost all k in both the two stages of migration, while the kernel forecast outperforms the aggregated forecast in general. This observation verifies our theoretical prediction that the GBMLE method outperforms the kernel method by incorporating the game structural information, while they both dominates the aggregate method because they both distinguish the migrant-level heterogeneity while the aggregated method does not.

In addition to the overall accuracy, it is also impressive that the GBMLE forecast accuracy can reach its maximum even when k is fairly small, while the average accuracy exceeds 95% at (k=1). In the other words, there is at most 5% chance by which the maximum-a-posteri forecaster would mistakenly pick one destination out of 400 alternative destinations. In contrast, the maximum-a-posteri forecast by the kernel method and aggregate method would have 50% chance of making mistakes. This difference is striking and it demonstrates the power of adding the game structural information in increasing the forecastibility.

By the green lines in Fig. 3a,b, we make a counter-fact experiment. The purpose is to test the forecastability of the 1st-(2nd-)stage migration trajectories by the estimated 2nd-(1st-)stage migration game model. In details, when generating the green line in Fig. 3a, the model parameters are the same as those used in generating the blue line in Fig. 3b which are estimated from the migration data at stage (2), rather than from the “correct” stage (1). The relation between the green line in Fig. 3b and the blue line in Fig.  3a is the same. Compare the green with the blue line in Fig. 3a(b), we find that although the forecast accuracy by GBMLE is a bit lower after replacing the model parameters, it is still significantly higher than the alternative methods (they are not affected by the parameter change). Moreover, the accuracy for the maximum-a-posterior forecast by GBMLE in the parameter-change case can still be around 91%, which has been good enough and is not much different from the case that “correct” parameters are used. Therefore, it can be expected that the underlying mechanisms driving the observed migration are not significantly distinct for the two stages, and migrants’ behavior is highly consistent over time. This finding is very striking, because according to the model set-up, this finding implies a kind of “where-they-stand-depends-on-where-they-sit” decision pattern: given that the migrant-level features are identical for two migrants A and B while their original cities, CA and CB, are different, if A migrated to CB during the first stage, then the decision pattern of A in choosing the second-stage destination would be analogous to the decision pattern of B in choosing the first-stage destination while differs from A itself during the first-stage decision. In the other words, the labor migration dynamics revealed by our data follows a Markovian decision process, i.e. when all others equal, only the current destination matters the migration in the future, the home town does not. This finding presents a positive evidence for the use of markovian models in describing human migration dynamics33,50. Finally, note that the observed “where-they-stand-depends-on-where-they-sit” decision pattern and the markovian property relies heavily on controlling the migrant-level heterogeneity. Without including the social-economic features of migrants, the feature-dependent migration probabilities reduce to the entries in the aggregated migration adjacency matrix which are significantly distinct for the two migration stages by Fig. 1, therefore, the markovian properties is not detectable unless the migrant-level heterogeneity is involved. This fact validates the usefulness of our model.

### Migration inefficiencies and polarization of migrants distribution

We ask whether the observed migration is efficient and how the degree of efficiency (measured by Eq. 4) varies in reaction to the migrant types and migration stages. To this end, we plot the overall inefficiency network [calculated by entry-wisely integrating equation (4) over all migrant-level features i.e. (DE_{{{mathscr {X}}}}:={frac{1}{|{{mathscr {X}}}|}sum _{xin {{mathscr {X}}}}DE_{x,ij}}) with (|cdot |) representing the cardinality, ({{mathscr {X}}}) the set of all migrants having the given feature] in Fig. 4a,b for the two migration stages from which it is visualized that the spatial distribution of the positive/negative signs of the inefficiency arrows and the arrow strengths (corresponding to the sign and absolute value of entries of the inefficiency matrix (4)) are almost identical for both of the two migration stages. The stability of the migration inefficiency is also verified by Fig. 4c–e where we plot the aggregated two-stage migration inefficiency against different education, income and age types. It is shown that for the major portion of migrants in our sample, which are the migrants with education level no more than undergraduate, monthly income no more than 42,000 and age older than 20 and younger than 50, the variation trend of the overall inefficiency against the increasing of education, age and income are parallel for the two stages, while the 2nd-stage overall efficiency are systematically higher than the 1st-stage by a constant which reflects as the aggregated inefficiency measure (Eq. 4) in Fig. 4c–e is lower in stage (2). In order to measure the significance of the difference, we conduct the left-tail Z test and the result in Fig. 4f–h shows that at the 5% confidential level, the inefficiency measure in stage (2) is significantly lower than that in stage (1) for the majority portion of migrants. In addition, we apply ANOVA to test the significance of the parallel difference trend, it shows that within the same range of education levels as above, we cannot reject that the 2nd-stage inefficiency curve in Fig. 4c differs from the 1st-stage inefficiency curve by a constant even at the 20% confidential level, the same holds for age and income as well which provide positive evidence for the parallel trend.

The increasing migration efficiency is quite intuitive and consistent with the job searching theory51 since the first-stage migration helps accumulate more information of the potential destinations that facilitates migrants to make rational decision. The relative invariance of migration efficiency against the variation of multiple migrant-level features is quite surprising. One possible explanation for this observation is the invariance of the migration game which is supported by Fig. 3, because the identical game set-up implies the same functional form of g which defines the ideal population ratio of each destination and also consists of a key component of the inefficiency measure (4). Hence, the invariance of g provides a stablizer for the two-stage migration inefficiency.

Finally, it is observed from Fig. 4c,d a decreasing trend of migration inefficiency along with the increasing of education and income at least for the majority population (education level less than the master’s degree and monthly salary less than 42,000). This observation implies that high-educated and well-paid migrants tend to make more rational decision in selecting destination which agrees with the literature52 that high education can increase the decision rationality and the fact that the high income are often the consequence of high education.

### Classification of destinations and education-chasing behind migration dynamics

From Fig. 4, it is clear that in both of the two migration stages, the overall efficiency is poor (reflected as the inefficiency measure is significantly positive for both stages). Then it is natural to ask what causes the inefficient migration. To answer that question, we notice that there are two sources leading to the positive inefficiency measure by definition equation (4): (1) a migrant is more likely to migrate to a destination than the average even if the given destination is already over-sized in his/her view; (2) a migrant is less likely to migrate to a destination than the average while the destination can better off if more migrants can come in. Similarly, the negativity of the inefficiency measure also arises from two sources that are exactly opposite to the two above. Given that, all destination cities can be partitioned into four classes: (I) the np cities that consist of the unsaturated cities (with (sum _{i}DE_{x,ij}^2<0)) with positive aggregated flow-ins ((sum _{i}DE_{x,ij}^1>0)); (II) the nn cities that consist of the unsaturated cities (with (sum _{i}DE_{x,ij}^2<0)) with negative aggregated flow-ins ((sum _{i}DE_{x,ij}^1<0)); (III) the pn cities that consist of the oversized cities (with (sum _{i}DE_{x,ij}^2>0)) with negative aggregated flow-ins ((sum _{i}DE_{x,ij}^1<0)); and (IV) the pp cities that consist of the oversized cities (with (sum _{i}DE_{x,ij}^2>0)) with positive aggregated flow-ins ((sum _{i}DE_{x,ij}^1>0)).

We ask what is the major feature of the four classes of city, how do they differ from each other and how is the difference related to the inefficient migration pattern? To this end, we consider the seven city-level features: population, GDP, foreign direct investment (FDI), total road length, city building area, the total number of teachers in all primary and middle schools and the total number of beds in all hospitals. The seven features are devoted to capture the social-economic development level of a city, among which the number of teachers and beds are proxies to the education and healthcare resources, which are two most important public goods and very influential to migration decision35,36,37,53. The city-level feature data is collected from Chinese Statistical Yearbook (2015) which are only available for the 178 cities in the 21 city clusters, so the aggregation is also taken on that basis. Fig.  5a–c present the mean of the seven features within each city class where the city classification and mean features are calculated for the overall inefficiency measure (integrating (DE^1_{x,ij}), (DE^1_{x,ij}) with respect to all i and x) and for both migration stages.

Since the inefficiency measure (Eq. 4) is defined on the level of migration flow between city pair, the four classes of cities can even be extended to the set of all migration flows, which becomes four classes of ordered city pairs. For each of the city-pair classes, we compute the OD ratio of the seven features which are ratios formed by dividing the value of the origin city at a certain feature dimension by that of the destination city. The OD ratios convey more information than the absolute value of city-level features because they encode the comparison between the origin and the destination cities. The mean OD ratio for the four city-pair classes and its variation along with education, age and income are plotted in Fig.  5d–f (in Supplementary Fig. S10, we plot the radial graph for the mean features and OD ratios for the per capita value of the six features other than population, which is the quotient by letting the feature value divide the population, which shows a qualitatively identical pattern to the Fig. 5).

The geometric shape of the feature area within each of the radial plots in Fig.  5 are quite identical for both migration stage (1) and (2), this observation supports, once again, the invariance of the migration game. For the mean features and OD ratios of the city classes, some remarkable properties can be concluded. First, compared to the unsaturated cities (the nn and np class in Fig. 5a,c), the oversized cities (the pn class in Fig. 5b) are significantly greater in both migration stages and in all of the seven city-level features. This finding is not trivial because except for the 22 city-cluster dummy variables, no city-level social-economic feature is included when fitting the migration game. In the other words, the classification by the migration data can have a perfect match to the real social-economic gap between the “large” and “small” cities even though the social-economic data of cities are missing from the classification process. This result establishes a natural link between the migration behavior and the city “size” which suggests a novel way to describe the “largeness” or “smallness” of a city.

Second, the np class differs from the nn class mainly in the stock of education resources measured by the number of teachers in primary and middle school and the np class of cities tend to own better education resources. This fact is robust for both migration stages, and can even be reinforced through comparing Fig.  5d,f where the relative advantage on the teacher’s number is much more significant for those np city pairs. The right-tail Z test is conducted to measure the significance of the finding for a variety of migrant types, the result shows that the teacher’s number is always significantly higher in the np class at the 0.05 confidential level for almost all situations (see Suplementary Fig. S3). In contrast, the other features, such as the healthcare resources measured by the bed number, do not differ much between the nn and np classes (see Supplementary Figs. S4S9). The extraordinary role played by education resources in attracting migrants results deeply from the fact that the education resources are immobile which makes migration the only way to chase better education54. Although the healthcare resources have the similarly immobile nature, it cannot impact the migration as much as the education resources because living temporally is fully feasible to take use of the healthcare services in a different city, there is no need to induce observable migration flow. In contrast, schools will not provide service unless migration really happens. In that sense, the education resources are more immobile than the healthcare resources, therefore offering better education services could form a long-lasting comparative advantage of a city in attracting migrants, this advantage will not leak out and be “freely” used by non-migrants.

Finally, the comparative disadvantage of nn class in education resources also provides an explanation to the overall inefficient migration pattern. In fact, around 4/5 of all ordered city pairs are contained in the nn class, which dominates all the other three classes and constitutes the main portion of the inefficient migration arrows in Fig. 4a,b As the nn class of city (pairs) are caused by the relative shortage of education resources in the destination cities, it is reasonable to claim that the imbalanced spatial distribution of education resources is a major driving force of the overall inefficient migration. This finding is consistent with the existing studies on the relationship between Chinese labor migration and the education system40, and provides positive evidence for the argument that the distortion on education induces the distortion of population migration.