### QSPR models for the adsorption of PE

Three QSPR models of log *K*_{d} were developed for the adsorption of PE in seawater, freshwater and pure water, respectively:

$$begin{aligned} {text{Seawater:}}quad log K_{{text{d}}} & = , left( {0.725 , pm , 0. , 058} right) , times , log D + , left( { – 36.236 , pm , 9.034} right) , times varepsilon_{alpha } \ & quad + , left( { – {23}.{169 } pm { 4}.{5}0{1}} right) , times varepsilon_{beta } + , left( {{17}.{856 } pm { 2}.{572}} right) \ end{aligned}$$

(1)

$${text{Freshwater:}}quad log K_{{text{d}}} = left( {0.667 , pm , 0.047} right) times log D + , left( {1.714 , pm , 0.302} right)$$

(2)

$${text{Pure}};{text{water:}}quad log K_{{text{d}}} = left( {0.449 , pm , 0.041} right) times log D + , left( {0.265 , pm , 0.115} right) , times M_{{text{w}}}^{prime } , + , left( {1.855 , pm , 0.302} right)$$

(3)

where log *D* is the n-octanol/water distribution coefficient at special pH value, *ε*_{α} is the covalent acidity, *ε*_{β} is the covalent basicity and *M*′_{w} is the relative molecular mass. As shown in Williams plot for model (3) (Fig. S1 of the Supplementary Information, S1), 17α-ethinyl estradiol obtained an absolute *SR* value (− 3.392) larger than 3 and it was diagnosed as an outlier. Structural analysis showed that 17α-ethinyl estradiol is significantly different from other compounds due to its acetylene group and steroidal ring (unsaturated benzene ring connects with saturated six-membered ring). Such discrepancy may be the main cause of predictive inaccuracy. After removing it, the following model was yielded:

$${text{Pure}};{text{water:}}quad log K_{{text{d}}} = left( {0.486 , pm , 0.035} right) times log D + , left( {2.420 , pm , 0.199} right)$$

(4)

The statistical parameters of the developed QSPR models are presented in Table 1. For the models (1), (2) and (4), *R*^{2} = 0.868, 0.903 and 0.811, *Q*^{2} = 0.868, 0.903 and 0.811, and *RMSE* = 0.826, 0.686 and 0.612, respectively. The statistical results indicate that the models have high goodness-of-fit. As shown in Table S1, all the *VIF* values (1.000–1.204) are less than 10, indicating there is no multicollinearity for the three models. The fitting plots (Fig. 1) state a good consistence between the experimental and predicted log *K*_{d} values. As shown in Fig. 2, the distributions of predictive errors show no dependence on experimental log *K*_{d} values. Thus, the developed models have no systematic error, which is also proved by *BIAS* = 0.000–0.001 (Table 1).

For the simulated external validation, the redeveloped QSPR models (S1–S3) based on 70% experimental data and the same descriptors in model (1), (2) and (4) show similar fitting performance (including *R*^{2}, *Q*^{2}, *RMSE* and *MAE*) and regression coefficients with the models developed by the whole dataset (Table 1). Thus, the models are statistically stable. As the training subsets are randomly assigned, there is no casual correlation. The predictive performance of each rebuilt model to the test set (30% subset, shown by the superscript of b in Table 2) are listed in Table 1. The values of *Q*^{2}, *RMSE* and *MAE* indicate excellent predictive quality of the developed QSPR models. The results of leave-one-out cross validation (*Q*^{2}_{CV} = 0.882–0.940) also show a good robustness and internal predictivity.

Williams plots were employed to test the application domain of the QSPR models (1), (2) and (4). The calculated alert value *h*^{*} are 0.324, 0.250 and 0. 128, respectively. As shown in Fig. 3, there are three (oxytetracycline, sulfadiazine and δ-hexachlorocyclohexane), and one (2,2′,3,3′,4,4′,5-heptachlorobiphenyl) compounds located at the right side of *h*^{*} for models (1) and (4), respectively. As their absolute *SR* values are < 3, these chemicals are not diagnosed to be outliers. In summary, these results indicate the developed QSPR models have excellent generalization capabilities in their descriptor matrix. Given the molecular structures for developing models, QSPR model (1) can be used to predict the log *K*_{d} values of organics including polychlorinated biphenyls, antibiotics, polycyclic aromatic hydrocarbons, chlorobenzenes, perfluorinated compounds and hexachlorocyclohexanes between PE and sea water; model (2) can be employed for predicting the log *K*_{d} values of polychlorinated biphenyls and antibiotics between PE and fresh water; model (4) can be performed to predict the adsorption of PE in pure water towards organic pollutants such as polychlorinated biphenyls, antibiotics, polycyclic aromatic hydrocarbons, chlorobenzenes, aromatic hydrocarbons and aliphatic hydrocarbons.

The n-octanol/water distribution coefficient at special pH value (log *D*) was selected for all the three log *K*_{d} predictive models for PE in seawater, freshwater and pure water. The experimental log *K*_{d} values significantly correlate with log *D*, which yields positive correlation coefficients (0.725, 0.667 and 0.486) in models (1), (2) and (4). Thus, the organic pollutants with high hydrophobicity will prefer to be adsorbed onto the PE. For example, hydrophobic polychlorinated biphenyls (PCBs) with large log *D* values exhibit higher log *K*_{d} values than ionizable organic pollutants (e.g., antibiotics). This is because the hydrophobicity of PE itself makes hydrophobic interaction as the main mechanism in the adsorption of PE towards organic pollutants. The same adsorption mechanism was also confirmed by Hüffer et al. who established prediction model based on the log *K*_{ow} values of seven organic compounds^{30}.

For the adsorption of PE in seawater, *ε*_{α} and *ε*_{β}, which respectively represents covalent acidity and covalent basicity, were also selected. The quantum chemical descriptor of *ε*_{α} shows a negative contribution to the log *K*_{d} values, suggesting that organic pollutant with large *ε*_{α} value prefers to dissolve in water, leading to a decrease in log *K*_{d}. That means the surface of PE has a weaker H-accepting ability to organic pollutants than water at the adsorption interface^{31}. Similarly, the log *K*_{d} values increase with decreasing *ε*_{β}, indicating that the H-donating ability of the PE surface is also weaker than water. It follows that hydrogen bond interaction is also an important mechanism for the interactions between PE and organic pollutants in sea water.

Compared with fresh water and pure water, the high salinity of seawater can enhance the dipole–dipole and dipole–induced dipole interactions in the system, which can make hydrogen bonds form easily. As a result, *ε*_{α} and *ε*_{β} play more important role in the log *K*_{d} value of PE for seawater. In brief, the distribution behavior of the studied organics between PE and water is mainly affected by the hydrophobic interaction. For the adsorption in seawater, hydrogen bond interaction is another important driving force.

### QSPR model for the adsorption of PP

A QSPR model of log *K*_{d} was yielded for the adsorption of PP in seawater:

$${text{Seawater:}}quad log K_{{text{d}}} = left( {0.751 pm 0. , 035} right) times log D + left( { – 19.323 pm 2.072} right) times varepsilon_{beta } + left( {6.735 pm 0.663} right)$$

(5)

Values of *R*^{2}, *Q*^{2}, and *RMSE* are 0.939, 0.939 and 0.381, respectively. Thus, the model (5) show great goodness of fitting and can explain 94% variability of the whole dataset. The nonlinearity of model (5) has been proved by the *VIF* values (1.034 for both descriptors, Table S1). As shown in Fig. S2, the predicted log *K*_{d} values show good consistence with their experimental values. The Fig. S3 and *BIAS* value (− 0.003) proved that there is no dependence of predictive errors on experimental log *K*_{d} values.

For the simulated external validation, the regression coefficients (*R*^{2} = 0.945, *RMSE* = 0.396 and *MAE* = 0.307) and statistical parameters of the training subset are similar to that of the whole dataset (Table 1 and model S4). Thus, model (5) is statistically stable and there is no casual correlation. As shown in Table 1, the high prediction quality of the developed QSPR model can be proved by the predictive performance of the new model (*Q*^{2} = 0.874, *RMSE* = 0.369 and *MAE* = 0.228) to the test subset. Furthermore, model (5 has good robustness and internal predictive ability (*Q*^{2}_{CV} = 0.957). The Williams plot for the applicability domain of model (5) (Fig. S4) shows that there are two compounds (sulfadiazine and *γ*-hexachlorocyclohexane) located at the right side of *h*^{*} (0.257). While, these two compounds yield absolute *SR* values < 3, indicating they are not outliers. Thus, model (5) can be used to predict the log *K*_{d} values of PE in seawater towards the organics including polychlorinated biphenyls, chlorobenzenes, hexachlorocyclohexanes, polycyclic aromatic hydrocarbons and antibiotics.

For the adsorption of PP in sea water, log *D* and *ε*_{β} were also selected in model (5). Thus, hydrophobic interaction and hydrogen bond interaction also play determining roles in the adsorption. However, unlike the log *K*_{d} predictive model of PE in seawater, the *ε*_{α} representing the covalent acidity is not selected in model (5). Such dissimilarity may come from the addition of methyl groups in the PP structure that reduces the difference of H-accepting ability between the microplastics and water, consequently resulting in a negligible contribution of *ε*_{α} in the adsorption of PP.

### QSPR model for the adsorption of PS

For the adsorption of PS in seawater, the experimental log *K*_{d} values of 28 organic pollutants (of which 14 are ionizable compounds) were used to established predictive model:

$${text{Seawater:}}quad log K_{{text{d}}} = left( {0.357 pm 0. , 062} right) times log D + left( {3.766 pm 0.384} right) times pi + left( { – 2.080 pm 0.540} right)$$

(6)

As shown in Tables 1 and S1, the obtained statistical parameters (*R*^{2} = *Q*^{2} = 0.837) prove a good regression performance and the calculated *VIF* values (1.000 for both descriptors) prove no multicollinearity of model (6). Meanwhile, the favorable consistence between the experimental and predicted log *K*_{d} values was observed in Fig. S5. The pattern of predictive errors shown in Fig. S6 reveals no systematic error for model (6), which is also verified by *BIAS* = 0.000 (Table 1).

Based on the training subset (70%), similar regression coefficients and statistical parameters of the new model (S5) were obtained (Table 1). The comparable statistics were also received for the test set. Moreover, *Q*^{2}_{CV} value (0.898) of the leave-one-out cross validation was obtained, higher than the acceptable criteria. Thus, model (6) has satisfactory robustness and internal predictive ability. As shown in the Fig. S7 of Williams plot, three compounds (fluoranthene, chrysene and pentacosafluorotridecanoic acid) with ׀SR׀ < 3 locate at the right side of *h*^{*} (0.321), indicating that they are not outliers. In conclusion, model (6) can be employed for predicting the adsorption carrying capacity (log *K*_{d}) of PS for organic pollutants (especially for ionizable organic pollutants) within the application domain in seawater. In previous study^{20}, the influence of dissociation on log *K*_{d} for ionizable organic pollutants was not considered in the construction of predictive models. In fact, the physicochemical properties (e.g., hydrophobicity) of various dissociation species are quite different, which may significantly affect the partition of ionizable organic pollutants between PS and seawater. Therefore, the predictive models established without considering the effect of pH on the distribution of dissociation species is only applicable to predict log *K*_{d} values under the experimental water pH. However, the QSPR model (6) constructed in this study can expand the predictive application to various pH values. Limited by the number of ionizable compounds and pH range used for model construction, the developed models are more suitable for the pH range of natural waters (6–9).

The presence of log *D* in model (6) proves that hydrophobic interaction also can enhance the adsorption of organics on PS in seawater. In addition to log *D*, *π* was also selected. The experimental log *K*_{d} values positively correlate with *π* (3.766) in the QSPR model, indicating that chemicals with larger *π* value preferred to be adsorbed onto PS in seawater. As shown in Tables 2 and S2, the organic compound, which contains strong π–electron conjugation in the structure, generally has a large *π* value. Thus, it can be inferred that the π − π interaction also contributes to the adsorption for PS. The phenyl groups in the PS structure produce higher π–π interactions with organic chemicals than PE and PP, thus yielding higher log *K*_{d} values (Table 2). For example, the log *K*_{d} value of phenanthrene onto PS (5.50) is much higher than that on PE (4.440) and PP (4.000) in sea water. In brief, hydrophobic interaction and π–π interaction play important roles in the adsorption of PS in sea water.