IntroductionHurricanes are becoming increasingly hazardous events, resulting in more severe impacts on communities (Walsh et al. 2016). Political and economic trends have also led to underregulated and/or unregulated housing development in at-risk regions, potentially exacerbating hurricane-related damage (Dahl et al. 2018). The effect of unchecked development on disaster impacts can be quantified by assessing the increasing number of people or assets exposed to hazards. However, the traditional approach of quantifying disasters in terms of the physical world has gradually been expanded to address the inherent social nature of disasters, leading to more complete assessments of risk and impacts. For example, Hurricane Katrina exposed the particular importance of race, social class, gender, and age, among a host of other indelible factors, to both the severity of household impacts as well as recovery rates (Hartman and Squires 2006). However, the influence of societal inequalities on the intensity of disaster impacts is difficult to evaluate because of the inherent complexity and qualitative nature of the challenges introduced. Similar to Katrina, Hurricane María also revealed preexisting inequalities that may have intensified storm impacts for certain populations; however, the extent of this influence has been difficult to assess because of the challenge in quantifying such topics.As the third costliest storm in US history (Pasch et al. 2018), Hurricane María offers an opportunity to gain an understanding of the underlying social factors and structural inequalities that contributed to its damage. As the first comprehensive, quantitative analysis of the social and physical drivers of María’s damage, this research complements surrounding qualitative studies and discussions. This case study also exemplifies a unique application of machine learning algorithms to illustrate the importance of holistic data analyses that incorporate human variables in traditionally physical analyses. A holistic conceptual framework (Fig. 1) guided this quantitative assessment and data collected in this study represent each facet of the conceptual framework. Interpretable machine learning algorithms were used to predict damage, analyze relationships amongst variables, and unveil important predictors of damage.BackgroundGeneral ContextDisasters affect different demographic groups with immense disparity. While physical hazards often remain the leading contributors to damage intensity, studies have demonstrated that social factors also significantly contribute. Namely, vulnerability, or the collective factors influencing a community’s susceptibility to damage, can intensify disaster impacts and inhibit postdisaster recovery (Fothergill and Peek 2004; Chakraborty et al. 2005; Flanagan et al. 2011; Chakraborty et al. 2014; Rumbach et al. 2020). Vulnerability can be quantified using either inductive or deductive statistical methods to create indices focused on socioeconomic, structural (i.e., built environment), and/or comprehensive (e.g., Cutter’s Social Vulnerability Index) measures (Cutter et al. 2003; Flanagan et al. 2011; Holand et al. 2011). Some studies elect to analyze vulnerability via individual raw indicators as opposed to indices; however, other studies have concluded that vulnerability is a multidimensional phenomenon (Morrow 1999) and individual indicators may misrepresent this broader concept (Cutter et al. 2003; Flanagan et al. 2011). An example of a widely accepted, comprehensive social vulnerability index includes the pioneering work completed by Cutter et al. (2003) which, through factor analysis, incorporates numerous variables affecting vulnerability, ranging from income to percentage of mobile home housing units. In an effort to distinguish population and infrastructure effects, Holand et al. (2011) adapted the work by Cutter et al. (2003) to fit their Norwegian case study. They separately created a socioeconomic vulnerability index that focused on living conditions and population characteristics as well as a structural vulnerability index that represented housing characteristics and structural quality.Evident across many scales, vulnerability and disaster resilience have been shown to be correlated from a nationwide (Ward and Shively 2017) to household level (Highfield et al. 2014). A review by Fothergill and Peek (2004) concluded that socioeconomic status consistently emerges as a contributing factor in structural damage for a variety of disasters. Furthermore, they found that vulnerable populations often reside in homes with structural qualities that cause them to be more susceptible to wind and flood forcings resulting in higher levels of damage incurred during a hurricane. Van Zandt and Rohe (2011) hypothesized that this may be because the houses that are affordable to lower income groups are often older and poorly maintained due to the lack of spare financial resources to fund needed repairs and updates, thus increasing their risk of damage. A study of Hurricane Ike by Highfield et al. (2014) confirmed that, while controlling for other factors, cheaper and/or older homes received higher levels of damage than newer, more expensive homes. Another study of Hurricane Ike and Hurricane Andrew by Peacock et al. (2014) concluded that social factors, such as income, were important determinants of both residential building damage and recovery since houses in wealthier neighborhoods retained a higher relative percentage of their home value postdisaster and recuperated their value more quickly.As a result of the increased availability of disaster data and recognized importance of community vulnerability, natural hazards studies have begun incorporating machine learning algorithms into their analyses. In addition to being highly interpretable, ensemble decision tree strategies, including random forest (RF) and stochastic gradient boosting tree (SGBT) algorithms, can address all facets of disaster risk or impact conceptual frameworks because they are highly flexible, nonparametric, and can accommodate nonlinear, multidimensional data sets from a variety of sources and formats—allowing both continuous numeric as well as categorical data within the same model (Breiman 1996, 2001; Friedman 2002).Owing to the complex nature of the phenomena, many flood studies in particular have embraced machine learning (Wang et al. 2015; Chapi et al. 2017; Shafizadeh-Moghadam et al. 2018; Sadler et al. 2018), as well as landslide (Trigila et al. 2015; Hong et al. 2016), and earthquake (Tesfamariam and Liu 2010) studies. However, these studies did not account for vulnerability. Several studies have recently emerged that apply machine learning algorithms to explore the societal role in disasters. An RF assessment of wildfire damage in Portugal by Oliveira et al. (2017) concluded that purchasing power and housing quality were significantly correlated with the extent of wildfire damage and that certain demographic groups, such as the elderly and households with lower education levels, were relatively more vulnerable to wildfire impacts. Merz et al. (2013) studied the correlation between flood damage and voluntary precautionary measures across German households, finding that households with resources to implement mitigation actions sustained lower structural losses. They also concluded that ensemble decision trees more accurately predicted damage than traditional impact models. By leveraging household-level damage assessments in Bangladesh, another flood study comparing linear regression, RF, and artificial neural networks concluded that larger households and higher education levels were associated with lower flood damage (Ganguly et al. 2019). This study uses ensemble decision tree algorithms, specifically RF and SGBT, to quantitatively explore the relative role that societal factors played in the structural damage caused by Hurricane María.Case StudyHurricane María was a Category 5 storm that made landfall in Puerto Rico on September 20, 2017, and devastated the island for months to follow. María is the third costliest storm in US history, after Hurricanes Katrina (2005) and Harvey (2017), with total approximate damages in the US Virgin Islands and Puerto Rico of $90 billion (Pasch et al. 2018). Two weeks prior to María’s landfall, Hurricane Irma skirted the island 50 km north, weakening the island’s infrastructure.In addition to the storm’s intensity, the severity of María’s impact could have been influenced by the preexisting social disparities that have been exacerbated in the previous two decades by the economic downturn in Puerto Rico (Santiago-Bartolomei 2018). With a poverty rate of 44%, Puerto Rico has a much higher low income population than the national average (poverty rate of 13%) as well as a higher percentage of the population is aged 65 and over (18%) compared to the national average (15%) (US Census Bureau 2017). The influence of these disparities is reflected in mortality statistics; Santos-Burgoa et al. (2018) determined that the 2,975 fatalities caused by María were concentrated in areas of low socioeconomic status as well as areas with the highest ratios of men over age 65.While María’s effect on infrastructure in Puerto Rico was widespread, one of the most critical sectors hit was housing, which has suffered from a shortage of resilient low-income housing (Santiago-Bartolomei 2018). Affluent residents of Puerto Rico often reside in concrete structures built to code by licensed contractors; however, many lower income populations only have access to substandard houses that have not been updated or are informally self-built, resulting in an uneven distribution of structural resiliency across demographic groups (RSF 2018). Informal housing refers to structures that are self-built without proper titles or permitting and do not comply with zoning and building regulations. A 2018 report by the Puerto Rico Home Builders Association estimated that 45% of structures on the island are informal (Asociación de Constructores de Puerto Rico 2018). However, these areas of vulnerable housing were difficult to prioritize in postdisaster response and recovery efforts because these structures lacked legal documentation and are not included in government housing databases (FEMA 2018a). Given the widespread prevalence of these structures, existing government structural databases are incomplete and could not be solely utilized to accurately model traditional structure-damage functions across the island.Damage patterns produced by wind events specifically are indicators of structural and socioeconomic vulnerability (Eaton 1980). Since Hurricane María passed diagonally across the center of Puerto Rico, all structures were exposed to some degree of wind forcing, revealing areas of poor infrastructure investment and high structural vulnerability. Ma and Smith (2020) analyzed Individual Assistance data from the Federal Emergency Management Agency (FEMA) and determined that María’s wind was the cause of 99% of the destroyed homes in Puerto Rico. They also concluded that renters and lower income populations sustained higher levels of damage than homeowners or higher income households.Although Cutter et al. (2003) established that lack of wealth and housing quality are primary contributors to hazard vulnerability, societal factors have not yet been extensively used as a proxy to analyze correlations between areas with high concentrations of community vulnerability with damage due to hurricanes. This is especially vital for the case of Hurricane María, given the high poverty rate and housing challenges experienced by Puerto Rico. While it has been the focus of many qualitative discussions, the role of vulnerability as it pertains to Hurricane María’s impact on Puerto Rico remains to be quantitatively constrained in a robust manner. In this study, resultant damage patterns are hypothesized to be a function of both hazardous forcings and preexisting vulnerabilities.MethodsDataThe widely recognized disaster risk assessment model defines risk as a function of hazards, exposure, and vulnerability. This framework was adapted for a postevent application by proposing that impact is a function of hazards, exposure, and vulnerability (Fig. 1). Best available data represented the components of this conceptual framework, including impact (number of buildings damaged), exposure (total number of buildings), natural hazards (wind, flooding, landslides), and vulnerabilities (socioeconomic and/or structural). Table 1 provides a summary of all variables produced by data gathering and processing, indicating their relation to the conceptual framework.Table 1. Summary of all data included in this analysis, including all predictive (P) and target (T) variablesTable 1. Summary of all data included in this analysis, including all predictive (P) and target (T) variablesCategoryMeasureTypeSourceAbbreviationWindDistance from hurricane center (deg)PNHC (2017)HurTrackPeak gust (m/s)PARA (2017)PeakGustMax sustained winds (m/s)PARA (2017)MaxSusWindsFloodProportion of flooded areaPFEMA (2017a)PropFAAverage depth of flooding (m)PFEMA (2017a)AveDepthMax depth of flooding (m)PFEMA (2017a)MaxDepthProportion of SFHAPFEMA (2018c)PropSFHALandslideAverage landslide density codePUSGS (2019)AveLSVulnerabilityProportion of special communitiesPIFA (2008)PropSCSocial vulnerabilityPCDC (2017)CDCVulnStructural vulnerabilityPEroglu et al. (2020)StrVISocioeconomic vulnerabilityPEroglu et al. (2020)SeVIDamage/Baseline damage indexTFEMA (2018b) and OSM (2019)DI1ExposureDI1, excluding highest outlierTFEMA (2018b) and OSM (2019)DI2DI1, excluding 0 damage tractsTFEMA (2018b) and OSM (2019)DI3An aerial damage assessment database from FEMA documented María’s impact on structures across the island (FEMA 2018b). A total of 53,664 structures were visually designated as “Affected” (49,972) or “Destroyed” (3,692); however, the data did not capture damage to the sides of structures and residential versus nonresidential structures are often indistinguishable because nadir imagery was used to generate the data set. A total of 1,500,308 building footprint polygons, created by OpenStreetMap (OSM), delineated all structures exposed across Puerto Rico (OpenStreetMap 2019).Hurricane María’s hazardous forces included wind, flooding, and landslides. For wind hazards, Applied Research Associates provided modeled measures of peak gusts (m/s) and maximum sustained winds (m/s) at 10 m elevation over flat terrain (Applied Research Associates 2017; Vickery et al. 2000). The National Hurricane Center (NHC) best track data for María charted the center path of the storm (NHC 2017). Using gauge and topographic data, FEMA created flood event depth grids. These data represented the inundation extent and intensity produced by Hurricane María (FEMA 2017a, b). Puerto Rico’s National Flood Insurance Program (NFIP) Special Flood Hazards Area (SFHA) database (FEMA 2018c) provided an additional measure of general flood risk, depicted by the 1-percent-annual-chance flood polygons, which is widely used for floodplain management and establishes the program’s flood insurance rates (FEMA 2019). A United States Geological Survey (USGS) data set documented the spatial density of landslides triggered by Hurricane María using posthurricane imagery and a grid-based landslide intensity classification system, visually validated with aerial helicopter surveys (USGS 2019; Bessette-Kirton et al. 2019).Since vulnerability is an inherently multidimensional concept, the present study incorporated vulnerability indices. Four data sets represented vulnerability in this study: two represented comprehensive measures (with both socioeconomic and structural factors), one focused on socioeconomic factors, and one focused on structural factors. The Center for Disease Control (CDC) created a comprehensive social vulnerability index (SVI) for Puerto Rico in 2017 (CDC 2017) that incorporates multiple themes including socioeconomic status, language, housing, and transportation, using deductive methods from Flanagan et al. (2011). Puerto Rico’s Infrastructure Financing Authority (IFA) created a shapefile of special communities that delineates the spatial extent of 713 identified disadvantaged communities throughout the island (Oficina del Coordinador General para el Financiamiento Social y la Autogestión 2008). Eroglu et al. (2020) developed a socioeconomic vulnerability index and a structural vulnerability index at the census tract scale for Puerto Rico by adapting the inductive statistical methods used by Cutter et al. (2003). The socioeconomic vulnerability index includes variables that represent the characteristics of the census tract’s population (e.g., average income, age, ethnicity, etc.) while the structural vulnerability index focuses on the general resilience of physical infrastructure in the tract, (e.g., average age of construction, size of homes, etc.).Census tracts are widely used for public policy and urban planning that specifically promote socioeconomic well-being and equality (Krieger 2006) and, therefore, provide an appropriate level of analysis for stakeholders interested in this study. The hazards and vulnerability data were spatially processed to the census tract scale following methods detailed in Szczyrba et al. (2020). The baseline damage index (DI1) created for this study measured the total number of “Affected” (NAffected) or “Destroyed” (NDestroyed) structures (impact) normalized by total number of structures (NTotal) in each census tract (exposure): (1) DI1=NAffected+NDestroyedNTotalfollowing similar methods as Burton (2010) and Ganguly et al. (2019). Three additional variations from DI1 were calculated to reduce noise: the second index (DI2) removed the highest outlying data point—value of 0.36, seen in Fig. 3(b), the third index (DI3) excluded 117 census tracts that contained no damage.After all predictive variables were processed and aggregated, a Spearman correlation analysis identified and eliminated colinear variables with very strong correlations, that is, variables containing redundant statistical relationships with correlations greater than 0.9 (Schober et al. 2018). This feature selection effort promoted the interpretability of model results (Karagiannopoulos et al. 2007).Ensemble Decision Tree AlgorithmsAfter gathering the data set, RF and SGBT regression algorithms, sourced from Python’s scikit-learn machine learning package (Pedregosa et al. 2011), were trained to assess the relative influences of hazards and vulnerabilities on structural damage due to Hurricane María in Puerto Rico. These algorithms were selected for their high interpretability compared to other machine learning algorithms and their ability to quantify relative importances of predictive features.RF is a common ensemble decision tree algorithm that constructs a group of independent classification or regression trees and leverages the majority vote of trees to determine the resultant prediction or the average prediction per data sample, respectively (Breiman 1996, 2001). On the other hand, SGBT iteratively generates a series of classification or regression trees, each improving upon the performance of the previous (Friedman 2002).Before applying the algorithms on the data, the scikit-learn “train, test, split” function created a randomized division with the created a training set containing 80% of the data and an evaluation set containing the remaining 20% of data, following commonly accepted procedures, (e.g., Suthaharan 2016). Data stratification upon division ensured that the distribution of the two data sets were similar. The resultant training set included 705 census tracts while the evaluation set contained 177. Automated optimization techniques, including randomized search cross validation and grid search cross validation, tuned the model to the ideal hyperparameters. Randomized search cross validation subdivided the training data set into five folds and, using internal cross validation, identified the range of ideal hyperparameter values. Then, grid search cross validation exhaustively tuned the model within the identified ranges. The ratio of variation explained by the model to total variation (R2), mean absolute error (MAE), and mean error (ME) were calculated on the evaluation set to assess model performance.Three measures were used to assess the importance of each predictive feature. The default importance calculation provided by scikit-learn—mean decrease in mean square error (MSE)—represented one measure of the importance of each predictive feature (Pedregosa et al. 2011). However, this measure can potentially be biased toward favoring features with high-cardinality (Strobl et al. 2007); therefore the importance of each predictive feature was also calculated by permuting, or randomly shuffling, each feature’s values while measuring changes in R2 before and after permutation (Breiman 2001). If shuffling one feature resulted in a sharp increase in model variance (i.e., sharp decrease in model performance), the feature is considered important. However, if the permuted feature correlated with another, the relationship would be retained, thus reducing the perceived measure of importance. To mitigate this effect, related groups of features were also permuted in tandem to determine how correlated variable categories affected the model (Koch et al. 2019). Variable groupings are indicated in Table 1.While feature importance measures indicate which features are most-valuable to model performance, they provide little information in terms of how or why features are important. Learned partial dependencies demonstrate the marginal effect each predictive feature exhibits on the damage index (Friedman 2001), represented by the following equation from Liaw and Wiener (2012): (2) where x is the predictive feature of interest among the other predictive features {X1C,X2C⋯XNC} used in the machine learning model f˜ of n samples. The function explains, for a given value x, the marginal effect it has on the prediction by creating an average prediction for each value of x over the distribution of XiC. The partial dependence for each predictive feature along with the feature distribution were plotted to determine the marginal relationship that each predictive feature exhibited with damage. In summary, the workflow involved selecting and processing appropriate data, training the algorithms with a subset of the database, tuning the model hyperparameters, evaluating the accuracy of the model with an evaluation data set, and, lastly, finalizing and applying the model to analyze variable importance (Fig. 2).ResultsDamage IndexDamage was distributed and areas of highest damage appear to loosely follow the center path of Hurricane María [Fig. 3(a)]. DI1 ranged from 0 to 0.36 and is skewed toward lower values, with a mean of 0.032 [Fig. 3(b)].Table 2 displays all Spearman correlation values (variable name abbreviations can be found in Table 1). Correlations with DI1 were strongest with StrVI (0.23), HurTrack (−0.22), and PropSC (0.21). DI3 exhibited the strongest correlations with all predictive features. The correlation analysis also found that two pairs of features, MaxDepth and AveDepth as well as MaxSusWinds and PeakGust, had Spearman correlation values higher than 0.9. Therefore, one feature from each pair was selected, resulting in a total of 10 predictive features incorporated into the machine learning analysis.Table 2. Spearman correlation matrix of all predictive and target variables used in the analysisTable 2. Spearman correlation matrix of all predictive and target variables used in the analysisVariableHurTrackPeakGustMaxSusWindsPropFAAveDepthMaxDepthPropSFHAAveLSPropSCCDCVulnStrVISeVIDI1DI2DI3HurTrack1——————————————PeakGust−0.181—————————————MaxSusWinds−0.170.991————————————PropFA−0.11−0.13−0.111———————————AveDepth−0.17−0.19−0.180.821——————————MaxDepth−0.18−0.20−0.190.840.981—————————PropSFHA0.25−0.04−0.020.410.280.281————————AveLS−0.22−0.24−0.26−−0.261———————PropSC0.04−0.09−——————CDCVuln0.01−0.24−—————StrVI−0.23−0.39−0.400.140.240.25−————SeVI−−0.060.02−0.21−0.49−0.131———DI1−0.22−−0.171——DI2−0.22−−0.171.001—DI3−0.24−0.08−− PerformanceModel performance results after tuning and optimization are summarized in Table 3. On average, the RF algorithm obtained an R2 of 89% on the training data and 29% on the evaluation data, with an average ME of −0.0022 and MAE 0.019 on the evaluation data across all damage indices. The SGBT algorithm, on average, obtained an R2 of 76% on the training data and 35% on the evaluation data, with an average ME of 0.0021 and MAE 0.018 of on the evaluation data across all indices. The best performing model applied the SGBT algorithm to target DI2.Table 3. Performance comparison of all models run in this study using consistent training and evaluation data setsTable 3. Performance comparison of all models run in this study using consistent training and evaluation data setsAlgorithmMeasureDI1DI2DI3AverageRFR2 evaluation0.290.320.270.29R2 training0.890.890.890.89ME−0.0029−0.0023−0.0015−0.0022MAE0.0190.0190.0200.019SGBTR2 evaluation0.330.370.360.35R2 training0.710.850.700.76ME0.00260.00180.00200.0021MAE0.0180.0180.0180.018Both algorithms performed best when trained with DI2, performance plots of these model predictions against the true data values are included in Fig. 4. It can be seen from this plot that the models appear to locally overpredict at low DI2 values and underpredict at high DI2 values on both training and evaluation data. Predictive maps generated by the top performing model (SGBT-DI2) reveal the spatial patterns in model performance, seen in Figs. 5(a–d). The model’s overall spatial distribution of damage is representative [Figs. 5(a and b)], areas with high error follow diagonally across the island [Fig. 5(c)], and errors center around 0 [Fig. 5(d)].Role of VulnerabilityResults from the feature importance analysis are summarized in Figs. 6(a–i) and 7(a–i). All models indicated that a vulnerability measure contributed the most predictive information. In the best-performing model (SGBT-DI2), StrVI was the leading predictive feature [Figs. 7(d–f)]. Randomly permuting the individual predictive variables revealed an interesting nuance. The CDCVuln variable became less important compared to the default measure of feature importance in all models while PropSC became more important in five of the six models. The four least informative features were often different orders of the three flood measures and AveLS. PropFA was the least predictive feature in all models.When permuting predictive features categorically and measuring the subsequent drop in model variance, all six models indicated that vulnerability was the leading predictive category, followed by wind, flood, and landslide [Figs. 6(c, f, i) and 7(c, f, i)]. Permuting the vulnerability variables in tandem resulted in approximately an 80% drop in model R2, with wind reducing R2 by approximately 50%, flood 25%, and landslide 10%.The learned marginal impact of each predictive feature and the damage indices demonstrate a variety of relationships [Figs. 8(a–j)]. PropSFHA [Fig. 8(a)], PropFA [Fig. 8(c)], and AveLS [Fig. 8(d)] exhibit relatively flat, horizontal relationships, indicating no relational pattern between variables and damage. For the majority of values, AveDepth [Fig. 8(b)] appears to also be horizontal, but sharply increases at the variable’s highest values. However, this could be an over-interpolation due to sparse data availability at these high values. The predicted damage index decreases as HurTrack increases, although this relationship is more complex at very short distances away from the center of the storm [Fig. 8(e)]. Expected damage increases as PeakGust [Fig. 8(f)], PropSC [Fig. 8(g)], StrVI [Fig. 8(h)], and CDCVuln [Fig. 8(j)] increase. Damage appears to increase exponentially as StrVI increases [Fig. 8(h)]. Interestingly, SeVI exhibits a negative relationship with damage [Fig. 8(i)], also seen in the Spearman correlation values (Table 2).DiscussionVulnerability features correlated with Hurricane María’s damage patterns and provided leading information to the machine learning models above all other wind, flood, and landslide variables. By calculating importance with multiple different methods and algorithms, these results do not rely upon misinterpretations due to inherent biases of a single method. Strobl et al. (2007) pointed out that the mean decrease in MSE method inflates the importance of variables with high cardinality. Upon permutation, the CDCVuln feature drops in importance and the PropSC feature increases in importance [Figs. 6(a–i) and 7(a–i)], possibly due to their distributions [Figs. 9(a and b)]. CDCVuln exhibits higher cardinality than PropSC, potentially resulting in an exaggerated importance measure of CDCVuln when using the default method of mean decrease in MSE. Therefore, the findings of this study support previous assertions that the mean decrease in MSE measure may be biased and that using multiple strategies to calculate feature importance is essential.Results from the partial dependency analysis are open to some interpretation. One possible cause for the sharp drop at the end of the CDCVuln curve seen in Fig. 8(j) could be due to areas of high vulnerability that did not receive damage in geographically sheltered areas. Therefore, they do not conform to the general pattern of increased damage with increased vulnerability because of exposure to hazards of lower intensity. Similarly, the sharp drop at the high values of PeakGust [Fig. 8(f)] could be due to areas that were exposed to intense hazards but were structurally very resilient.The partial dependency analysis also revealed that SeVI exhibited a negative relationship with DI1 [Fig. 8(i)] and, furthermore, the Spearman analysis (Table 2) showed negative correlations with all damage indices (−0.17, −0.17, and −0.24 for DI1, DI2, and DI3, respectively), PropSC (−0.21), StrVI (−0.13), and a strong negative correlation with CDCVuln (−0.49). Puerto Rico specifically may not be suitable for socioeconomic vulnerability quantification with the traditional methods used by Eroglu et al. (2020) due to the high rates of emigration and prevalence of informal housing (US Census Bureau 2017; Asociación de Constructores de Puerto Rico 2018). Therefore, census measurements and traditional methods of vulnerability calculation may not capture accurate information on the most socioeconomically vulnerable areas and care needs to be taken before applying and interpreting these measures against damage data (Holand 2015; Bakkensen et al. 2017). In order to improve the existing measures of Puerto Rican vulnerability, future work is needed developing a place-based methodology to create representative vulnerability indices (Ahmed and Kelman 2018).The reported model performance metrics indicate that SGBT outperforms RF; however, both algorithms are struggling to generalize the data (Table 3). This may be due to the imbalanced distribution in target variable values and the small number of data samples. Generalizability is challenging on disproportionately distributed data with many outliers since machine learning algorithms optimize to reflect average conditions (He and Garcia 2009). This may be why DI2 performed best. Previous studies with similarly distributed targets have indicated that this is a common challenge (Sadler et al. 2018).The high amount of variance in this data set may be influenced by other possible factors, such as the large scale of the analysis, coarse data resolution, passage of Hurricane Irma to the north of the island just weeks before Hurricane María, as well as the dearth of basic structural and building material data. Furthermore, the spatially diagonal pattern in the errors [Fig. 5(c)], which appear to coincide with the center track of the storm [Fig. 3(a)], could be indicative of the lack of topographic effects in the wind data (such as wind speed up through mountainous terrain). Given these limitations, a predictive performance of 37% on evaluation data is adequate (Koch et al. 2019).ConclusionsThis study provides evidence that, at the census tract level, vulnerable communities in Puerto Rico suffered higher levels of damage due Hurricane María and that vulnerability measures were more predictive of damage than wind, flood, and landslide hazards. Hazardous forces alone do not sufficiently explain damage patterns and impact assessment models must include social factors as input variables to accurately depict areas of priority for decision makers, improve resource allocation, and, ultimately, ensure a more efficient and equitable response effort. Furthermore, it is advantageous for policymakers to prioritize underprivileged areas for predisaster mitigation investment to avoid heightened postdisaster losses.Various disaster impacts may be reduced if prevailing vulnerabilities are addressed proactively before an event brings them to the fore. However, this can be challenging in data-scarce regions, which often coincide with areas of fewer resources. This study showed that different methods of quantifying vulnerability indicated different correlations with damage and that traditional methods of quantifying socioeconomic vulnerability in particular can be misleading. It is especially pertinent in regions with fewer resources that existing methods of demographic data collection and vulnerability quantification are continuously refined to accurately represent the most vulnerable communities.Data availability provides an opportunity for researchers to implement statistical learning approaches and studies that seek to provide situational awareness throughout the life cycle of a disaster may find these approaches helpful. This study demonstrated that these emerging methods can analyze diverse data sets representing multiple drivers of disaster impacts, including social factors, and provide valuable, holistic estimates of damage patterns and intensities as well as quantify the influence of vulnerability variables.References Asociación de Constructores de Puerto Rico. 2018. “Situación de la industria de la vivienda en Puerto Rico: Recomendaciones de política pública.” Accessed March 3, 2020. Bakkensen, L. A., C. Fox-Lent, L. K. Read, and I. Linkov. 2017. “Validating resilience and vulnerability indices in the context of natural disasters.” Risk Anal. 37 (5): 982–1004. Bessette-Kirton, E. K., C. Cerovski-Darriau, W. H. Schulz, J. A. Coe, J. W. Kean, J. W. Godt, M. A. Thomas, and K. S. Hughes. 2019. “Landslides triggered by Hurricane María: Assessment of an extreme event in Puerto Rico.” GSA Today 29 (6): 4–10. Chakraborty, J., G. A. Tobin, and B. E. Montz. 2005. “Population evacuation: Assessing spatial variability in geophysical risk and social vulnerability to natural hazards.” Nat. Hazards Rev. 6 (1): 23–33. Chapi, K., V. P. Singh, A. Shirzadi, H. Shahabi, D. T. Bui, B. T. Pham, and K. Khosravi. 2017. “A novel hybrid artificial intelligence approach for flood susceptibility assessment.” Environ. Modell. Software 95 (Sep): 229–245. Eroglu, D. I., D. Pamukcu, L. Szczyrba, and Y. Zhang. 2020. “Analyzing and contextualizing social vulnerability to natural disasters in Puerto Rico.” In ISCRAM 2020 Conf. Proc., 17th Int. Conf. on Information Systems for Crisis Response and Management, edited by A. Hughes, F. McNeill, and C. W. Zobel, 389–395. Blacksburg, VA: Virginia Tech. Flanagan, B. E., E. W. Gregory, E. J. Hallisey, J. L. Heitgerd, and B. Lewis. 2011. “A social vulnerability index for disaster management.” J. Homeland Secur. Emergency Manage. 8 (1): 0000102202154773551792. Ganguly, K. K., N. Nahar, and B. M. Hossain. 2019. “A machine learning-based prediction and analysis of flood affected households: A case study of floods in Bangladesh.” Int. J. Disaster Risk Reduct. 34 (Mar): 283–294. Hartman, C. W., and G. D. Squires. 2006. There is no such thing as a natural disaster: Race, class, and Hurricane Katrina. London: Taylor & Francis. Highfield, W. E., W. G. Peacock, and S. Van Zandt. 2014. “Mitigation planning: Why hazard exposure, structural vulnerability, and social vulnerability matter.” J. Plann. Educ. Res. 34 (3): 287–300. Holand, I. S., P. Lujala, and J. K. Rød. 2011. “Social vulnerability assessment for Norway: A quantitative approach.” Norsk. Geogr. Tidsskr.-Norw. J. Geogr. 65 (1): 1–17. Hong, H., H. R. Pourghasemi, and Z. S. Pourtaghi. 2016. “Landslide susceptibility assessment in Lianhua County (China): A comparison between a random forest data mining technique and bivariate and multivariate statistical models.” Geomorphology 259 (Apr): 105–118. Karagiannopoulos, M., D. Anyfantis, S. Kotsiantis, and P. Pintelas. 2007. “Feature selection for regression problems.” In Proc., 8th Hellenic European Research on Computer Mathematics and Its Applications. Athens, Greece. Koch, J., S. Stisen, J. C. Refsgaard, V. Ernstsen, P. R. Jakobsen, and A. L. Højberg. 2019. “Modeling depth of the redox interface at high resolution at national scale using random forest and residual gaussian simulation.” Water Resour. Res. 55 (2): 1451–1469. Ma, C., and T. Smith. 2020. “Vulnerability of renters and low-income households to storm damage: Evidence from Hurricane Maria in Puerto Rico.” Am. J. Public Health 110 (2): 196–202. Merz, B., H. Kreibich, and U. Lall. 2013. “Multi-variate flood damage assessment: A tree-based data-mining approach.” Nat. Hazards Earth Syst. Sci. 13 (1): 53–64. Oliveira, S., J. L. Zêzere, M. Queirós, and J. M. Pereira. 2017. “Assessing the social context of wildfire-affected areas. the case of mainland Portugal.” Appl. Geogr. 88 (Nov): 104–117. Pedregosa, F., et al. 2011. “Scikit-learn: Machine learning in python.” J. Mach. Learn. Res. 12 (Oct): 2825–2830. Sadler, J., J. Goodall, M. Morsy, and K. Spencer. 2018. “Modeling urban coastal flood severity from crowd-sourced flood reports using poisson regression and random forest.” J. Hydrol. 559 (Apr): 43–55. Santos-Burgoa, C., et al. 2018. “Differential and persistent risk of excess mortality from Hurricane Maria in Puerto Rico: A time-series analysis.” Lancet Planet. Health 2 (11): e478–e488. Shafizadeh-Moghadam, H., R. Valavi, H. Shahabi, K. Chapi, and A. Shirzadi. 2018. “Novel forecasting approaches using combination of machine learning and statistical models for flood susceptibility mapping.” J. Environ. Manage. 217 (Jul): 1–11. Strobl, C., A.-L. Boulesteix, A. Zeileis, and T. Hothorn. 2007. “Bias in random forest variable importance measures: Illustrations, sources and a solution.” BMC Bioinf. 8 (1): 1–21. Suthaharan, S. 2016. “Supervised learning algorithms.” In Machine learning models and algorithms for big data classification, 183–206. Berlin: Springer. Szczyrba, L., Y. Zhang, D. Pamukcu, and D. I. Eroglu. 2020. “A machine learning method to quantify the role of vulnerability in hurricane damage.” In ISCRAM 2020 Conf. Proc., 17th Int. Conf. on Information Systems for Crisis Response and Management, edited by A. Hughes, F. McNeill, and C. W. Zobel, 179–187. Blacksburg, VA: Virginia Tech. Trigila, A., C. Iadanza, C. Esposito, and G. Scarascia-Mugnozza. 2015. “Comparison of logistic regression and random forests techniques for shallow landslide susceptibility assessment in Giampilieri (NE Sicily, Italy).” Geomorphology 249 (Nov): 119–136. Van Zandt, S., and W. M. Rohe. 2011. “The sustainability of low-income homeownership: The incidence of unexpected costs and needed repairs among low-income home buyers.” Hous. Policy Debate 21 (2): 317–341. Walsh, K. J., J. L. McBride, P. J. Klotzbach, S. Balachandran, S. J. Camargo, G. Holland, T. R. Knutson, J. P. Kossin, T.-C. Lee, and A. Sobel. 2016. “Tropical cyclones and climate change.” Wiley Interdiscip. Rev. Clim. Change 7 (1): 65–89. Ward, P. S., and G. E. Shively. 2017. “Disaster risk, social vulnerability, and economic development.” Disasters 41 (2): 324–351.

Source link

Leave a Reply

Your email address will not be published. Required fields are marked *