Reconstructed optical map of the seafloor
The 3-D structure model was generated from 30,957 images obtained across seven survey lines (Fig. 2). The total length of each survey line was around 1,838 m. The resolutions of the x, y and z axes were 0.01 m and the corals can be identified from the constructed model. The survey site is a well-known diving spot and we can identify some drop-offs with depth differences of around 5–7 m.
The large-scale 2-D image was produced from the 3-D structure model and is illustrated in Fig. 3. A survey area of 11,434 m2 was covered, yielding a calculated survey efficiency of 12,146 m2/h. The pixel resolution in the horizontal plane (x–y plane) is about 3.5 mm/pixel (± 0.4%); the viewing scale can be adjusted on any type of commercial or free geographical information system (GIS) software. As shown in the Fig. 3, the resolution was enough to identify the coral. We can identify a large quantity of coral from the high-resolution image in Fig. 3 and the presence of at least 10 individual species of corals, such as Pocillopora eydouxi and P. verrucosa, are confirmed in this data by the expert.
In addition, the DEMSSS inside the black border line was produced from the 3-D structure model and overlapped onto the DEMMBES (background), as is shown in Fig. 4. It seems that the connection between the DEMSSS and DEMMBES is seamless. To compare the DEM resolutions, enlarged figures are illustrated in Fig. 4a,b. The resolution of the image (horizontal plane) in Fig. 4a is 0.5 m/pixel and in Fig. 4b is 0.01 m/pixel; thus, we can extrapolate the seafloor structure with precision using the DEMSSS. The accuracy by the photogrammetry method was well discussed in the literature (approximately 1–2 mm at 3 m distance)38. The distribution of differences of depths in the vertical plane (elevation) was calculated and is illustrated as the color gradation in Fig. 5a. From this figure, it can be seen that the difference around the slope area is large. In addition, Fig. 5b shows a histogram of this difference [− 0.68 ± 1.16 m (mean ± S.D., n = 38,602)] and slightly shifts to the left (minus direction). This means that the DEMSSS tends to become lower than the DEMMBES. The supplementary Fig. 2 shows the locations of the Ground control points (GCP) in DEMMBES and DEMSSS to validate the difference of depths [1.61 ± 0.14 m in the horizontal plane, 0.74 ± 0.11 m in the vertical direction (mean ± S.E., n = 21)]. The GCPs were arbitrarily picked up from the point data at the characteristic land features. From these results, the error was larger in the horizontal plane than in the vertical direction. We assume the main difference of depths was caused by the gap in the horizontal plane due to the GPS positioning error (± 1 m).
Although a slight difference in the vertical plane is observed, this high-resolution DEMSSS will offer useful information for the advanced surveying of seabed topography, especially in shallow coastal areas. This precise seabed topography will contribute not only to coral surveys but also to other ecological, engineering and geographical studies, e.g., high-resolution advection modeling and structural calculations of natural reefs39,40,41. The survey efficiency of 12,146 m2/h achieved in this study is higher than the 7,000 m2/h of the previous study26, because six cameras were used in this case compared to five in the previous one due to battery problems. In addition, the water transparency was better than before (see the supplementary Fig. S3); therefore, we could maintain the SSS at a high altitude of around 3–5 m. Thus, the efficiency of the SSS is at least five times greater than that of an AUV and some 80 times higher than that of diving, making it suitable for the rapid assessment of coral reefs.
Of course, the condition is different in each survey site; therefore, we should search the optimal survey strategy to fit them. The use of the acoustic positioning system or the already-known benchmark position on the sea floor will become one of the solutions to keep the accuracy of the DEMSSS. Also, in case of the deeper sea survey or more turbid condition, we should use the LED lights and care the safety of the operation of the towed camera array system with long towing rope to avoid hitting the corals.
Evaluation of U-Net-based segmentation
In this study, we propose and evaluate a U-Net-based coral segmentation approach for the efficient surveying of large areas, such as depicted in Fig. 3. (See the Methods sections for details of the U-Net model and data processing). For training and evaluation, we divide the entire dataset (Fig. 3a) into 14,016 images of 512 × 512 pixels. Each divided image measures about 3.2 m2. We randomly selected 200 images from those divided and manually labeled images of coral under the supervision of coral experts. The images in the leftmost and rightmost columns in Fig. 6 are examples of the divided images and labeled coral images, respectively.
We then performed training and performance evaluations of the dataset of the 200 image pairs above. The processing of the color correction (CC)26 and data-augmentation (DA) for the obtained images, which was based on rotations21,34, may affect prediction performance. Therefore, we trained and evaluated four types of U-Net models with and without CC and DA, respectively. Furthermore, to compare prediction performance with the U-Net model, we employed the pixelwise CNN model, which had exhibited good performance in our previous work26. We evaluated the performances of the pixelwise CNN models with different input window sizes of 32 × 32, 48 × 48, 64 × 64, 96 × 96, 128 × 128 and 160 × 160, because the size of the local images used for the input window of the pixelwise CNN model greatly influences the prediction performance. (See the Methods section for details of the training procedure and evaluation metrics).
Figure 6 shows prediction examples for two test images, A and B. The images in the leftmost column are the original ones, while the images in the second column were processed by color correcting the originals. The images in the third and fourth columns are the predicted results using U-Net with CC and DA and the pixelwise CNN (window: 64 × 64 pixels) models, respectively. The results for the different processing conditions (CC and DA) of the U-Net model are shown in Fig. S1. The black and white areas indicate pixels that were successfully predicted as coral (TP: True Positive) and non-coral (TN: True Negative) areas, respectively. On the other hand, the red and blue areas were those that were wrongly predicted as coral (FP: False Positive) and non-coral (FN: False Negative), respectively. The white area in the rightmost column shows the manually-labeled coral area. The prediction accuracies for images A and B were, respectively, 0.913 and 0.924 with U-Net and 0.903 and 0.870 with pixelwise CNN. Both methods achieved a high degree of accuracy of about 0.9, but U-Net showed slightly better performance. In addition, the F-measures for images A and B were 0.805 and 0.857 with U-Net and 0.759 and 0.763 with pixelwise CNN. These results suggest that U-Net has the potential to identify corals with greater accuracy than pixelwise CNN.
To evaluate the performances of U-Net and pixelwise CNN in more detail, we conducted evaluations using a dataset of 200 labeled images based on a five-fold cross-validation. (See the Methods section for methodological detail on this validation). Table 1 and Fig. 7a show the evaluated performances of U-Net with and without CC and DA, as well as pixelwise CNN using the images with CC and DA with different window sizes. The predictions by all variants of U-Net achieved high levels of accuracy (> 0.9). From the results listed in Table 1, it can be confirmed that performance tends to increase with the application of CC and DA. The U-Net model with both CC and DA showed the highest accuracy (0.910) and F-measure (0.772). The pixelwise CNN result shows that the performance tends to increase with increasing window size. However, it is clearly shown in Fig. 7a that the accuracy (blue-dashed line) and F-measure (orange-dashed line) of the U-Net exhibit better performances compared to that of the pixelwise CNN. These results indicate that the U-Net has high predictive performance, and both CC and DA are effective for improving this. While pixelwise CNN uses the local information of window sizes as its main input for prediction, U-Net utilizes the global information of the entire input image (see supplementary Fig. S1). Therefore, U-Net is considered to have achieved higher performance than pixelwise CNN.
We assessed the details of the relationship between prediction performance and prediction time. Figure 7b displays prediction times per image using U-Net and pixelwise CNN with different window sizes. We used an Nvidia GeForce GTX 1,080 Ti GPU with an Intel Xeon CPU E5-2,630 v4 computing core. These results indicate that the prediction time rapidly increases as the input size expands, while the prediction time of U-Net is very short (0.057 s). Note that the prediction time of U-Net does not change because the input size is constant (512 × 512 pixels). The prediction time of U-Net is about 1/1,000 for pixelwise CNN with a window size of 64 × 64. The results shown Fig. 7a,b indicate that U-Net-based prediction is more accurate and substantially faster than pixelwise CNN.
Estimation of coral cover in the surveyed area
We built a prediction model for the entire surveyed area using all 200 images and the U-net with CC and DA, which had exhibited the best performance in the above evaluations. The 2-D image (orthophoto) of the entire surveyed area was divided into 14,016 local images (512 × 512 pixels). We estimated the quantity of coral in the surveyed area (11,434 m2) using the built model and divided the images. The calculation time for this estimation was 1,120 s (18.7 min) using the same GPU and CPU as that outlined above. Figure 8 shows the overall coral coverage prediction by the model. The predicted percent coral cover was distributed from 0 to 35%. According to the previous survey, conducted in 2011 by scuba divers using the manta-method, the coral cover in the area was estimated to be around 25 to 50%42. The results this time around were about half of what they were last time, so our results indicate a decline in coral cover, which may be due to the 2016 bleaching event43. As previously described, the changes to coral reefs have been dramatic and determining the mechanisms underlying these requires the capacity to rapidly assess reefs. In addition, the U-net based segmentation method has the possibility to be applied for the species cover, or disease prevalence studies. Although the fields are different, Saito et al. have classified the layers of two-dimensional materials into three classes44. Also, Kohl et al. have classified images of street scenes taken from a camera into 19 classes, including person, car, and road45. As remarked above, the efficient survey method presently under discussion has the potential to become a useful tool for quantitatively investigating biological systems such as coral.