Data access was approved by the institutional ethics committee at Klinikum Rechts der Isar (Ethikvotum 87/18 S) and the data was anonymized. The ethics committee has waived the need for informed consent. All research was performed in accordance with relevant guidelines and regulations. A dataset of 391 CXRs (Chest PA) was collected from our institution’s picture archiving and communication system (PACS). Patient demographics for training and test sets are shown in Table 3. Additional clinical information from the medical report, such as follow-up CT scans were available to verify the diagnosis. Thereby, case-level ground-truth labels (unsuspicious or nodulous) were assigned based on the diagnosis of two radiologists: the first radiologist made the diagnosis in clinical routine and a second radiologist (JB, 3 years of experience in chest imaging) verified and segmented the nodules retrospectively using our in-house built web-based platform. For the reader study test-set, one more radiologist verified the segmented nodules (DP, 12 years of experience).

From the segmentations, bounding boxes were extracted based on the segmentation boundaries. From the radiographs with nodules, 257 were used for training. Training data was supplemented by the Japanese Society of Radiological Technology (JSRT) dataset31, from which 154 additional radiographs with annotated nodules were obtained. Therefore, the total number of radiographs used for training was 411.

Additionally, lung segmentations for all 247 JSRT files were obtained from the segmention in chest radiography (SCR) database32 in order to train a lung segmentation network. Please note that data for lung segmentation also includes 93 additional non-nodulous images from the JSRT database. For lung segmentation train, validation and test set size was set to 157, 40 and 50.

Table 3 Patient demographics for training and test subsets. Mass size is given as a fraction of the radiograph size (1.0 would indicate every pixel of the radiograph is a nodule). As for screening and foreign body (FB) test sets segmentations were unavailable, nodule mass and location were not provided. Within a radiograph multiple foreign bodies may occur. Secondary pathologies (SP) were excluded from the reader study. AM indicates acromastinum induced artifacts, which often show nodule-like morphological characteristics.
Figure 5

General workflow for training and test phase.

Network training

General workflow for network training is illustrated in Fig. 5. For nodule detection, we employed a RetinaNet architecture18. This architecture was successfully utilitzed in prior literature6, 22 for nodule detection in radiographs. It inputs a preprocessed radiograph and outputs multiple box-coordinates of nodule locations with additional scores (assessing confidence).

For preprocessing, images were resampled to (512 times 512) pixels. Afterwards, histogram equalization was performed. Resulting intensities were normalized to values between 0 and 1.

Training was performed using a batch size of 1 with 1000 steps per epoch. For the utilized loss function (focal loss), hyperparameters were set to (alpha = 0.25, gamma = 2.0). The initial learning rate was set to (10^{-5}) and reduced by factor 0.1 after 3 epochs of stagnating loss ((delta = 0.0001)). The network was trained for 50 epochs in total. Data augmentation transformations included contrast, brightness, shear, scale, flip, and translation. From the training set, 80 percent of the radiographs were used for training and 20 percent for validation. None of the training or validation data was part of the reader study or foreign body test sets. Models were implemented based on keras-retinanet33 using Tensorflow34 and Keras35. As a RetinaNet backbone, ResNet-10136 was used.

To invalidate extrathoracic nodule detections made by RetinaNet, an additional lung segmentation network was developed. For lung segmentation, an U-Net15 like architecture is applied in illustrated in Fig. 6. U-Net like architectures were successfully applied for lung segmentation in previous literature16. Training masks were generated by combining the left lung lob, right lung lobe and heart mask from the SCR dataset. An Adam optimizer with a learning rate of (10^{-4}) was used. Total number of epochs was set to 30. Augmentation operations included zoom, height, shift and rotation. As a loss function, the Dice loss according to37 was used.

Figure 6

Utilized U-Net architecture for lung segmentation.


In each radiograph, we evaluated the number of true-positives (TP), false-positives (FP) and false-negatives (FN). True-negatives were not counted, as these include all possible remaining boxes within the radiograph. To determine the aforementioned numbers, a distance measurement is required, whereat we utilized a method similar to Shapira et al.25: within a single radiograph, first the center of masses of all ground-truth and prediction lesions are determined. Next, between each pair ((G_i,P_i)) of a ground-truth lesion (G_i) and a prediction lesion (P_i), the euclidean distance is calculated. If the euclidean distance is below a certain threshold D, the nodule accounts as a true-positive. If there is no neighbour within the distance D for a (G_i) or (P_i), the nodule counts as false-negative and false-positive respectively. For a 512 × 512 pixel image we set the value of D to 23, which means a distance below 23 pixels for lesion centers between ground-truth and prediction yields a TP. To determine this value, we let the radiologist who annotated the groundtruth mark the lesion center in an additional experiment. Afterwards we calculated the distances between groundtruth segmentation center-of-mass and marked lesion center. The maximum distance between the groundtruth center-of-mass and the radiologist mark yielded 23 pixels. Furthermore, the sensitivity of the lesion detection can be controlled by ignoring predictions below a certain lesion score. Setting a lower threshold usually increases sensitivity, but also false-positives. This trend was visualized for different thresholds using a FROC curve, similar to Kim et al.22. For the absolute true-positive and false-positive numbers for RetinaNet results, we set a threshold of 0.6, which yields a lower false-positive rate than radiologists and therefore makes the absolute number of true-positives comparable. For evaluation of the nodule location detection task, the returned boxes were analyzed with respect to ground-truth annotations using the described metric.

Additionally it is required to retrieve a case-level score, for the screening task. This case-level score indicates whether there is one or more nodule in the radiograph. As we retrieved individual nodule scores from the RetinaNet predictions, we chose the maximum of all nodule scores within the radiograph as a case-level score.

Reader study setup

For the reader study, two radiologists (CML and JA) interpreted 75 chest PA (posterior anterior) radiographs. The radiologists had 4 and 6 years experience. In order to simulate a clinical setting, each radiologist was given a time constraint of 10 seconds per radiograph. The assignment was to mark all nodules with a mouse click using our in-house built web-based platform. At least one nodule occured in 36 radiographs. Total nodule count was 65 and the average nodule count in radiographs with nodules was 1.8 ± 1.6.

Statistical analysis

The bootstrap approach38 was used to calculate CIs of the ROC retrieved in the screening task for the RetinaNet architecture. We conducted the following experiment with 1,000 replications: In each experiment, we selected 75 random samples from the test set and calculated the ROC AUC values from these samples. In order to retrieve the 95% confidence interval, we sorted the resulting AUCs from all experiments incrementally and took the AUC value at 2.5% and 97.5% as minimum and maximum of the CI, respectively.

Source link

Leave a Reply

Your email address will not be published. Required fields are marked *