This paper utilizes linear and non-linear models for box office predictions. In order to predict box office revenue at week t, the study uses the eWOM variables at week t−1. Tables 3–8 present the multiple regression analysis results using the six eWOM variables of the average number of reviews, average review rating, average review extremity level, average length of a review in words, average number of emotional reviews, and average number of positive reviews. The control variables include star power, awards, sequels, the timing of the release, genre (one binary variable representing drama), and nationality (two binary variables representing Korea and US). In total, 1798 movies are used in the multiple regression analysis.
The results of the multiple regression analysis (from Tables 3–8) show that the adjusted R square value is higher in the high review or reviewer helpfulness subsample than in the low review or reviewer subsample. The adjusted R square decreases as the release date moves further into the past. This shows that while the explanatory power of eWOM becomes weaker over time after the release, the explanatory power of eWOM in the high review or reviewer subsample is greater than that in the low review or reviewer subsample. Thus, eWOM variables provide greater explanatory power when the helpfulness values of reviews and reviewers are higher. Accordingly, the study shows the explanatory power of the eWOM variables of volume and valence, which have been investigated extensively in the literature when movies are divided into high and low review or reviewer helpfulness subsamples, which is insightful given that studies of the effect of eWOM on movie performance revenue between those with high and low review or reviewer helpfulness are nearly nonexistent. This indicates that review and reviewer helpfulness are crucial moderating factors increasing the explanatory power of eWOM for box office revenue.
The review volume (the average number of reviews) provides a consistent effect on revenue across all subsamples. The average number of reviews has a consistently significant effect on movie performance in both the high and low reviewer or reviewer subsamples for the first, second and third weeks after the release of the movie. This supports earlier work which noted a positive influence of eWOM on movie performance (Duan et al., 2008; Chintagunta et al., 2010). The effects of the average review rating and the average review extremity level are partially significant. Further, regarding the review helpfulness subsamples, the effect of the average review rating is negative for the first week after the release of the movie and positive for the second and third weeks after the release. For the reviewer helpfulness subsamples, the effect of the average review rating becomes negative for the first week after the release of the movie and positive for the third week after the release. This shows that rather than the review volume, the review valence has less of an effect on box office outcomes, a finding consistent with earlier work (Duan et al., 2008). In addition, the effect of review ratings becomes positive over passes after the release of the movie, which validates the effect of the valence as found in previous studies, which posit that valence exerts effects on box office outcomes (Chintagunta et al., 2010).
These results show that for experience goods such as movies, less extreme reviews may have a greater effect on movie performance than extreme reviews. For the review and reviewer helpfulness subsamples, the average review extremity level exerts a negative effect on revenue for the first week after the release of the movie. Thus, the negative influence of the review extremity level during this first week can be explained by evaluations of the movie showing two-sided comments representing good or poor opinions about the movie. This supports Mudambi and Schuff (2010) in that for experience goods such as movies, reviews with moderate ratings turn out to be more helpful than reviews with extreme ratings. Further, this is related to the mixed effect of the valence of reviews, which is negative for the first week and positive for the third week after the release of the movie.
In order to test hypotheses 1 and 2 further, this study applied four BI methods (random forest, decision trees with boosting, the k-nearest neighbor method, and discriminant analysis).
The effect of the sample size on the prediction performance should be removed for a fair comparison of the prediction performance between the high and low review or reviewer helpfulness subsamples. The numbers of movies in the high and low review or reviewer helpfulness subsamples is set to be identical to the minimum of the two subsamples. Thus, pairs of 881, 505, and 368 movies are ultimately sorted into the subsamples of high and low review helpfulness for predicting box office revenue levels at weeks 1, 2, and 3, respectively (see the numbers in parentheses in Table 9). For the subsamples of high and low reviewer helpfulness (which are not time-varying), the pair of 578 movies is included to predict box office revenue levels during weeks 1, 2, and 3 (see the numbers in parentheses in Table 10). The subsamples in three pairs are divided into two subsets: a training set and a validation set.
N-fold cross validation of the sample and the three dependent variables of the box office revenue are utilized to investigate the stability of the comparison results. The subsamples with 881, 505, 368, and 578 movies were correspondingly divided into 30, 34, 37, and 29 subsets with 30, 15, 10, and 30 movies, respectively. When each of these subsets is entered as a validation sample in sequence, the remaining movies are entered as the training sample. Using each training sample, machine learning methods predict the target variables of the corresponding sample. Thirty training and validation sample pairs are devised with 852 and 30 movies, respectively, to compare the prediction performance at week 1 between high and low review helpfulness. Here, 29 training and validation sample pairs are created with 549 and 29 movies, respectively, to compare the prediction performance at week 1 between high and low reviewer helpfulness. The prediction error is computed using the classification error. The predictions from the learned BI models were compared with the true values in the validation sample to produce the prediction errors, and the prediction errors were then averaged across a number of validation samples.
The results in Tables 9 and 10 show that the average prediction errors are significantly lower in the high review subsample than in the low review or reviewer sample, except when kNN and discriminant analysis methods are used at week 1. For the reviewer helpfulness subsamples, the prediction error of the high reviewer helpfulness subsample is lower than that of the lower reviewer helpfulness subsample, except when the random forest, decision trees using boosting, and kNN methods are used at week 3. Even for insignificant differences in revenue at week 1 or 3, at least either the review or reviewer helpfulness subsamples show a difference in the prediction error for that specific week. This shows that the results using machine learning methods confirm the results of the multiple regression analysis, i.e., that eWOM with greater helpfulness of reviews and reviewers has greater explanatory power with regard to box office revenue.