AbstractThe current state of practice in traffic data quality control features rule-based data checking and validation processes, where the rules are subjective and insensitive to variation inherited with traffic data. In this paper, self-supervised deep learning approaches were explored to leverage the existence of multiple sources of traffic volume data, which permitted cross-checking of one data source against another for improved robustness. Two types of models were developed, aiming at detecting data anomalies at two distinct timescales. Particularly, a novel variational autoencoder (VAE)-based model was formulated for discerning data anomalies at the daily level and four recurrent model structures, including recurrent neural networks (RNN), gated recurrent units (GRU), long short-term memory (LSTM) units, and liquid time constant (LTC) networks, were evaluated for detecting anomalies in finer incremental timescales (i.e., 5-min intervals). The effectiveness of the proposed methods was demonstrated using two independent sources of traffic data from the Georgia Department of Transportation: (1) traffic counts collected by inductive loops as part of the statewide traffic count program, and (2) traffic volumes acquired by a video detection system as part of the Georgia 511, an advanced traveler information system in Georgia. Based on our experiments, the VAE-based model achieved a precision of 0.95, recall of 0.92, and F1 score of 0.94. Among the recurrent models, the fully connected LTC produced the lowest prediction error and achieved a precision of 0.82, recall of 0.88, and F1 score of 0.85.