AbstractCrane operators and signalmen play an integral role in the safe and efficient operation of cranes on a construction site. Operating a crane is a complex and demanding task that requires careful coordination between operator and signalmen in order to avoid errors that could have dire consequences, including serious injury or loss of life. Therefore, special considerations should be taken to mitigate communication errors that could occur between the two parties. Technology can play an important role in enhancing communication, and, with recent advancements in technology, human–computer interaction has emerged as an active area of research within the field of computer vision. This paper presents a framework that integrates the YOLOv4 model (for object detection) and the long short-term memory (LSTM) model (a recurrent neural network) for dynamic hand signal classification in real time. The first step is the creation of a crane signalman dynamic hand signal data set with 18 classes. The YOLOv4 model is then customized for this application by modifying the activation function. Three modified YOLOv4 models are then integrated with the LSTM model. The modified YOLOv4 integrated with LSTM is found to achieve a maximum overall accuracy of 94.8% with an inference time of 55.1 frames per second. The model is further validated with real-time dynamic hand signal classification, achieving an accuracy of 93.5% and an inference time of 44 frames per second. The proposed models show improved quality in classification accuracy as well as in processing speed in comparison to some of the most widely used models currently in use. The proposed novel framework can be used as another layer of communication to supplement current practice and reduce communication errors between crane signalmen and crane operators.