AbstractDeep reinforcement learning has strong perception and decision-making capabilities that can effectively solve the problem of continuous high-dimensional state-action space and has become the mainstream method in the field of traffic light timing. However, due to model structural defects or different strategic mechanisms of models, most deep reinforcement learning models have problems such as convergence and divergence or poor exploration capabilities. Therefore, this paper proposes a multi-agent Soft Actor–Critic (SAC) for traffic light timing. Multi-agent SAC adds an entropy item to measure the randomness of the strategy in the objective function of traditional reinforcement learning and maximizes the sum of expected reward and entropy item to improve the model’s exploration ability. The system model can learn multiple optimal timing schemes, avoid repeated selection of the same optimal timing scheme and fall into a local optimum or fail to converge. Meanwhile, it abandons low reward value strategies to reduce data storage and sampling complexity, accelerate training, and improve the stability of the system. Comparative experiments show that the method based on multi-agent SAC traffic light timing can solve the existing problems of deep reinforcement learning and improve the efficiency of vehicles passing through in different traffic scenarios.Practical ApplicationsThis paper is devoted to research on the timing method of traffic lights at multiple intersections. The experimental results show that the method proposed in this paper can effectively improve the throughput of each intersection, the waiting time of vehicles, and the number of queued vehicles. In comparison with related algorithms, it is fully proven that the method proposed in this paper can effectively solve the problems pervasive in existing algorithms. In the actual application process, the traffic status information is obtained via interactions with real traffic environment, and the timing scheme of the traffic lights is dynamically adjusted according to the information, so as to achieve the effect of alleviating traffic congestion at the intersection.