We consider analyzing traffic accident patterns using both road network and satellite images aligned to the road graph nodes.
Previous work for predicting accident occurrences has utilized graph-structural features extracted from road networks, which do not incorporate physical and environmental aspects of the road. This work constructs MMTraCE, a large-scale dataset across six U.S. states, comprising nine million traffic accident records from official sources, and one million high-resolution satellite images for each node of the road network. Additionally, every node is annotated with features such as the region's weather statistics, traffic volume, and road types (e.g., residential vs. motorway).
Utilizing this dataset, we conduct a comprehensive evaluation of multimodal learning methods that integrate both visual and network embeddings. Our findings show that by combining network and visual features, multimodal learning achieves an accurate prediction of accident occurrences with an average AUROC of 90.1%, outperforming graph-based methods by 3.7% on average. With the enhanced accuracy provided by multimodal embeddings, we conduct a causal analysis based on a matching estimator to examine the contributing factors of traffic accidents. The findings suggest that accident frequency increases under higher precipitation by 24%, and on higher speed limit roads such as motorways by 22%, after adjusting for other confounding factors through the embeddings. Seasonal factors increase accident rates by 29%. Ablation studies validate the importance of satellite imagery features for achieving accurate prediction.