Abstract—Road extraction from remote sensing images holds significant application value in various aspects of daily scenarios. However, it is still challenging to extract high-quality road results from remote sensing images due to the interference of objects sharing similar structures with roads in the background, and the occlusion caused by surroundings. To alleviate these problems, a road extraction network based on the global-local Context perception and Cross spatial-scale feature interaction is proposed (C2Net). First, a global-local context perception module is incorporated to capture the overall topology features of road, which aims to improve the ability of model to discriminate between roads and similar objects. Then, the cross spatail-scale feature interaction module is designed in the skip connection to effectively aggregate full-scale features without loss of feature information, which can provide rich and accurate road structural features for the decoder. Experiments conducted on public road datasets demonstrate that C2Net outperforms existing methods in terms of comprehensive metrics such as Intersection over Union (IoU) and F1-score. The results indicate that C2Net can produce road results with superior connectivity and quality.