Abstract: Generating detailed textual descriptions of remote sensing images is challenging because it requires capturing both global and local visual information. The complexity of backgrounds and the ...