Satellite imagery from earth observation missions enable processing big data to gather information about the world. Automatizing the creation of maps that reflect ground truth is a desirable outcome that would aid decision makers to take adequate actions in alignment with the United Nations Sustainable Development Goals. In order to harness the power that the availability of the new generation of satellites enable, it is necessary to implement techniques capable of handling annotations for the massive volume and variability of high spatial resolution imagery for further processing. However, the availability of public datasets for training machine learning models for image segmentation plays an important role for scalability.This work focuses on bridging remote sensing and computer vision by providing an open source based pipeline for generating machine learning training datasets for road detection in an area of interest. The proposed pipeline addresses road detection as a binary classification problem using road annotations existing in OpenStreetMap for creating masks. For this case study, Planet images of 3m resolution are used for creating a training dataset for road detection in Kenya.