Text this: Multimodal Location Estimation of Videos and Images