Text this: Multimodal Analysis of User-Generated Multimedia Content