Text this: Multimodal Video Characterization and Summarization