Text this: Perception in Multimodal Dialogue Systems