Text this: Visual Perception for Manipulation and Imitation in Humanoid Robots