Text this: Real-time Speech and Music Classification by Large Audio Feature Space Extraction