Talk to OpenAI using their multimodal API
This look great! but can we attach VAD or interrupt when human speaks?