Cool! I posted it on github, but looks like the model is buggy if you don't provide all the necessary informations at the beginning.
As an example, if you just say "hi", the model will reason and try to ask you for the location, but you won't hear anything cause such questions are printed and not returned to the TTS of fastRTC
Do you know how to possibly fix this? I don't know much about the smolagents library