streamlit PyMuPDF nltk scikit-learn openai