Spaces:
Sleeping
Sleeping
A newer version of the Gradio SDK is available:
5.16.0
metadata
title: Paper-based RAG
emoji: ๐
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 5.8.0
app_file: app.py
python_version: 3.11.0
models:
- sentence-transformers/all-mpnet-base-v2
tags:
- question-answering
- gradio
- LLM
- document-processing
Document QA System
Document Question-Answering system that utilizes LlamaIndex for document indexing, generation, and retrieval and Gradio for the user interface.
Technologies
- Data source
- Paper about BERT located in the data directory are used as the data source for indexing.
- Chunking
- Document chunking is handled by all-mpnet-base-v2
- LLM
- The system utilizes the
gpt-4o-mini
for generating responses
- The system utilizes the
- Retriever, Reranker
- gpt-4o-mini is used
- UI
- The user interface is built with Gradio
Installation
Prerequisites
Docker:
API keys
Using HuggingFace Spaces
- Follow the link to the paper-based-rag on Spaces.
- Upload your paper for indexing or use the default paper about BERT.
Using Docker
Build the Docker Image:
docker build -t doc-qa-system .
Run the Docker Container:
docker run -p 7860:7860 doc-qa-system
Access the Interface:
- Open your browser and go to
http://localhost:7860
.
- Open your browser and go to
Using Python
Install Dependencies:
pip install -r requirements.txt
Add paper to the data directory:
- Add the paper you want to index to the
data
directory or use default paper about BERT.
- Add the paper you want to index to the
Run indexing data:
python index.py
Run the Application:
python app.py
Project structure
โโโ app.py # Gradio application
โโโ main.py # Main script for answering queries
โโโ utils/ # Utility functions and helpers
โ โโโ constant.py # Constant values used in the project
โ โโโ index.py # Handles document indexing
โ โโโ retriever.py # Retrieves and ranks documents
โ โโโ settings.py # Configuration settings
โโโ data/ # Directory containing documents to be indexed
โโโ index/ # Stores the generated index files
โ โโโ default__vector_store.json
โ โโโ docstore.json
โ โโโ graph_store.json
โ โโโ image__vector_store.json
โ โโโ index_store.json
โโโ requirements.txt # Python dependencies
โโโ Dockerfile # Docker configuration
โโโ README.md # Project documentation
Example questions
- What is the pre-training procedure for BERT, and how does it differ from traditional supervised learning?
- Can you describe how BERT can be fine-tuned for tasks like question answering or sentiment analysis?