Spaces:
Sleeping
Sleeping
metadata
title: Document QA System
emoji: ๐
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 5.8.0
app_file: app.py
python_version: 3.11.0
models:
- sentence-transformers/all-mpnet-base-v2
tags:
- question-answering
- gradio
- LLM
- document-processing
Document QA System
Document Question-Answering system that utilizes Gradio for the interface and Docker for deployment.
Features
- Document Indexing: Efficiently processes and indexes documents for quick retrieval.
- Interactive Interface: Provides a user-friendly interface for querying documents.
- Dockerization: Easy to build and deploy using Docker.
Technologies
- Data source
- Paper about Few-NERD dataset located in the data directory are used as the data source for indexing.
- Chunking
- Document chunking is handled by all-mpnet-base-v2
- LLM
- The system utilizes the Cohere Command R for generating responses
- Retriever, Reranker
- Cohere Command R is used
- UI
- The user interface is built with Gradio
Installation
Prerequisites
Docker:
Set path to the data directory, index directory:
- Update the variables in
utils/constant.py
.
- Update the variables in
Set the API key for Cohere Command R and LLamaParse:
- Update the
CO_API_KEY
andLLAMA_CLOUD_API_KEY
inutils/settings.py
in functionconfigure_settings
.
- Update the
Using Docker
Clone the Repository:
git clone <repository-url> cd <repository-folder>
Build the Docker Image:
docker build -t doc-qa-system .
Run the Docker Container:
docker run -p 7860:7860 doc-qa-system
Access the Interface:
Open your browser and go to
http://localhost:7860
.
Using Python
Clone the Repository:
git clone <repository-url> cd <repository-folder>
Install Dependencies:
pip install -r requirements.txt
Run indexing data:
python index.py
Run the Application:
python app.py
Project structure
โโโ app.py # Gradio application
โโโ main.py # Main script for answering queries
โโโ utils/ # Utility functions and helpers
โ โโโ constant.py # Constant values used in the project
โ โโโ index.py # Handles document indexing
โ โโโ retriever.py # Retrieves and ranks documents
โ โโโ settings.py # Configuration settings
โโโ data/ # Directory containing documents to be indexed
โโโ index/ # Stores the generated index files
โ โโโ default__vector_store.json
โ โโโ docstore.json
โ โโโ graph_store.json
โ โโโ image__vector_store.json
โ โโโ index_store.json
โโโ requirements.txt # Python dependencies
โโโ Dockerfile # Docker configuration
โโโ README.md # Project documentation
Example questions
- What is Few-NERD?
- What is the Few-NERD dataset used for?
- What are NER types in dataset?
- What role does "transfer learning" play in the proposed few-shot learning system?
- What metric does the paper use to evaluate the effectiveness of the few-shot model?