Spaces:
Sleeping
Sleeping
File size: 3,754 Bytes
4cf0e20 693d949 4cf0e20 693d949 4cf0e20 693d949 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 |
---
title: "Document QA System"
emoji: "π"
colorFrom: "blue"
colorTo: "indigo"
sdk: gradio
sdk_version: 5.8.0
app_file: app.py
python_version: 3.11.0
models:
- sentence-transformers/all-mpnet-base-v2
tags:
- question-answering
- gradio
- LLM
- document-processing
---
# Document QA System
Document Question-Answering system that utilizes Gradio for the interface and Docker for deployment.
## Features
- **Document Indexing**: Efficiently processes and indexes documents for quick retrieval.
- **Interactive Interface**: Provides a user-friendly interface for querying documents.
- **Dockerization**: Easy to build and deploy using Docker.
## Technologies
- Data source
- [Paper about Few-NERD dataset](https://arxiv.org/pdf/2105.07464) located in the data directory are used as the data source for indexing.
- Chunking
- Document chunking is handled by [all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2)
- LLM
- The system utilizes the [Cohere Command R](https://cohere.com/command) for generating responses
- Retriever, Reranker
- [Cohere Command R](https://cohere.com/command) is used
- UI
- The user interface is built with Gradio
## Installation
### Prerequisites
1. **Docker**:
- [Install Docker](https://docs.docker.com/get-docker/)
2. **Set path to the data directory, index directory**:
- Update the variables in `utils/constant.py`.
3. **Set the API key for [Cohere Command](https://dashboard.cohere.com/api-keys) R and [LLamaParse](https://docs.cloud.llamaindex.ai/llamaparse/getting_started/get_an_api_key)**:
- Update the `CO_API_KEY` and `LLAMA_CLOUD_API_KEY` in `utils/settings.py` in function `configure_settings`.
### Using Docker
1. **Clone the Repository**:
```bash
git clone <repository-url>
cd <repository-folder>
```
2. **Build the Docker Image**:
```bash
docker build -t doc-qa-system .
```
3. **Run the Docker Container**:
```bash
docker run -p 7860:7860 doc-qa-system
```
4. **Access the Interface**:
Open your browser and go to `http://localhost:7860`.
### Using Python
1. **Clone the Repository**:
```bash
git clone <repository-url>
cd <repository-folder>
```
2. **Install Dependencies**:
```bash
pip install -r requirements.txt
```
3. **Run indexing data**:
```bash
python index.py
```
4. **Run the Application**:
```bash
python app.py
```
## Project structure
```bash
βββ app.py # Gradio application
βββ main.py # Main script for answering queries
βββ utils/ # Utility functions and helpers
β βββ constant.py # Constant values used in the project
β βββ index.py # Handles document indexing
β βββ retriever.py # Retrieves and ranks documents
β βββ settings.py # Configuration settings
βββ data/ # Directory containing documents to be indexed
βββ index/ # Stores the generated index files
β βββ default__vector_store.json
β βββ docstore.json
β βββ graph_store.json
β βββ image__vector_store.json
β βββ index_store.json
βββ requirements.txt # Python dependencies
βββ Dockerfile # Docker configuration
βββ README.md # Project documentation
```
## Example questions
- What is Few-NERD?
- What is the Few-NERD dataset used for?
- What are NER types in dataset?
- What role does "transfer learning" play in the proposed few-shot learning system?
- What metric does the paper use to evaluate the effectiveness of the few-shot model? |