paper_based_rag / README.md
ะฎั€ะฐ ะฆะตะฟะปั–ั†ัŒะบะธะน
Initial commit
693d949
|
raw
history blame
3.75 kB
metadata
title: Document QA System
emoji: ๐Ÿ“„
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 5.8.0
app_file: app.py
python_version: 3.11.0
models:
  - sentence-transformers/all-mpnet-base-v2
tags:
  - question-answering
  - gradio
  - LLM
  - document-processing

Document QA System

Document Question-Answering system that utilizes Gradio for the interface and Docker for deployment.

Features

  • Document Indexing: Efficiently processes and indexes documents for quick retrieval.
  • Interactive Interface: Provides a user-friendly interface for querying documents.
  • Dockerization: Easy to build and deploy using Docker.

Technologies

Installation

Prerequisites

  1. Docker:

  2. Set path to the data directory, index directory:

    • Update the variables in utils/constant.py.
  3. Set the API key for Cohere Command R and LLamaParse:

    • Update the CO_API_KEY and LLAMA_CLOUD_API_KEY in utils/settings.py in function configure_settings.

Using Docker

  1. Clone the Repository:

    git clone <repository-url>
    cd <repository-folder>
    
  2. Build the Docker Image:

    docker build -t doc-qa-system .
    
  3. Run the Docker Container:

     docker run -p 7860:7860 doc-qa-system
    
  4. Access the Interface:

    Open your browser and go to http://localhost:7860.

Using Python

  1. Clone the Repository:

    git clone <repository-url>
    cd <repository-folder>
    
  2. Install Dependencies:

    pip install -r requirements.txt
    
  3. Run indexing data:

    python index.py
    
  4. Run the Application:

    python app.py
    

Project structure

โ”œโ”€โ”€ app.py                   # Gradio application
โ”œโ”€โ”€ main.py                  # Main script for answering queries
โ”œโ”€โ”€ utils/                   # Utility functions and helpers
โ”‚   โ”œโ”€โ”€ constant.py          # Constant values used in the project
โ”‚   โ”œโ”€โ”€ index.py             # Handles document indexing
โ”‚   โ”œโ”€โ”€ retriever.py         # Retrieves and ranks documents
โ”‚   โ”œโ”€โ”€ settings.py          # Configuration settings
โ”œโ”€โ”€ data/                    # Directory containing documents to be indexed
โ”œโ”€โ”€ index/                   # Stores the generated index files
โ”‚   โ”œโ”€โ”€ default__vector_store.json
โ”‚   โ”œโ”€โ”€ docstore.json
โ”‚   โ”œโ”€โ”€ graph_store.json
โ”‚   โ”œโ”€โ”€ image__vector_store.json
โ”‚   โ”œโ”€โ”€ index_store.json
โ”œโ”€โ”€ requirements.txt         # Python dependencies
โ”œโ”€โ”€ Dockerfile               # Docker configuration
โ”œโ”€โ”€ README.md                # Project documentation 

Example questions

  • What is Few-NERD?
  • What is the Few-NERD dataset used for?
  • What are NER types in dataset?
  • What role does "transfer learning" play in the proposed few-shot learning system?
  • What metric does the paper use to evaluate the effectiveness of the few-shot model?