Юра Цепліцький commited on
Commit
af8b652
·
1 Parent(s): 0582ac1

Change default paper

Browse files
README.md CHANGED
@@ -1,5 +1,5 @@
1
  ---
2
- title: "Document QA System"
3
  emoji: "📄"
4
  colorFrom: "blue"
5
  colorTo: "indigo"
@@ -18,24 +18,18 @@ tags:
18
 
19
  # Document QA System
20
 
21
- Document Question-Answering system that utilizes Gradio for the interface and Docker for deployment.
22
-
23
- ## Features
24
-
25
- - **Document Indexing**: Efficiently processes and indexes documents for quick retrieval.
26
- - **Interactive Interface**: Provides a user-friendly interface for querying documents.
27
- - **Dockerization**: Easy to build and deploy using Docker.
28
 
29
  ## Technologies
30
 
31
  - Data source
32
- - [Paper about Few-NERD dataset](https://arxiv.org/pdf/2105.07464) located in the data directory are used as the data source for indexing.
33
  - Chunking
34
  - Document chunking is handled by [all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2)
35
  - LLM
36
- - The system utilizes the [Cohere Command R](https://cohere.com/command) for generating responses
37
  - Retriever, Reranker
38
- - [Cohere Command R](https://cohere.com/command) is used
39
  - UI
40
  - The user interface is built with Gradio
41
 
@@ -47,30 +41,24 @@ Document Question-Answering system that utilizes Gradio for the interface and Do
47
 
48
  - [Install Docker](https://docs.docker.com/get-docker/)
49
 
50
- 2. **Set path to the data directory, index directory**:
 
 
51
 
52
- - Update the variables in `utils/constant.py`.
53
 
54
- 3. **Set the API key for [Cohere Command](https://dashboard.cohere.com/api-keys) R and [LLamaParse](https://docs.cloud.llamaindex.ai/llamaparse/getting_started/get_an_api_key)**:
55
-
56
- - Update the `CO_API_KEY` and `LLAMA_CLOUD_API_KEY` in `utils/settings.py` in function `configure_settings`.
57
 
58
  ### Using Docker
59
 
60
- 1. **Clone the Repository**:
61
-
62
- ```bash
63
- git clone <repository-url>
64
- cd <repository-folder>
65
- ```
66
-
67
- 2. **Build the Docker Image**:
68
 
69
  ```bash
70
  docker build -t doc-qa-system .
71
  ```
72
 
73
- 3. **Run the Docker Container**:
74
 
75
  ```bash
76
  docker run -p 7860:7860 doc-qa-system
@@ -78,30 +66,27 @@ Document Question-Answering system that utilizes Gradio for the interface and Do
78
 
79
  4. **Access the Interface**:
80
 
81
- Open your browser and go to `http://localhost:7860`.
82
 
83
  ### Using Python
84
 
85
- 1. **Clone the Repository**:
86
-
87
- ```bash
88
- git clone <repository-url>
89
- cd <repository-folder>
90
- ```
91
-
92
- 2. **Install Dependencies**:
93
 
94
  ```bash
95
  pip install -r requirements.txt
96
  ```
97
 
98
- 3. **Run indexing data**:
 
 
 
 
99
 
100
  ```bash
101
  python index.py
102
  ```
103
 
104
- 4. **Run the Application**:
105
 
106
  ```bash
107
  python app.py
@@ -131,8 +116,5 @@ Document Question-Answering system that utilizes Gradio for the interface and Do
131
 
132
  ## Example questions
133
 
134
- - What is Few-NERD?
135
- - What is the Few-NERD dataset used for?
136
- - What are NER types in dataset?
137
- - What role does "transfer learning" play in the proposed few-shot learning system?
138
- - What metric does the paper use to evaluate the effectiveness of the few-shot model?
 
1
  ---
2
+ title: "Paper-based RAG"
3
  emoji: "📄"
4
  colorFrom: "blue"
5
  colorTo: "indigo"
 
18
 
19
  # Document QA System
20
 
21
+ Document Question-Answering system that utilizes LlamaIndex for document indexing, generation, and retrieval and Gradio for the user interface.
 
 
 
 
 
 
22
 
23
  ## Technologies
24
 
25
  - Data source
26
+ - [Paper about BERT](https://arxiv.org/pdf/1810.04805) located in the data directory are used as the data source for indexing.
27
  - Chunking
28
  - Document chunking is handled by [all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2)
29
  - LLM
30
+ - The system utilizes the `gpt-4o-mini` for generating responses
31
  - Retriever, Reranker
32
+ - gpt-4o-mini is used
33
  - UI
34
  - The user interface is built with Gradio
35
 
 
41
 
42
  - [Install Docker](https://docs.docker.com/get-docker/)
43
 
44
+ 2. **API keys**
45
+ - [OpenAI](https://platform.openai.com/api-keys)
46
+ - [LLamaParse](https://docs.cloud.llamaindex.ai/llamaparse/getting_started/get_an_api_key):
47
 
48
+ ### Using HuggingFace Spaces
49
 
50
+ 1. Follow the link to the [paper-based-rag](https://huggingface.co/spaces/Gepe55o/paper_based_rag) on Spaces.
51
+ 2. Upload your paper for indexing or use the default [paper](https://arxiv.org/pdf/1810.04805) about BERT.
 
52
 
53
  ### Using Docker
54
 
55
+ 1. **Build the Docker Image**:
 
 
 
 
 
 
 
56
 
57
  ```bash
58
  docker build -t doc-qa-system .
59
  ```
60
 
61
+ 2. **Run the Docker Container**:
62
 
63
  ```bash
64
  docker run -p 7860:7860 doc-qa-system
 
66
 
67
  4. **Access the Interface**:
68
 
69
+ - Open your browser and go to `http://localhost:7860`.
70
 
71
  ### Using Python
72
 
73
+ 1. **Install Dependencies**:
 
 
 
 
 
 
 
74
 
75
  ```bash
76
  pip install -r requirements.txt
77
  ```
78
 
79
+ 2. **Add paper to the data directory**:
80
+
81
+ - Add the paper you want to index to the `data` directory or use default [paper](https://arxiv.org/pdf/1810.04805) about BERT.
82
+
83
+ 2. **Run indexing data**:
84
 
85
  ```bash
86
  python index.py
87
  ```
88
 
89
+ 3. **Run the Application**:
90
 
91
  ```bash
92
  python app.py
 
116
 
117
  ## Example questions
118
 
119
+ - What is the pre-training procedure for BERT, and how does it differ from traditional supervised learning?
120
+ - Can you describe how BERT can be fine-tuned for tasks like question answering or sentiment analysis?
 
 
 
__pycache__/main.cpython-312.pyc CHANGED
Binary files a/__pycache__/main.cpython-312.pyc and b/__pycache__/main.cpython-312.pyc differ
 
data/paper.pdf CHANGED
Binary files a/data/paper.pdf and b/data/paper.pdf differ
 
index/default__vector_store.json CHANGED
The diff for this file is too large to render. See raw diff
 
index/docstore.json CHANGED
The diff for this file is too large to render. See raw diff
 
index/index_store.json CHANGED
@@ -1 +1 @@
1
- {"index_store/data": {"8a968a2d-ad62-41b4-8e52-02e2e510beb6": {"__type__": "vector_store", "__data__": "{\"index_id\": \"8a968a2d-ad62-41b4-8e52-02e2e510beb6\", \"summary\": null, \"nodes_dict\": {\"3e9bf844-0a4e-4de1-8be3-8a00f47f9be1\": \"3e9bf844-0a4e-4de1-8be3-8a00f47f9be1\", \"d950eb15-82e3-4c1c-b8bb-d5a7249aadae\": \"d950eb15-82e3-4c1c-b8bb-d5a7249aadae\", \"89d6be11-da41-4dd7-899f-1340a92c4cd2\": \"89d6be11-da41-4dd7-899f-1340a92c4cd2\", \"3a410009-58ad-4f35-9627-bfaa50dd56d8\": \"3a410009-58ad-4f35-9627-bfaa50dd56d8\", \"08a4de1b-58e0-4975-a68c-99b215ddca75\": \"08a4de1b-58e0-4975-a68c-99b215ddca75\", \"703bb83a-4aea-4eb3-85a6-086d25555ccb\": \"703bb83a-4aea-4eb3-85a6-086d25555ccb\", \"f6c2c3db-ba3c-489e-9459-e6b4f579286b\": \"f6c2c3db-ba3c-489e-9459-e6b4f579286b\", \"547f541a-ed82-4d22-af00-51a95dc3f0e1\": \"547f541a-ed82-4d22-af00-51a95dc3f0e1\", \"5bd4a82d-022c-47e6-9bbb-8bdeef20f515\": \"5bd4a82d-022c-47e6-9bbb-8bdeef20f515\", \"de0a20d6-b6dc-4ff3-8b6e-f6ad19472b08\": \"de0a20d6-b6dc-4ff3-8b6e-f6ad19472b08\", \"39abd0c8-e1f5-4ee3-8da1-537353646ec6\": \"39abd0c8-e1f5-4ee3-8da1-537353646ec6\", \"ec59971c-cf54-40e2-9a55-c5de0cdbea76\": \"ec59971c-cf54-40e2-9a55-c5de0cdbea76\", \"ce1695d1-7872-48ae-8589-5b5ed5355234\": \"ce1695d1-7872-48ae-8589-5b5ed5355234\", \"5a2138f4-d397-4d63-9cac-d45d9fe4de7e\": \"5a2138f4-d397-4d63-9cac-d45d9fe4de7e\", \"a2435907-a143-49c8-b483-ee3e8a02ba74\": \"a2435907-a143-49c8-b483-ee3e8a02ba74\", \"b3793ecc-96fc-4f50-bc61-21be9868e23b\": \"b3793ecc-96fc-4f50-bc61-21be9868e23b\", \"c33b63d5-7341-40f1-9016-43201810afd5\": \"c33b63d5-7341-40f1-9016-43201810afd5\", \"ecacd21e-1829-48fa-95ab-5c90846e8dd3\": \"ecacd21e-1829-48fa-95ab-5c90846e8dd3\", \"95509f41-b5f0-4bc4-ba2c-886ad18a6046\": \"95509f41-b5f0-4bc4-ba2c-886ad18a6046\", \"b778bdc3-b7ac-4222-b5f9-8e068507f3a6\": \"b778bdc3-b7ac-4222-b5f9-8e068507f3a6\", \"810ba2d6-65c6-4378-91c4-4ba38f087746\": \"810ba2d6-65c6-4378-91c4-4ba38f087746\", \"27c32a2f-d0a1-4540-90a2-aed3847dc7e4\": \"27c32a2f-d0a1-4540-90a2-aed3847dc7e4\", \"c2ae573a-cfd8-4747-a7c2-ce1d55a0484b\": \"c2ae573a-cfd8-4747-a7c2-ce1d55a0484b\", \"9cc52dba-eaee-481f-b340-5c0a400c28e7\": \"9cc52dba-eaee-481f-b340-5c0a400c28e7\", \"f1116f47-ab33-4225-bb26-ddc62fe95589\": \"f1116f47-ab33-4225-bb26-ddc62fe95589\", \"e72dac24-34a6-4159-818b-d6f023d89f0c\": \"e72dac24-34a6-4159-818b-d6f023d89f0c\", \"00f9b9f2-a717-4ccb-a263-c9c92e3a0604\": \"00f9b9f2-a717-4ccb-a263-c9c92e3a0604\", \"0b352382-f3d6-4693-8571-1762bd92e288\": \"0b352382-f3d6-4693-8571-1762bd92e288\", \"812846d5-bd57-4218-8039-072d4826c457\": \"812846d5-bd57-4218-8039-072d4826c457\", \"c52e3f4a-332f-4829-9c57-c42ad62c4c61\": \"c52e3f4a-332f-4829-9c57-c42ad62c4c61\", \"e32886ff-2b1a-422c-b95b-e421bd43419f\": \"e32886ff-2b1a-422c-b95b-e421bd43419f\", \"fbb1da9d-8adb-456b-a269-3544ffe0f8c3\": \"fbb1da9d-8adb-456b-a269-3544ffe0f8c3\", \"5b74caa6-0e1a-4998-8fce-bc485614f693\": \"5b74caa6-0e1a-4998-8fce-bc485614f693\", \"ae5d7634-5d34-44d1-a4e7-8d200469f0db\": \"ae5d7634-5d34-44d1-a4e7-8d200469f0db\", \"51714cff-a266-4cf3-96f1-bbb555068ce9\": \"51714cff-a266-4cf3-96f1-bbb555068ce9\", \"22d3e563-46d5-4e6a-a7d5-84b175421878\": \"22d3e563-46d5-4e6a-a7d5-84b175421878\", \"04883a01-7aeb-46c7-ab74-6fa9337c61ee\": \"04883a01-7aeb-46c7-ab74-6fa9337c61ee\", \"b7178a9a-baa5-4df6-bf34-fe7e2076eb3f\": \"b7178a9a-baa5-4df6-bf34-fe7e2076eb3f\", \"5a13abdf-cef2-4d15-a4c6-2678fd859672\": \"5a13abdf-cef2-4d15-a4c6-2678fd859672\", \"310337fe-3f15-42a8-a1fd-8a9bfc87f6a4\": \"310337fe-3f15-42a8-a1fd-8a9bfc87f6a4\", \"91ed48ed-da65-4f77-98c0-99f800d0db39\": \"91ed48ed-da65-4f77-98c0-99f800d0db39\", \"5998e668-1c0b-4446-ba84-6386fe51b607\": \"5998e668-1c0b-4446-ba84-6386fe51b607\", \"06456051-5542-40dd-9ddd-87258d76aa23\": \"06456051-5542-40dd-9ddd-87258d76aa23\", \"492b4f97-d056-4cde-bbf6-d2fa2a5b21b0\": \"492b4f97-d056-4cde-bbf6-d2fa2a5b21b0\", \"fa1b2e06-8569-4c40-b557-50ab94a0728d\": \"fa1b2e06-8569-4c40-b557-50ab94a0728d\", \"f70535c7-2605-4c2f-b0fc-4e390501a1e4\": \"f70535c7-2605-4c2f-b0fc-4e390501a1e4\", \"76c294c4-2bf8-4452-9ad4-beb68c0848c3\": \"76c294c4-2bf8-4452-9ad4-beb68c0848c3\", \"c82a6593-cdd2-458f-915a-b0cbba22ba2a\": \"c82a6593-cdd2-458f-915a-b0cbba22ba2a\"}, \"doc_id_dict\": {}, \"embeddings_dict\": {}}"}}}
 
1
+ {"index_store/data": {"b34bf71b-85e4-4e6b-94af-378af061ac96": {"__type__": "vector_store", "__data__": "{\"index_id\": \"b34bf71b-85e4-4e6b-94af-378af061ac96\", \"summary\": null, \"nodes_dict\": {\"8d55e99e-029a-47c6-8fae-5bff1ee7672a\": \"8d55e99e-029a-47c6-8fae-5bff1ee7672a\", \"1d81b4fd-e928-4180-8ef6-a41371ac1fed\": \"1d81b4fd-e928-4180-8ef6-a41371ac1fed\", \"a58908ab-8178-40cd-b2a3-efcaf65228f2\": \"a58908ab-8178-40cd-b2a3-efcaf65228f2\", \"ab890d58-e9ed-49ae-9c78-282a91366161\": \"ab890d58-e9ed-49ae-9c78-282a91366161\", \"9bbcaaca-e282-4f65-a5c2-9a5708a228eb\": \"9bbcaaca-e282-4f65-a5c2-9a5708a228eb\", \"926f43b0-e58c-4e8f-b766-c90f2447361e\": \"926f43b0-e58c-4e8f-b766-c90f2447361e\", \"1a82bbed-7218-4c23-8cb5-402971703a30\": \"1a82bbed-7218-4c23-8cb5-402971703a30\", \"a93a9152-5892-4b84-87c4-d7d7e57b5d64\": \"a93a9152-5892-4b84-87c4-d7d7e57b5d64\", \"abbf6d32-14ba-4c60-850d-cf4429b349e4\": \"abbf6d32-14ba-4c60-850d-cf4429b349e4\", \"f1f06b3a-489c-4861-b5be-72cd1c1d8e80\": \"f1f06b3a-489c-4861-b5be-72cd1c1d8e80\", \"544c5e06-611b-4fe1-93ae-aa3441eb385e\": \"544c5e06-611b-4fe1-93ae-aa3441eb385e\", \"b635c2da-b277-4b49-936d-8ac5939da468\": \"b635c2da-b277-4b49-936d-8ac5939da468\", \"2f892821-777d-4fb6-ac8f-611ef0566d7d\": \"2f892821-777d-4fb6-ac8f-611ef0566d7d\", \"6c2b1028-b0b0-4f92-af51-9660eb0f49f5\": \"6c2b1028-b0b0-4f92-af51-9660eb0f49f5\", \"349ffcb2-b1dd-49ef-b7f7-37406a635c71\": \"349ffcb2-b1dd-49ef-b7f7-37406a635c71\", \"43c7eae6-7d68-47bc-8fd4-8f0584405231\": \"43c7eae6-7d68-47bc-8fd4-8f0584405231\", \"15b81cf1-d657-41b4-a9ff-703f8e5e6fac\": \"15b81cf1-d657-41b4-a9ff-703f8e5e6fac\", \"0af4d994-64d1-4bf4-8624-55ed2e4f7dbd\": \"0af4d994-64d1-4bf4-8624-55ed2e4f7dbd\", \"b6154034-ddde-4fd8-a018-126e613aa014\": \"b6154034-ddde-4fd8-a018-126e613aa014\", \"61d2f504-bc66-41b5-b6b3-662d999e9f60\": \"61d2f504-bc66-41b5-b6b3-662d999e9f60\", \"049233ae-7f17-4b97-b212-478744e93165\": \"049233ae-7f17-4b97-b212-478744e93165\", \"21c9c67d-01b5-49c7-b0c5-7c5856abacbe\": \"21c9c67d-01b5-49c7-b0c5-7c5856abacbe\", \"2b41dc46-a706-4035-8707-8bdac62c2cfc\": \"2b41dc46-a706-4035-8707-8bdac62c2cfc\", \"641ed924-8e3f-4b7c-ae65-66e3dc4da5d5\": \"641ed924-8e3f-4b7c-ae65-66e3dc4da5d5\", \"60786148-3cfd-4e95-ab5d-256991f19a68\": \"60786148-3cfd-4e95-ab5d-256991f19a68\", \"9f8093af-21a6-443b-a6ac-864fb66387f8\": \"9f8093af-21a6-443b-a6ac-864fb66387f8\", \"e1c258d9-0291-4310-9fbd-a17f908a5826\": \"e1c258d9-0291-4310-9fbd-a17f908a5826\", \"6808912a-ceb2-47ba-9281-2f1c06afe3d9\": \"6808912a-ceb2-47ba-9281-2f1c06afe3d9\", \"d1afa468-be5c-4597-8f48-c90574dff711\": \"d1afa468-be5c-4597-8f48-c90574dff711\", \"d854ece0-8e05-4e06-ba7d-442eb5a771eb\": \"d854ece0-8e05-4e06-ba7d-442eb5a771eb\", \"f48e1778-26db-49b0-89e7-e04c961609cc\": \"f48e1778-26db-49b0-89e7-e04c961609cc\", \"f1a6746d-1e02-49e9-be62-360454d78ce3\": \"f1a6746d-1e02-49e9-be62-360454d78ce3\", \"388d7ccb-8037-4295-9e61-5e7bc66581e2\": \"388d7ccb-8037-4295-9e61-5e7bc66581e2\", \"8e29386d-e646-4e6b-8096-3545b568bd44\": \"8e29386d-e646-4e6b-8096-3545b568bd44\", \"48a545c7-3964-4115-88cc-e2df29b360a0\": \"48a545c7-3964-4115-88cc-e2df29b360a0\", \"4ae1f21b-eae6-41ac-a45b-c08dcc7346e5\": \"4ae1f21b-eae6-41ac-a45b-c08dcc7346e5\", \"f7ae2b3e-8e5f-4317-9483-6029a02f4a66\": \"f7ae2b3e-8e5f-4317-9483-6029a02f4a66\", \"cd5b0464-f7f4-465c-894e-93dd7a8f1e77\": \"cd5b0464-f7f4-465c-894e-93dd7a8f1e77\", \"5042ecfe-b092-4370-a34d-f747863465a0\": \"5042ecfe-b092-4370-a34d-f747863465a0\", \"3efbee33-f0fd-4f16-bf5c-3c7a897c1562\": \"3efbee33-f0fd-4f16-bf5c-3c7a897c1562\", \"87a4fe6d-ce60-4893-a3ab-faea7aa65407\": \"87a4fe6d-ce60-4893-a3ab-faea7aa65407\", \"4ee6eb46-8fd4-43f4-b321-1711067a516f\": \"4ee6eb46-8fd4-43f4-b321-1711067a516f\", \"305ade7f-2710-4529-8972-2d49133a67ed\": \"305ade7f-2710-4529-8972-2d49133a67ed\", \"8b857906-4aa7-4a72-9b9b-fe4e7472a16b\": \"8b857906-4aa7-4a72-9b9b-fe4e7472a16b\", \"e4240080-f6c3-485f-b3b8-bc17acedd026\": \"e4240080-f6c3-485f-b3b8-bc17acedd026\", \"7432a915-d9a3-49ac-bda3-f72850d06063\": \"7432a915-d9a3-49ac-bda3-f72850d06063\", \"f6bb6a46-a49b-41ef-b78b-84c7502bebb9\": \"f6bb6a46-a49b-41ef-b78b-84c7502bebb9\", \"7200837e-7b73-4376-b06f-d9b4c910cd3e\": \"7200837e-7b73-4376-b06f-d9b4c910cd3e\", \"bc03e397-6cf3-4852-a898-40b4c4063325\": \"bc03e397-6cf3-4852-a898-40b4c4063325\", \"69f5ade7-59a5-4766-95c0-52706c102651\": \"69f5ade7-59a5-4766-95c0-52706c102651\", \"3ab56ff6-1afe-4901-b46a-0160b56c7a48\": \"3ab56ff6-1afe-4901-b46a-0160b56c7a48\", \"c6a86942-07f0-4b10-a27c-f02d201c542f\": \"c6a86942-07f0-4b10-a27c-f02d201c542f\", \"d4f64e83-6625-43c1-8da5-3551fee253a5\": \"d4f64e83-6625-43c1-8da5-3551fee253a5\"}, \"doc_id_dict\": {}, \"embeddings_dict\": {}}"}}}
main.py CHANGED
@@ -40,7 +40,7 @@ def answer_query(query: str) -> str:
40
  score = node.get_score()
41
  text = node.text
42
 
43
- response += f"\nNode: {node.node_id}\nScore: {score:0.3f}\nText: {text}\n"
44
 
45
  return response
46
 
 
40
  score = node.get_score()
41
  text = node.text
42
 
43
+ response += f"\nNode: {node.node_id}\nScore: {score:0.3f}\nText: {text[:1000]}\n"
44
 
45
  return response
46
 
utils/__pycache__/constant.cpython-312.pyc CHANGED
Binary files a/utils/__pycache__/constant.cpython-312.pyc and b/utils/__pycache__/constant.cpython-312.pyc differ
 
utils/__pycache__/index.cpython-312.pyc CHANGED
Binary files a/utils/__pycache__/index.cpython-312.pyc and b/utils/__pycache__/index.cpython-312.pyc differ
 
utils/__pycache__/retriever.cpython-312.pyc CHANGED
Binary files a/utils/__pycache__/retriever.cpython-312.pyc and b/utils/__pycache__/retriever.cpython-312.pyc differ
 
utils/__pycache__/settings.cpython-312.pyc CHANGED
Binary files a/utils/__pycache__/settings.cpython-312.pyc and b/utils/__pycache__/settings.cpython-312.pyc differ