Spaces:
Sleeping
Sleeping
Юра Цепліцький
commited on
Commit
·
af8b652
1
Parent(s):
0582ac1
Change default paper
Browse files- README.md +23 -41
- __pycache__/main.cpython-312.pyc +0 -0
- data/paper.pdf +0 -0
- index/default__vector_store.json +0 -0
- index/docstore.json +0 -0
- index/index_store.json +1 -1
- main.py +1 -1
- utils/__pycache__/constant.cpython-312.pyc +0 -0
- utils/__pycache__/index.cpython-312.pyc +0 -0
- utils/__pycache__/retriever.cpython-312.pyc +0 -0
- utils/__pycache__/settings.cpython-312.pyc +0 -0
README.md
CHANGED
@@ -1,5 +1,5 @@
|
|
1 |
---
|
2 |
-
title: "
|
3 |
emoji: "📄"
|
4 |
colorFrom: "blue"
|
5 |
colorTo: "indigo"
|
@@ -18,24 +18,18 @@ tags:
|
|
18 |
|
19 |
# Document QA System
|
20 |
|
21 |
-
Document Question-Answering system that utilizes
|
22 |
-
|
23 |
-
## Features
|
24 |
-
|
25 |
-
- **Document Indexing**: Efficiently processes and indexes documents for quick retrieval.
|
26 |
-
- **Interactive Interface**: Provides a user-friendly interface for querying documents.
|
27 |
-
- **Dockerization**: Easy to build and deploy using Docker.
|
28 |
|
29 |
## Technologies
|
30 |
|
31 |
- Data source
|
32 |
-
- [Paper about
|
33 |
- Chunking
|
34 |
- Document chunking is handled by [all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2)
|
35 |
- LLM
|
36 |
-
- The system utilizes the
|
37 |
- Retriever, Reranker
|
38 |
-
-
|
39 |
- UI
|
40 |
- The user interface is built with Gradio
|
41 |
|
@@ -47,30 +41,24 @@ Document Question-Answering system that utilizes Gradio for the interface and Do
|
|
47 |
|
48 |
- [Install Docker](https://docs.docker.com/get-docker/)
|
49 |
|
50 |
-
2. **
|
|
|
|
|
51 |
|
52 |
-
|
53 |
|
54 |
-
|
55 |
-
|
56 |
-
- Update the `CO_API_KEY` and `LLAMA_CLOUD_API_KEY` in `utils/settings.py` in function `configure_settings`.
|
57 |
|
58 |
### Using Docker
|
59 |
|
60 |
-
1. **
|
61 |
-
|
62 |
-
```bash
|
63 |
-
git clone <repository-url>
|
64 |
-
cd <repository-folder>
|
65 |
-
```
|
66 |
-
|
67 |
-
2. **Build the Docker Image**:
|
68 |
|
69 |
```bash
|
70 |
docker build -t doc-qa-system .
|
71 |
```
|
72 |
|
73 |
-
|
74 |
|
75 |
```bash
|
76 |
docker run -p 7860:7860 doc-qa-system
|
@@ -78,30 +66,27 @@ Document Question-Answering system that utilizes Gradio for the interface and Do
|
|
78 |
|
79 |
4. **Access the Interface**:
|
80 |
|
81 |
-
Open your browser and go to `http://localhost:7860`.
|
82 |
|
83 |
### Using Python
|
84 |
|
85 |
-
1. **
|
86 |
-
|
87 |
-
```bash
|
88 |
-
git clone <repository-url>
|
89 |
-
cd <repository-folder>
|
90 |
-
```
|
91 |
-
|
92 |
-
2. **Install Dependencies**:
|
93 |
|
94 |
```bash
|
95 |
pip install -r requirements.txt
|
96 |
```
|
97 |
|
98 |
-
|
|
|
|
|
|
|
|
|
99 |
|
100 |
```bash
|
101 |
python index.py
|
102 |
```
|
103 |
|
104 |
-
|
105 |
|
106 |
```bash
|
107 |
python app.py
|
@@ -131,8 +116,5 @@ Document Question-Answering system that utilizes Gradio for the interface and Do
|
|
131 |
|
132 |
## Example questions
|
133 |
|
134 |
-
- What is
|
135 |
-
-
|
136 |
-
- What are NER types in dataset?
|
137 |
-
- What role does "transfer learning" play in the proposed few-shot learning system?
|
138 |
-
- What metric does the paper use to evaluate the effectiveness of the few-shot model?
|
|
|
1 |
---
|
2 |
+
title: "Paper-based RAG"
|
3 |
emoji: "📄"
|
4 |
colorFrom: "blue"
|
5 |
colorTo: "indigo"
|
|
|
18 |
|
19 |
# Document QA System
|
20 |
|
21 |
+
Document Question-Answering system that utilizes LlamaIndex for document indexing, generation, and retrieval and Gradio for the user interface.
|
|
|
|
|
|
|
|
|
|
|
|
|
22 |
|
23 |
## Technologies
|
24 |
|
25 |
- Data source
|
26 |
+
- [Paper about BERT](https://arxiv.org/pdf/1810.04805) located in the data directory are used as the data source for indexing.
|
27 |
- Chunking
|
28 |
- Document chunking is handled by [all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2)
|
29 |
- LLM
|
30 |
+
- The system utilizes the `gpt-4o-mini` for generating responses
|
31 |
- Retriever, Reranker
|
32 |
+
- gpt-4o-mini is used
|
33 |
- UI
|
34 |
- The user interface is built with Gradio
|
35 |
|
|
|
41 |
|
42 |
- [Install Docker](https://docs.docker.com/get-docker/)
|
43 |
|
44 |
+
2. **API keys**
|
45 |
+
- [OpenAI](https://platform.openai.com/api-keys)
|
46 |
+
- [LLamaParse](https://docs.cloud.llamaindex.ai/llamaparse/getting_started/get_an_api_key):
|
47 |
|
48 |
+
### Using HuggingFace Spaces
|
49 |
|
50 |
+
1. Follow the link to the [paper-based-rag](https://huggingface.co/spaces/Gepe55o/paper_based_rag) on Spaces.
|
51 |
+
2. Upload your paper for indexing or use the default [paper](https://arxiv.org/pdf/1810.04805) about BERT.
|
|
|
52 |
|
53 |
### Using Docker
|
54 |
|
55 |
+
1. **Build the Docker Image**:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
56 |
|
57 |
```bash
|
58 |
docker build -t doc-qa-system .
|
59 |
```
|
60 |
|
61 |
+
2. **Run the Docker Container**:
|
62 |
|
63 |
```bash
|
64 |
docker run -p 7860:7860 doc-qa-system
|
|
|
66 |
|
67 |
4. **Access the Interface**:
|
68 |
|
69 |
+
- Open your browser and go to `http://localhost:7860`.
|
70 |
|
71 |
### Using Python
|
72 |
|
73 |
+
1. **Install Dependencies**:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
74 |
|
75 |
```bash
|
76 |
pip install -r requirements.txt
|
77 |
```
|
78 |
|
79 |
+
2. **Add paper to the data directory**:
|
80 |
+
|
81 |
+
- Add the paper you want to index to the `data` directory or use default [paper](https://arxiv.org/pdf/1810.04805) about BERT.
|
82 |
+
|
83 |
+
2. **Run indexing data**:
|
84 |
|
85 |
```bash
|
86 |
python index.py
|
87 |
```
|
88 |
|
89 |
+
3. **Run the Application**:
|
90 |
|
91 |
```bash
|
92 |
python app.py
|
|
|
116 |
|
117 |
## Example questions
|
118 |
|
119 |
+
- What is the pre-training procedure for BERT, and how does it differ from traditional supervised learning?
|
120 |
+
- Can you describe how BERT can be fine-tuned for tasks like question answering or sentiment analysis?
|
|
|
|
|
|
__pycache__/main.cpython-312.pyc
CHANGED
Binary files a/__pycache__/main.cpython-312.pyc and b/__pycache__/main.cpython-312.pyc differ
|
|
data/paper.pdf
CHANGED
Binary files a/data/paper.pdf and b/data/paper.pdf differ
|
|
index/default__vector_store.json
CHANGED
The diff for this file is too large to render.
See raw diff
|
|
index/docstore.json
CHANGED
The diff for this file is too large to render.
See raw diff
|
|
index/index_store.json
CHANGED
@@ -1 +1 @@
|
|
1 |
-
{"index_store/data": {"
|
|
|
1 |
+
{"index_store/data": {"b34bf71b-85e4-4e6b-94af-378af061ac96": {"__type__": "vector_store", "__data__": "{\"index_id\": \"b34bf71b-85e4-4e6b-94af-378af061ac96\", \"summary\": null, \"nodes_dict\": {\"8d55e99e-029a-47c6-8fae-5bff1ee7672a\": \"8d55e99e-029a-47c6-8fae-5bff1ee7672a\", \"1d81b4fd-e928-4180-8ef6-a41371ac1fed\": \"1d81b4fd-e928-4180-8ef6-a41371ac1fed\", \"a58908ab-8178-40cd-b2a3-efcaf65228f2\": \"a58908ab-8178-40cd-b2a3-efcaf65228f2\", \"ab890d58-e9ed-49ae-9c78-282a91366161\": \"ab890d58-e9ed-49ae-9c78-282a91366161\", \"9bbcaaca-e282-4f65-a5c2-9a5708a228eb\": \"9bbcaaca-e282-4f65-a5c2-9a5708a228eb\", \"926f43b0-e58c-4e8f-b766-c90f2447361e\": \"926f43b0-e58c-4e8f-b766-c90f2447361e\", \"1a82bbed-7218-4c23-8cb5-402971703a30\": \"1a82bbed-7218-4c23-8cb5-402971703a30\", \"a93a9152-5892-4b84-87c4-d7d7e57b5d64\": \"a93a9152-5892-4b84-87c4-d7d7e57b5d64\", \"abbf6d32-14ba-4c60-850d-cf4429b349e4\": \"abbf6d32-14ba-4c60-850d-cf4429b349e4\", \"f1f06b3a-489c-4861-b5be-72cd1c1d8e80\": \"f1f06b3a-489c-4861-b5be-72cd1c1d8e80\", \"544c5e06-611b-4fe1-93ae-aa3441eb385e\": \"544c5e06-611b-4fe1-93ae-aa3441eb385e\", \"b635c2da-b277-4b49-936d-8ac5939da468\": \"b635c2da-b277-4b49-936d-8ac5939da468\", \"2f892821-777d-4fb6-ac8f-611ef0566d7d\": \"2f892821-777d-4fb6-ac8f-611ef0566d7d\", \"6c2b1028-b0b0-4f92-af51-9660eb0f49f5\": \"6c2b1028-b0b0-4f92-af51-9660eb0f49f5\", \"349ffcb2-b1dd-49ef-b7f7-37406a635c71\": \"349ffcb2-b1dd-49ef-b7f7-37406a635c71\", \"43c7eae6-7d68-47bc-8fd4-8f0584405231\": \"43c7eae6-7d68-47bc-8fd4-8f0584405231\", \"15b81cf1-d657-41b4-a9ff-703f8e5e6fac\": \"15b81cf1-d657-41b4-a9ff-703f8e5e6fac\", \"0af4d994-64d1-4bf4-8624-55ed2e4f7dbd\": \"0af4d994-64d1-4bf4-8624-55ed2e4f7dbd\", \"b6154034-ddde-4fd8-a018-126e613aa014\": \"b6154034-ddde-4fd8-a018-126e613aa014\", \"61d2f504-bc66-41b5-b6b3-662d999e9f60\": \"61d2f504-bc66-41b5-b6b3-662d999e9f60\", \"049233ae-7f17-4b97-b212-478744e93165\": \"049233ae-7f17-4b97-b212-478744e93165\", \"21c9c67d-01b5-49c7-b0c5-7c5856abacbe\": \"21c9c67d-01b5-49c7-b0c5-7c5856abacbe\", \"2b41dc46-a706-4035-8707-8bdac62c2cfc\": \"2b41dc46-a706-4035-8707-8bdac62c2cfc\", \"641ed924-8e3f-4b7c-ae65-66e3dc4da5d5\": \"641ed924-8e3f-4b7c-ae65-66e3dc4da5d5\", \"60786148-3cfd-4e95-ab5d-256991f19a68\": \"60786148-3cfd-4e95-ab5d-256991f19a68\", \"9f8093af-21a6-443b-a6ac-864fb66387f8\": \"9f8093af-21a6-443b-a6ac-864fb66387f8\", \"e1c258d9-0291-4310-9fbd-a17f908a5826\": \"e1c258d9-0291-4310-9fbd-a17f908a5826\", \"6808912a-ceb2-47ba-9281-2f1c06afe3d9\": \"6808912a-ceb2-47ba-9281-2f1c06afe3d9\", \"d1afa468-be5c-4597-8f48-c90574dff711\": \"d1afa468-be5c-4597-8f48-c90574dff711\", \"d854ece0-8e05-4e06-ba7d-442eb5a771eb\": \"d854ece0-8e05-4e06-ba7d-442eb5a771eb\", \"f48e1778-26db-49b0-89e7-e04c961609cc\": \"f48e1778-26db-49b0-89e7-e04c961609cc\", \"f1a6746d-1e02-49e9-be62-360454d78ce3\": \"f1a6746d-1e02-49e9-be62-360454d78ce3\", \"388d7ccb-8037-4295-9e61-5e7bc66581e2\": \"388d7ccb-8037-4295-9e61-5e7bc66581e2\", \"8e29386d-e646-4e6b-8096-3545b568bd44\": \"8e29386d-e646-4e6b-8096-3545b568bd44\", \"48a545c7-3964-4115-88cc-e2df29b360a0\": \"48a545c7-3964-4115-88cc-e2df29b360a0\", \"4ae1f21b-eae6-41ac-a45b-c08dcc7346e5\": \"4ae1f21b-eae6-41ac-a45b-c08dcc7346e5\", \"f7ae2b3e-8e5f-4317-9483-6029a02f4a66\": \"f7ae2b3e-8e5f-4317-9483-6029a02f4a66\", \"cd5b0464-f7f4-465c-894e-93dd7a8f1e77\": \"cd5b0464-f7f4-465c-894e-93dd7a8f1e77\", \"5042ecfe-b092-4370-a34d-f747863465a0\": \"5042ecfe-b092-4370-a34d-f747863465a0\", \"3efbee33-f0fd-4f16-bf5c-3c7a897c1562\": \"3efbee33-f0fd-4f16-bf5c-3c7a897c1562\", \"87a4fe6d-ce60-4893-a3ab-faea7aa65407\": \"87a4fe6d-ce60-4893-a3ab-faea7aa65407\", \"4ee6eb46-8fd4-43f4-b321-1711067a516f\": \"4ee6eb46-8fd4-43f4-b321-1711067a516f\", \"305ade7f-2710-4529-8972-2d49133a67ed\": \"305ade7f-2710-4529-8972-2d49133a67ed\", \"8b857906-4aa7-4a72-9b9b-fe4e7472a16b\": \"8b857906-4aa7-4a72-9b9b-fe4e7472a16b\", \"e4240080-f6c3-485f-b3b8-bc17acedd026\": \"e4240080-f6c3-485f-b3b8-bc17acedd026\", \"7432a915-d9a3-49ac-bda3-f72850d06063\": \"7432a915-d9a3-49ac-bda3-f72850d06063\", \"f6bb6a46-a49b-41ef-b78b-84c7502bebb9\": \"f6bb6a46-a49b-41ef-b78b-84c7502bebb9\", \"7200837e-7b73-4376-b06f-d9b4c910cd3e\": \"7200837e-7b73-4376-b06f-d9b4c910cd3e\", \"bc03e397-6cf3-4852-a898-40b4c4063325\": \"bc03e397-6cf3-4852-a898-40b4c4063325\", \"69f5ade7-59a5-4766-95c0-52706c102651\": \"69f5ade7-59a5-4766-95c0-52706c102651\", \"3ab56ff6-1afe-4901-b46a-0160b56c7a48\": \"3ab56ff6-1afe-4901-b46a-0160b56c7a48\", \"c6a86942-07f0-4b10-a27c-f02d201c542f\": \"c6a86942-07f0-4b10-a27c-f02d201c542f\", \"d4f64e83-6625-43c1-8da5-3551fee253a5\": \"d4f64e83-6625-43c1-8da5-3551fee253a5\"}, \"doc_id_dict\": {}, \"embeddings_dict\": {}}"}}}
|
main.py
CHANGED
@@ -40,7 +40,7 @@ def answer_query(query: str) -> str:
|
|
40 |
score = node.get_score()
|
41 |
text = node.text
|
42 |
|
43 |
-
response += f"\nNode: {node.node_id}\nScore: {score:0.3f}\nText: {text}\n"
|
44 |
|
45 |
return response
|
46 |
|
|
|
40 |
score = node.get_score()
|
41 |
text = node.text
|
42 |
|
43 |
+
response += f"\nNode: {node.node_id}\nScore: {score:0.3f}\nText: {text[:1000]}\n"
|
44 |
|
45 |
return response
|
46 |
|
utils/__pycache__/constant.cpython-312.pyc
CHANGED
Binary files a/utils/__pycache__/constant.cpython-312.pyc and b/utils/__pycache__/constant.cpython-312.pyc differ
|
|
utils/__pycache__/index.cpython-312.pyc
CHANGED
Binary files a/utils/__pycache__/index.cpython-312.pyc and b/utils/__pycache__/index.cpython-312.pyc differ
|
|
utils/__pycache__/retriever.cpython-312.pyc
CHANGED
Binary files a/utils/__pycache__/retriever.cpython-312.pyc and b/utils/__pycache__/retriever.cpython-312.pyc differ
|
|
utils/__pycache__/settings.cpython-312.pyc
CHANGED
Binary files a/utils/__pycache__/settings.cpython-312.pyc and b/utils/__pycache__/settings.cpython-312.pyc differ
|
|