paper_based_rag / index /docstore.json
Юра Цепліцький
Change default paper
af8b652
{"docstore/data": {"8d55e99e-029a-47c6-8fae-5bff1ee7672a": {"__data__": {"id_": "8d55e99e-029a-47c6-8fae-5bff1ee7672a", "embedding": null, "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "excluded_embed_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "excluded_llm_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "relationships": {"1": {"node_id": "f94e48ce-e161-46d1-800f-430e38b4962c", "node_type": "4", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "b48eb0eef4d0df2eae6ff59be5cf4fc6a0f132aec4a695ee580d17064cf41786", "class_name": "RelatedNodeInfo"}, "3": {"node_id": "1d81b4fd-e928-4180-8ef6-a41371ac1fed", "node_type": "1", "metadata": {}, "hash": "44018fcff7312a574af64fd046844327e2023ddeda661b59c96732fb610d3359", "class_name": "RelatedNodeInfo"}}, "metadata_template": "{key}: {value}", "metadata_separator": "\n", "text": "arXiv:1810.04805v2 [cs.CL] 24 May 2019\n\n BERT: Pre-training of Deep Bidirectional Transformers for\n Language Understanding\n\n Jacob Devlin Ming-Wei Chang Kenton Lee Kristina Toutanova\n Google AI Language\n {jacobdevlin,mingweichang,kentonl,kristout}@google.com\n\n Abstract\n We introduce a new language representa-\n tion model called BERT, which stands for\n Bidirectional Encoder Representations from\n Transformers. Unlike recent language repre-\n sentation models (Peters et al., 2018a; Rad-\n ford et al., 2018), BERT is designed to pre-\n train deep bidirectional representations from\n unlabeled text by jointly conditioning on both\n left and right context in all layers. As a re-\n sult, the pre-trained BERT model can be fine-\n tuned with just one additional output layer\n to create state-of-the-art models for a wide\n range of tasks, such as question answering and\n language inference, without substantial task-\n specific architecture modifications.\n BERT is conceptually simple and empirically\n powerful. It obtains new state-of-the-art re-\n sults on eleven natural language processing\n tasks, including pushing the GLUE score to\n 80.5% (7.7% point absolute improvement),\n MultiNLI accuracy to 86.7% (4.6% absolute\n improvement), SQuAD v1.1 question answer-\n ing Test F1 to 93.2 (1.5 point absolute im-\n provement) and SQuAD v2.0 Test F1 to 83.1\n (5.1 point absolute improvement).\n1 Introduction\n\nLanguage model pre-training has been shown to\nbe effective for improving many natural language\nprocessing tasks (Dai and Le, 2015; Peters et al.,\n2018a; Radford et al., 2018; Howard and Ruder,\n2018). These include sentence-level tasks such as\nnatural language inference (Bowman et al., 2015;\nWilliams et al., 2018) and paraphrasing (Dolan\nand Brockett, 2005), which aim to predict the re-\nlationships between sentences by analyzing them\nholistically, as well as token-level tasks such as\nnamed entity recognition and question answering,\nwhere models are required to produce fine-grained\noutput at the token level (Tjong Kim Sang and\nDe Meulder, 2003; Rajpurkar et al., 2016).\n There are two existing strategies for apply-\ning pre-trained language representations to down-\nstream tasks: feature-based and fine-tuning. The\nfeature-based approach, such as ELMo (Peters\net al., 2018a), uses task-specific architectures that\ninclude the pre-trained representations as addi-\ntional features. The fine-tuning approach, such as\nthe Generative Pre-trained Transformer (OpenAI\nGPT) (Radford et al., 2018), introduces minimal\ntask-specific parameters, and is trained on the\n\ndownstream tasks by simply fine-tuning all pre-\ntrained parameters. The two approaches share the\nsame objective function during pre-training, where\nthey use unidirectional language models to learn\ngeneral language representations.\n ", "mimetype": "text/plain", "start_char_idx": 0, "end_char_idx": 3179, "metadata_seperator": "\n", "text_template": "{metadata_str}\n\n{content}", "class_name": "TextNode"}, "__type__": "1"}, "1d81b4fd-e928-4180-8ef6-a41371ac1fed": {"__data__": {"id_": "1d81b4fd-e928-4180-8ef6-a41371ac1fed", "embedding": null, "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "excluded_embed_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "excluded_llm_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "relationships": {"1": {"node_id": "f94e48ce-e161-46d1-800f-430e38b4962c", "node_type": "4", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "b48eb0eef4d0df2eae6ff59be5cf4fc6a0f132aec4a695ee580d17064cf41786", "class_name": "RelatedNodeInfo"}, "2": {"node_id": "8d55e99e-029a-47c6-8fae-5bff1ee7672a", "node_type": "1", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "2aca2bfa89e3ba6ce60d62fd9a99c06e1bf5d9cd3ff7a62bdec2bf59ae34d9f9", "class_name": "RelatedNodeInfo"}}, "metadata_template": "{key}: {value}", "metadata_separator": "\n", "text": "We argue that current techniques restrict the\npower of the pre-trained representations, espe-\ncially for the fine-tuning approaches. The ma-\njor limitation is that standard language models are\nunidirectional, and this limits the choice of archi-\n\ntectures that can be used during pre-training. For\nexample, in OpenAI GPT, the authors use a left-to-\nright architecture, where every token can only at-\ntend to previous tokens in the self-attention layers\nof the Transformer (Vaswani et al., 2017). Such re-\nstrictions are sub-optimal for sentence-level tasks,\nand could be very harmful when applying fine-\ntuning based approaches to token-level tasks such\nas question answering, where it is crucial to incor-\nporate context from both directions.\n In this paper, we improve the fine-tuning based\napproaches by proposing BERT: Bidirectional\nEncoder Representations from Transformers.\nBERT alleviates the previously mentioned unidi-\nrectionality constraint by using a \u201cmasked lan-\nguage model\u201d (MLM) pre-training objective, in-\nspired by the Cloze task (Taylor, 1953). The\nmasked language model randomly masks some of\nthe tokens from the input, and the objective is to\npredict the original vocabulary id of the masked", "mimetype": "text/plain", "start_char_idx": 3179, "end_char_idx": 4412, "metadata_seperator": "\n", "text_template": "{metadata_str}\n\n{content}", "class_name": "TextNode"}, "__type__": "1"}, "a58908ab-8178-40cd-b2a3-efcaf65228f2": {"__data__": {"id_": "a58908ab-8178-40cd-b2a3-efcaf65228f2", "embedding": null, "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "excluded_embed_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "excluded_llm_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "relationships": {"1": {"node_id": "dcba1f47-98fb-4b83-8b82-bb0a4e4c9db8", "node_type": "4", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "0b8e1a0c339f566ebd752c36a4dc2922651392267af70af03aa115f95907c2a7", "class_name": "RelatedNodeInfo"}, "3": {"node_id": "ab890d58-e9ed-49ae-9c78-282a91366161", "node_type": "1", "metadata": {}, "hash": "2fd2d87d3c78a3ac270b0311b7a48b60606c775990a2fa99270b055b6b13889e", "class_name": "RelatedNodeInfo"}}, "metadata_template": "{key}: {value}", "metadata_separator": "\n", "text": "word based only on its context. Unlike left-to-\nright language model pre-training, the MLM ob-\njective enables the representation to fuse the left\nand the right context, which allows us to pre-\ntrain a deep bidirectional Transformer. In addi-\ntion to the masked language model, we also use\na \u201cnext sentence prediction\u201d task that jointly pre-\ntrains text-pair representations. The contributions\nof our paper are as follows:\n \u2022 We demonstrate the importance of bidirectional\n pre-training for language representations. Un-\n like Radford et al. (2018), which uses unidirec-\n tional language models for pre-training, BERT\n uses masked language models to enable pre-\n trained deep bidirectional representations. This\n is also in contrast to Peters et al. ", "mimetype": "text/plain", "start_char_idx": 0, "end_char_idx": 762, "metadata_seperator": "\n", "text_template": "{metadata_str}\n\n{content}", "class_name": "TextNode"}, "__type__": "1"}, "ab890d58-e9ed-49ae-9c78-282a91366161": {"__data__": {"id_": "ab890d58-e9ed-49ae-9c78-282a91366161", "embedding": null, "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "excluded_embed_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "excluded_llm_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "relationships": {"1": {"node_id": "dcba1f47-98fb-4b83-8b82-bb0a4e4c9db8", "node_type": "4", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "0b8e1a0c339f566ebd752c36a4dc2922651392267af70af03aa115f95907c2a7", "class_name": "RelatedNodeInfo"}, "2": {"node_id": "a58908ab-8178-40cd-b2a3-efcaf65228f2", "node_type": "1", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "edd6518c76da299b4e2642067d6fafe03a3f884aa22306e6a84a97607f1a363c", "class_name": "RelatedNodeInfo"}, "3": {"node_id": "9bbcaaca-e282-4f65-a5c2-9a5708a228eb", "node_type": "1", "metadata": {}, "hash": "db50d03f28b4aa8423179f882d2286b8c1ec0e6320acd93a52d396276d66c2ea", "class_name": "RelatedNodeInfo"}}, "metadata_template": "{key}: {value}", "metadata_separator": "\n", "text": "(2018a), which\n uses a shallow concatenation of independently\n trained left-to-right and right-to-left LMs.\n\n \u2022 We show that pre-trained representations reduce\n the need for many heavily-engineered task-\n specific architectures. BERT is the first fine-\n tuning based representation model that achieves\n state-of-the-art performance on a large suite\n of sentence-level and token-level tasks, outper-\n forming many task-specific architectures.\n \u2022 BERT advances the state of the art for eleven\n NLP tasks. The code and pre-trained mod-\n els are available at https://github.com/\n google-research/bert.\n\n2 Related Work\n\nThere is a long history of pre-training general lan-\nguage representations, and we briefly review the\nmost widely-used approaches in this section.\n2.1 Unsupervised Feature-based Approaches\nLearning widely applicable representations of\nwords has been an active area of research for\ndecades, including non-neural (Brown et al., 1992;\nAndo and Zhang, 2005; Blitzer et al., 2006) and\nneural (Mikolov et al., 2013; Pennington et al.,\n2014) methods. Pre-trained word embeddings\nare an integral part of modern NLP systems, of-\nfering significant improvements over embeddings\nlearned from scratch (Turian et al., 2010). To pre-\ntrain word embedding vectors, left-to-right lan-\nguage modeling objectives have been used (Mnih\nand Hinton, 2009), as well as objectives to dis-\ncriminate correct from incorrect words in left and\nright context (Mikolov et al., 2013).\n These approaches have been generalized to\ncoarser granularities, such as sentence embed-\ndings (Kiros et al., 2015; Logeswaran and Lee,\n2018) or paragraph embeddings (Le and Mikolov,\n2014). To train sentence representations, prior\nwork has used objectives to rank candidate next\nsentences (Jernite et al., 2017; Logeswaran and\nLee, 2018), left-to-right generation of next sen-\ntence words given a representation of the previous\nsentence (Kiros et al., 2015), or denoising auto-\nencoder derived objectives (Hill et al., 2016).\n ELMo and its predecessor (Peters et al., 2017,\n2018a) generalize traditional word embedding re-\nsearch along a different dimension. They extract\ncontext-sensitive features from a left-to-right and a\nright-to-left language model. The contextual rep-\nresentation of each token is the concatenation of\nthe left-to-right and right-to-left representations.\nWhen integrating contextual word embeddings\nwith existing task-specific architectures, ELMo\nadvances the state of the art for several major NLP\nbenchmarks (Peters et al., 2018a) including ques-\ntion answering (Rajpurkar et al., 2016), sentiment\nanalysis (Socher et al., 2013), and named entity\nrecognition (Tjong Kim Sang and De Meulder,\n2003). Melamud et al. (2016) proposed learning\ncontextual representations through a task to pre-\ndict a single word from both left and right context\nusing LSTMs. Similar to ELMo, their model is\nfeature-based and not deeply bidirectional. ", "mimetype": "text/plain", "start_char_idx": 762, "end_char_idx": 3740, "metadata_seperator": "\n", "text_template": "{metadata_str}\n\n{content}", "class_name": "TextNode"}, "__type__": "1"}, "9bbcaaca-e282-4f65-a5c2-9a5708a228eb": {"__data__": {"id_": "9bbcaaca-e282-4f65-a5c2-9a5708a228eb", "embedding": null, "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "excluded_embed_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "excluded_llm_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "relationships": {"1": {"node_id": "dcba1f47-98fb-4b83-8b82-bb0a4e4c9db8", "node_type": "4", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "0b8e1a0c339f566ebd752c36a4dc2922651392267af70af03aa115f95907c2a7", "class_name": "RelatedNodeInfo"}, "2": {"node_id": "ab890d58-e9ed-49ae-9c78-282a91366161", "node_type": "1", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "c48e35db928469b51656e9cd7516a4f31009d3c2ebcf1de606c315125da340d6", "class_name": "RelatedNodeInfo"}}, "metadata_template": "{key}: {value}", "metadata_separator": "\n", "text": "Fedus\net al. (2018) shows that the cloze task can be used\nto improve the robustness of text generation mod-\nels.\n\n2.2 Unsupervised Fine-tuning Approaches\nAs with the feature-based approaches, the first\nworks in this direction only pre-trained word em-\nbedding parameters from unlabeled text (Col-\nlobert and Weston, 2008).\n More recently, sentence or document encoders\nwhich produce contextual token representations\nhave been pre-trained from unlabeled text and\nfine-tuned for a supervised downstream task (Dai\nand Le, 2015; Howard and Ruder, 2018; Radford\net al., 2018). The advantage of these approaches\nis that few parameters need to be learned from\nscratch. At least partly due to this advantage,\nOpenAI GPT (Radford et al., 2018) achieved pre-\nviously state-of-the-art results on many sentence-\nlevel tasks from the GLUE benchmark (Wang\net al., 2018a). Left-to-right language model-", "mimetype": "text/plain", "start_char_idx": 3740, "end_char_idx": 4661, "metadata_seperator": "\n", "text_template": "{metadata_str}\n\n{content}", "class_name": "TextNode"}, "__type__": "1"}, "926f43b0-e58c-4e8f-b766-c90f2447361e": {"__data__": {"id_": "926f43b0-e58c-4e8f-b766-c90f2447361e", "embedding": null, "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "excluded_embed_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "excluded_llm_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "relationships": {"1": {"node_id": "f9edbae8-ed8d-4147-803e-936500d2d60f", "node_type": "4", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "9df513b63b8b262211a3d8eb8604c1aaa8b1b593862148aa0963a289adb8e421", "class_name": "RelatedNodeInfo"}, "3": {"node_id": "1a82bbed-7218-4c23-8cb5-402971703a30", "node_type": "1", "metadata": {}, "hash": "6ac57136a445edf51d9d665746d3d0bd3c173d180cb8bc84e4a0fc41654c755b", "class_name": "RelatedNodeInfo"}}, "metadata_template": "{key}: {value}", "metadata_separator": "\n", "text": " NSP Mask LM Mask LM MNLI NER SQuAD Start/End Span\n\n C T 1 ... T N T [SEP]T \u20191 ... T \u2019M C T 1 ... T N T [SEP]T \u20191 ... T \u2019M\n\n BERT BERT BERT\n E [CLS]E 1 ... E N E [SEP]E \u20191 ... E \u2019M E [CLS]E 1 ... E N E [SEP] E \u20191 ... E \u2019M\n\n [CLS] Tok 1... Tok N [SEP] Tok 1... TokM [CLS] Tok 1... Tok N [SEP] Tok 1... TokM\n\n Masked Sentence A Masked Sentence B Question Paragraph\n\n Unlabeled Sentence A and B Pair Question Answer Pair\n\n Pre-training Fine-Tuning\n\n Figure 1: Overall pre-training and fine-tuning procedures for BERT. Apart from output layers, the same architec-\n tures are used in both pre-training and fine-tuning. ", "mimetype": "text/plain", "start_char_idx": 0, "end_char_idx": 1412, "metadata_seperator": "\n", "text_template": "{metadata_str}\n\n{content}", "class_name": "TextNode"}, "__type__": "1"}, "1a82bbed-7218-4c23-8cb5-402971703a30": {"__data__": {"id_": "1a82bbed-7218-4c23-8cb5-402971703a30", "embedding": null, "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "excluded_embed_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "excluded_llm_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "relationships": {"1": {"node_id": "f9edbae8-ed8d-4147-803e-936500d2d60f", "node_type": "4", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "9df513b63b8b262211a3d8eb8604c1aaa8b1b593862148aa0963a289adb8e421", "class_name": "RelatedNodeInfo"}, "2": {"node_id": "926f43b0-e58c-4e8f-b766-c90f2447361e", "node_type": "1", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "42369be9595563bb5dd96a4dde08a6e205223c2c6f3edb12c542a50a943a6e42", "class_name": "RelatedNodeInfo"}, "3": {"node_id": "a93a9152-5892-4b84-87c4-d7d7e57b5d64", "node_type": "1", "metadata": {}, "hash": "d2427e562848251c49292d250a0d4e843d7b44186198c1c8f968486e8bfd40a3", "class_name": "RelatedNodeInfo"}}, "metadata_template": "{key}: {value}", "metadata_separator": "\n", "text": "The same pre-trained model parameters are used to initialize\n models for different down-stream tasks. During fine-tuning, all parameters are fine-tuned. [CLS] is a special\n symbol added in front of every input example, and [SEP] is a special separator token (e.g. separating ques-\n tions/answers).\n\n", "mimetype": "text/plain", "start_char_idx": 1412, "end_char_idx": 1774, "metadata_seperator": "\n", "text_template": "{metadata_str}\n\n{content}", "class_name": "TextNode"}, "__type__": "1"}, "a93a9152-5892-4b84-87c4-d7d7e57b5d64": {"__data__": {"id_": "a93a9152-5892-4b84-87c4-d7d7e57b5d64", "embedding": null, "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "excluded_embed_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "excluded_llm_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "relationships": {"1": {"node_id": "f9edbae8-ed8d-4147-803e-936500d2d60f", "node_type": "4", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "9df513b63b8b262211a3d8eb8604c1aaa8b1b593862148aa0963a289adb8e421", "class_name": "RelatedNodeInfo"}, "2": {"node_id": "1a82bbed-7218-4c23-8cb5-402971703a30", "node_type": "1", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "fc2e237c06d17ca2f6b8044e119b7a0ef6543d27c24ae314a1c3616d6ea6467e", "class_name": "RelatedNodeInfo"}}, "metadata_template": "{key}: {value}", "metadata_separator": "\n", "text": "ing and auto-encoder objectives have been used\nfor pre-training such models (Howard and Ruder,\n2018; Radford et al., 2018; Dai and Le, 2015).\n2.3 Transfer Learning from Supervised Data\nThere has also been work showing effective trans-\nfer from supervised tasks with large datasets, such\nas natural language inference (Conneau et al.,\n2017) and machine translation (McCann et al.,\n2017). Computer vision research has also demon-\nstrated the importance of transfer learning from\nlarge pre-trained models, where an effective recipe\nis to fine-tune models pre-trained with Ima-\ngeNet (Deng et al., 2009; Yosinski et al., 2014).\n3 BERT\n\nWe introduce BERT and its detailed implementa-\ntion in this section. There are two steps in our\nframework: pre-training and fine-tuning.\ning pre-training, the model is trained on unlabeled\ndata over different pre-training tasks. For fine-\ntuning, the BERT model is first initialized with\nthe pre-trained parameters, and all of the param-\neters are fine-tuned using labeled data from the\ndownstream tasks. Each downstream task has sep-\narate fine-tuned models, even though they are ini-\ntialized with the same pre-trained parameters. The\nquestion-answering example in Figure 1 will serve\nas a running example for this section.\n A distinctive feature of BERT is its unified ar-\nchitecture across different tasks. There is mini-\n mal difference between the pre-trained architec-\n ture and the final downstream architecture.\n Model Architecture BERT\u2019s model architec-\n ture is a multi-layer bidirectional Transformer en-\n coder based on the original implementation de-\n scribed in Vaswani et al. (2017) and released in\n the tensor2tensor library.1 Because the use\n of Transformers has become common and our im-\n plementation is almost identical to the original,\n we will omit an exhaustive background descrip-\n tion of the model architecture and refer readers to\n Vaswani et al. (2017) as well as excellent guides\n such as \u201cThe Annotated Transformer.\u201d2In this work, we denote the number of layers\n (i.e., Transformer blocks) as L, the hidden size as\n H, and the number of self-attention heads as A.3\n We primarily report results on two model sizes:\n BERTBASE (L=12, H=768, A=12, Total Param-\nDur- eters=110M) and BERTLARGE (L=24, H=1024,\n A=16, Total Parameters=340M).\n BERTBASE was chosen to have the same model\n size as OpenAI GPT for comparison purposes.\n Critically, however, the BERT Transformer uses\n bidirectional self-attention, while the GPT Trans-\n former uses constrained self-attention where every\n token can only attend to context to its left.4\n 1https://github.com/tensorflow/tensor2tensor\n 2http://nlp.seas.harvard.edu/2018/04/03/attention.html\n 3In all cases we set the feed-forward/filter size to be 4H,\n i.e., 3072 for the H = 768 and 4096 for the H = 1024.\n 4We note that in the literature the bidirectional Trans-", "mimetype": "text/plain", "start_char_idx": 1774, "end_char_idx": 5002, "metadata_seperator": "\n", "text_template": "{metadata_str}\n\n{content}", "class_name": "TextNode"}, "__type__": "1"}, "abbf6d32-14ba-4c60-850d-cf4429b349e4": {"__data__": {"id_": "abbf6d32-14ba-4c60-850d-cf4429b349e4", "embedding": null, "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "excluded_embed_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "excluded_llm_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "relationships": {"1": {"node_id": "cc193b63-6a56-496d-92cc-812a0a7cf204", "node_type": "4", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "c504ea5b9924961a3c9ca9f79fe234f1e39fed524dcae1c6501d8e27d6c1815d", "class_name": "RelatedNodeInfo"}, "3": {"node_id": "f1f06b3a-489c-4861-b5be-72cd1c1d8e80", "node_type": "1", "metadata": {}, "hash": "e5f87bf6b7e315ec301f7fa560aefe8486181803b4dd4bbfbbba0af61ff807a6", "class_name": "RelatedNodeInfo"}}, "metadata_template": "{key}: {value}", "metadata_separator": "\n", "text": "Input/Output Representations To make BERT\nhandle a variety of down-stream tasks, our input\nrepresentation is able to unambiguously represent\nboth a single sentence and a pair of sentences\n(e.g., \u3008 Question, Answer \u3009) in one token sequence.\nThroughout this work, a \u201csentence\u201d can be an arbi-\ntrary span of contiguous text, rather than an actual\nlinguistic sentence. A \u201csequence\u201d refers to the in-\nput token sequence to BERT, which may be a sin-\ngle sentence or two sentences packed together.\n We use WordPiece embeddings (Wu et al.,\n2016) with a 30,000 token vocabulary. The first\ntoken of every sequence is always a special clas-\nsification token ([CLS]). The final hidden state\ncorresponding to this token is used as the ag-\ngregate sequence representation for classification\ntasks. Sentence pairs are packed together into a\nsingle sequence. We differentiate the sentences in\ntwo ways. First, we separate them with a special\ntoken ([SEP]). Second, we add a learned embed-\nding to every token indicating whether it belongs\nto sentence A or sentence B. As shown in Figure 1,\nwe denote input embedding as E, the final hidden\nvector of the special [CLS] token as C \u2208 RH ,\nand the final hidden vector for the ith input token\nas Ti \u2208 RH .\n For a given token, its input representation is\nconstructed by summing the corresponding token,\nsegment, and position embeddings. A visualiza-\ntion of this construction can be seen in Figure 2.\n\n3.1 Pre-training BERT\nUnlike Peters et al. ", "mimetype": "text/plain", "start_char_idx": 0, "end_char_idx": 1495, "metadata_seperator": "\n", "text_template": "{metadata_str}\n\n{content}", "class_name": "TextNode"}, "__type__": "1"}, "f1f06b3a-489c-4861-b5be-72cd1c1d8e80": {"__data__": {"id_": "f1f06b3a-489c-4861-b5be-72cd1c1d8e80", "embedding": null, "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "excluded_embed_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "excluded_llm_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "relationships": {"1": {"node_id": "cc193b63-6a56-496d-92cc-812a0a7cf204", "node_type": "4", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "c504ea5b9924961a3c9ca9f79fe234f1e39fed524dcae1c6501d8e27d6c1815d", "class_name": "RelatedNodeInfo"}, "2": {"node_id": "abbf6d32-14ba-4c60-850d-cf4429b349e4", "node_type": "1", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "41b6d9a30bcbf21c8b5930fbeae0cc97e5eaf1a435c890cf50c611130ed5513a", "class_name": "RelatedNodeInfo"}, "3": {"node_id": "544c5e06-611b-4fe1-93ae-aa3441eb385e", "node_type": "1", "metadata": {}, "hash": "94a6a070021b29b916813b97eea6cc8ae87ff0c254afc72d7cdff7b398df6015", "class_name": "RelatedNodeInfo"}}, "metadata_template": "{key}: {value}", "metadata_separator": "\n", "text": "(2018a) and Radford et al.\n(2018), we do not use traditional left-to-right or\nright-to-left language models to pre-train BERT.\nInstead, we pre-train BERT using two unsuper-\nvised tasks, described in this section. This step\nis presented in the left part of Figure 1.\n\nTask #1: Masked LM Intuitively, it is reason-\nable to believe that a deep bidirectional model is\nstrictly more powerful than either a left-to-right\nmodel or the shallow concatenation of a left-to-\nright and a right-to-left model. Unfortunately,\nstandard conditional language models can only be\ntrained left-to-right or right-to-left, since bidirec-\ntional conditioning would allow each word to in-\ndirectly \u201csee itself\u201d, and the model could trivially\npredict the target word in a multi-layered context.\nformer is often referred to as a \u201cTransformer encoder\u201d while\nthe left-context-only version is referred to as a \u201cTransformer\ndecoder\u201d since it can be used for text generation.\n In order to train a deep bidirectional representa-\ntion, we simply mask some percentage of the input\ntokens at random, and then predict those masked\ntokens. We refer to this procedure as a \u201cmasked\nLM\u201d (MLM), although it is often referred to as a\nCloze task in the literature (Taylor, 1953). In this\ncase, the final hidden vectors corresponding to the\nmask tokens are fed into an output softmax over\nthe vocabulary, as in a standard LM. In all of our\nexperiments, we mask 15% of all WordPiece to-\nkens in each sequence at random. In contrast to\ndenoising auto-encoders (Vincent et al., 2008), we\nonly predict the masked words rather than recon-\nstructing the entire input.\n Although this allows us to obtain a bidirec-\ntional pre-trained model, a downside is that we\nare creating a mismatch between pre-training and\nfine-tuning, since the [MASK] token does not ap-\npear during fine-tuning. To mitigate this, we do\nnot always replace \u201cmasked\u201d words with the ac-\ntual [MASK] token. The training data generator\nchooses 15% of the token positions at random for\nprediction. If the i-th token is chosen, we replace\nthe i-th token with (1) the [MASK] token 80% of\nthe time (2) a random token 10% of the time (3)\nthe unchanged i-th token 10% of the time. Then,\nTi will be used to predict the original token with\ncross entropy loss. ", "mimetype": "text/plain", "start_char_idx": 1495, "end_char_idx": 3796, "metadata_seperator": "\n", "text_template": "{metadata_str}\n\n{content}", "class_name": "TextNode"}, "__type__": "1"}, "544c5e06-611b-4fe1-93ae-aa3441eb385e": {"__data__": {"id_": "544c5e06-611b-4fe1-93ae-aa3441eb385e", "embedding": null, "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "excluded_embed_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "excluded_llm_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "relationships": {"1": {"node_id": "cc193b63-6a56-496d-92cc-812a0a7cf204", "node_type": "4", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "c504ea5b9924961a3c9ca9f79fe234f1e39fed524dcae1c6501d8e27d6c1815d", "class_name": "RelatedNodeInfo"}, "2": {"node_id": "f1f06b3a-489c-4861-b5be-72cd1c1d8e80", "node_type": "1", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "7171358e6db4a7d03cbed4a1dde37cc4c3afdd7ed30c429f837a996a188fa018", "class_name": "RelatedNodeInfo"}}, "metadata_template": "{key}: {value}", "metadata_separator": "\n", "text": "We compare variations of this\nprocedure in Appendix C.2.\n\nTask #2: Next Sentence Prediction (NSP)\nMany important downstream tasks such as Ques-\ntion Answering (QA) and Natural Language Infer-\nence (NLI) are based on understanding the rela-\ntionship between two sentences, which is not di-\nrectly captured by language modeling. In order\nto train a model that understands sentence rela-\ntionships, we pre-train for a binarized next sen-\ntence prediction task that can be trivially gener-\nated from any monolingual corpus. Specifically,\nwhen choosing the sentences A and B for each pre-\ntraining example, 50% of the time B is the actual\nnext sentence that follows A (labeled as IsNext),\nand 50% of the time it is a random sentence from\nthe corpus (labeled as NotNext). As we show\nin Figure 1, C is used for next sentence predic-\ntion (NSP).5 Despite its simplicity, we demon-\nstrate in Section 5.1 that pre-training towards this\ntask is very beneficial to both QA and NLI. 6\n 5The final model achieves 97%-98% accuracy on NSP.\n 6The vector C is not a meaningful sentence representation\nwithout fine-tuning, since it was trained with NSP.", "mimetype": "text/plain", "start_char_idx": 3796, "end_char_idx": 4959, "metadata_seperator": "\n", "text_template": "{metadata_str}\n\n{content}", "class_name": "TextNode"}, "__type__": "1"}, "b635c2da-b277-4b49-936d-8ac5939da468": {"__data__": {"id_": "b635c2da-b277-4b49-936d-8ac5939da468", "embedding": null, "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "excluded_embed_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "excluded_llm_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "relationships": {"1": {"node_id": "5a8026ec-62da-4652-ab01-3c38b64fda7c", "node_type": "4", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "2528f15cb7e33e1a6812665686be884401bdcef4d6c11d2f8bdc665e233bd6d4", "class_name": "RelatedNodeInfo"}, "3": {"node_id": "2f892821-777d-4fb6-ac8f-611ef0566d7d", "node_type": "1", "metadata": {}, "hash": "977dccbf7c4edab6ebb97aa97345b2293f6e71f1dec073e7975da20a780e2356", "class_name": "RelatedNodeInfo"}}, "metadata_template": "{key}: {value}", "metadata_separator": "\n", "text": " Input [CLS] my dog is cute [SEP] he likes play ##ing [SEP]\n\n Token E [CLS] Emy Edog Eis Ecute E[SEP] Ehe Elikes Eplay E##ing E[SEP]\n Embeddings\n\n Segment E A E A E A E A E A E A E B E B E B E B E B\n Embeddings\n\n Position E 0 E 1 E 2 E 3 E 4 E 5 E 6 E 7 E 8 E 9 E 10\n Embeddings\n\n Figure 2: BERT input representation. The input embeddings are the sum of the token embeddings, the segmenta-\n tion embeddings and the position embeddings.\n\nThe NSP task is closely related to representation-\nlearning objectives used in Jernite et al. ", "mimetype": "text/plain", "start_char_idx": 0, "end_char_idx": 989, "metadata_seperator": "\n", "text_template": "{metadata_str}\n\n{content}", "class_name": "TextNode"}, "__type__": "1"}, "2f892821-777d-4fb6-ac8f-611ef0566d7d": {"__data__": {"id_": "2f892821-777d-4fb6-ac8f-611ef0566d7d", "embedding": null, "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "excluded_embed_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "excluded_llm_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "relationships": {"1": {"node_id": "5a8026ec-62da-4652-ab01-3c38b64fda7c", "node_type": "4", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "2528f15cb7e33e1a6812665686be884401bdcef4d6c11d2f8bdc665e233bd6d4", "class_name": "RelatedNodeInfo"}, "2": {"node_id": "b635c2da-b277-4b49-936d-8ac5939da468", "node_type": "1", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "984d97b14cd67774554e50dda8a526787755c1ed32bce97782170293021e249b", "class_name": "RelatedNodeInfo"}, "3": {"node_id": "6c2b1028-b0b0-4f92-af51-9660eb0f49f5", "node_type": "1", "metadata": {}, "hash": "a393d71d25b2c080f8870137cad5e1782ac540d1a08dcad32606faaa9d4f469a", "class_name": "RelatedNodeInfo"}}, "metadata_template": "{key}: {value}", "metadata_separator": "\n", "text": "(2017) and\nLogeswaran and Lee (2018). However, in prior\nwork, only sentence embeddings are transferred to\ndown-stream tasks, where BERT transfers all pa-\nrameters to initialize end-task model parameters.\n\nPre-training data The pre-training procedure\nlargely follows the existing literature on language\nmodel pre-training. For the pre-training corpus we\nuse the BooksCorpus (800M words) (Zhu et al.,\n2015) and English Wikipedia (2,500M words).\nFor Wikipedia we extract only the text passages\nand ignore lists, tables, and headers. It is criti-\ncal to use a document-level corpus rather than a\nshuffled sentence-level corpus such as the Billion\nWord Benchmark (Chelba et al., 2013) in order to\nextract long contiguous sequences.\n3.2 Fine-tuning BERT\nFine-tuning is straightforward since the self-\nattention mechanism in the Transformer al-\nlows BERT to model many downstream tasks\u2014\nwhether they involve single text or text pairs\u2014by\nswapping out the appropriate inputs and outputs.\nFor applications involving text pairs, a common\npattern is to independently encode text pairs be-\nfore applying bidirectional cross attention, such\nas Parikh et al. (2016); Seo et al. (2017). BERT\ninstead uses the self-attention mechanism to unify\nthese two stages, as encoding a concatenated text\npair with self-attention effectively includes bidi-\nrectional cross attention between two sentences.\n For each task, we simply plug in the task-\nspecific inputs and outputs into BERT and fine-\ntune all the parameters end-to-end. At the in-\nput, sentence A and sentence B from pre-training\nare analogous to (1) sentence pairs in paraphras-\ning, (2) hypothesis-premise pairs in entailment, (3)\nquestion-passage pairs in question answering, and\n(4) a degenerate text-\u2205 pair in text classification\nor sequence tagging. At the output, the token rep-\nresentations are fed into an output layer for token-\nlevel tasks, such as sequence tagging or question\nanswering, and the [CLS] representation is fed\ninto an output layer for classification, such as en-\ntailment or sentiment analysis.\n Compared to pre-training, fine-tuning is rela-\ntively inexpensive. All of the results in the pa-\nper can be replicated in at most 1 hour on a sin-\ngle Cloud TPU, or a few hours on a GPU, starting\nfrom the exact same pre-trained model.7 We de-\nscribe the task-specific details in the correspond-\ning subsections of Section 4. More details can be\nfound in Appendix A.5.\n\n4 Experiments\nIn this section, we present BERT fine-tuning re-\nsults on 11 NLP tasks.\n4.1 GLUE\nThe General Language Understanding Evaluation\n(GLUE) benchmark (Wang et al., 2018a) is a col-\nlection of diverse natural language understanding\ntasks. Detailed descriptions of GLUE datasets are\nincluded in Appendix B.1.\n To fine-tune on GLUE, we represent the input\nsequence (for single sentence or sentence pairs)\nas described in Section 3, and use the final hid-\nden vector C \u2208 RH corresponding to the first\ninput token ([CLS]) as the aggregate representa-\ntion. The only new parameters introduced during\nfine-tuning are classification layer weights W \u2208\nRK\u00d7H , where K is the number of labels. ", "mimetype": "text/plain", "start_char_idx": 989, "end_char_idx": 4141, "metadata_seperator": "\n", "text_template": "{metadata_str}\n\n{content}", "class_name": "TextNode"}, "__type__": "1"}, "6c2b1028-b0b0-4f92-af51-9660eb0f49f5": {"__data__": {"id_": "6c2b1028-b0b0-4f92-af51-9660eb0f49f5", "embedding": null, "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "excluded_embed_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "excluded_llm_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "relationships": {"1": {"node_id": "5a8026ec-62da-4652-ab01-3c38b64fda7c", "node_type": "4", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "2528f15cb7e33e1a6812665686be884401bdcef4d6c11d2f8bdc665e233bd6d4", "class_name": "RelatedNodeInfo"}, "2": {"node_id": "2f892821-777d-4fb6-ac8f-611ef0566d7d", "node_type": "1", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "b1b75cf9b57f383786b17095af33f3371cec5a2680d9146d9a5a6747db75b6c9", "class_name": "RelatedNodeInfo"}}, "metadata_template": "{key}: {value}", "metadata_separator": "\n", "text": "We com-\npute a standard classification loss with C and W ,\ni.e., log(softmax(CW T )).\n 7For example, the BERT SQuAD model can be trained in\naround 30 minutes on a single Cloud TPU to achieve a Dev\nF1 score of 91.0%.\n 8See (10) in https://gluebenchmark.com/faq.", "mimetype": "text/plain", "start_char_idx": 4141, "end_char_idx": 4407, "metadata_seperator": "\n", "text_template": "{metadata_str}\n\n{content}", "class_name": "TextNode"}, "__type__": "1"}, "349ffcb2-b1dd-49ef-b7f7-37406a635c71": {"__data__": {"id_": "349ffcb2-b1dd-49ef-b7f7-37406a635c71", "embedding": null, "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "excluded_embed_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "excluded_llm_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "relationships": {"1": {"node_id": "bc534cc6-e535-4b99-b43b-605d5b37174d", "node_type": "4", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "94088816dc336558cdd166bf902f577f63147d180695eeb1688c3f56347193c3", "class_name": "RelatedNodeInfo"}, "3": {"node_id": "43c7eae6-7d68-47bc-8fd4-8f0584405231", "node_type": "1", "metadata": {}, "hash": "c22399c625eba9e8f4f7b55bb2b397fc0836581dc8867be47199a10af00c43ac", "class_name": "RelatedNodeInfo"}}, "metadata_template": "{key}: {value}", "metadata_separator": "\n", "text": " System MNLI-(m/mm) QQP QNLI SST-2 CoLA STS-B MRPC RTE Average\n 392k 363k 108k 67k 8.5k 5.7k 3.5k 2.5k -\n Pre-OpenAI SOTA 80.6/80.1 66.1 82.3 93.2 35.0 81.0 86.0 61.7 74.0\n BiLSTM+ELMo+Attn 76.4/76.1 64.8 79.8 90.4 36.0 73.3 84.9 56.8 71.0\n OpenAI GPT 82.1/81.4 70.3 87.4 91.3 45.4 80.0 82.3 56.0 75.1\n BERTBASE 84.6/83.4 71.2 90.5 93.5 52.1 85.8 88.9 66.4 79.6\n BERTLARGE 86.7/85.9 72.1 92.7 94.9 60.5 86.5 89.3 70.1 82.1\n\n Table 1: GLUE Test results, scored by the evaluation server (https://gluebenchmark.com/leaderboard).\n The number below each task denotes the number of training examples. The \u201cAverage\u201d column is slightly different\n than the official GLUE score, since we exclude the problematic WNLI set.8 BERT and OpenAI GPT are single-\n model, single task. F1 scores are reported for QQP and MRPC, Spearman correlations are reported for STS-B, and\n accuracy scores are reported for the other tasks. We exclude entries that use BERT as one of their components.\n\n We use a batch size of 32 and fine-tune for 3\nepochs over the data for all GLUE tasks. For each\ntask, we selected the best fine-tuning learning rate\n(among 5e-5, 4e-5, 3e-5, and 2e-5) on the Dev set.\nAdditionally, for BERTLARGE we found that fine-\ntuning was sometimes unstable on small datasets,\nso we ran several random restarts and selected the\nbest model on the Dev set. With random restarts,\nwe use the same pre-trained checkpoint but per-\nform different fine-tuning data shuffling and clas-\nsifier layer initialization.9\n Results are presented in Table 1. Both\n\nBERTBASE and BERTLARGE outperform all sys-\ntems on all tasks by a substantial margin, obtaining\n4.5% and 7.0% respective average accuracy im-\nprovement over the prior state of the art. Note that\nBERTBASE and OpenAI GPT are nearly identical\nin terms of model architecture apart from the at-\ntention masking. For the largest and most widely\nreported GLUE task, MNLI, BERT obtains a 4.6%\nabsolute accuracy improvement. On the official\nGLUE leaderboard10, BERTLARGE obtains a score\nof 80.5, compared to OpenAI GPT, which obtains\n72.8 as of the date of writing.\n We find that BERTLARGE significantly outper-\nforms BERTBASE across all tasks, especially those\nwith very little training data. The effect of model\nsize is explored more thoroughly in Section 5.2.\n\n4.2 SQuAD v1.1\n\nThe Stanford Question Answering Dataset\n(SQuAD v1.1) is a collection of 100k crowd-\nsourced question/answer pairs (Rajpurkar et al.,\n2016). Given a question and a passage from\n 9The GLUE data set distribution does not include the Test\nlabels, and we only made a single GLUE evaluation server\nsubmission for each of BERTBASE and BERTLARGE .\n 10https://gluebenchmark.com/leaderboard\nWikipedia containing the answer, the task is to\npredict the answer text span in the passage.\n As shown in Figure 1, in the question answer-\ning task, we represent the input question and pas-\nsage as a single packed sequence, with the ques-\ntion using the A embedding and the passage using\nthe B embedding. ", "mimetype": "text/plain", "start_char_idx": 0, "end_char_idx": 3770, "metadata_seperator": "\n", "text_template": "{metadata_str}\n\n{content}", "class_name": "TextNode"}, "__type__": "1"}, "43c7eae6-7d68-47bc-8fd4-8f0584405231": {"__data__": {"id_": "43c7eae6-7d68-47bc-8fd4-8f0584405231", "embedding": null, "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "excluded_embed_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "excluded_llm_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "relationships": {"1": {"node_id": "bc534cc6-e535-4b99-b43b-605d5b37174d", "node_type": "4", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "94088816dc336558cdd166bf902f577f63147d180695eeb1688c3f56347193c3", "class_name": "RelatedNodeInfo"}, "2": {"node_id": "349ffcb2-b1dd-49ef-b7f7-37406a635c71", "node_type": "1", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "17269988ca15a411246f71360ae77c4492f64e2d353d65412d6591e307bda7ee", "class_name": "RelatedNodeInfo"}, "3": {"node_id": "15b81cf1-d657-41b4-a9ff-703f8e5e6fac", "node_type": "1", "metadata": {}, "hash": "6ca3a08ecedab58920c940ad1a8daec2a5281f99c992e4c1d6f1b5fd2fcbc2f7", "class_name": "RelatedNodeInfo"}}, "metadata_template": "{key}: {value}", "metadata_separator": "\n", "text": "We only introduce a start vec-\ntor S \u2208 RH and an end vector E \u2208 RH during\nfine-tuning. The probability of word i being the\nstart of the answer span is computed as a dot prod-\nuct between Ti and S followed by a softmax over\nall of the words in the paragraph: Pi = \u2211eS\u00b7Ti\u00b7Tj .\n j eS\nThe analogous formula is used for the end of the\nanswer span. The score of a candidate span from\nposition i to position j is defined as S\u00b7Ti + E\u00b7Tj ,\nand the maximum scoring span where j \u2265 i is\nused as a prediction. The training objective is the\nsum of the log-likelihoods of the correct start and\nend positions. We fine-tune for 3 epochs with a\nlearning rate of 5e-5 and a batch size of 32.\n Table 2 shows top leaderboard entries as well\n\nas results from top published systems (Seo et al.,\n2017; Clark and Gardner, 2018; Peters et al.,\n2018a; Hu et al., 2018). The top results from the\nSQuAD leaderboard do not have up-to-date public\nsystem descriptions available,11 and are allowed to\nuse any public data when training their systems.\nWe therefore use modest data augmentation in\nour system by first fine-tuning on TriviaQA (Joshi\net al., 2017) befor fine-tuning on SQuAD.\n Our best performing system outperforms the top\nleaderboard system by +1.5 F1 in ensembling and\n+1.3 F1 as a single system. In fact, our single\nBERT model outperforms the top ensemble sys-\ntem in terms of F1 score. Without TriviaQA fine-\n 11QANet is described in Yu et al. ", "mimetype": "text/plain", "start_char_idx": 3770, "end_char_idx": 5273, "metadata_seperator": "\n", "text_template": "{metadata_str}\n\n{content}", "class_name": "TextNode"}, "__type__": "1"}, "15b81cf1-d657-41b4-a9ff-703f8e5e6fac": {"__data__": {"id_": "15b81cf1-d657-41b4-a9ff-703f8e5e6fac", "embedding": null, "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "excluded_embed_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "excluded_llm_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "relationships": {"1": {"node_id": "bc534cc6-e535-4b99-b43b-605d5b37174d", "node_type": "4", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "94088816dc336558cdd166bf902f577f63147d180695eeb1688c3f56347193c3", "class_name": "RelatedNodeInfo"}, "2": {"node_id": "43c7eae6-7d68-47bc-8fd4-8f0584405231", "node_type": "1", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "9e3b7dd2238d9acd968b8a769d0581f893d0699ebdb7bc690a8cf8e45cd3d9c4", "class_name": "RelatedNodeInfo"}}, "metadata_template": "{key}: {value}", "metadata_separator": "\n", "text": "(2018), but the system\n\nhas improved substantially after publication.", "mimetype": "text/plain", "start_char_idx": 5273, "end_char_idx": 5342, "metadata_seperator": "\n", "text_template": "{metadata_str}\n\n{content}", "class_name": "TextNode"}, "__type__": "1"}, "0af4d994-64d1-4bf4-8624-55ed2e4f7dbd": {"__data__": {"id_": "0af4d994-64d1-4bf4-8624-55ed2e4f7dbd", "embedding": null, "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "excluded_embed_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "excluded_llm_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "relationships": {"1": {"node_id": "7ab3456d-9fa7-49a8-856f-b2a713991fd8", "node_type": "4", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "0464c89c7c2bd6b62dc88231646ec3440477502e2d385be0ea2d4a00f6c68047", "class_name": "RelatedNodeInfo"}, "3": {"node_id": "b6154034-ddde-4fd8-a018-126e613aa014", "node_type": "1", "metadata": {}, "hash": "204313461a2324cf4bb2994cf29641409922b6a49790a66acf6c5a6bf04b2950", "class_name": "RelatedNodeInfo"}}, "metadata_template": "{key}: {value}", "metadata_separator": "\n", "text": " System Dev Test System Dev Test\n EM F1 EM F1 ESIM+GloVe 51.9 52.7\n Top Leaderboard Systems (Dec 10th, 2018) ESIM+ELMo 59.1 59.2\n Human - - 82.3 91.2 OpenAI GPT - 78.0\n #1 Ensemble - nlnet - - 86.0 91.7 BERTBASE 81.6 -\n #2 Ensemble - QANet - - 84.5 90.5 BERTLARGE 86.6 86.3\n Published Human (expert)\u2020 - 85.0\n BiDAF+ELMo (Single) - 85.6 - 85.8 Human (5 annotations)\u2020 - 88.0\n R.M. Reader (Ensemble) 81.2 87.9 82.3 88.5\n Ours Table 4: SWAG Dev and Test accuracies. \u2020Human per-\n BERTBASE (Single) 80.8 88.5 - -\n BERTLARGE (Single) 84.1 90.9 - - formance is measured with 100 samples, as reported in\n BERTLARGE (Ensemble) 85.8 91.8 - - the SWAG paper.\n BERTLARGE (Sgl.+TriviaQA) 84.2 91.1 85.1 91.8\n BERTLARGE (Ens.+TriviaQA) 86.2 92.2 87.4 93.2\n\nTable 2: SQuAD 1.1 results. The BERT ensemble\nis 7x systems which use different pre-training check-\npoints and fine-tuning seeds.\n\n System Dev Test\n EM F1 EM F1\n Top Leaderboard Systems (Dec 10th, 2018)\n Human 86.3 89.0 86.9 89.5\n #1 Single - MIR-MRC (F-Net) - - 74.8 78.0\n #2 Single - nlnet - - 74.2 77.1\n Published\n unet (Ensemble) - - 71.4 74.9\n SLQA+ (Single) - 71.4 74.4\n Ours\n BERTLARGE (Single) 78.7 81.9 80.0 83.1\n\nTable 3: SQuAD 2.0 results. We exclude entries that\nuse BERT as one of their components.\n\ntuning data, we only lose 0.1-0.4 F1, still outper-\nforming all existing systems by a wide margin.12\n4.3 SQuAD v2.0\nThe SQuAD 2.0 task extends the SQuAD 1.1\nproblem definition by allowing for the possibility\nthat no short answer exists in the provided para-\ngraph, making the problem more realistic.\n We use a simple approach to extend the SQuAD\nv1.1 BERT model for this task. We treat ques-\ntions that do not have an answer as having an an-\nswer span with start and end at the [CLS] to-\nken. The probability space for the start and end\nanswer span positions is extended to include the\nposition of the [CLS] token. For prediction, we\ncompare the score of the no-answer span: snull =\nS\u00b7C + E\u00b7C to the score of the best non-null span\n 12The TriviaQA data we used consists of paragraphs from\nTriviaQA-Wiki formed of the first 400 tokens in documents,\nthat contain at least one of the provided possible answers.\ns\u02c6i,j = maxj\u2265iS\u00b7Ti + E\u00b7Tj . We predict a non-null\nanswer when \u02c6si,j > snull + \u03c4 , where the thresh-\nold \u03c4 is selected on the dev set to maximize F1.\nWe did not use TriviaQA data for this model. ", "mimetype": "text/plain", "start_char_idx": 0, "end_char_idx": 3711, "metadata_seperator": "\n", "text_template": "{metadata_str}\n\n{content}", "class_name": "TextNode"}, "__type__": "1"}, "b6154034-ddde-4fd8-a018-126e613aa014": {"__data__": {"id_": "b6154034-ddde-4fd8-a018-126e613aa014", "embedding": null, "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "excluded_embed_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "excluded_llm_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "relationships": {"1": {"node_id": "7ab3456d-9fa7-49a8-856f-b2a713991fd8", "node_type": "4", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "0464c89c7c2bd6b62dc88231646ec3440477502e2d385be0ea2d4a00f6c68047", "class_name": "RelatedNodeInfo"}, "2": {"node_id": "0af4d994-64d1-4bf4-8624-55ed2e4f7dbd", "node_type": "1", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "5b203c46fac99619997fbe420c7403431c3da2285853ad72f69b2509fc694e47", "class_name": "RelatedNodeInfo"}, "3": {"node_id": "61d2f504-bc66-41b5-b6b3-662d999e9f60", "node_type": "1", "metadata": {}, "hash": "791d9f609c53e58b17f287c8c24afec8ac14c1c6b0fe0ac2d12b5bf274ff46d1", "class_name": "RelatedNodeInfo"}}, "metadata_template": "{key}: {value}", "metadata_separator": "\n", "text": "We\nfine-tuned for 2 epochs with a learning rate of 5e-5\nand a batch size of 48.\n The results compared to prior leaderboard en-\ntries and top published work (Sun et al., 2018;\nWang et al., 2018b) are shown in Table 3, exclud-\n\ning systems that use BERT as one of their com-\nponents. We observe a +5.1 F1 improvement over\nthe previous best system.\n\n4.4 SWAG\nThe Situations With Adversarial Generations\n(SWAG) dataset contains 113k sentence-pair com-\npletion examples that evaluate grounded common-\nsense inference (Zellers et al., 2018). Given a sen-\ntence, the task is to choose the most plausible con-\ntinuation among four choices.When fine-tuning on the SWAG dataset, we\nconstruct four input sequences, each containing\nthe concatenation of the given sentence (sentence\nA) and a possible continuation (sentence B). The\nonly task-specific parameters introduced is a vec-\ntor whose dot product with the [CLS] token rep-\nresentation C denotes a score for each choice\nwhich is normalized with a softmax layer.\n We fine-tune the model for 3 epochs with a\nlearning rate of 2e-5 and a batch size of 16. ", "mimetype": "text/plain", "start_char_idx": 3711, "end_char_idx": 4814, "metadata_seperator": "\n", "text_template": "{metadata_str}\n\n{content}", "class_name": "TextNode"}, "__type__": "1"}, "61d2f504-bc66-41b5-b6b3-662d999e9f60": {"__data__": {"id_": "61d2f504-bc66-41b5-b6b3-662d999e9f60", "embedding": null, "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "excluded_embed_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "excluded_llm_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "relationships": {"1": {"node_id": "7ab3456d-9fa7-49a8-856f-b2a713991fd8", "node_type": "4", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "0464c89c7c2bd6b62dc88231646ec3440477502e2d385be0ea2d4a00f6c68047", "class_name": "RelatedNodeInfo"}, "2": {"node_id": "b6154034-ddde-4fd8-a018-126e613aa014", "node_type": "1", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "605d062d81f57157a49ee24b715fbe954d1ac4da24bf69e5746ce728e1edcf3b", "class_name": "RelatedNodeInfo"}}, "metadata_template": "{key}: {value}", "metadata_separator": "\n", "text": "Re-\nsults are presented in Table 4. BERTLARGE out-\nperforms the authors\u2019 baseline ESIM+ELMo sys-\ntem by +27.1% and OpenAI GPT by 8.3%.\n5 Ablation Studies\n\nIn this section, we perform ablation experiments\nover a number of facets of BERT in order to better\nunderstand their relative importance. Additional", "mimetype": "text/plain", "start_char_idx": 4814, "end_char_idx": 5120, "metadata_seperator": "\n", "text_template": "{metadata_str}\n\n{content}", "class_name": "TextNode"}, "__type__": "1"}, "049233ae-7f17-4b97-b212-478744e93165": {"__data__": {"id_": "049233ae-7f17-4b97-b212-478744e93165", "embedding": null, "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "excluded_embed_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "excluded_llm_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "relationships": {"1": {"node_id": "13d1b84a-4210-4825-933b-a6d498a30c00", "node_type": "4", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "131b956706902a0251fc74cfea9f60f7dea5bf0ed9d618204398e5e06339cc19", "class_name": "RelatedNodeInfo"}, "3": {"node_id": "21c9c67d-01b5-49c7-b0c5-7c5856abacbe", "node_type": "1", "metadata": {}, "hash": "d8ef9c2e31fa3f500db572aff04fb54e46f16b62194b4a23b17b6ab8a41a5c44", "class_name": "RelatedNodeInfo"}}, "metadata_template": "{key}: {value}", "metadata_separator": "\n", "text": " Dev Set results are still far worse than those of the pre-\n Tasks MNLI-m QNLI MRPC SST-2 SQuAD trained bidirectional models. The BiLSTM hurts\n (Acc) (Acc) (Acc) (Acc) (F1) performance on the GLUE tasks.\n BERTBASE 84.4 88.4 86.7 92.7 88.5 We recognize that it would also be possible to\n No NSP 83.9 84.9 86.5 92.6 87.9\n LTR & No NSP 82.1 84.3 77.5 92.1 77.8 train separate LTR and RTL models and represent\n + BiLSTM 82.1 84.1 75.7 91.6 84.9 each token as the concatenation of the two mod-\n Table 5: Ablation over the pre-training tasks using the els, as ELMo does. However: (a) this is twice as\n BERTBASE architecture. \u201cNo NSP\u201d is trained without expensive as a single bidirectional model; (b) this\n the next sentence prediction task. \u201cLTR & No NSP\u201d is is non-intuitive for tasks like QA, since the RTL\n trained as a left-to-right LM without the next sentence model would not be able to condition the answer\n prediction, like OpenAI GPT. \u201c+ BiLSTM\u201d adds a ran- on the question; (c) this it is strictly less powerful\n domly initialized BiLSTM on top of the \u201cLTR + No than a deep bidirectional model, since it can use\n NSP\u201d model during fine-tuning. ", "mimetype": "text/plain", "start_char_idx": 0, "end_char_idx": 1754, "metadata_seperator": "\n", "text_template": "{metadata_str}\n\n{content}", "class_name": "TextNode"}, "__type__": "1"}, "21c9c67d-01b5-49c7-b0c5-7c5856abacbe": {"__data__": {"id_": "21c9c67d-01b5-49c7-b0c5-7c5856abacbe", "embedding": null, "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "excluded_embed_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "excluded_llm_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "relationships": {"1": {"node_id": "13d1b84a-4210-4825-933b-a6d498a30c00", "node_type": "4", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "131b956706902a0251fc74cfea9f60f7dea5bf0ed9d618204398e5e06339cc19", "class_name": "RelatedNodeInfo"}, "2": {"node_id": "049233ae-7f17-4b97-b212-478744e93165", "node_type": "1", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "1816ded9e583043856ebde339cf39eff11a9ecbdc0f6b580ae47bd0efb3766af", "class_name": "RelatedNodeInfo"}, "3": {"node_id": "2b41dc46-a706-4035-8707-8bdac62c2cfc", "node_type": "1", "metadata": {}, "hash": "65b9795fc0e74b775ed3a317666ae0a0a0f3a5cdeee861e08552d8821d5d91d7", "class_name": "RelatedNodeInfo"}}, "metadata_template": "{key}: {value}", "metadata_separator": "\n", "text": "both left and right context at every layer.\n\nablation studies can be found in Appendix C.\n\n5.1 Effect of Pre-training Tasks\nWe demonstrate the importance of the deep bidi-\nrectionality of BERT by evaluating two pre-\ntraining objectives using exactly the same pre-\ntraining data, fine-tuning scheme, and hyperpa-\nrameters as BERTBASE:\n\nNo NSP: A bidirectional model which is trained\nusing the \u201cmasked LM\u201d (MLM) but without the\n\u201cnext sentence prediction\u201d (NSP) task.\nLTR & No NSP: A left-context-only model which\nis trained using a standard Left-to-Right (LTR)\nLM, rather than an MLM. The left-only constraint\nwas also applied at fine-tuning, because removing\nit introduced a pre-train/fine-tune mismatch that\ndegraded downstream performance. Additionally,\nthis model was pre-trained without the NSP task.\nThis is directly comparable to OpenAI GPT, but\nusing our larger training dataset, our input repre-\nsentation, and our fine-tuning scheme.\n We first examine the impact brought by the NSP\ntask. In Table 5, we show that removing NSP\nhurts performance significantly on QNLI, MNLI,\nand SQuAD 1.1. Next, we evaluate the impact\nof training bidirectional representations by com-\nparing \u201cNo NSP\u201d to \u201cLTR & No NSP\u201d. The LTR\nmodel performs worse than the MLM model on all\ntasks, with large drops on MRPC and SQuAD.\n For SQuAD it is intuitively clear that a LTR\nmodel will perform poorly at token predictions,\nsince the token-level hidden states have no right-\nside context. In order to make a good faith at-\ntempt at strengthening the LTR system, we added\na randomly initialized BiLSTM on top. This does\nsignificantly improve results on SQuAD, but the\n5.2 Effect of Model Size\nIn this section, we explore the effect of model size\non fine-tuning task accuracy. We trained a number\nof BERT models with a differing number of layers,\nhidden units, and attention heads, while otherwise\nusing the same hyperparameters and training pro-\ncedure as described previously.\n Results on selected GLUE tasks are shown in\nTable 6. In this table, we report the average Dev\nSet accuracy from 5 random restarts of fine-tuning.\nWe can see that larger models lead to a strict ac-\ncuracy improvement across all four datasets, even\nfor MRPC which only has 3,600 labeled train-\ning examples, and is substantially different from\nthe pre-training tasks. It is also perhaps surpris-\ning that we are able to achieve such significant\nimprovements on top of models which are al-\nready quite large relative to the existing literature.\nFor example, the largest Transformer explored in\nVaswani et al. (2017) is (L=6, H=1024, A=16)\nwith 100M parameters for the encoder, and the\nlargest Transformer we have found in the literature\nis (L=64, H=512, A=2) with 235M parameters\n(Al-Rfou et al., 2018). By contrast, BERTBASE\ncontains 110M parameters and BERTLARGE con-\ntains 340M parameters.\n It has long been known that increasing the\nmodel size will lead to continual improvements\non large-scale tasks such as machine translation\nand language modeling, which is demonstrated\nby the LM perplexity of held-out training data\nshown in Table 6. However, we believe that\nthis is the first work to demonstrate convinc-\ningly that scaling to extreme model sizes also\nleads to large improvements on very small scale\ntasks, provided that the model has been suffi-\nciently pre-trained. Peters et al. ", "mimetype": "text/plain", "start_char_idx": 1754, "end_char_idx": 5122, "metadata_seperator": "\n", "text_template": "{metadata_str}\n\n{content}", "class_name": "TextNode"}, "__type__": "1"}, "2b41dc46-a706-4035-8707-8bdac62c2cfc": {"__data__": {"id_": "2b41dc46-a706-4035-8707-8bdac62c2cfc", "embedding": null, "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "excluded_embed_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "excluded_llm_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "relationships": {"1": {"node_id": "13d1b84a-4210-4825-933b-a6d498a30c00", "node_type": "4", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "131b956706902a0251fc74cfea9f60f7dea5bf0ed9d618204398e5e06339cc19", "class_name": "RelatedNodeInfo"}, "2": {"node_id": "21c9c67d-01b5-49c7-b0c5-7c5856abacbe", "node_type": "1", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "877b4d729adb665b0fdce9ed776eaa6b3ebd1bb13eeaba93e912c0f85fc9c1c5", "class_name": "RelatedNodeInfo"}}, "metadata_template": "{key}: {value}", "metadata_separator": "\n", "text": "(2018b) presented", "mimetype": "text/plain", "start_char_idx": 5122, "end_char_idx": 5139, "metadata_seperator": "\n", "text_template": "{metadata_str}\n\n{content}", "class_name": "TextNode"}, "__type__": "1"}, "641ed924-8e3f-4b7c-ae65-66e3dc4da5d5": {"__data__": {"id_": "641ed924-8e3f-4b7c-ae65-66e3dc4da5d5", "embedding": null, "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "excluded_embed_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "excluded_llm_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "relationships": {"1": {"node_id": "6ea55280-c096-4e63-a744-d6fbd76d6e91", "node_type": "4", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "b8d1715d2e0a931ca1d10b2617e5ac4bbad69540fa2ce884da0efa838dd17f72", "class_name": "RelatedNodeInfo"}, "3": {"node_id": "60786148-3cfd-4e95-ab5d-256991f19a68", "node_type": "1", "metadata": {}, "hash": "98dacc97c1efac521fed40ac6eb5e115093ce4bbf9b2113f8d8e97493687b729", "class_name": "RelatedNodeInfo"}}, "metadata_template": "{key}: {value}", "metadata_separator": "\n", "text": "mixed results on the downstream task impact of\nincreasing the pre-trained bi-LM size from two\nto four layers and Melamud et al. (2016) men-\ntioned in passing that increasing hidden dimen-\nsion size from 200 to 600 helped, but increasing\nfurther to 1,000 did not bring further improve-\n\nments. Both of these prior works used a feature-\nbased approach \u2014 we hypothesize that when the\nmodel is fine-tuned directly on the downstream\ntasks and uses only a very small number of ran-\n\ndomly initialized additional parameters, the task-\nspecific models can benefit from the larger, more\nexpressive pre-trained representations even when\ndownstream task data is very small.\n5.3 Feature-based Approach with BERT\n\nAll of the BERT results presented so far have used\nthe fine-tuning approach, where a simple classifi-\ncation layer is added to the pre-trained model, and\nall parameters are jointly fine-tuned on a down-\nstream task. However, the feature-based approach,\nwhere fixed features are extracted from the pre-\ntrained model, has certain advantages. First, not\nall tasks can be easily represented by a Trans-\nformer encoder architecture, and therefore require\na task-specific model architecture to be added.\n", "mimetype": "text/plain", "start_char_idx": 0, "end_char_idx": 1203, "metadata_seperator": "\n", "text_template": "{metadata_str}\n\n{content}", "class_name": "TextNode"}, "__type__": "1"}, "60786148-3cfd-4e95-ab5d-256991f19a68": {"__data__": {"id_": "60786148-3cfd-4e95-ab5d-256991f19a68", "embedding": null, "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "excluded_embed_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "excluded_llm_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "relationships": {"1": {"node_id": "6ea55280-c096-4e63-a744-d6fbd76d6e91", "node_type": "4", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "b8d1715d2e0a931ca1d10b2617e5ac4bbad69540fa2ce884da0efa838dd17f72", "class_name": "RelatedNodeInfo"}, "2": {"node_id": "641ed924-8e3f-4b7c-ae65-66e3dc4da5d5", "node_type": "1", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "edee340ad5b5b677edb913fcea52cb0f2a57f73bae804a9a5b35312f9e7408f5", "class_name": "RelatedNodeInfo"}, "3": {"node_id": "9f8093af-21a6-443b-a6ac-864fb66387f8", "node_type": "1", "metadata": {}, "hash": "ab5363432f20eadc223a052d06475f118c0d9467a583aa29b07563f8fea6bdfe", "class_name": "RelatedNodeInfo"}}, "metadata_template": "{key}: {value}", "metadata_separator": "\n", "text": "Second, there are major computational benefits\nto pre-compute an expensive representation of the\ntraining data once and then run many experiments\nwith cheaper models on top of this representation.\n In this section, we compare the two approaches\nby applying BERT to the CoNLL-2003 Named\nEntity Recognition (NER) task (Tjong Kim Sang\nand De Meulder, 2003). In the input to BERT, we\nuse a case-preserving WordPiece model, and we\ninclude the maximal document context provided\nby the data. Following standard practice, we for-\nmulate this as a tagging task but do not use a CRF\n\n Hyperparams Dev Set Accuracy\n #L #H #A LM (ppl) MNLI-m MRPC SST-2\n 3 768 12 5.84 77.9 79.8 88.4\n 6 768 3 5.24 80.6 82.2 90.7\n 6 768 12 4.68 81.9 84.8 91.3\n 12 768 12 3.99 84.4 86.7 92.9\n 12 1024 16 3.54 85.7 86.9 93.3\n 24 1024 16 3.23 86.6 87.8 93.7\n\nTable 6: Ablation over BERT model size. #L = the\nnumber of layers; #H = hidden size; #A = number of at-\ntention heads. \u201cLM (ppl)\u201d is the masked LM perplexity\nof held-out training data.\n System Dev F1 Test F1\n ELMo (Peters et al., 2018a) 95.7 92.2\n CVT (Clark et al., 2018) - 92.6\n CSE (Akbik et al., 2018) - 93.1\n Fine-tuning approach\n BERTLARGE 96.6 92.8\n BERTBASE 96.4 92.4\n Feature-based approach (BERTBASE )\n Embeddings 91.0 -\n Second-to-Last Hidden 95.6 -\n Last Hidden 94.9 -\n Weighted Sum Last Four Hidden 95.9 -\n Concat Last Four Hidden 96.1 -\n Weighted Sum All 12 Layers 95.5 -\nTable 7: CoNLL-2003 Named Entity Recognition re-\nsults. Hyperparameters were selected using the Dev\nset. ", "mimetype": "text/plain", "start_char_idx": 1203, "end_char_idx": 3320, "metadata_seperator": "\n", "text_template": "{metadata_str}\n\n{content}", "class_name": "TextNode"}, "__type__": "1"}, "9f8093af-21a6-443b-a6ac-864fb66387f8": {"__data__": {"id_": "9f8093af-21a6-443b-a6ac-864fb66387f8", "embedding": null, "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "excluded_embed_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "excluded_llm_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "relationships": {"1": {"node_id": "6ea55280-c096-4e63-a744-d6fbd76d6e91", "node_type": "4", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "b8d1715d2e0a931ca1d10b2617e5ac4bbad69540fa2ce884da0efa838dd17f72", "class_name": "RelatedNodeInfo"}, "2": {"node_id": "60786148-3cfd-4e95-ab5d-256991f19a68", "node_type": "1", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "2b552c060d99c83f74c0324521e3574349a02cadcdf8c75a2d697b610b9a929f", "class_name": "RelatedNodeInfo"}}, "metadata_template": "{key}: {value}", "metadata_separator": "\n", "text": "The reported Dev and Test scores are averaged over\n5 random restarts using those hyperparameters.\n\n\nlayer in the output. We use the representation of\nthe first sub-token as the input to the token-level\nclassifier over the NER label set.\n To ablate the fine-tuning approach, we apply the\nfeature-based approach by extracting the activa-\ntions from one or more layers without fine-tuning\nany parameters of BERT. These contextual em-\nbeddings are used as input to a randomly initial-\nized two-layer 768-dimensional BiLSTM before\nthe classification layer.\n\n Results are presented in Table 7. BERTLARGE\nperforms competitively with state-of-the-art meth-\nods. The best performing method concatenates the\ntoken representations from the top four hidden lay-\ners of the pre-trained Transformer, which is only\n0.3 F1 behind fine-tuning the entire model. This\ndemonstrates that BERT is effective for both fine-\ntuning and feature-based approaches.\n\n6 Conclusion\n\nRecent empirical improvements due to transfer\nlearning with language models have demonstrated\nthat rich, unsupervised pre-training is an integral\npart of many language understanding systems. In\nparticular, these results enable even low-resource\n\ntasks to benefit from deep unidirectional architec-\ntures. Our major contribution is further general-\nizing these findings to deep bidirectional architec-\ntures, allowing the same pre-trained model to suc-\ncessfully tackle a broad set of NLP tasks.", "mimetype": "text/plain", "start_char_idx": 3320, "end_char_idx": 4773, "metadata_seperator": "\n", "text_template": "{metadata_str}\n\n{content}", "class_name": "TextNode"}, "__type__": "1"}, "e1c258d9-0291-4310-9fbd-a17f908a5826": {"__data__": {"id_": "e1c258d9-0291-4310-9fbd-a17f908a5826", "embedding": null, "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "excluded_embed_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "excluded_llm_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "relationships": {"1": {"node_id": "5d240373-8164-4fbe-9ff4-02a17074549b", "node_type": "4", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "b9d83a9300be27e862369e9733ab621bde8059435a32ce45655d29b06eaad2ea", "class_name": "RelatedNodeInfo"}, "3": {"node_id": "6808912a-ceb2-47ba-9281-2f1c06afe3d9", "node_type": "1", "metadata": {}, "hash": "c319b508831e30f2cdb555245d048d4409dd8e4ff6b09f4b73eb487c5b73c886", "class_name": "RelatedNodeInfo"}}, "metadata_template": "{key}: {value}", "metadata_separator": "\n", "text": "References\nAlan Akbik, Duncan Blythe, and Roland Vollgraf.\n ", "mimetype": "text/plain", "start_char_idx": 0, "end_char_idx": 65, "metadata_seperator": "\n", "text_template": "{metadata_str}\n\n{content}", "class_name": "TextNode"}, "__type__": "1"}, "6808912a-ceb2-47ba-9281-2f1c06afe3d9": {"__data__": {"id_": "6808912a-ceb2-47ba-9281-2f1c06afe3d9", "embedding": null, "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "excluded_embed_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "excluded_llm_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "relationships": {"1": {"node_id": "5d240373-8164-4fbe-9ff4-02a17074549b", "node_type": "4", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "b9d83a9300be27e862369e9733ab621bde8059435a32ce45655d29b06eaad2ea", "class_name": "RelatedNodeInfo"}, "2": {"node_id": "e1c258d9-0291-4310-9fbd-a17f908a5826", "node_type": "1", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "439d13a98b6c08467cf9614fccd626de0db3605cb4fd6688c180248ab85390e7", "class_name": "RelatedNodeInfo"}, "3": {"node_id": "d1afa468-be5c-4597-8f48-c90574dff711", "node_type": "1", "metadata": {}, "hash": "b635a18b19a68ad15d0895fc1a91db3a4767cb1c2c44fbbf02f3b13c4ea583f7", "class_name": "RelatedNodeInfo"}}, "metadata_template": "{key}: {value}", "metadata_separator": "\n", "text": "2018. Contextual string embeddings for sequence\n labeling. In Proceedings of the 27th International\n Conference on Computational Linguistics, pages\n 1638\u20131649.\n\nRami Al-Rfou, Dokook Choe, Noah Constant, Mandy\n Guo, and Llion Jones. 2018. Character-level lan-\n guage modeling with deeper self-attention. arXiv\n preprint arXiv:1808.04444.\nKevin Clark, Minh-Thang Luong, Christopher D Man-ning, and Quoc Le. 2018. Semi-supervised se-\n quence modeling with cross-view training. In Pro-\n ceedings of the 2018 Conference on Empirical Meth-\n ods in Natural Language Processing, pages 1914\u2013\n 1925.\n\nRonan Collobert and Jason Weston. 2008. A unified\n architecture for natural language processing: Deep\n neural networks with multitask learning. In Pro-\n ceedings of the 25th international conference on\n Machine learning, pages 160\u2013167. ACM.\n\n Rie Kubota Ando and Tong Zhang. 2005. A framework Alexis Conneau, Douwe Kiela, Holger Schwenk, Lo\u00a8\u0131c\n for learning predictive structures from multiple tasks Barrault, and Antoine Bordes. 2017. Supervised\n and unlabeled data. Journal of Machine Learning learning of universal sentence representations from\n Research, 6(Nov):1817\u20131853. natural language inference data. In Proceedings of\n the 2017 Conference on Empirical Methods in Nat-\n Luisa Bentivogli, Bernardo Magnini, Ido Dagan, ural Language Processing, pages 670\u2013680, Copen-\n Hoa Trang Dang, and Danilo Giampiccolo. 2009. hagen, Denmark. Association for Computational\n The fifth PASCAL recognizing textual entailment Linguistics.\n challenge. In TAC. NIST. Andrew M Dai and Quoc V Le. ", "mimetype": "text/plain", "start_char_idx": 65, "end_char_idx": 2529, "metadata_seperator": "\n", "text_template": "{metadata_str}\n\n{content}", "class_name": "TextNode"}, "__type__": "1"}, "d1afa468-be5c-4597-8f48-c90574dff711": {"__data__": {"id_": "d1afa468-be5c-4597-8f48-c90574dff711", "embedding": null, "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "excluded_embed_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "excluded_llm_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "relationships": {"1": {"node_id": "5d240373-8164-4fbe-9ff4-02a17074549b", "node_type": "4", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "b9d83a9300be27e862369e9733ab621bde8059435a32ce45655d29b06eaad2ea", "class_name": "RelatedNodeInfo"}, "2": {"node_id": "6808912a-ceb2-47ba-9281-2f1c06afe3d9", "node_type": "1", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "226fd2b43894cd7d2595c0ca96cab1d872db2c703dd2ad5db2cc65ac79e93989", "class_name": "RelatedNodeInfo"}, "3": {"node_id": "d854ece0-8e05-4e06-ba7d-442eb5a771eb", "node_type": "1", "metadata": {}, "hash": "bff0b7447221c74fc85f9a8b958569fe88d5566febc455d092f91aea495fe7fa", "class_name": "RelatedNodeInfo"}}, "metadata_template": "{key}: {value}", "metadata_separator": "\n", "text": "2015. Semi-supervised\n sequence learning. In Advances in neural informa-\n John Blitzer, Ryan McDonald, and Fernando Pereira. tion processing systems, pages 3079\u20133087.\n 2006. Domain adaptation with structural correspon-\n dence learning. In Proceedings of the 2006 confer- J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-\n ence on empirical methods in natural language pro- Fei. 2009. ImageNet: A Large-Scale Hierarchical\n cessing, pages 120\u2013128. Association for Computa- Image Database. In CVPR09.\n tional Linguistics.\n William B Dolan and Chris Brockett. 2005. Automati-\n Samuel R. Bowman, Gabor Angeli, Christopher Potts, cally constructing a corpus of sentential paraphrases.\n and Christopher D. Manning. 2015. A large anno- In Proceedings of the Third International Workshop\n tated corpus for learning natural language inference. on Paraphrasing (IWP2005).\n In EMNLP. Association for Computational Linguis-\n tics. William Fedus, Ian Goodfellow, and Andrew M Dai.\n 2018. Maskgan: Better text generation via filling in\n Peter F Brown, Peter V Desouza, Robert L Mercer, the . arXiv preprint arXiv:1801.07736.\n Vincent J Della Pietra, and Jenifer C Lai. 1992.\n Class-based n-gram models of natural language. Dan Hendrycks and Kevin Gimpel. 2016. Bridging\n Computational linguistics, 18(4):467\u2013479. nonlinearities and stochastic regularizers with gaus-\n sian error linear units. CoRR, abs/1606.08415.\n Daniel Cer, Mona Diab, Eneko Agirre, Inigo Lopez- Felix Hill, Kyunghyun Cho, and Anna Korhonen. 2016.\n Gazpio, and Lucia Specia. 2017. Semeval-2017 Learning distributed representations of sentences\n task 1: Semantic textual similarity multilingual and from unlabelled data. In Proceedings of the 2016\n crosslingual focused evaluation. In Proceedings Conference of the North American Chapter of the\n of the 11th International Workshop on Semantic\n Evaluation (SemEval-2017), pages 1\u201314, Vancou- Association for Computational Linguistics: Human\n ver, Canada. Association for Computational Lin- Language Technologies. Association for Computa-\n guistics. tional Linguistics.\n Ciprian Chelba, Tomas Mikolov, Mike Schuster, Qi Ge, Jeremy Howard and Sebastian Ruder. 2018. Universal\n Thorsten Brants, Phillipp Koehn, and Tony Robin- language model fine-tuning for text classification. In\n son. 2013. One billion word benchmark for measur- ACL. Association for Computational Linguistics.\n ing progress in statistical language modeling. arXiv Minghao Hu, Yuxing Peng, Zhen Huang, Xipeng Qiu,\n preprint arXiv:1312.3005. Furu Wei, and Ming Zhou. ", "mimetype": "text/plain", "start_char_idx": 2529, "end_char_idx": 7696, "metadata_seperator": "\n", "text_template": "{metadata_str}\n\n{content}", "class_name": "TextNode"}, "__type__": "1"}, "d854ece0-8e05-4e06-ba7d-442eb5a771eb": {"__data__": {"id_": "d854ece0-8e05-4e06-ba7d-442eb5a771eb", "embedding": null, "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "excluded_embed_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "excluded_llm_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "relationships": {"1": {"node_id": "5d240373-8164-4fbe-9ff4-02a17074549b", "node_type": "4", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "b9d83a9300be27e862369e9733ab621bde8059435a32ce45655d29b06eaad2ea", "class_name": "RelatedNodeInfo"}, "2": {"node_id": "d1afa468-be5c-4597-8f48-c90574dff711", "node_type": "1", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "dac55530bb0e1e04298ea3174a6d8afeaf75774267b062d665bf2cd5d72b6fd1", "class_name": "RelatedNodeInfo"}, "3": {"node_id": "f48e1778-26db-49b0-89e7-e04c961609cc", "node_type": "1", "metadata": {}, "hash": "c3b0b9664baec8cb5cb9882ee4c32a97bbb75d76c9b9c3345e8ba4e7a18baf07", "class_name": "RelatedNodeInfo"}}, "metadata_template": "{key}: {value}", "metadata_separator": "\n", "text": "2018. Reinforced\n mnemonic reader for machine reading comprehen-\n Z. Chen, H. Zhang, X. Zhang, and L. Zhao. 2018. sion. In IJCAI.\n Quora question pairs.\n Yacine Jernite, Samuel R. Bowman, and David Son-\n Christopher Clark and Matt Gardner. 2018. Simple tag. ", "mimetype": "text/plain", "start_char_idx": 7696, "end_char_idx": 8436, "metadata_seperator": "\n", "text_template": "{metadata_str}\n\n{content}", "class_name": "TextNode"}, "__type__": "1"}, "f48e1778-26db-49b0-89e7-e04c961609cc": {"__data__": {"id_": "f48e1778-26db-49b0-89e7-e04c961609cc", "embedding": null, "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "excluded_embed_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "excluded_llm_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "relationships": {"1": {"node_id": "5d240373-8164-4fbe-9ff4-02a17074549b", "node_type": "4", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "b9d83a9300be27e862369e9733ab621bde8059435a32ce45655d29b06eaad2ea", "class_name": "RelatedNodeInfo"}, "2": {"node_id": "d854ece0-8e05-4e06-ba7d-442eb5a771eb", "node_type": "1", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "b3aa4da2f8a1cbc7f15222336bb2c50764469bdc5c7b461c11c4da23ea5bd7b2", "class_name": "RelatedNodeInfo"}, "3": {"node_id": "f1a6746d-1e02-49e9-be62-360454d78ce3", "node_type": "1", "metadata": {}, "hash": "37a7841cfb432669c0b15be4123df867a05922b3a4ee0b5c9b96d0b79351ef6f", "class_name": "RelatedNodeInfo"}}, "metadata_template": "{key}: {value}", "metadata_separator": "\n", "text": "2017. Discourse-based objectives for fast un-\n and effective multi-paragraph reading comprehen- supervised sentence representation learning. CoRR,\n sion. ", "mimetype": "text/plain", "start_char_idx": 8436, "end_char_idx": 8717, "metadata_seperator": "\n", "text_template": "{metadata_str}\n\n{content}", "class_name": "TextNode"}, "__type__": "1"}, "f1a6746d-1e02-49e9-be62-360454d78ce3": {"__data__": {"id_": "f1a6746d-1e02-49e9-be62-360454d78ce3", "embedding": null, "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "excluded_embed_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "excluded_llm_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "relationships": {"1": {"node_id": "5d240373-8164-4fbe-9ff4-02a17074549b", "node_type": "4", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "b9d83a9300be27e862369e9733ab621bde8059435a32ce45655d29b06eaad2ea", "class_name": "RelatedNodeInfo"}, "2": {"node_id": "f48e1778-26db-49b0-89e7-e04c961609cc", "node_type": "1", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "39cb5ba8f8c815af3d966b4807c50db18346ef77ac003de3e592e7a97b6f3648", "class_name": "RelatedNodeInfo"}}, "metadata_template": "{key}: {value}", "metadata_separator": "\n", "text": "In ACL. abs/1705.00557.", "mimetype": "text/plain", "start_char_idx": 8717, "end_char_idx": 8806, "metadata_seperator": "\n", "text_template": "{metadata_str}\n\n{content}", "class_name": "TextNode"}, "__type__": "1"}, "388d7ccb-8037-4295-9e61-5e7bc66581e2": {"__data__": {"id_": "388d7ccb-8037-4295-9e61-5e7bc66581e2", "embedding": null, "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "excluded_embed_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "excluded_llm_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "relationships": {"1": {"node_id": "74359064-bf24-40e0-9818-ab1d000bff3e", "node_type": "4", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "63cb94e1411b5ccf6b63ba01e87da25eb67b1a39b41578fc4e232688d086a98b", "class_name": "RelatedNodeInfo"}, "3": {"node_id": "8e29386d-e646-4e6b-8096-3545b568bd44", "node_type": "1", "metadata": {}, "hash": "7a1e1c12c727340235d4c97b34e78540c1cbc5f0d42532c29b4d78a71dc257af", "class_name": "RelatedNodeInfo"}}, "metadata_template": "{key}: {value}", "metadata_separator": "\n", "text": " Mandar Joshi, Eunsol Choi, Daniel S Weld, and Luke\n Zettlemoyer. 2017. Triviaqa: A large scale distantly\n supervised challenge dataset for reading comprehen-\n sion. In ACL.\n\n ", "mimetype": "text/plain", "start_char_idx": 0, "end_char_idx": 185, "metadata_seperator": "\n", "text_template": "{metadata_str}\n\n{content}", "class_name": "TextNode"}, "__type__": "1"}, "8e29386d-e646-4e6b-8096-3545b568bd44": {"__data__": {"id_": "8e29386d-e646-4e6b-8096-3545b568bd44", "embedding": null, "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "excluded_embed_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "excluded_llm_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "relationships": {"1": {"node_id": "74359064-bf24-40e0-9818-ab1d000bff3e", "node_type": "4", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "63cb94e1411b5ccf6b63ba01e87da25eb67b1a39b41578fc4e232688d086a98b", "class_name": "RelatedNodeInfo"}, "2": {"node_id": "388d7ccb-8037-4295-9e61-5e7bc66581e2", "node_type": "1", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "4a2ff896ea82da46620dba0e1daf8725c78ea8ecb65c1904e66295ef2bd8b681", "class_name": "RelatedNodeInfo"}, "3": {"node_id": "48a545c7-3964-4115-88cc-e2df29b360a0", "node_type": "1", "metadata": {}, "hash": "9de51d651b0539ae2768a76c255f5d2abaa2ca6bb83ccc5617edf623f7c82df5", "class_name": "RelatedNodeInfo"}}, "metadata_template": "{key}: {value}", "metadata_separator": "\n", "text": "Ryan Kiros, Yukun Zhu, Ruslan R Salakhutdinov,\n Richard Zemel, Raquel Urtasun, Antonio Torralba,\n and Sanja Fidler. ", "mimetype": "text/plain", "start_char_idx": 185, "end_char_idx": 307, "metadata_seperator": "\n", "text_template": "{metadata_str}\n\n{content}", "class_name": "TextNode"}, "__type__": "1"}, "48a545c7-3964-4115-88cc-e2df29b360a0": {"__data__": {"id_": "48a545c7-3964-4115-88cc-e2df29b360a0", "embedding": null, "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "excluded_embed_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "excluded_llm_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "relationships": {"1": {"node_id": "74359064-bf24-40e0-9818-ab1d000bff3e", "node_type": "4", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "63cb94e1411b5ccf6b63ba01e87da25eb67b1a39b41578fc4e232688d086a98b", "class_name": "RelatedNodeInfo"}, "2": {"node_id": "8e29386d-e646-4e6b-8096-3545b568bd44", "node_type": "1", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "2c1af3ed8808aa9d2ef1739e0ec9a336ed0872382c0b1734d3f39e5a792bdf3d", "class_name": "RelatedNodeInfo"}, "3": {"node_id": "4ae1f21b-eae6-41ac-a45b-c08dcc7346e5", "node_type": "1", "metadata": {}, "hash": "2ea676273b06bbd9b193563a4ebf99b2abf4c42a926139869658b1e4101b0e91", "class_name": "RelatedNodeInfo"}}, "metadata_template": "{key}: {value}", "metadata_separator": "\n", "text": "2015. Skip-thought vectors. In\n Advances in neural information processing systems,\n pages 3294\u20133302.\nQuoc Le and Tomas Mikolov. 2014. Distributed rep-\n resentations of sentences and documents. In Inter-\n national Conference on Machine Learning, pages\n 1188\u20131196.\nHector J Levesque, Ernest Davis, and Leora Morgen-\n stern. 2011. The winograd schema challenge. In\n Aaai spring symposium: Logical formalizations of\n commonsense reasoning, volume 46, page 47.\nLajanugen Logeswaran and Honglak Lee. 2018. An\n efficient framework for learning sentence represen-\n tations. In International Conference on Learning\n Representations.\nBryan McCann, James Bradbury, Caiming Xiong, and\n Richard Socher. 2017. Learned in translation: Con-\n textualized word vectors. In NIPS.\n\nOren Melamud, Jacob Goldberger, and Ido Dagan.\n 2016. context2vec: Learning generic context em-\n bedding with bidirectional LSTM. In CoNLL.\n Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Cor-\n rado, and Jeff Dean. 2013. Distributed representa-\n tions of words and phrases and their compositional-\n ity. In Advances in Neural Information Processing\n Systems 26, pages 3111\u20133119. Curran Associates,\n Inc.\n Andriy Mnih and Geoffrey E Hinton. ", "mimetype": "text/plain", "start_char_idx": 307, "end_char_idx": 1570, "metadata_seperator": "\n", "text_template": "{metadata_str}\n\n{content}", "class_name": "TextNode"}, "__type__": "1"}, "4ae1f21b-eae6-41ac-a45b-c08dcc7346e5": {"__data__": {"id_": "4ae1f21b-eae6-41ac-a45b-c08dcc7346e5", "embedding": null, "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "excluded_embed_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "excluded_llm_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "relationships": {"1": {"node_id": "74359064-bf24-40e0-9818-ab1d000bff3e", "node_type": "4", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "63cb94e1411b5ccf6b63ba01e87da25eb67b1a39b41578fc4e232688d086a98b", "class_name": "RelatedNodeInfo"}, "2": {"node_id": "48a545c7-3964-4115-88cc-e2df29b360a0", "node_type": "1", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "32a352e883afde83740202d56a5eee63aa32aa05471451b5aecc0494b951ae4a", "class_name": "RelatedNodeInfo"}, "3": {"node_id": "f7ae2b3e-8e5f-4317-9483-6029a02f4a66", "node_type": "1", "metadata": {}, "hash": "e759a0b2061099a51b8a4833c5e182068c3f6a6af5e2c637fed9c28fd53d3166", "class_name": "RelatedNodeInfo"}}, "metadata_template": "{key}: {value}", "metadata_separator": "\n", "text": "2009. A scal-\n able hierarchical distributed language model.\n D. Koller, D. Schuurmans, Y. Bengio, and L. Bot-\n tou, editors, Advances in Neural Information Pro-\n cessing Systems 21, pages 1081\u20131088. Curran As-\n sociates, Inc.\n Ankur P Parikh, Oscar T\u00a8ackstr\u00a8om, Dipanjan Das, and\n Jakob Uszkoreit. ", "mimetype": "text/plain", "start_char_idx": 1570, "end_char_idx": 1887, "metadata_seperator": "\n", "text_template": "{metadata_str}\n\n{content}", "class_name": "TextNode"}, "__type__": "1"}, "f7ae2b3e-8e5f-4317-9483-6029a02f4a66": {"__data__": {"id_": "f7ae2b3e-8e5f-4317-9483-6029a02f4a66", "embedding": null, "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "excluded_embed_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "excluded_llm_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "relationships": {"1": {"node_id": "74359064-bf24-40e0-9818-ab1d000bff3e", "node_type": "4", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "63cb94e1411b5ccf6b63ba01e87da25eb67b1a39b41578fc4e232688d086a98b", "class_name": "RelatedNodeInfo"}, "2": {"node_id": "4ae1f21b-eae6-41ac-a45b-c08dcc7346e5", "node_type": "1", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "9d1fddbe4bf045683de124d183fd5d71823e0de08303edccda49b94f7a06bd6a", "class_name": "RelatedNodeInfo"}, "3": {"node_id": "cd5b0464-f7f4-465c-894e-93dd7a8f1e77", "node_type": "1", "metadata": {}, "hash": "7134eafd2306060597d120c1c1d81ca67b6a38347527017f8162a7c010a88a70", "class_name": "RelatedNodeInfo"}}, "metadata_template": "{key}: {value}", "metadata_separator": "\n", "text": "2016. A decomposable attention\n model for natural language inference. In EMNLP.\n Jeffrey Pennington, Richard Socher, and Christo-\n pher D. Manning. 2014. Glove: Global vectors for\n word representation. In Empirical Methods in Nat-\n ural Language Processing (EMNLP), pages 1532\u2013\n 1543.\n Matthew Peters, Waleed Ammar, Chandra Bhagavat-\n ula, and Russell Power. 2017. Semi-supervised se-\n quence tagging with bidirectional language models.\n In ACL.\n\n ", "mimetype": "text/plain", "start_char_idx": 1887, "end_char_idx": 2359, "metadata_seperator": "\n", "text_template": "{metadata_str}\n\n{content}", "class_name": "TextNode"}, "__type__": "1"}, "cd5b0464-f7f4-465c-894e-93dd7a8f1e77": {"__data__": {"id_": "cd5b0464-f7f4-465c-894e-93dd7a8f1e77", "embedding": null, "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "excluded_embed_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "excluded_llm_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "relationships": {"1": {"node_id": "74359064-bf24-40e0-9818-ab1d000bff3e", "node_type": "4", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "63cb94e1411b5ccf6b63ba01e87da25eb67b1a39b41578fc4e232688d086a98b", "class_name": "RelatedNodeInfo"}, "2": {"node_id": "f7ae2b3e-8e5f-4317-9483-6029a02f4a66", "node_type": "1", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "b278f032dbb5c904b0a6d0e7d0e86eae73b5c83c0bd47274ac3e8fa844d2439a", "class_name": "RelatedNodeInfo"}}, "metadata_template": "{key}: {value}", "metadata_separator": "\n", "text": "Matthew Peters, Mark Neumann, Mohit Iyyer, Matt\n Gardner, Christopher Clark, Kenton Lee, and Luke\n Zettlemoyer. 2018a. Deep contextualized word rep-\n resentations. In NAACL.\n Matthew Peters, Mark Neumann, Luke Zettlemoyer,\n and Wen-tau Yih. 2018b. Dissecting contextual\n word embeddings: Architecture and representation.\n In Proceedings of the 2018 Conference on Empiri-\n cal Methods in Natural Language Processing, pages\n 1499\u20131509.\n\n Alec Radford, Karthik Narasimhan, Tim Salimans, and\n Ilya Sutskever. 2018. Improving language under-\n standing with unsupervised learning. Technical re-\n port, OpenAI.\n Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and\n Percy Liang. 2016. Squad: 100,000+ questions for\n machine comprehension of text. In Proceedings of\n the 2016 Conference on Empirical Methods in Nat-\n ural Language Processing, pages 2383\u20132392.\n Minjoon Seo, Aniruddha Kembhavi, Ali Farhadi, and\n Hannaneh Hajishirzi. 2017. Bidirectional attention\n flow for machine comprehension. In ICLR.\n Richard Socher, Alex Perelygin, Jean Wu, Jason\n Chuang, Christopher D Manning, Andrew Ng, and\n Christopher Potts. 2013. Recursive deep models\n for semantic compositionality over a sentiment tree-\n bank. In Proceedings of the 2013 conference on\n empirical methods in natural language processing,\n pages 1631\u20131642.\n Fu Sun, Linyang Li, Xipeng Qiu, and Yang Liu.\n 2018. U-net: Machine reading comprehension\n with unanswerable questions. arXiv preprint\n arXiv:1810.06638.\n Wilson L Taylor. 1953. Cloze procedure: A new\n tool for measuring readability. Journalism Bulletin,\n 30(4):415\u2013433.\n\n Erik F Tjong Kim Sang and Fien De Meulder.\n 2003. Introduction to the conll-2003 shared task:\nIn Language-independent named entity recognition. In\n CoNLL.\n Joseph Turian, Lev Ratinov, and Yoshua Bengio. 2010.\n Word representations: A simple and general method\n for semi-supervised learning. In Proceedings of the\n 48th Annual Meeting of the Association for Compu-\n tational Linguistics, ACL \u201910, pages 384\u2013394.\n Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob\n Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz\n Kaiser, and Illia Polosukhin. 2017. Attention is all\n you need. In Advances in Neural Information Pro-\n cessing Systems, pages 6000\u20136010.\n Pascal Vincent, Hugo Larochelle, Yoshua Bengio, and\n Pierre-Antoine Manzagol. 2008. Extracting and\n composing robust features with denoising autoen-\n coders. In Proceedings of the 25th international\n conference on Machine learning, pages 1096\u20131103.\n ACM.\n\n Alex Wang, Amanpreet Singh, Julian Michael, Fe-\n lix Hill, Omer Levy, and Samuel Bowman. 2018a.\n Glue: A multi-task benchmark and analysis platform", "mimetype": "text/plain", "start_char_idx": 2359, "end_char_idx": 5627, "metadata_seperator": "\n", "text_template": "{metadata_str}\n\n{content}", "class_name": "TextNode"}, "__type__": "1"}, "5042ecfe-b092-4370-a34d-f747863465a0": {"__data__": {"id_": "5042ecfe-b092-4370-a34d-f747863465a0", "embedding": null, "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "excluded_embed_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "excluded_llm_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "relationships": {"1": {"node_id": "166109f8-fb93-4127-abe5-964ed07035c2", "node_type": "4", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "65af298d8c0241a3ccbb049b875129d8cf8d84b8ce9413dff19e760c11ba7913", "class_name": "RelatedNodeInfo"}, "3": {"node_id": "3efbee33-f0fd-4f16-bf5c-3c7a897c1562", "node_type": "1", "metadata": {}, "hash": "3e04125e8542e61d7f68cfa7c1b35db6fe7e7db713c3bfdd2a1376a20dc89822", "class_name": "RelatedNodeInfo"}}, "metadata_template": "{key}: {value}", "metadata_separator": "\n", "text": " for natural language understanding. In Proceedings\n of the 2018 EMNLP Workshop BlackboxNLP: An-\n alyzing and Interpreting Neural Networks for NLP,\n pages 353\u2013355.\nWei Wang, Ming Yan, and Chen Wu. 2018b. Multi-\n granularity hierarchical attention fusion networks\n for reading comprehension and question answering.\n In Proceedings of the 56th Annual Meeting of the As-\n sociation for Computational Linguistics (Volume 1:\n Long Papers). Association for Computational Lin-\n guistics.\n\nAlex Warstadt, Amanpreet Singh, and Samuel R Bow-\n man. 2018. Neural network acceptability judg-\n ments. arXiv preprint arXiv:1805.12471.\nAdina Williams, Nikita Nangia, and Samuel R Bow-\n man. ", "mimetype": "text/plain", "start_char_idx": 0, "end_char_idx": 707, "metadata_seperator": "\n", "text_template": "{metadata_str}\n\n{content}", "class_name": "TextNode"}, "__type__": "1"}, "3efbee33-f0fd-4f16-bf5c-3c7a897c1562": {"__data__": {"id_": "3efbee33-f0fd-4f16-bf5c-3c7a897c1562", "embedding": null, "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "excluded_embed_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "excluded_llm_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "relationships": {"1": {"node_id": "166109f8-fb93-4127-abe5-964ed07035c2", "node_type": "4", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "65af298d8c0241a3ccbb049b875129d8cf8d84b8ce9413dff19e760c11ba7913", "class_name": "RelatedNodeInfo"}, "2": {"node_id": "5042ecfe-b092-4370-a34d-f747863465a0", "node_type": "1", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "ac66a64796c41de33dcab9aa3e041ccd7824d97d64672665be8e42e4253b6f8e", "class_name": "RelatedNodeInfo"}, "3": {"node_id": "87a4fe6d-ce60-4893-a3ab-faea7aa65407", "node_type": "1", "metadata": {}, "hash": "3e95f0caa5abb302a198469d6228b17e43a0e215ff938e3755e9541d660364a4", "class_name": "RelatedNodeInfo"}}, "metadata_template": "{key}: {value}", "metadata_separator": "\n", "text": "2018. A broad-coverage challenge corpus\n for sentence understanding through inference. In\n NAACL.\n\n", "mimetype": "text/plain", "start_char_idx": 707, "end_char_idx": 816, "metadata_seperator": "\n", "text_template": "{metadata_str}\n\n{content}", "class_name": "TextNode"}, "__type__": "1"}, "87a4fe6d-ce60-4893-a3ab-faea7aa65407": {"__data__": {"id_": "87a4fe6d-ce60-4893-a3ab-faea7aa65407", "embedding": null, "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "excluded_embed_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "excluded_llm_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "relationships": {"1": {"node_id": "166109f8-fb93-4127-abe5-964ed07035c2", "node_type": "4", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "65af298d8c0241a3ccbb049b875129d8cf8d84b8ce9413dff19e760c11ba7913", "class_name": "RelatedNodeInfo"}, "2": {"node_id": "3efbee33-f0fd-4f16-bf5c-3c7a897c1562", "node_type": "1", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "7289a56bcc3a329f219dbe8f3c17f35d43e9ecab375b0d711f807efe993d5ee9", "class_name": "RelatedNodeInfo"}, "3": {"node_id": "4ee6eb46-8fd4-43f4-b321-1711067a516f", "node_type": "1", "metadata": {}, "hash": "2123f35ca78e606423d1e8bf69e910aa33a9a9e5abdfaba7613e8243792233e7", "class_name": "RelatedNodeInfo"}}, "metadata_template": "{key}: {value}", "metadata_separator": "\n", "text": "Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V\n Le, Mohammad Norouzi, Wolfgang Macherey,\n Maxim Krikun, Yuan Cao, Qin Gao, Klaus\n Macherey, et al. ", "mimetype": "text/plain", "start_char_idx": 816, "end_char_idx": 970, "metadata_seperator": "\n", "text_template": "{metadata_str}\n\n{content}", "class_name": "TextNode"}, "__type__": "1"}, "4ee6eb46-8fd4-43f4-b321-1711067a516f": {"__data__": {"id_": "4ee6eb46-8fd4-43f4-b321-1711067a516f", "embedding": null, "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "excluded_embed_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "excluded_llm_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "relationships": {"1": {"node_id": "166109f8-fb93-4127-abe5-964ed07035c2", "node_type": "4", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "65af298d8c0241a3ccbb049b875129d8cf8d84b8ce9413dff19e760c11ba7913", "class_name": "RelatedNodeInfo"}, "2": {"node_id": "87a4fe6d-ce60-4893-a3ab-faea7aa65407", "node_type": "1", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "d50d92f080832f7adf7adfca276409259aaeb76000588441dc7dfa1a5e5a4957", "class_name": "RelatedNodeInfo"}}, "metadata_template": "{key}: {value}", "metadata_separator": "\n", "text": "2016. Google\u2019s neural ma-\n chine translation system: Bridging the gap between\n human and machine translation. arXiv preprint\n arXiv:1609.08144.\nJason Yosinski, Jeff Clune, Yoshua Bengio, and Hod\n Lipson. 2014. How transferable are features in deep\n neural networks? In Advances in neural information\n processing systems, pages 3320\u20133328.\n\nAdams Wei Yu, David Dohan, Minh-Thang Luong, Rui\n Zhao, Kai Chen, Mohammad Norouzi, and Quoc V\n Le. 2018. QANet: Combining local convolution\n with global self-attention for reading comprehen-\n sion. In ICLR.\nRowan Zellers, Yonatan Bisk, Roy Schwartz, and Yejin\n Choi. 2018. Swag: A large-scale adversarial dataset\n for grounded commonsense inference. In Proceed-\n ings of the 2018 Conference on Empirical Methods\n in Natural Language Processing (EMNLP).\nYukun Zhu, Ryan Kiros, Rich Zemel, Ruslan Salakhut-\n dinov, Raquel Urtasun, Antonio Torralba, and Sanja\n Fidler. 2015. Aligning books and movies: Towards\n story-like visual explanations by watching movies\n and reading books. In Proceedings of the IEEE\n international conference on computer vision, pages\n 19\u201327.\n\n Appendix for \u201cBERT: Pre-training of\n Deep Bidirectional Transformers for\n Language Understanding\u201d\n We organize the appendix into three sections:\n\n \u2022 Additional implementation details for BERT\n are presented in Appendix A;\n \u2022 Additional details for our experiments are\n presented in Appendix B; and\n\n \u2022 Additional ablation studies are presented in\n Appendix C.\n We present additional ablation studies for\n BERT including:\n\n \u2013 Effect of Number of Training Steps; and\n\n \u2013 Ablation for Different Masking Proce-\n dures.\n\nA Additional Details for BERT\n\nA.1 Illustration of the Pre-training Tasks\nWe provide examples of the pre-training tasks in\nthe following.\n\nMasked LM and the Masking Procedure As-\nsuming the unlabeled sentence is my dog is\nhairy, and during the random masking procedure\n\nwe chose the 4-th token (which corresponding to\nhairy), our masking procedure can be further il-\nlustrated by\n\n \u2022 80% of the time: Replace the word with the\n [MASK] token, e.g., my dog is hairy \u2192\n my dog is [MASK]\n\n \u2022 10% of the time: Replace the word with a\n random word, e.g., my dog is hairy \u2192 my\n dog is apple\n\n \u2022 10% of the time: Keep the word un-\n changed, e.g., my dog is hairy \u2192 my dog\n is hairy. The purpose of this is to bias the\n representation towards the actual observed\n word.\n\n The advantage of this procedure is that the\nTransformer encoder does not know which words\nit will be asked to predict or which have been re-\nplaced by random words, so it is forced to keep\na distributional contextual representation of ev-\n\nery input token. Additionally, because random\nreplacement only occurs for 1.5% of all tokens\n(i.e., 10% of 15%), this does not seem to harm\nthe model\u2019s language understanding capability. In\nSection C.2, we evaluate the impact this proce-\ndure.\n Compared to standard langauge model training,\nthe masked LM only make predictions on 15% of\ntokens in each batch, which suggests that more\npre-training steps may be required for the model", "mimetype": "text/plain", "start_char_idx": 970, "end_char_idx": 4266, "metadata_seperator": "\n", "text_template": "{metadata_str}\n\n{content}", "class_name": "TextNode"}, "__type__": "1"}, "305ade7f-2710-4529-8972-2d49133a67ed": {"__data__": {"id_": "305ade7f-2710-4529-8972-2d49133a67ed", "embedding": null, "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "excluded_embed_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "excluded_llm_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "relationships": {"1": {"node_id": "77ff5d8d-0dda-479c-9295-ad7f30935d02", "node_type": "4", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "0c80f68887fa99b2003a00b5141ac73dfc42f506fee14a4cc318865998b26184", "class_name": "RelatedNodeInfo"}, "3": {"node_id": "8b857906-4aa7-4a72-9b9b-fe4e7472a16b", "node_type": "1", "metadata": {}, "hash": "eb9de8cb3ead53b34236e9a40d0e1851fd295484069b2b411cc0229cc5387b45", "class_name": "RelatedNodeInfo"}}, "metadata_template": "{key}: {value}", "metadata_separator": "\n", "text": " BERT (Ours) OpenAI GPT ELMo\n T 1 T 2 ... T N T 1 T 2 ... T N T 1 T 2 ... T N\n\n Trm Trm ... Trm Trm Trm ... Trm\n Lstm Lstm ... Lstm Lstm Lstm Lstm\n ...\n\n Trm Trm ... Trm Trm Trm ... Trm Lstm Lstm ... Lstm Lstm Lstm ... Lstm\n\n E 1 E 2 ... E N E 1 E 2 ... E N E 1 E 2 ... E N\n\n Figure 3: Differences in pre-training model architectures. BERT uses a bidirectional Transformer. OpenAI GPT\n uses a left-to-right Transformer. ELMo uses the concatenation of independently trained left-to-right and right-to-\n left LSTMs to generate features for downstream tasks. Among the three, only BERT representations are jointly\n conditioned on both left and right context in all layers. In addition to the architecture differences, BERT and\n OpenAI GPT are fine-tuning approaches, while ELMo is a feature-based approach.\n\nto converge. In Section C.1 we demonstrate that\nMLM does converge marginally slower than a left-\nto-right model (which predicts every token), but\nthe empirical improvements of the MLM model\nfar outweigh the increased training cost.\nNext Sentence Prediction The next sentence\n\nprediction task can be illustrated in the following\nexamples.\nInput = [CLS] the man went to [MASK] store [SEP]\n\n he bought a gallon [MASK] milk [SEP]\n Label = IsNext\n\nInput = [CLS] the man [MASK] to the store [SEP]\n penguin [MASK] are flight ##less birds [SEP]\n Label = NotNext\n\nA.2 Pre-training Procedure\nTo generate each training input sequence, we sam-\nple two spans of text from the corpus, which we\nrefer to as \u201csentences\u201d even though they are typ-\nically much longer than single sentences (but can\nbe shorter also). The first sentence receives the A\nembedding and the second receives the B embed-\nding. 50% of the time B is the actual next sentence\nthat follows A and 50% of the time it is a random\nsentence, which is done for the \u201cnext sentence pre-\ndiction\u201d task. They are sampled such that the com-\nbined length is \u2264 512 tokens. The LM masking is\napplied after WordPiece tokenization with a uni-\nform masking rate of 15%, and no special consid-\neration given to partial word pieces.\n We train with batch size of 256 sequences (256\nsequences * 512 tokens = 128,000 tokens/batch)\nfor 1,000,000 steps, which is approximately 40\nepochs over the 3.3 billion word corpus. We\nuse Adam with learning rate of 1e-4, \u03b21 = 0.9,\n\u03b22 = 0.999, L2 weight decay of 0.01, learning\nrate warmup over the first 10,000 steps, and linear\ndecay of the learning rate. We use a dropout prob-\nability of 0.1 on all layers. ", "mimetype": "text/plain", "start_char_idx": 0, "end_char_idx": 3338, "metadata_seperator": "\n", "text_template": "{metadata_str}\n\n{content}", "class_name": "TextNode"}, "__type__": "1"}, "8b857906-4aa7-4a72-9b9b-fe4e7472a16b": {"__data__": {"id_": "8b857906-4aa7-4a72-9b9b-fe4e7472a16b", "embedding": null, "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "excluded_embed_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "excluded_llm_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "relationships": {"1": {"node_id": "77ff5d8d-0dda-479c-9295-ad7f30935d02", "node_type": "4", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "0c80f68887fa99b2003a00b5141ac73dfc42f506fee14a4cc318865998b26184", "class_name": "RelatedNodeInfo"}, "2": {"node_id": "305ade7f-2710-4529-8972-2d49133a67ed", "node_type": "1", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "b07e8aeacea96fb9c4d2b4780c8fbfab3315c10acdbc5ebf1b197df3b56f1907", "class_name": "RelatedNodeInfo"}, "3": {"node_id": "e4240080-f6c3-485f-b3b8-bc17acedd026", "node_type": "1", "metadata": {}, "hash": "020cf96cf4dc57fe2ffc75bbcf50b34c86db6c41f3e9cb83059389f2869d7b8f", "class_name": "RelatedNodeInfo"}}, "metadata_template": "{key}: {value}", "metadata_separator": "\n", "text": "We use a gelu acti-\nvation (Hendrycks and Gimpel, 2016) rather than\nthe standard relu, following OpenAI GPT. The\ntraining loss is the sum of the mean masked LM\nlikelihood and the mean next sentence prediction\nlikelihood.\n Training of BERTBASE was performed on 4\nCloud TPUs in Pod configuration (16 TPU chips\ntotal).13 Training of BERTLARGE was performed\non 16 Cloud TPUs (64 TPU chips total). Each pre-\ntraining took 4 days to complete.Longer sequences are disproportionately expen-\nsive because attention is quadratic to the sequence\nlength. To speed up pretraing in our experiments,\nwe pre-train the model with sequence length of\n128 for 90% of the steps. Then, we train the rest\n10% of the steps of sequence of 512 to learn the\npositional embeddings.\n\nA.3 Fine-tuning Procedure\nFor fine-tuning, most model hyperparameters are\nthe same as in pre-training, with the exception of\nthe batch size, learning rate, and number of train-\ning epochs. ", "mimetype": "text/plain", "start_char_idx": 3338, "end_char_idx": 4290, "metadata_seperator": "\n", "text_template": "{metadata_str}\n\n{content}", "class_name": "TextNode"}, "__type__": "1"}, "e4240080-f6c3-485f-b3b8-bc17acedd026": {"__data__": {"id_": "e4240080-f6c3-485f-b3b8-bc17acedd026", "embedding": null, "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "excluded_embed_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "excluded_llm_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "relationships": {"1": {"node_id": "77ff5d8d-0dda-479c-9295-ad7f30935d02", "node_type": "4", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "0c80f68887fa99b2003a00b5141ac73dfc42f506fee14a4cc318865998b26184", "class_name": "RelatedNodeInfo"}, "2": {"node_id": "8b857906-4aa7-4a72-9b9b-fe4e7472a16b", "node_type": "1", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "ffddeddb1979247b208f81899c271a3d56e3b04902804318e7d344dcc44b1280", "class_name": "RelatedNodeInfo"}}, "metadata_template": "{key}: {value}", "metadata_separator": "\n", "text": "The dropout probability was always\nkept at 0.1. The optimal hyperparameter values\nare task-specific, but we found the following range\nof possible values to work well across all tasks:\n\n \u2022 Batch size: 16, 32\n 13https://cloudplatform.googleblog.com/2018/06/Cloud-\nTPU-now-offers-preemptible-pricing-and-global-\navailability.html", "mimetype": "text/plain", "start_char_idx": 4290, "end_char_idx": 4621, "metadata_seperator": "\n", "text_template": "{metadata_str}\n\n{content}", "class_name": "TextNode"}, "__type__": "1"}, "7432a915-d9a3-49ac-bda3-f72850d06063": {"__data__": {"id_": "7432a915-d9a3-49ac-bda3-f72850d06063", "embedding": null, "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "excluded_embed_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "excluded_llm_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "relationships": {"1": {"node_id": "aaadb951-b38a-4e3a-accf-f6bfdb2f05a6", "node_type": "4", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "1e00f7be7373d801694101a53b6f6c259fe3d81300983be5be248aaf28f5803c", "class_name": "RelatedNodeInfo"}, "3": {"node_id": "f6bb6a46-a49b-41ef-b78b-84c7502bebb9", "node_type": "1", "metadata": {}, "hash": "35ac692a69e27b8d07e5c2696f898d3e00d1939ad4998df5fd3f1d2c8df44247", "class_name": "RelatedNodeInfo"}}, "metadata_template": "{key}: {value}", "metadata_separator": "\n", "text": " \u2022 Learning rate (Adam): 5e-5, 3e-5, 2e-5\n \u2022 Number of epochs: 2, 3, 4\n\n We also observed that large data sets (e.g.,\n100k+ labeled training examples) were far less\nsensitive to hyperparameter choice than small data\nsets. ", "mimetype": "text/plain", "start_char_idx": 0, "end_char_idx": 230, "metadata_seperator": "\n", "text_template": "{metadata_str}\n\n{content}", "class_name": "TextNode"}, "__type__": "1"}, "f6bb6a46-a49b-41ef-b78b-84c7502bebb9": {"__data__": {"id_": "f6bb6a46-a49b-41ef-b78b-84c7502bebb9", "embedding": null, "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "excluded_embed_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "excluded_llm_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "relationships": {"1": {"node_id": "aaadb951-b38a-4e3a-accf-f6bfdb2f05a6", "node_type": "4", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "1e00f7be7373d801694101a53b6f6c259fe3d81300983be5be248aaf28f5803c", "class_name": "RelatedNodeInfo"}, "2": {"node_id": "7432a915-d9a3-49ac-bda3-f72850d06063", "node_type": "1", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "054e75f13f4902325c847e83695ac16adca2a5333266a2405b7c56020376292b", "class_name": "RelatedNodeInfo"}, "3": {"node_id": "7200837e-7b73-4376-b06f-d9b4c910cd3e", "node_type": "1", "metadata": {}, "hash": "e8ce4d0a1f79b183a9a86c72d13537be3d5b328bdf892e19901c89bc0de7d5c2", "class_name": "RelatedNodeInfo"}}, "metadata_template": "{key}: {value}", "metadata_separator": "\n", "text": "Fine-tuning is typically very fast, so it is rea-\nsonable to simply run an exhaustive search over\nthe above parameters and choose the model that\nperforms best on the development set.\n\nA.4 Comparison of BERT, ELMo ,and\n OpenAI GPT\nHere we studies the differences in recent popular\nrepresentation learning models including ELMo,\nOpenAI GPT and BERT. The comparisons be-\ntween the model architectures are shown visually\nin Figure 3. Note that in addition to the architec-\nture differences, BERT and OpenAI GPT are fine-\ntuning approaches, while ELMo is a feature-based\napproach.\n The most comparable existing pre-training\nmethod to BERT is OpenAI GPT, which trains a\nleft-to-right Transformer LM on a large text cor-\npus. In fact, many of the design decisions in BERT\nwere intentionally made to make it as close to\nGPT as possible so that the two methods could be\nminimally compared. The core argument of this\nwork is that the bi-directionality and the two pre-\ntraining tasks presented in Section 3.1 account for\nthe majority of the empirical improvements, but\nwe do note that there are several other differences\nbetween how BERT and GPT were trained:\n\n \u2022 GPT is trained on the BooksCorpus (800M\n words); BERT is trained on the BooksCor-\n pus (800M words) and Wikipedia (2,500M\n words).\n \u2022 GPT uses a sentence separator ([SEP]) and\n classifier token ([CLS]) which are only in-\n troduced at fine-tuning time; BERT learns\n [SEP], [CLS] and sentence A/B embed-\n dings during pre-training.\n \u2022 GPT was trained for 1M steps with a batch\n size of 32,000 words; BERT was trained for\n 1M steps with a batch size of 128,000 words.\n\n \u2022 GPT used the same learning rate of 5e-5 for\n all fine-tuning experiments; BERT chooses a\n task-specific fine-tuning learning rate which\n performs the best on the development set.\n To isolate the effect of these differences, we per-\nform ablation experiments in Section 5.1 which\ndemonstrate that the majority of the improvements\nare in fact coming from the two pre-training tasks\nand the bidirectionality they enable.\n\nA.5 Illustrations of Fine-tuning on Different\n Tasks\nThe illustration of fine-tuning BERT on different\ntasks can be seen in Figure 4. Our task-specific\nmodels are formed by incorporating BERT with\none additional output layer, so a minimal num-\nber of parameters need to be learned from scratch.\nAmong the tasks, (a) and (b) are sequence-level\ntasks while (c) and (d) are token-level tasks. In\nthe figure, E represents the input embedding, Ti\nrepresents the contextual representation of token i,\n[CLS] is the special symbol for classification out-\nput, and [SEP] is the special symbol to separate\nnon-consecutive token sequences.\n\nB Detailed Experimental Setup\nB.1 Detailed Descriptions for the GLUE\n Benchmark Experiments.\nOur GLUE results in Table1 are obtained\nfrom https://gluebenchmark.com/\nleaderboard and https://blog.\nopenai.com/language-unsupervised.\n", "mimetype": "text/plain", "start_char_idx": 230, "end_char_idx": 3269, "metadata_seperator": "\n", "text_template": "{metadata_str}\n\n{content}", "class_name": "TextNode"}, "__type__": "1"}, "7200837e-7b73-4376-b06f-d9b4c910cd3e": {"__data__": {"id_": "7200837e-7b73-4376-b06f-d9b4c910cd3e", "embedding": null, "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "excluded_embed_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "excluded_llm_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "relationships": {"1": {"node_id": "aaadb951-b38a-4e3a-accf-f6bfdb2f05a6", "node_type": "4", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "1e00f7be7373d801694101a53b6f6c259fe3d81300983be5be248aaf28f5803c", "class_name": "RelatedNodeInfo"}, "2": {"node_id": "f6bb6a46-a49b-41ef-b78b-84c7502bebb9", "node_type": "1", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "e3e97367c6c1af85c11cddda6d32ea6720ce5d064c10807902143af990bd19f1", "class_name": "RelatedNodeInfo"}}, "metadata_template": "{key}: {value}", "metadata_separator": "\n", "text": "The GLUE benchmark includes the following\ndatasets, the descriptions of which were originally\nsummarized in Wang et al. (2018a):\n\nMNLI Multi-Genre Natural Language Inference\nis a large-scale, crowdsourced entailment classifi-\ncation task (Williams et al., 2018). Given a pair of\nsentences, the goal is to predict whether the sec-\nond sentence is an entailment, contradiction, or\nneutral with respect to the first one.\nQQP Quora Question Pairs is a binary classifi-\ncation task where the goal is to determine if two\nquestions asked on Quora are semantically equiv-\nalent (Chen et al., 2018).\nQNLI Question Natural Language Inference is\na version of the Stanford Question Answering\nDataset (Rajpurkar et al., 2016) which has been\nconverted to a binary classification task (Wang\net al., 2018a). The positive examples are (ques-\ntion, sentence) pairs which do contain the correct\nanswer, and the negative examples are (question,\nsentence) from the same paragraph which do not\ncontain the answer.", "mimetype": "text/plain", "start_char_idx": 3269, "end_char_idx": 4277, "metadata_seperator": "\n", "text_template": "{metadata_str}\n\n{content}", "class_name": "TextNode"}, "__type__": "1"}, "bc03e397-6cf3-4852-a898-40b4c4063325": {"__data__": {"id_": "bc03e397-6cf3-4852-a898-40b4c4063325", "embedding": null, "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "excluded_embed_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "excluded_llm_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "relationships": {"1": {"node_id": "de34d689-b92f-4d51-9747-504f3d6499a7", "node_type": "4", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "45e5affb0a2bcfbbd8b92364275d130db3665278f02b87ddc29514a66d4e5b6b", "class_name": "RelatedNodeInfo"}, "3": {"node_id": "69f5ade7-59a5-4766-95c0-52706c102651", "node_type": "1", "metadata": {}, "hash": "b88fcb5a4ba36ce3506d6c3d76d2d277aa79e92199e8db9a72b70e22fac999ca", "class_name": "RelatedNodeInfo"}}, "metadata_template": "{key}: {value}", "metadata_separator": "\n", "text": " Class Class\n Label Label\n\n C T 1 ... T N T [SEP] T \u20191... T \u2019M C T 1 T 2 ... T N\n\n BERT BERT\n\nE [CLS]E 1 ... E N E [SEP] E \u20191... E \u2019M\n[CLS] Tok Tok Tok Tok\n\n\n 1 ... N [SEP] 1 ... M\nE [CLS] E 1 E 2 ... E N\n[CLS]\n[CLS] Tok 1\n Tok 1 Tok 2 Tok N\n ...\n\n Sentence 1 Sentence 2 Single Sentence\n\n Start/End Span O B-PER ... O\n\n C T 1 ... T N T [SEP] T \u20191... T \u2019M C T 1 T 2 ... T N\n\n BERT BERT\n\n E [CLS]E 1 ... E N E [SEP] E \u20191 ... E \u2019M E [CLS] E 1 E 2 ... E N\n\n [CLS] Tok ... Tok [SEP] Tok ... Tok [CLS] Tok 1 Tok 2 ... Tok N\n 1 N 1 M\n\n Question Paragraph Single Sentence\n\n Figure 4: Illustrations of Fine-tuning BERT on Different Tasks.\n\nSST-2 The Stanford Sentiment Treebank is a\nbinary single-sentence classification task consist-\ning of sentences extracted from movie reviews\nwith human annotations of their sentiment (Socher\net al., 2013).\n\nCoLA The Corpus of Linguistic Acceptability is\na binary single-sentence classification task, where\nthe goal is to predict whether an English sentence\nis linguistically \u201cacceptable\u201d or not (Warstadt\net al., 2018).\n\nSTS-B The Semantic Textual Similarity Bench-\nmark is a collection of sentence pairs drawn from\nnews headlines and other sources (Cer et al.,\n2017). They were annotated with a score from 1\nto 5 denoting how similar the two sentences are in\nterms of semantic meaning.\nMRPC Microsoft Research Paraphrase Corpus\nconsists of sentence pairs automatically extracted\n\nfrom online news sources, with human annotations\nfor whether the sentences in the pair are semanti-\ncally equivalent (Dolan and Brockett, 2005).\n\nRTE Recognizing Textual Entailment is a bi-\nnary entailment task similar to MNLI, but with\nmuch less training data (Bentivogli et al., 2009).14\n\nWNLI Winograd NLI is a small natural lan-\nguage inference dataset (Levesque et al., 2011).\nThe GLUE webpage notes that there are issues\nwith the construction of this dataset, 15 and every\ntrained system that\u2019s been submitted to GLUE has\nperformed worse than the 65.1 baseline accuracy\nof predicting the majority class. ", "mimetype": "text/plain", "start_char_idx": 0, "end_char_idx": 3411, "metadata_seperator": "\n", "text_template": "{metadata_str}\n\n{content}", "class_name": "TextNode"}, "__type__": "1"}, "69f5ade7-59a5-4766-95c0-52706c102651": {"__data__": {"id_": "69f5ade7-59a5-4766-95c0-52706c102651", "embedding": null, "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "excluded_embed_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "excluded_llm_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "relationships": {"1": {"node_id": "de34d689-b92f-4d51-9747-504f3d6499a7", "node_type": "4", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "45e5affb0a2bcfbbd8b92364275d130db3665278f02b87ddc29514a66d4e5b6b", "class_name": "RelatedNodeInfo"}, "2": {"node_id": "bc03e397-6cf3-4852-a898-40b4c4063325", "node_type": "1", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "e1ade5db5b47d10cff19421ad494efc0d9bda59731a1f1a6ec892c986aaf732b", "class_name": "RelatedNodeInfo"}}, "metadata_template": "{key}: {value}", "metadata_separator": "\n", "text": "We therefore ex-\nclude this set to be fair to OpenAI GPT. For our\nGLUE submission, we always predicted the ma-\n 14Note that we only report single-task fine-tuning results\nin this paper. A multitask fine-tuning approach could poten-\ntially push the performance even further. For example, we\ndid observe substantial improvements on RTE from multi-\ntask training with MNLI.15https://gluebenchmark.com/faq", "mimetype": "text/plain", "start_char_idx": 3411, "end_char_idx": 3814, "metadata_seperator": "\n", "text_template": "{metadata_str}\n\n{content}", "class_name": "TextNode"}, "__type__": "1"}, "3ab56ff6-1afe-4901-b46a-0160b56c7a48": {"__data__": {"id_": "3ab56ff6-1afe-4901-b46a-0160b56c7a48", "embedding": null, "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "excluded_embed_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "excluded_llm_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "relationships": {"1": {"node_id": "1d130fec-17ad-4a6a-a2dd-1409c65e4a7d", "node_type": "4", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "67ea5751ce9c00dc7ddd9fe026d2254e6bf5e49f6d210ca65dd9e2f7604edfa2", "class_name": "RelatedNodeInfo"}, "3": {"node_id": "c6a86942-07f0-4b10-a27c-f02d201c542f", "node_type": "1", "metadata": {}, "hash": "428f9c0596ab18d35b02b20d26927b1c81e0ed04eccf0fb69154c366273dbbaf", "class_name": "RelatedNodeInfo"}}, "metadata_template": "{key}: {value}", "metadata_separator": "\n", "text": "jority class.\nC Additional Ablation Studies\nC.1 Effect of Number of Training Steps\n\nFigure 5 presents MNLI Dev accuracy after fine-\ntuning from a checkpoint that has been pre-trained\nfor k steps. This allows us to answer the following\nquestions:\n\n 1. Question: Does BERT really need such\n a large amount of pre-training (128,000\n words/batch * 1,000,000 steps) to achieve\n high fine-tuning accuracy?\n Answer: Yes, BERTBASE achieves almost\n 1.0% additional accuracy on MNLI when\n\n trained on 1M steps compared to 500k steps.\n\n 2. Question: Does MLM pre-training converge\n slower than LTR pre-training, since only 15%\n of words are predicted in each batch rather\n than every word?\n Answer: The MLM model does converge\n slightly slower than the LTR model. How-\n ever, in terms of absolute accuracy the MLM\n model begins to outperform the LTR model\n almost immediately.\n\nC.2 Ablation for Different Masking\n Procedures\nIn Section 3.1, we mention that BERT uses a\nmixed strategy for masking the target tokens when\npre-training with the masked language model\n(MLM) objective. The following is an ablation\nstudy to evaluate the effect of different masking\nstrategies.\n\n 84\n\n 82\n 80\n\n 78\n BERTBASE (Masked LM)\n 76 BERTBASE (Left-to-Right)\n Note that the purpose of the masking strategiesMNLI Dev Accuracyis to reduce the mismatch between pre-training\nand fine-tuning, as the [MASK] symbol never ap-\npears during the fine-tuning stage. We report the\nDev results for both MNLI and NER. For NER,\nwe report both fine-tuning and feature-based ap-\nproaches, as we expect the mismatch will be am-\nplified for the feature-based approach as the model\nwill not have the chance to adjust the representa-\ntions.\n\n Masking Rates Dev Set Results\nMASK SAME RND MNLI NER\n Fine-tune Fine-tune Feature-based\n 80% 10% 10% 84.2 95.4 94.9\n 100% 0% 0% 84.3 94.9 94.0\n 80% 0% 20% 84.1 95.2 94.6\n 80% 20% 0% 84.4 95.2 94.7\n 0% 20% 80% 83.7 94.8 94.6\n 0% 0% 100% 83.6 94.9 94.6\n\n Table 8: Ablation over different masking strategies.\n\n The results are presented in Table 8. ", "mimetype": "text/plain", "start_char_idx": 0, "end_char_idx": 2481, "metadata_seperator": "\n", "text_template": "{metadata_str}\n\n{content}", "class_name": "TextNode"}, "__type__": "1"}, "c6a86942-07f0-4b10-a27c-f02d201c542f": {"__data__": {"id_": "c6a86942-07f0-4b10-a27c-f02d201c542f", "embedding": null, "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "excluded_embed_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "excluded_llm_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "relationships": {"1": {"node_id": "1d130fec-17ad-4a6a-a2dd-1409c65e4a7d", "node_type": "4", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "67ea5751ce9c00dc7ddd9fe026d2254e6bf5e49f6d210ca65dd9e2f7604edfa2", "class_name": "RelatedNodeInfo"}, "2": {"node_id": "3ab56ff6-1afe-4901-b46a-0160b56c7a48", "node_type": "1", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "e9e54ed853ee5bbdbf469f59fa078b65885530ae445ff05a7b5b16287628e0f6", "class_name": "RelatedNodeInfo"}, "3": {"node_id": "d4f64e83-6625-43c1-8da5-3551fee253a5", "node_type": "1", "metadata": {}, "hash": "3d0387a38319c0c4a6a07cfa63d48bb879e376c7fd84b7549569a6e4d28f4631", "class_name": "RelatedNodeInfo"}}, "metadata_template": "{key}: {value}", "metadata_separator": "\n", "text": "In the table,\nMASK means that we replace the target token with\nthe [MASK] symbol for MLM; SAME means that\nwe keep the target token as is; RND means that\nwe replace the target token with another random\ntoken.\n The numbers in the left part of the table repre-\nsent the probabilities of the specific strategies used\nduring MLM pre-training (BERT uses 80%, 10%,\n10%). The right part of the paper represents the\nDev set results. For the feature-based approach,\nwe concatenate the last 4 layers of BERT as the\nfeatures, which was shown to be the best approach\nin Section 5.3.From the table it can be seen that fine-tuning is\nsurprisingly robust to different masking strategies.\nHowever, as expected, using only the MASK strat-\negy was problematic when applying the feature-\nbased approach to NER. Interestingly, using only\nthe RND strategy performs much worse than our\nstrategy as well.\n\n\n\n\n ", "mimetype": "text/plain", "start_char_idx": 2481, "end_char_idx": 3404, "metadata_seperator": "\n", "text_template": "{metadata_str}\n\n{content}", "class_name": "TextNode"}, "__type__": "1"}, "d4f64e83-6625-43c1-8da5-3551fee253a5": {"__data__": {"id_": "d4f64e83-6625-43c1-8da5-3551fee253a5", "embedding": null, "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "excluded_embed_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "excluded_llm_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "relationships": {"1": {"node_id": "1d130fec-17ad-4a6a-a2dd-1409c65e4a7d", "node_type": "4", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "67ea5751ce9c00dc7ddd9fe026d2254e6bf5e49f6d210ca65dd9e2f7604edfa2", "class_name": "RelatedNodeInfo"}, "2": {"node_id": "c6a86942-07f0-4b10-a27c-f02d201c542f", "node_type": "1", "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}, "hash": "3304a0ebf9902880e1b4a1890134196408a77c6ef7a614889fba777106713ac9", "class_name": "RelatedNodeInfo"}}, "metadata_template": "{key}: {value}", "metadata_separator": "\n", "text": "200 400 600 800 1,000\n Pre-training Steps (Thousands)\n\n Figure 5: Ablation over number of training steps. This\n shows the MNLI accuracy after fine-tuning, starting\n from model parameters that have been pre-trained for\n k steps. The x-axis is the value of k.", "mimetype": "text/plain", "start_char_idx": 3404, "end_char_idx": 3801, "metadata_seperator": "\n", "text_template": "{metadata_str}\n\n{content}", "class_name": "TextNode"}, "__type__": "1"}}, "docstore/metadata": {"8d55e99e-029a-47c6-8fae-5bff1ee7672a": {"doc_hash": "2aca2bfa89e3ba6ce60d62fd9a99c06e1bf5d9cd3ff7a62bdec2bf59ae34d9f9", "ref_doc_id": "f94e48ce-e161-46d1-800f-430e38b4962c"}, "1d81b4fd-e928-4180-8ef6-a41371ac1fed": {"doc_hash": "d186919dd54075a6eda65382523733547789e6d01b2b18af581dc53ef19974ae", "ref_doc_id": "f94e48ce-e161-46d1-800f-430e38b4962c"}, "a58908ab-8178-40cd-b2a3-efcaf65228f2": {"doc_hash": "edd6518c76da299b4e2642067d6fafe03a3f884aa22306e6a84a97607f1a363c", "ref_doc_id": "dcba1f47-98fb-4b83-8b82-bb0a4e4c9db8"}, "ab890d58-e9ed-49ae-9c78-282a91366161": {"doc_hash": "c48e35db928469b51656e9cd7516a4f31009d3c2ebcf1de606c315125da340d6", "ref_doc_id": "dcba1f47-98fb-4b83-8b82-bb0a4e4c9db8"}, "9bbcaaca-e282-4f65-a5c2-9a5708a228eb": {"doc_hash": "01e61f2445fd436f8507df047920f5265c41b2975f6ff40cdae587854553c4bb", "ref_doc_id": "dcba1f47-98fb-4b83-8b82-bb0a4e4c9db8"}, "926f43b0-e58c-4e8f-b766-c90f2447361e": {"doc_hash": "42369be9595563bb5dd96a4dde08a6e205223c2c6f3edb12c542a50a943a6e42", "ref_doc_id": "f9edbae8-ed8d-4147-803e-936500d2d60f"}, "1a82bbed-7218-4c23-8cb5-402971703a30": {"doc_hash": "fc2e237c06d17ca2f6b8044e119b7a0ef6543d27c24ae314a1c3616d6ea6467e", "ref_doc_id": "f9edbae8-ed8d-4147-803e-936500d2d60f"}, "a93a9152-5892-4b84-87c4-d7d7e57b5d64": {"doc_hash": "f10e994c1d91ae4c8bc4ea64aa8b6ba381a3fe5278f5a865b368cd6658d1b01a", "ref_doc_id": "f9edbae8-ed8d-4147-803e-936500d2d60f"}, "abbf6d32-14ba-4c60-850d-cf4429b349e4": {"doc_hash": "41b6d9a30bcbf21c8b5930fbeae0cc97e5eaf1a435c890cf50c611130ed5513a", "ref_doc_id": "cc193b63-6a56-496d-92cc-812a0a7cf204"}, "f1f06b3a-489c-4861-b5be-72cd1c1d8e80": {"doc_hash": "7171358e6db4a7d03cbed4a1dde37cc4c3afdd7ed30c429f837a996a188fa018", "ref_doc_id": "cc193b63-6a56-496d-92cc-812a0a7cf204"}, "544c5e06-611b-4fe1-93ae-aa3441eb385e": {"doc_hash": "37cda6455ab0b252d0c7797c47cfa0c5fc72c8abbf7747e8f89c2d40fe6d69ab", "ref_doc_id": "cc193b63-6a56-496d-92cc-812a0a7cf204"}, "b635c2da-b277-4b49-936d-8ac5939da468": {"doc_hash": "984d97b14cd67774554e50dda8a526787755c1ed32bce97782170293021e249b", "ref_doc_id": "5a8026ec-62da-4652-ab01-3c38b64fda7c"}, "2f892821-777d-4fb6-ac8f-611ef0566d7d": {"doc_hash": "b1b75cf9b57f383786b17095af33f3371cec5a2680d9146d9a5a6747db75b6c9", "ref_doc_id": "5a8026ec-62da-4652-ab01-3c38b64fda7c"}, "6c2b1028-b0b0-4f92-af51-9660eb0f49f5": {"doc_hash": "e2dc9f57eb6a14717530ff0b2f4fe5ab98181c686c49d22a8de64969fd38e0b3", "ref_doc_id": "5a8026ec-62da-4652-ab01-3c38b64fda7c"}, "349ffcb2-b1dd-49ef-b7f7-37406a635c71": {"doc_hash": "17269988ca15a411246f71360ae77c4492f64e2d353d65412d6591e307bda7ee", "ref_doc_id": "bc534cc6-e535-4b99-b43b-605d5b37174d"}, "43c7eae6-7d68-47bc-8fd4-8f0584405231": {"doc_hash": "9e3b7dd2238d9acd968b8a769d0581f893d0699ebdb7bc690a8cf8e45cd3d9c4", "ref_doc_id": "bc534cc6-e535-4b99-b43b-605d5b37174d"}, "15b81cf1-d657-41b4-a9ff-703f8e5e6fac": {"doc_hash": "aaac0953e78e1481ec674f4706a04223fe34ac76e4aff2cd9a9f51f92ad56b09", "ref_doc_id": "bc534cc6-e535-4b99-b43b-605d5b37174d"}, "0af4d994-64d1-4bf4-8624-55ed2e4f7dbd": {"doc_hash": "5b203c46fac99619997fbe420c7403431c3da2285853ad72f69b2509fc694e47", "ref_doc_id": "7ab3456d-9fa7-49a8-856f-b2a713991fd8"}, "b6154034-ddde-4fd8-a018-126e613aa014": {"doc_hash": "605d062d81f57157a49ee24b715fbe954d1ac4da24bf69e5746ce728e1edcf3b", "ref_doc_id": "7ab3456d-9fa7-49a8-856f-b2a713991fd8"}, "61d2f504-bc66-41b5-b6b3-662d999e9f60": {"doc_hash": "1c716450597b1e4b1afe4cc8b954902e8f0e54f4f1b4b9ed4783671708cba627", "ref_doc_id": "7ab3456d-9fa7-49a8-856f-b2a713991fd8"}, "049233ae-7f17-4b97-b212-478744e93165": {"doc_hash": "1816ded9e583043856ebde339cf39eff11a9ecbdc0f6b580ae47bd0efb3766af", "ref_doc_id": "13d1b84a-4210-4825-933b-a6d498a30c00"}, "21c9c67d-01b5-49c7-b0c5-7c5856abacbe": {"doc_hash": "877b4d729adb665b0fdce9ed776eaa6b3ebd1bb13eeaba93e912c0f85fc9c1c5", "ref_doc_id": "13d1b84a-4210-4825-933b-a6d498a30c00"}, "2b41dc46-a706-4035-8707-8bdac62c2cfc": {"doc_hash": "4854d9c2319df1e9cd22d54da936e65c2a02eee859483e4bf9e628a3d1b6db23", "ref_doc_id": "13d1b84a-4210-4825-933b-a6d498a30c00"}, "641ed924-8e3f-4b7c-ae65-66e3dc4da5d5": {"doc_hash": "edee340ad5b5b677edb913fcea52cb0f2a57f73bae804a9a5b35312f9e7408f5", "ref_doc_id": "6ea55280-c096-4e63-a744-d6fbd76d6e91"}, "60786148-3cfd-4e95-ab5d-256991f19a68": {"doc_hash": "2b552c060d99c83f74c0324521e3574349a02cadcdf8c75a2d697b610b9a929f", "ref_doc_id": "6ea55280-c096-4e63-a744-d6fbd76d6e91"}, "9f8093af-21a6-443b-a6ac-864fb66387f8": {"doc_hash": "aaa375d627f38ef98fc757becb63f3dffc72afaad2793af7d20d9d196df6d314", "ref_doc_id": "6ea55280-c096-4e63-a744-d6fbd76d6e91"}, "e1c258d9-0291-4310-9fbd-a17f908a5826": {"doc_hash": "439d13a98b6c08467cf9614fccd626de0db3605cb4fd6688c180248ab85390e7", "ref_doc_id": "5d240373-8164-4fbe-9ff4-02a17074549b"}, "6808912a-ceb2-47ba-9281-2f1c06afe3d9": {"doc_hash": "226fd2b43894cd7d2595c0ca96cab1d872db2c703dd2ad5db2cc65ac79e93989", "ref_doc_id": "5d240373-8164-4fbe-9ff4-02a17074549b"}, "d1afa468-be5c-4597-8f48-c90574dff711": {"doc_hash": "dac55530bb0e1e04298ea3174a6d8afeaf75774267b062d665bf2cd5d72b6fd1", "ref_doc_id": "5d240373-8164-4fbe-9ff4-02a17074549b"}, "d854ece0-8e05-4e06-ba7d-442eb5a771eb": {"doc_hash": "b3aa4da2f8a1cbc7f15222336bb2c50764469bdc5c7b461c11c4da23ea5bd7b2", "ref_doc_id": "5d240373-8164-4fbe-9ff4-02a17074549b"}, "f48e1778-26db-49b0-89e7-e04c961609cc": {"doc_hash": "39cb5ba8f8c815af3d966b4807c50db18346ef77ac003de3e592e7a97b6f3648", "ref_doc_id": "5d240373-8164-4fbe-9ff4-02a17074549b"}, "f1a6746d-1e02-49e9-be62-360454d78ce3": {"doc_hash": "5ffb5da693a4515d8a99aa3c5378fe9fcceb8af750407a6fe2f027b6bb10d3cf", "ref_doc_id": "5d240373-8164-4fbe-9ff4-02a17074549b"}, "388d7ccb-8037-4295-9e61-5e7bc66581e2": {"doc_hash": "4a2ff896ea82da46620dba0e1daf8725c78ea8ecb65c1904e66295ef2bd8b681", "ref_doc_id": "74359064-bf24-40e0-9818-ab1d000bff3e"}, "8e29386d-e646-4e6b-8096-3545b568bd44": {"doc_hash": "2c1af3ed8808aa9d2ef1739e0ec9a336ed0872382c0b1734d3f39e5a792bdf3d", "ref_doc_id": "74359064-bf24-40e0-9818-ab1d000bff3e"}, "48a545c7-3964-4115-88cc-e2df29b360a0": {"doc_hash": "32a352e883afde83740202d56a5eee63aa32aa05471451b5aecc0494b951ae4a", "ref_doc_id": "74359064-bf24-40e0-9818-ab1d000bff3e"}, "4ae1f21b-eae6-41ac-a45b-c08dcc7346e5": {"doc_hash": "9d1fddbe4bf045683de124d183fd5d71823e0de08303edccda49b94f7a06bd6a", "ref_doc_id": "74359064-bf24-40e0-9818-ab1d000bff3e"}, "f7ae2b3e-8e5f-4317-9483-6029a02f4a66": {"doc_hash": "b278f032dbb5c904b0a6d0e7d0e86eae73b5c83c0bd47274ac3e8fa844d2439a", "ref_doc_id": "74359064-bf24-40e0-9818-ab1d000bff3e"}, "cd5b0464-f7f4-465c-894e-93dd7a8f1e77": {"doc_hash": "abfa47e008a706030ded92c08bb68bf666b50e88ed3bf341d502468d16b5e686", "ref_doc_id": "74359064-bf24-40e0-9818-ab1d000bff3e"}, "5042ecfe-b092-4370-a34d-f747863465a0": {"doc_hash": "ac66a64796c41de33dcab9aa3e041ccd7824d97d64672665be8e42e4253b6f8e", "ref_doc_id": "166109f8-fb93-4127-abe5-964ed07035c2"}, "3efbee33-f0fd-4f16-bf5c-3c7a897c1562": {"doc_hash": "7289a56bcc3a329f219dbe8f3c17f35d43e9ecab375b0d711f807efe993d5ee9", "ref_doc_id": "166109f8-fb93-4127-abe5-964ed07035c2"}, "87a4fe6d-ce60-4893-a3ab-faea7aa65407": {"doc_hash": "d50d92f080832f7adf7adfca276409259aaeb76000588441dc7dfa1a5e5a4957", "ref_doc_id": "166109f8-fb93-4127-abe5-964ed07035c2"}, "4ee6eb46-8fd4-43f4-b321-1711067a516f": {"doc_hash": "a2fa6a035ac9bdb9b7e8a7c6985270597942f33e99dd48074b349c6739f74a63", "ref_doc_id": "166109f8-fb93-4127-abe5-964ed07035c2"}, "305ade7f-2710-4529-8972-2d49133a67ed": {"doc_hash": "b07e8aeacea96fb9c4d2b4780c8fbfab3315c10acdbc5ebf1b197df3b56f1907", "ref_doc_id": "77ff5d8d-0dda-479c-9295-ad7f30935d02"}, "8b857906-4aa7-4a72-9b9b-fe4e7472a16b": {"doc_hash": "ffddeddb1979247b208f81899c271a3d56e3b04902804318e7d344dcc44b1280", "ref_doc_id": "77ff5d8d-0dda-479c-9295-ad7f30935d02"}, "e4240080-f6c3-485f-b3b8-bc17acedd026": {"doc_hash": "e032f9d9a50a15d1bd37544bf800a56bc1512e093defa96cdd8b10202dcca54e", "ref_doc_id": "77ff5d8d-0dda-479c-9295-ad7f30935d02"}, "7432a915-d9a3-49ac-bda3-f72850d06063": {"doc_hash": "054e75f13f4902325c847e83695ac16adca2a5333266a2405b7c56020376292b", "ref_doc_id": "aaadb951-b38a-4e3a-accf-f6bfdb2f05a6"}, "f6bb6a46-a49b-41ef-b78b-84c7502bebb9": {"doc_hash": "e3e97367c6c1af85c11cddda6d32ea6720ce5d064c10807902143af990bd19f1", "ref_doc_id": "aaadb951-b38a-4e3a-accf-f6bfdb2f05a6"}, "7200837e-7b73-4376-b06f-d9b4c910cd3e": {"doc_hash": "8073a590f45482b1f25286526e36f00c8dda79028b691774a20a673ccf997ad6", "ref_doc_id": "aaadb951-b38a-4e3a-accf-f6bfdb2f05a6"}, "bc03e397-6cf3-4852-a898-40b4c4063325": {"doc_hash": "e1ade5db5b47d10cff19421ad494efc0d9bda59731a1f1a6ec892c986aaf732b", "ref_doc_id": "de34d689-b92f-4d51-9747-504f3d6499a7"}, "69f5ade7-59a5-4766-95c0-52706c102651": {"doc_hash": "674935bb7c1ee9b4bd0e5e0bc5b5457267cafbbab5f75be6d12804d47edcc74c", "ref_doc_id": "de34d689-b92f-4d51-9747-504f3d6499a7"}, "3ab56ff6-1afe-4901-b46a-0160b56c7a48": {"doc_hash": "e9e54ed853ee5bbdbf469f59fa078b65885530ae445ff05a7b5b16287628e0f6", "ref_doc_id": "1d130fec-17ad-4a6a-a2dd-1409c65e4a7d"}, "c6a86942-07f0-4b10-a27c-f02d201c542f": {"doc_hash": "3304a0ebf9902880e1b4a1890134196408a77c6ef7a614889fba777106713ac9", "ref_doc_id": "1d130fec-17ad-4a6a-a2dd-1409c65e4a7d"}, "d4f64e83-6625-43c1-8da5-3551fee253a5": {"doc_hash": "257f9cea62315ec330ff4cbe534f75feaff0bd7ba072c73342b45ee9c61a5eea", "ref_doc_id": "1d130fec-17ad-4a6a-a2dd-1409c65e4a7d"}}, "docstore/ref_doc_info": {"f94e48ce-e161-46d1-800f-430e38b4962c": {"node_ids": ["8d55e99e-029a-47c6-8fae-5bff1ee7672a", "1d81b4fd-e928-4180-8ef6-a41371ac1fed"], "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}}, "dcba1f47-98fb-4b83-8b82-bb0a4e4c9db8": {"node_ids": ["a58908ab-8178-40cd-b2a3-efcaf65228f2", "ab890d58-e9ed-49ae-9c78-282a91366161", "9bbcaaca-e282-4f65-a5c2-9a5708a228eb"], "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}}, "f9edbae8-ed8d-4147-803e-936500d2d60f": {"node_ids": ["926f43b0-e58c-4e8f-b766-c90f2447361e", "1a82bbed-7218-4c23-8cb5-402971703a30", "a93a9152-5892-4b84-87c4-d7d7e57b5d64"], "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}}, "cc193b63-6a56-496d-92cc-812a0a7cf204": {"node_ids": ["abbf6d32-14ba-4c60-850d-cf4429b349e4", "f1f06b3a-489c-4861-b5be-72cd1c1d8e80", "544c5e06-611b-4fe1-93ae-aa3441eb385e"], "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}}, "5a8026ec-62da-4652-ab01-3c38b64fda7c": {"node_ids": ["b635c2da-b277-4b49-936d-8ac5939da468", "2f892821-777d-4fb6-ac8f-611ef0566d7d", "6c2b1028-b0b0-4f92-af51-9660eb0f49f5"], "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}}, "bc534cc6-e535-4b99-b43b-605d5b37174d": {"node_ids": ["349ffcb2-b1dd-49ef-b7f7-37406a635c71", "43c7eae6-7d68-47bc-8fd4-8f0584405231", "15b81cf1-d657-41b4-a9ff-703f8e5e6fac"], "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}}, "7ab3456d-9fa7-49a8-856f-b2a713991fd8": {"node_ids": ["0af4d994-64d1-4bf4-8624-55ed2e4f7dbd", "b6154034-ddde-4fd8-a018-126e613aa014", "61d2f504-bc66-41b5-b6b3-662d999e9f60"], "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}}, "13d1b84a-4210-4825-933b-a6d498a30c00": {"node_ids": ["049233ae-7f17-4b97-b212-478744e93165", "21c9c67d-01b5-49c7-b0c5-7c5856abacbe", "2b41dc46-a706-4035-8707-8bdac62c2cfc"], "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}}, "6ea55280-c096-4e63-a744-d6fbd76d6e91": {"node_ids": ["641ed924-8e3f-4b7c-ae65-66e3dc4da5d5", "60786148-3cfd-4e95-ab5d-256991f19a68", "9f8093af-21a6-443b-a6ac-864fb66387f8"], "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}}, "5d240373-8164-4fbe-9ff4-02a17074549b": {"node_ids": ["e1c258d9-0291-4310-9fbd-a17f908a5826", "6808912a-ceb2-47ba-9281-2f1c06afe3d9", "d1afa468-be5c-4597-8f48-c90574dff711", "d854ece0-8e05-4e06-ba7d-442eb5a771eb", "f48e1778-26db-49b0-89e7-e04c961609cc", "f1a6746d-1e02-49e9-be62-360454d78ce3"], "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}}, "74359064-bf24-40e0-9818-ab1d000bff3e": {"node_ids": ["388d7ccb-8037-4295-9e61-5e7bc66581e2", "8e29386d-e646-4e6b-8096-3545b568bd44", "48a545c7-3964-4115-88cc-e2df29b360a0", "4ae1f21b-eae6-41ac-a45b-c08dcc7346e5", "f7ae2b3e-8e5f-4317-9483-6029a02f4a66", "cd5b0464-f7f4-465c-894e-93dd7a8f1e77"], "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}}, "166109f8-fb93-4127-abe5-964ed07035c2": {"node_ids": ["5042ecfe-b092-4370-a34d-f747863465a0", "3efbee33-f0fd-4f16-bf5c-3c7a897c1562", "87a4fe6d-ce60-4893-a3ab-faea7aa65407", "4ee6eb46-8fd4-43f4-b321-1711067a516f"], "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}}, "77ff5d8d-0dda-479c-9295-ad7f30935d02": {"node_ids": ["305ade7f-2710-4529-8972-2d49133a67ed", "8b857906-4aa7-4a72-9b9b-fe4e7472a16b", "e4240080-f6c3-485f-b3b8-bc17acedd026"], "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}}, "aaadb951-b38a-4e3a-accf-f6bfdb2f05a6": {"node_ids": ["7432a915-d9a3-49ac-bda3-f72850d06063", "f6bb6a46-a49b-41ef-b78b-84c7502bebb9", "7200837e-7b73-4376-b06f-d9b4c910cd3e"], "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}}, "de34d689-b92f-4d51-9747-504f3d6499a7": {"node_ids": ["bc03e397-6cf3-4852-a898-40b4c4063325", "69f5ade7-59a5-4766-95c0-52706c102651"], "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}}, "1d130fec-17ad-4a6a-a2dd-1409c65e4a7d": {"node_ids": ["3ab56ff6-1afe-4901-b46a-0160b56c7a48", "c6a86942-07f0-4b10-a27c-f02d201c542f", "d4f64e83-6625-43c1-8da5-3551fee253a5"], "metadata": {"file_path": "C:\\MAIN\\it\\projects\\vs\\ds_rag\\data\\paper.pdf", "file_name": "paper.pdf", "file_type": "application/pdf", "file_size": 775166, "creation_date": "2024-12-05", "last_modified_date": "2024-12-08"}}}}