Update README.md
Browse files
README.md
CHANGED
@@ -22,12 +22,34 @@ Quick links:
|
|
22 |
|
23 |
The best way to use this model is via the [olmOCR toolkit](https://github.com/allenai/olmocr).
|
24 |
|
25 |
-
##
|
26 |
|
27 |
This model expects as input a single document image, rendered such that the longest dimension is 1024 pixels.
|
28 |
|
29 |
The prompt must then contain the additional metadata from the document, and the easiest way to generate this
|
30 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
31 |
|
32 |
## License and use
|
33 |
|
|
|
22 |
|
23 |
The best way to use this model is via the [olmOCR toolkit](https://github.com/allenai/olmocr).
|
24 |
|
25 |
+
## Usage
|
26 |
|
27 |
This model expects as input a single document image, rendered such that the longest dimension is 1024 pixels.
|
28 |
|
29 |
The prompt must then contain the additional metadata from the document, and the easiest way to generate this
|
30 |
+
|
31 |
+
|
32 |
+
## Manual Prompting
|
33 |
+
|
34 |
+
```python
|
35 |
+
image_base64 = [base64 image of PDF rendered down to 1024 px on longest edge]
|
36 |
+
|
37 |
+
"messages": [
|
38 |
+
{
|
39 |
+
"role": "user",
|
40 |
+
"content": [
|
41 |
+
{"type": "text", "text": "Below is the image of one page of a document, as well as some raw textual content that was previously extracted for it. Just return the plain text representation of this document as if you were reading it naturally.
|
42 |
+
Do not hallucinate.
|
43 |
+
RAW_TEXT_START
|
44 |
+
Page dimensions: 1836.8x2267.2
|
45 |
+
[Image 0x0 to 1837x2267]
|
46 |
+
|
47 |
+
RAW_TEXT_END"},
|
48 |
+
{"type": "image_url", "image_url": {"url": f"data:image/png;base64,{image_base64}"}},
|
49 |
+
],
|
50 |
+
}
|
51 |
+
],
|
52 |
+
```
|
53 |
|
54 |
## License and use
|
55 |
|