Capabilties

What is Optical Character Recognition (OCR)?

Upstage Document OCR is designed to efficiently detect and recognize text from a wide range of document images, ensuring high accuracy and versatility across various languages and image qualities.

Available models

ModelRelease dateDescriptionAmazon SageMaker
ocr-2.2.12024-06-11 An OCR model specialized for English and Korean, wich additional support for Japanese and Chinese character sets (Hanja, Hanzi, and Kanji). Resilient against real-world images, including wrinkled papers and rotated text.Document OCR

Understanding model output

Robustness on real-world documents

Our OCR model is designed to provide robust performance in various document processing scenarios, including rotated images, watermarks, noise, and checkboxes. It accurately detects and recognizes text by training on a high-quality dataset that covers a wide range of scenarios. The model's ability to accurately detect the upper-left corner of word boxes in rotated documents, as well as its ability to ignore watermarks and checkboxes during training, ensures that only meaningful text from the document is extracted. This makes it an ideal solution for businesses and individuals seeking accurate and efficient document processing capabilities.

namecard-ocr.png
Figure 1: detecting words in the rotated images

Utilizing confidence score

Upstage OCR generates a confidence score that measures the likelihood of the recognized text being correct during the character recognition process. This score helps to indicate the accuracy of the OCR system's output. The score is initially created at the character level but is calibrated at the word level to make it more useful for applications. The confidence score can be used to visualize or verify the recognized content, with lower scores highlighting areas that require closer inspection or filtering. This process helps assess the reliability of extracted text and determines if additional verification is needed, ultimately enhancing overall accuracy and user trust.

ocr-highlight.png
Figure 2. An example of an application that highlights the low-confident entities from the OCR result.
hello.png
Figure 3. Example of Document OCR API response in JSON format with the confidence score.

Requirements

  • Supported file formats: JPEG, PNG, BMP, PDF, TIFF, HEIC, DOCX, PPTX, XLSX
  • Maximum file size: 50MB
  • Maximum number of pages per file: 30 pages (For files exceeding 30 pages, the first 30 pages are processed)
  • Maximum pixels per page: 100,000,000 pixels. For non-image files, the pixel count is determined after converting to images at a standard of 300 DPI.
  • Supported character sets: Alphanumeric, Hangul, and Hanja are supported. Hanzi and Kanji are in beta versions, indicating that they are available but not fully supported.
  • Text size: Optimized for text size that is approximately under 30% of the page size. Examples that don't meet these standards are considered bad examples, and could result in a response error.

Hanja, Hanzi, and Kanji are writing systems based on Chinese characters used in Korean, Chinese, and Japanese writing systems. Despite sharing similarities, they possess distinct visual representations, pronunciations, meanings, and usage conventions within their respective linguistic contexts. For more information, see this article.

API reference

POST
/document-ai/ocr
/document-ai/ocr

The Authorization access token

Authorization

Authorization
Required
Bearer <token>

In: header

Request Body

multipart/form-dataRequired

document
Required
file

The document file to be processed.

schemastring

An optional parameter that specifies the response format. If set, the output is converted to the format of the corresponding OCR API. Valid values are "clova", "google" or None. All values are provided exclusively by Upstage models, and are irrelevant with each service provider. Default value is None.

modelstring

An optional parameter that specifices the model version to be used. Available models can be found at the top of this document.

import requests
 
api_key = "UPSTAGE_API_KEY"  # ex: up_xxxYYYzzzAAAbbbCCC
filename = "YOUR_FILE_NAME"  # ex: ./image.png
 
url = "https://api.upstage.ai/v1/document-ai/ocr"
headers = {"Authorization": f"Bearer {api_key}"}
 
files = {"document": open(filename, "rb")}
response = requests.post(url, headers=headers, files=files)
 
print(response.json())
 

Success

{
  "apiVersion": "v1.1",
  "confidence": 0.98,
  "mimeType": "multipart/form-data",
  "modelVersion": "ocr-2.2.1",
  "numBilledPages": 5,
  "pages": [
    {
      "confidence": 0.97,
      "height": 1600,
      "width": 1200,
      "text": "This is the content of the page.",
      "words": [
        {
          "text": "Invoice",
          "confidence": 0.95,
          "boundingBox": {
            "vertices": [
              {
                "x": 50,
                "y": 75
              },
              {
                "x": 150,
                "y": 75
              },
              {
                "x": 150,
                "y": 100
              },
              {
                "x": 50,
                "y": 100
              }
            ]
          }
        }
      ]
    }
  ],
  "stored": true,
  "text": "This is the full document text.",
  "metadata": {
    "pageSize": "A4",
    "totalPages": 5
  }
}

Example

Request

hello.png
hello.png
import requests
 
api_key = "UPSTAGE_API_KEY"
filename = "hello.png"
 
url = "https://api.upstage.ai/v1/document-ai/ocr"
headers = {"Authorization": f"Bearer {api_key}"}
files = {"document": open(filename, "rb")}
response = requests.post(url, headers=headers, files=files)
print(response.json())

Response

{
    "apiVersion": "1.1",
    "confidence": 0.9924988460974842,
    "metadata": {
        "pages": [
            {
                "height": 256,
                "page": 1,
                "width": 786
            }
        ]
    },
    "mimeType": "multipart/form-data",
    "modelVersion": "ocr-2.2.1",
    "numBilledPages": 1,
    "pages": [
        {
            "confidence": 0.9924988460974842,
            "height": 256,
            "id": 0,
            "text": "Print the words \nhello, world",
            "width": 786,
            "words": [
                {
                    "boundingBox": {
                        "vertices": [
                            {
                                "x": 65,
                                "y": 52
                            },
                            {
                                "x": 221,
                                "y": 55
                            },
                            {
                                "x": 221,
                                "y": 104
                            },
                            {
                                "x": 64,
                                "y": 101
                            }
                        ]
                    },
                    "confidence": 0.9950619419121907,
                    "id": 0,
                    "text": "Print"
                },
                {
                    "boundingBox": {
                        "vertices": [
                            {
                                "x": 243,
                                "y": 49
                            },
                            {
                                "x": 341,
                                "y": 52
                            },
                            {
                                "x": 340,
                                "y": 105
                            },
                            {
                                "x": 241,
                                "y": 102
                            }
                        ]
                    },
                    "confidence": 0.9989913157886589,
                    "id": 1,
                    "text": "the"
                },
                {
                    "boundingBox": {
                        "vertices": [
                            {
                                "x": 368,
                                "y": 52
                            },
                            {
                                "x": 553,
                                "y": 51
                            },
                            {
                                "x": 553,
                                "y": 105
                            },
                            {
                                "x": 368,
                                "y": 105
                            }
                        ]
                    },
                    "confidence": 0.9890200556796326,
                    "id": 2,
                    "text": "words"
                },
                {
                    "boundingBox": {
                        "vertices": [
                            {
                                "x": 214,
                                "y": 131
                            },
                            {
                                "x": 470,
                                "y": 149
                            },
                            {
                                "x": 467,
                                "y": 206
                            },
                            {
                                "x": 210,
                                "y": 188
                            }
                        ]
                    },
                    "confidence": 0.9933670202605895,
                    "id": 3,
                    "text": "hello,"
                },
                {
                    "boundingBox": {
                        "vertices": [
                            {
                                "x": 527,
                                "y": 145
                            },
                            {
                                "x": 748,
                                "y": 143
                            },
                            {
                                "x": 749,
                                "y": 192
                            },
                            {
                                "x": 527,
                                "y": 194
                            }
                        ]
                    },
                    "confidence": 0.986053896846349,
                    "id": 4,
                    "text": "world"
                }
            ]
        }
    ],
    "stored": true,
    "text": "Print the words \nhello, world"
}

On this page