What is Optical Character Recognition (OCR)?
Upstage Document OCR is designed to efficiently detect and recognize text from a wide range of document images, ensuring high accuracy and versatility across various languages and image qualities.
Available models
Model | Release date | Description | Amazon SageMaker |
---|---|---|---|
ocr-2.2.1 | 2024-06-11 | An OCR model specialized for English and Korean, wich additional support for Japanese and Chinese character sets (Hanja, Hanzi, and Kanji). Resilient against real-world images, including wrinkled papers and rotated text. | Document OCR |
Understanding model output
Robustness on real-world documents
Our OCR model is designed to provide robust performance in various document processing scenarios, including rotated images, watermarks, noise, and checkboxes. It accurately detects and recognizes text by training on a high-quality dataset that covers a wide range of scenarios. The model's ability to accurately detect the upper-left corner of word boxes in rotated documents, as well as its ability to ignore watermarks and checkboxes during training, ensures that only meaningful text from the document is extracted. This makes it an ideal solution for businesses and individuals seeking accurate and efficient document processing capabilities.

Utilizing confidence score
Upstage OCR generates a confidence score that measures the likelihood of the recognized text being correct during the character recognition process. This score helps to indicate the accuracy of the OCR system's output. The score is initially created at the character level but is calibrated at the word level to make it more useful for applications. The confidence score can be used to visualize or verify the recognized content, with lower scores highlighting areas that require closer inspection or filtering. This process helps assess the reliability of extracted text and determines if additional verification is needed, ultimately enhancing overall accuracy and user trust.


Requirements
- Supported file formats: JPEG, PNG, BMP, PDF, TIFF, HEIC, DOCX, PPTX, XLSX
- Maximum file size: 50MB
- Maximum number of pages per file: 30 pages (For files exceeding 30 pages, the first 30 pages are processed)
- Maximum pixels per page: 100,000,000 pixels. For non-image files, the pixel count is determined after converting to images at a standard of 300 DPI.
- Supported character sets: Alphanumeric, Hangul, and Hanja are supported. Hanzi and Kanji are in beta versions, indicating that they are available but not fully supported.
- Text size: Optimized for text size that is approximately under 30% of the page size. Examples that don't meet these standards are considered bad examples, and could result in a response error.
Hanja, Hanzi, and Kanji are writing systems based on Chinese characters used in Korean, Chinese, and Japanese writing systems. Despite sharing similarities, they possess distinct visual representations, pronunciations, meanings, and usage conventions within their respective linguistic contexts. For more information, see this article.
API reference
Authorization
Authorization
RequiredBearer <token>
In: header
Request Body
multipart/form-data
Requireddocument
Requiredfile
The document file to be processed.
schema
string
An optional parameter that specifies the response format. If set, the output is converted to the format of the corresponding OCR API. Valid values are "clova", "google" or None. All values are provided exclusively by Upstage models, and are irrelevant with each service provider. Default value is None.
model
string
An optional parameter that specifices the model version to be used. Available models can be found at the top of this document.
Success
Example
Request
