Azure AI Document Intelligence now previewing field extraction with Generative AI and more

Vinod_Kurpad · Aug 19, 2024

Azure AI Document Intelligence is an AI service that provides you with a simple set of APIs and a studio experience to effectively extract content, structure like tables, paragraphs sections and figures, and fields, predefined for specific document types or custom fields for any document or form. With the Document Intelligence APIs, you can effectively split, classify, extract fields or content from any document or form at scale.

Document Intelligence continues to provide more value for your document processing needs. We recently announced a reduction in price for the custom extraction model from $50/1000 pages to $30/1000 pages, we also lowered the price for commitment tiers for volume discounts. Learn more about these pricing changes and how you can maximize your value from using Document Intelligence.

We now have a new preview API in public preview that adds new value with new and improved features!

Document field extraction with Generative AI

Document processing with Generative AI typically involves the RAG (Retrieval Augmented Generation) pattern for tasks like field extraction. Managing the complexities of RAG like chunking and vectorizing documents, building and managing a search index and prompt tuning are now no longer needed for the field extraction task!

With the new custom field extraction capability, simply define your schema, allow the model to extract the fields you need. The Generative AI based model provides a simplified experience with tools to improve the predicted results with corrections, if needed. Once you have a model built, you can then integrate the model into your document processing workflows. Model outputs include grounded results and confidence scores, providing the guardrails to ensure the extracted values align with your business scenario and existing tools and processes.

Try out the new Generative AI based field extraction model in the AI Studio today. Follow the quickstart to build a model for any of your documents. This new prebuilt capability is currently available in the North Central US region.

New Prebuilt models

While custom models offer the flexibility of training a model to extract a specific schema for any document type you need to process, prebuilt models offer the simplicity and cost benefit of extracting a defined schema from a specific document type. Document Intelligence continues to expand prebuilt models supporting the financial services, tax and mortgage scenarios. With new models for common document types including bank statements, pay stubs, checks and mortgage forms 1004 and 1005, Document Intelligence makes processing these common document types easy. Adding a unified prebuilt model for all tax forms further simplifies the challenge with classifying and analyzing documents. Try any of the new prebuilt models in the Document Intelligence Studio.

Searchable PDF output

Analysis results from Document Intelligence has always been JSON, with the current preview API, we’re now adding a Searchable PDF output. Start with a PDF file, analyze the document with the prebuilt read models and generate a searchable PDF response that you can render in your apps, support copy and paste and search. The searchable PDF currently works only with PDF input files and will be extended to include images. Try the new searchable PDF response, but simply adding an output=PDF query string parameter to the input request. Learn more about searchable PDF.

Layout update for charts and figures

Figure processing has been enhanced in this release by providing an option to get the figure from each document that figures are extracted from. Figures follow dot notation where each figure is indexed by page and id followed by figures within the page, so the first figure on the first page would be 1.1. Looking at the Layout response below you see the figures section. To retrieve this specific figure, you can call the get results API again and add the figures/1.1 path to the GET analyze response call to get the figure object. This is useful when you are processing a document with LLMs and need to specifically process figures. A common pattern is to convert each figure like a pie chart into a table in markdown format that can be embedded back into the text. Learn more about the updated Layout API with figures today.

Batch API

The new batch API simplifies the process of processing large volumes of documents. By providing a storage location or a list of files to work with, the Batch API makes it easy to process large volumes of files with a single API call. The batch API status enables, checking for completion, identifying failed or skipped files. Try the new batch API to simplify the processing of large volumes of documents.

Unified classification and extraction with the updated model compose

Composing multiple custom models into a single model with model compose, you were able to classify and analyze a document in a single API call. With the addition of the explicit model classification API, this now became two calls, first the classification followed by the analysis or extraction. The new model compose brings this back to a single API while retaining the benefits of an explicit classification model. With model compose you can now classify and split an input file into multiple documents, analyze each document with the appropriate analysis model, use confidence-based routing and extend the analysis calls with add-on features like query fields. The updated model compose makes it easy to process large binders with multiple documents or scenarios where you don’t know the type of file being processed. Try the updated model compose in the Document Intelligence Studio today or learn more about composed models.

OCR model updates

This release includes updates to the OCR model for improved text extraction for a variety of scenarios including dense forms and scanned documents with lower resolution.

Get started with the preview features!

The preview updates are available in only a few select regions that include North Central US, East US, West US2 and West Europe. The API version is 2024-07-31-preview. The generative AI based field extraction is only available in North Central US.

Visit the what's new page to learn more about all the new capabilities in Azure AI Document Intelligence.

Continue reading...

Azure AI Document Intelligence now previewing field extraction with Generative AI and more

Vinod_Kurpad

Document field extraction with Generative AI ​

New Prebuilt models​

​

Searchable PDF output​

Layout update for charts and figures​

​

Batch API​

Unified classification and extraction with the updated model compose​

OCR model updates​