Document Intelligence preview adds query fields, new prebuilt models and other improvements

BemaBonsu · Nov 16, 2023

Announcing Azure AI Document Intelligence preview

Azure AI Document Intelligence, formerly known as Form Recognizer, is an AI service for all your document understanding needs. With the latest update Azure AI Document Intelligence is previewing new features such as markdown output for semantic chunking in the RAG pattern with large language models, language expansion, field expansion and new prebuilt models. We are also happy to announce structure analysis updates, quality improvements to tables, reading order and section headings.

Azure AI Document Intelligence has three categories of models, general extraction models which include Read and Layout to extract content, document structure, and fields from any form or document. Prebuilt models for extracting a defined schema for a specific document type, examples of prebuilt models include invoice, W-2, ID document and many more. Finally, custom models for classifying and extracting fields from document types specific to your scenario or use case.

The current generally available (GA) version of the service is being enhanced with a new and updated set of preview capabilities.

What is new in Preview?

Form Recognizer is now Azure AI Document Intelligence

We are thrilled to announce a significant evolution in our product's journey. As we continue to integrate broader capabilities, it became clear that a new name was needed to reflect our service's full potential and vision. This change encapsulates our commitment to provide a comprehensive, intuitive, and seamless experience in document processing and analysis. Azure AI Document Intelligence, continues to provide the trusted functionalities you rely on, as Azure AI Document Intelligence, we will continue to innovate on document scenarios. We are excited for you to join us as we continue to redefine what's possible with document analytics!

Try out the new Document Intelligence Studio to experience all the new and updated capabilities.

Layout

Often, the challenge in document processing lies in the complexity of layouts that traditional models overlook. The layout model continues to assist in resolving these complexities with a list of notable advancements. Table recognition, region grouping, and reading order have all seen AI quality improvements. But that's not all; ayout now extends format support to Office document types like Word, PowerPoint, Excel and HTML inputs, extracting checkboxes, tables, and paragraphs, thereby broadening the scope of document types manageable within your workflow.

Markdown output

Markdown is a popular input used to enable semantic chunking in RAG (Retrieval augmented generation). In this preview we are adding markdown as an output option for layout to make data pre-processing for LLMs easier than ever with Azure Document Intelligence. Use the markdown content from Layout to split documents based on paragraph boundaries, create specific chunks for tables and fine-tune your chunking strategy to improve the quality of the generated responses.

Key value pairs

The general document model identified key value pairs from documents in addition to the layout results. This release streamlines the APIs by removing the general document model and making the key value pairs output available as an optional add-on feature in the layout model. This add-on feature is ideal for identifying clearly defined key value pairs or form fields.

The general document model will continue to be available in previous versions of the API. Moving forward it is recommended that you use the new key-value pair add on feature for this use case.

Try the updated layout model with key value pairs in the Document Intelligence Studio.

Extend model schema with Query Fields

Sometimes the fields you need to effectively process a document are not recognized as a field by the prebuilt models or key value pairs. With the new query fields capability, Azure Document intelligence now leverages powerful models to identify and extract the specific fields you require to process your documents.

Query fields are supported as an add-on feature with Layout, prebuilt models, and custom extraction models.

Query fields on invoices

In the below example you can see how query fields are used to extract two additional fields that are not currently supported in the invoice prebuilt: “Requisitioner” and “Shipped Via”. Using query fields these additional fields are easily extracted and added to the prebuilt output.

Try the new query fields feature in the Document Intelligence Studio.

Prebuilt Models

Prebuilt models offer an out-of-the-box solution that provides the information you want quickly and with minimal effort. There have been various improvements to many of the prebuilt models we offer.

New 1099 tax form

We continue to expand the supported US tax forms with the introduction of the 1099 tax form and its many variations including: A, B, C, CAP, DIV, G, H, INT, K, LS, LTC, MSC, NEC, OID, PTR, Q, QA, R, S, SA, SB.

Try the new Us Tax 1099 model in the Document Intelligence Studio.

Invoices

The invoice model has added support for KVK number for the Dutch locale and support for B-pay in the Australian locale. Along with these new fields there have been many AI quality improvements across many languages that the service supports.

Try these new fields in the invoice prebuilt model

Health Insurance cards: The health insurance card model adds fields to support Medicare and Medicaid information as well as AI quality improvements.

Try these new fields in the Health Insurance Card model

Language Expansion

Custom Classification

Custom classification models are deep-learning-model types that combine layout and language features to accurately detect and identify documents you process within your application. Until this release, this model was only available for English language documents, with this release, custom classification models now support documents in 35 languages.

Classification model supports document splitting for scenarios where a single file contains multiple logical documents. The new splitMode parameter added to the API provides finer grain control on the splitting behavior.

Try out the updated custom classification model in the Document Intelligence Studio

Read

Navigating through multilingual documents is a breeze with the Read API latest enhancement, now offering support for additional handwritten text in Russian (RU), Arabic (AR), and Thai (TH).

See the complete list of over 300 languages supported or try the updated read model in the Document Intelligence Studio.

Get started with the preview features!

The preview updates are available in only a few select regions that include US East, West US2 and West Europe. The API version is 2023-10-31-preview. Check for the updated SDK in the documentation.

Visit the what's new page to learn more about all the new capabilities in Azure AI Document Intelligence

Document Intelligence preview adds query fields, new prebuilt models and other improvements

BemaBonsu