Data Extractions

Extract information from an uploaded file

Documents (often in PDF format) are widely-used within the quote-and-bind workflow and in the insurance industry broadly. For example, many brokers collect application PDFs as part of the quote process. Others may require the insureds to submit documents like financial statements or location summary as part of the application. As a result, in many cases, the data that carriers require to generate quotes is already captured by brokers in these documents.

However, these documents are often not consistently structured or readily machine-readable, making it a challenge to reliably extract data from them.

This feature gives the ability to extract data from documents for easy use with Herald’s API.

This feature is in Beta and is subject to change. If you are interested in accessing it, please reach out to your Herald Customer Success representative.

‍

Extraction use cases

When Herald performs an extraction we take as a source some document (e.g., a PDF) and output structured data. Here are two commonly seen use cases for using Herald’s data extractions endpoints.

Gathering information to pre-fill an application for submission to Herald. In this case, Herald takes as a source a file and extracts all of the information that can be used to populate risk and coverage parameters in Herald’s library. These can then be added into a Herald application and submission.
Extracting data to populate into a client’s system. The goal of this extraction is to store structured data in a client system of record. In these cases, the data may or may not be sent to a Herald submission.

‍

Data Extractions Flow

Herald performs two types of data extraction.

Deterministic extractions: We configure pre-defined sets of mapping rules for how data in a file maps to the library of Herald parameters or into a client’s structured data model.
AI-based extractions: Our AI agent reads and interprets an application and maps the data into the target data model (Herald parameters or a clients data model).

Herald chains these two together within our GET /data_extractions endpoint. Specifically: we maintain a set of document templates that represent rule sets for deterministic extraction. When you send a file to /data_extractions, we first check if it corresponds to one of these document templates. If it does, we apply deterministic extraction. If it does not, we fall back to AI-based extraction.

You can view the list of document templates which are supported for deterministic extraction using the GET /document_templates endpoint.

A Data Extraction can have the following statuses:

Status	Description
available	The data extraction is complete
pending	The data extraction process failed
unsupported	[only possible when AI-enabled extraction is turned off] The file sent for extraction is not recognized for deterministic extraction

‍

What files can Herald extract data from

Format:

For additional file type and format needs, please reach out to Herald’s Customer Success representative.

How data extraction works

Upload a PDF file via POST /files endpoint, and we will return the extracted data mapped to Herald parameters. You will not receive data that cannot be mapped to Herald parameters or fail our input validation.

/data_extractions

Extraction use cases Data Extractions Flow What files can Herald extract data fromHow data extraction works

Tell us a bit about what you’re building to get started