Data Extractions: Step-by-step

Learn how to streamline your workflows by extracting relevant values from PDF

Step 1: Upload your application form

The first step is to send the file to Herald. This can be accomplished using the [.h-endpoint-link]POST /files[.h-endpoint-link] endpoint.

Note that this endpoint requires the request body to be formatted as [.h-code]multipart/form-data[.h-code] instead of [.h-code]application/json[.h-code].

Here is an example to demonstrate this using the [.h-code]curl[.h-code] utility:

  
curl --request POST \
  --url https://sandbox.heraldapi.com/files \
  --header 'Accept: application/json' \
  --header 'Authorization: Bearer api-key-goes-here' \
  --header 'Content-Type: multipart/form-data' \
  --form file=@/path/to/your/file.pdf
  

The endpoint also takes an additional “type” field in the request body, which you can leave null for this workflow

Retrieve the corresponding file id from the API response.

Example response:

POST /files
Copied

{
  "file": {
    "id": "d7c4579a-5450-4e79-bcfc-e918b3c8a564",
    "format": "pdf",
    "file_name": "herald_quote_summary_d7c4579a-5450-4e79-bcfc-e918b3c8a564",
    "text": "Application Prefill",
    "created_at": "2022-08-11",
    "size": 2470,
    "status": "available",
    "associations": null
  }
}
 


Step 2: Create a data extraction

Create a data extraction from the uploaded file via [.h-endpoint-link]POST /data_extractions[.h-endpoint-link] endpoint.

Example request:

POST /data_extractions
Copied

{"file_id": "d7c4579a-5450-4e79-bcfc-e918b3c8a564"}
 

You should expect a response that looks like the following, where status is pending and parameter values are null.  This response indicates that your file is being processed.

Example response:

POST /data_extractions
Copied

{
  "data_extraction": {
    "id": "497f6eca-6276-4993-bfeb-53cbbbba6f08",
    "status": "pending",
    "risk_values": [
      {
        "risk_parameter_id": "rsk_14b8_fein",
        "value": "XX-XXXXXXX"
      }
    ],
    "coverage_values": [
      {
        "coverage_parameter_id": "cvg_wsz8_gl_general_aggregate_limit",
        "value": 2000000
      }
    ],
    "created_at": "2023-10-11T21:51:52.737Z",
    "updated_at": "2023-10-11T21:51:52.737Z"
  }
}
 


Step 3: Get your extraction results

Once the data extraction has been processed (expect <15s wait time), you can send a request to [.h-endpoint-link]GET /data_extractions/{data_extraction_id}[.h-endpoint-link] with the extraction_id to retrieve results. This can be accomplished either via polling intermittently for asynchronous updates or listening on webhooks.

The response body should include the set of risk and coverage values that have been extracted based on all applicable information in the PDF.

Example response:

GET /data_extractions/{data_extraction_id}
Copied

{
  "data_extraction": {
    "id": "7aca2557-584f-40de-bbf0-9bdbee7c9fd5",
    "status": "available",
    "file_id": "7476b790-ed78-4d2e-a320-f238a9584a25",
    "risk_values": [
      {
        "value": "ACME Inc.",
        "risk_parameter_id": "rsk_m4p9_insured_name"
      },
      {
        "risk_parameter_id": "rsk_jsy2_primary_address",
        "value": {
          "line1": "100 Main St",
          "line2": null,
          "line3": null,
          "city": "Somerville",
          "state": "MA",
          "postal_code": "02144",
          "country_code": "USA",
          "organization": null
        }
      },
      {
        "value": 3000000,
        "risk_parameter_id": "rsk_vrb1_total_annual_revenue"
      },
      {
        "value": "yes",
        "risk_parameter_id": "rsk_7ahp_has_domain"
      },
      {
        "value": "example.com",
        "risk_parameter_id": "rsk_dy7r_domain_names"
      }
    ],
    "coverage_values": null,
    "created_at": "2024-10-24T14:57:43.624Z",
    "updated_at": "2024-10-24T14:59:12.031Z"
  }
}
 


[Optional] View all extractions associated with a file

In the event you created multiple extractions on the same file object (for example, when you were unsatisfied with the results from an earlier extraction), you have the option to review all historical extractions associated with a file by querying the [.h-endpoint-link]GET /data_extractions[.h-endpoint-link] endpoint with a file id.

If you expect a large number of extractions associated with the file, you can also include a limit and a pageparameter in the request to specify requirements for pagination.

[Optional] Simulate pre-fill workflow in HeRB

You can also test the above workflow in HeRB by clicking the “Extract data from a file” link.