Data Cleaning
Herald has logic to transform the values of risk, coverage, and admin parameters. These rules are applied when submitting values in an application, and when making a submission. These rules are enforced to ensure that every value submitted to an institution is formatted correctly.
Rules
This is the full list of rules that Herald applies to values. Below, you can see the list of rules applied based on the [.h-code-link]input_type[.h-code-link] of a parameter.
[.icon-circle-blue][.icon-circle-blue] This is the full set of rules as of 03/28/2024
Transformation | Description | Examples | Notes |
---|---|---|---|
stringToNumber | Convert string to number | "1,000,000" → 1000000
"$100" → 100 |
Only cleans up correctly-spaced commas and single leading “$”. Does not handle "10,00" or "$$100". |
cleanCharacter | Translates certain Unicode characters to ASCII replacements. | ” (U+201D) → " (U+0022)
‘ (U+201D) → ' (U+0022) |
See the full list of UNICODE Translations. |
handleNone | Handles situations where people submit none of the above along with another selection | ["None of the above", "Foobar"] → ["None of the above"] | |
cleanDomain | Removes http:// and https:// from beginning | "https://foo.com" → "foo.com" | |
cleanDate | Converts dates to match the pattern YYYY-MM-DD | "2023-11-4" → "2023-11-04" | |
trimWhiteSpace | Removes leading and trailing whitespace | " foo " → "foo" | |
cleanPhone | Removes non-digit characters and, if present, a leading 1 (country code) | "+1 (222) 333-4444" → "2223334444" | |
cleanPostalCode | Removes non-digit any anything after first 5 digits | "02144-1234" → "02144" | |
emptyStringToNull | Converts empty string to null | "" → null | ⚠️ This only applies to address and claim_event today. |
Input Types
Based on a parameters input type, the following rules will will be applied.
Input Type | Transformation (s) |
---|---|
integer | stringToNumber |
currency | stringToNumber |
string | trimWhiteSpace cleanCharacter |
trimWhiteSpace | |
domain | cleanDomain trimWhiteSpace |
select_many | handleNone |
phone | cleanPhone |
address.line1 | trimWhiteSpace cleanCharacter |
address.line2 | trimWhiteSpace emptyStringToNull cleanCharacter |
address.line3 | trimWhiteSpace emptyStringToNull cleanCharacter |
address.organization | trimWhiteSpace cleanCharacter |
address.city | trimWhiteSpace cleanCharacter |
address.state | trimWhiteSpace |
address.postal_code | cleanPostalCode |
address.country_code | trimWhiteSpace |
claim_event.description | trimWhiteSpace emptyStringToNull cleanCharacter |
Unicode Translations
UNICODE characters that are translated as a part of the [.h-code]cleanCharacter[.h-code] transformation, and what characters they are replaced with.
Code | Glyph | Name | Rep. Code | Rep. Glyph | Rep. Name |
---|---|---|---|---|---|
U+2013 | – | En Dash | U+002D | - | Hyphen-minus |
U+2014 | — | Em Dash | U+002D | - | Hyphen-minus |
U+2015 | ― | Horizontal Bar | U+002D | - | Hyphen-minus |
U+2018 | ‘ | Left single quotation mark | U+0027 | ' | Apostrophe |
U+2019 | ’ | Right single quotation mark | U+0027 | ' | Apostrophe |
U+201A | ‚ | Single low-9 quotation mark | U+0027 | ' | Apostrophe |
U+201B | ‛ | Single high-reversed-9 quotation mark | U+0027 | ' | Apostrophe |
U+201C | “ | Left double quotation mark | U+0022 | " | Quotation mark |
U+201D | ” | Right double quotation mark | U+0022 | " | Quotation mark |
U+201E | „ | Double low-9 quotation mark | U+0022 | " | Quotation mark |
U+2022 | • | Bullet | U+002D | - | Hyphen-minus |
U+2032 | ′ | Prime | U+0027 | ' | Apostrophe |
U+2033 | ″ | Double prime | U+0022 | " | Quotation mark |
U+00E0 | à | Lowercase A with grave accent | U+0061 | a | Latin Small Letter A |
U+00E1 | á | Lowercase A with acute accent | U+0061 | a | Latin Small Letter A |
U+00E7 | ç | Lowercase C with cedilla | U+0063 | c | Latin Small Letter C |
U+00E8 | è | Lowercase E with grave accent | U+0065 | e | Latin Small Letter E |
U+00E9 | é | Lowercase E with acute accent | U+0065 | e | Latin Small Letter E |
U+00EA | ê | Lowercase E with circumflex | U+0065 | e | Latin Small Letter E |
U+00EB | ë | Lowercase E with diaeresis | U+0065 | e | Latin Small Letter E |
U+00ED | í | Lowercase I with acute accent | U+0069 | i | Latin Small Letter I |
U+00EE | î | Lowercase I with circumflex | U+0069 | i | Latin Small Letter I |
U+00EF | ï | Lowercase I with diaeresis | U+0069 | i | Latin Small Letter I |
U+00F1 | ñ | Lowercase N with tilde | U+006E | n | Latin Small Letter N |
U+00F3 | ó | Lowercase O with acute accent | U+006F | o | Latin Small Letter O |
U+00F4 | ô | Lowercase O with circumflex | U+006F | o | Latin Small Letter O |
U+00FA | ú | Lowercase U with acute accent | U+0075 | u | Latin Small Letter U |
U+00FB | û | Lowercase U with circumflex | U+0075 | u | Latin Small Letter U |
U+00FC | ü | Lowercase U with diaeresis | U+0075 | u | Latin Small Letter U |