Data Cleaning

Herald has logic to transform the values of risk, coverage, and admin parameters. These rules are applied when submitting values in an application, and when making a submission. These rules are enforced to ensure that every value submitted to an institution is formatted correctly.

Rules

This is the full list of rules that Herald applies to values. Below, you can see the list of rules applied based on the [.h-code-link]input_type[.h-code-link] of a parameter.

[.icon-circle-blue][.icon-circle-blue] This is the full set of rules as of 03/28/2024
Transformation Description Examples Notes
stringToNumber Convert string to number "1,000,000"1000000
"$100"100
Only cleans up correctly-spaced commas and single leading “$”. Does not handle "10,00" or "$$100".
cleanCharacter Translates certain Unicode characters to ASCII replacements. (U+201D) → " (U+0022)
(U+201D) → ' (U+0022)
See the full list of UNICODE Translations.
handleNone Handles situations where people submit none of the above along with another selection ["None of the above", "Foobar"]["None of the above"]
cleanDomain Removes http:// and https:// from beginning "https://foo.com""foo.com"
cleanDate Converts dates to match the pattern YYYY-MM-DD "2023-11-4""2023-11-04"
trimWhiteSpace Removes leading and trailing whitespace " foo ""foo"
cleanPhone Removes non-digit characters and, if present, a leading 1 (country code) "+1 (222) 333-4444""2223334444"
cleanState Converts full state names and state codes with malformed capitalization to well formed state codes "mA"MA
"Massachusetts"MA
cleanPostalCode Removes non-digit any anything after first 5 digits "02144-1234""02144"
emptyStringToNull Converts empty string to null ""null ⚠️ This only applies to address and claim_event today.

Input Types

Based on a parameters input type, the following rules will will be applied.

Input Type Transformation (s)
integer stringToNumber
currency stringToNumber
string trimWhiteSpace
cleanCharacter
email trimWhiteSpace
domain cleanDomain
trimWhiteSpace
select_many handleNone
phone cleanPhone
address.line1 trimWhiteSpace
cleanCharacter
address.line2 trimWhiteSpace
emptyStringToNull
cleanCharacter
address.line3 trimWhiteSpace
emptyStringToNull
cleanCharacter
address.organization trimWhiteSpace
cleanCharacter
address.city trimWhiteSpace
cleanCharacter
address.state trimWhiteSpace
cleanState
address.postal_code cleanPostalCode
address.country_code trimWhiteSpace
claim_event.description trimWhiteSpace
emptyStringToNull
cleanCharacter

Unicode Translations

UNICODE characters that are translated as a part of the [.h-code]cleanCharacter[.h-code] transformation, and what characters they are replaced with.

Code Glyph Name Rep. Code Rep. Glyph Rep. Name
U+2013 En Dash U+002D - Hyphen-minus
U+2014 Em Dash U+002D - Hyphen-minus
U+2015 Horizontal Bar U+002D - Hyphen-minus
U+2018 Left single quotation mark U+0027 ' Apostrophe
U+2019 Right single quotation mark U+0027 ' Apostrophe
U+201A Single low-9 quotation mark U+0027 ' Apostrophe
U+201B Single high-reversed-9 quotation mark U+0027 ' Apostrophe
U+201C Left double quotation mark U+0022 " Quotation mark
U+201D Right double quotation mark U+0022 " Quotation mark
U+201E Double low-9 quotation mark U+0022 " Quotation mark
U+2022 Bullet U+002D - Hyphen-minus
U+2032 Prime U+0027 ' Apostrophe
U+2033 Double prime U+0022 " Quotation mark
U+00E0 à Lowercase A with grave accent U+0061 a Latin Small Letter A
U+00E1 á Lowercase A with acute accent U+0061 a Latin Small Letter A
U+00E7 ç Lowercase C with cedilla U+0063 c Latin Small Letter C
U+00E8 è Lowercase E with grave accent U+0065 e Latin Small Letter E
U+00E9 é Lowercase E with acute accent U+0065 e Latin Small Letter E
U+00EA ê Lowercase E with circumflex U+0065 e Latin Small Letter E
U+00EB ë Lowercase E with diaeresis U+0065 e Latin Small Letter E
U+00ED í Lowercase I with acute accent U+0069 i Latin Small Letter I
U+00EE î Lowercase I with circumflex U+0069 i Latin Small Letter I
U+00EF ï Lowercase I with diaeresis U+0069 i Latin Small Letter I
U+00F1 ñ Lowercase N with tilde U+006E n Latin Small Letter N
U+00F3 ó Lowercase O with acute accent U+006F o Latin Small Letter O
U+00F4 ô Lowercase O with circumflex U+006F o Latin Small Letter O
U+00FA ú Lowercase U with acute accent U+0075 u Latin Small Letter U
U+00FB û Lowercase U with circumflex U+0075 u Latin Small Letter U
U+00FC ü Lowercase U with diaeresis U+0075 u Latin Small Letter U