Spring Hire Sale
Limited Time Deal: Unlock all premium questions for over 30% off
$10.42$7.08
08
:
04
:
48
:
09
Back to Dashboard
Employee Record Matcher
Hard
A data-quality team needs to detect duplicate or near-duplicate employee records in a large HR dataset. Each record is a row in a 2D string array records, where the first row is a header listing field names (always including "id"). Subsequent rows contain field values for each record.
You are also given an array weights where each element is a string formatted as "field:weight". Each field corresponds to a column in the header (never "id"), and each weight is a decimal in [0, 1]. The weights sum to exactly 1.0. A higher weight means that matching on that field is more significant when computing similarity.
...