Employee Record Matcher - Hard Interview Question

Spring Hire Sale

Limited Time Deal: Unlock all premium questions for over 30% off

$10.42$7.08

Employee Record Matcher

Hard

A data-quality team needs to detect duplicate or near-duplicate employee records in a large HR dataset. Each record is a row in a 2D string array records, where the first row is a header listing field names (always including "id"). Subsequent rows contain field values for each record.

You are also given an array weights where each element is a string formatted as "field:weight". Each field corresponds to a column in the header (never "id"), and each weight is a decimal in [0, 1]. The weights sum to exactly 1.0. A higher weight means that matching on that field is more significant when computing similarity. ...