Check for duplicates - Fuzzy matching
|
Data quality administrator
|
A Fuzzy matching duplicate check checks for duplicates applying fuzzy logic. A Fuzzy matching duplicate check compares, for a selected record, several field values with the values of the same fields of other records.
When duplicate values are found in another record:
- The duplicate score is calculated. The duplicate score is calculated based on the field weightage as defined for the duplicate check.
- The calculated duplicate score is compared with the threshold as defined for the duplicate check.
- If the duplicate check is equal to or higher than the threshold, the record is reported as possible duplicate.
Example:
Duplicate check on CustTable
Threshold: 50%
Table name |
Datasource name |
Field |
Field label |
Weightage |
CustTable |
CustTable |
AccountNum |
Customer account |
|
CustCustomerV3Entity |
CustCustomerV3Entity |
AddressStreet |
Street |
1 |
CustCustomerV3Entity |
CustCustomerV3Entity |
AddressZipCode |
ZIP/postal code |
1 |
CustCustomerV3Entity |
CustCustomerV3Entity |
OrganizationName |
Organization name |
6 |
CustCustomerV3Entity |
CustCustomerV3Entity |
PrimaryContactEmail |
Primary email |
3 |
CustCustomerV3Entity |
CustCustomerV3Entity |
PrimaryContactPhone |
Primary phone |
3 |
Calculation examples:
- Duplicate values exist in the Primary email field and in the Primary phone field. The duplicate score is: 6 / 14 * 100 = 42,86. The record is not reported as possible duplicate.
- Duplicate values exist in the Organization name field and the ZIP/postal code field. The duplicate score is: 7 / 14 * 100 = 50. The record is reported as possible duplicate.
Review duplicates
You can review the found duplicates. To solve duplicates, you have several options, for example:
- Change or remove the checked record. To do so, go to the selected record and do the desired action.
- Change or remove the found duplicate record. To do so, go to the found duplicate record and do the desired action.
- Keep the records unchanged.
|