Use a fuzzy duplicate check to ensure that no (almost) similar records exist in a table.
To set up a fuzzy duplicate check rule, use a duplicate check of type Fuzzy matching. If the desired Fuzzy matching duplicate check does not exist, set up a new Fuzzy matching duplicate check.
A Fuzzy matching duplicate check checks for duplicates applying fuzzy logic. A Fuzzy matching duplicate check compares, for a selected record, several field values with the values of the same fields of other records. Based on the comparison, a duplicate score is calculated.
On the duplicate check, you define:
Duplicate check on CustTable
Threshold: 50%
Table name | Datasource name | Field | Field label | Weightage |
---|---|---|---|---|
CustTable | CustTable | AccountNum | Customer account | |
CustCustomerV3Entity | CustCustomerV3Entity | AddressStreet | Street | 1 |
CustCustomerV3Entity | CustCustomerV3Entity | AddressZipCode | ZIP/postal code | 1 |
CustCustomerV3Entity | CustCustomerV3Entity | OrganizationName | Organization name | 6 |
CustCustomerV3Entity | CustCustomerV3Entity | PrimaryContactEmail | Primary email | 3 |
CustCustomerV3Entity | CustCustomerV3Entity | PrimaryContactPhone | Primary phone | 3 |
Calculation examples:
1. | Click Data quality management. |
2. | Click Duplicate checks. |
3. | Click New. |
4. | In the Duplicate check name field, type a value. |
5. | In the Duplicate check type field, select 'Fuzzy matching'. |
6. | Use a dynamic query to define the records that are checked for duplicate values and the fields that can be checked for duplicate values. |
  | In the Query field, enter or select a value. |
7. | Define the threshold that is used to indicate if a record is a possible duplicate. The threshold is expressed in a percentage. Only if the calculated duplicate score for a record is equal to or higher than the threshold, a record is marked as possible duplicate. |
  | In the Threshold % field, enter a number. |
8. | Sub-task: Select the fields to be checked on duplicate values. |
8.1 | In the Fields section, click Add. |
8.2 | In the Select fields dialog, in the available grid, select the desired fields. |
8.3 | Click Add. |
8.4 | Click OK. |
8.5 | You can use a phonetic search algorithm to check on duplicate names that sound similar, for example, John and Jon. |
  | Select the Use phonetic search check box. |
8.6 | If you have selected the Use phonetic search check box, define the applicable phonetic search rule. |
  | In the Phonetic search rule field, enter or select a value. |
8.7 | The weightage expresses the importance of a duplicate value. Express the weightage in a number (with or without decimals). Define the weightage number in such a way that the importance is expressed compared to the other fields. |
  | In the Weightage field, enter a number. |
  |
Note: If you do not define a weightage for a field, the field value is not checked for duplicates. |
8.8 | You can hide a field and its duplicate check results from the Duplicate records found page. For example, if a field shows sensitive data, you can check the field for duplicate values but hide the results of the check. |
  | Select the Hide in results check box. |
9. | Sub-task: Select the fields that can be merged for a duplicate record. |
9.1 | In the Merge fields section, click Add. |
9.2 | On the dialog, in the list, find and select the desired fields. |
9.3 | Click Add (->). |
9.4 | Click OK. |
10. | Sub-task: Activate duplicate check. |
10.1 | If the duplicate check setup is finished, you can make it available for selection on the duplicate check rules. |
  | Select Yes in the Active field. |
  |
Note:
|
11. | Close the page. |
Related to | Notes |
---|---|
Set up duplicate check |
  |