To set up a fuzzy duplicate check rule, use a duplicate check of type Fuzzy matching. If the desired Fuzzy matching duplicate check does not exist, set up a new Fuzzy matching duplicate check.

A Fuzzy matching duplicate check checks for duplicates applying fuzzy logic. A Fuzzy matching duplicate check compares, for a selected record, several field values with the values of the same fields of other records. Based on the comparison, a duplicate score is calculated.

On the duplicate check, you define:

  • Which dynamic query is used. The dynamic query defines the records that are checked for duplicate values and the fields that can be checked for duplicate values.
    In the dynamic query, the first defined table must be the main table, on which you want to check for duplicates. This table must be the same table that you define in the duplicate check header, in the Table name field.
    A form can use several related tables. In this case, in the dynamic query, use data entities for the next table records. For each of the data entity table records, define the applicable parent. Use the data entities to select the fields that you want to check for duplicates.
  • The fields which values are checked for duplicates. You only can use fields that are defined in the dynamic query.
  • For each field, the weightage. The weightage expresses the importance of a duplicate value. Express the weightage in a number (with or without decimals). Define the weightage number in such a way that the importance is expressed compared to the other fields. If you do not define a weightage for a field, the field value is not checked for duplicates.
  • The threshold for a record to be marked as duplicate. The threshold is expressed in a percentage. Only if the calculated duplicate score for a record is equal to or higher than the threshold, a record is marked as potential duplicate.
    The duplicate score is calculated in this way: [Weightage sum of fields with duplicates] / [Total weightage sum] * 100%

Example:

Duplicate check on CustTable

Threshold: 50%

Table name Datasource name Field Field label Weightage
CustTable CustTable AccountNum Customer account  
CustCustomerV3Entity CustCustomerV3Entity AddressStreet Street 1
CustCustomerV3Entity CustCustomerV3Entity AddressZipCode ZIP/postal code 1
CustCustomerV3Entity CustCustomerV3Entity OrganizationName Organization name 6
CustCustomerV3Entity CustCustomerV3Entity PrimaryContactEmail Primary email 3
CustCustomerV3Entity CustCustomerV3Entity PrimaryContactPhone Primary phone 3

Calculation examples:

  • Duplicate values exist in the Primary email field and in the Primary phone field. The duplicate score is: 6 / 14 * 100 = 42,86. The record is not reported as possible duplicate.
  • Duplicate values exist in the Organization name field and the ZIP/postal code field. The duplicate score is: 7 / 14 * 100 = 50. The record is reported as possible duplicate.


Standard procedure

1. Click Data quality management.
2. Click Duplicate checks.
3. Click New.
4. In the Duplicate check name field, type a value.
5. In the Duplicate check type field, select 'Fuzzy matching'.
6. In the Query field, enter or select a value.
7. In the Threshold % field, enter a number.
8. Sub-task: Add fields that are checked for duplicates.
  8.1 Expand the Fields section.
  8.2 Click Add.
  8.3 In the Select fields dialog, in the available grid, select the desired fields.
  8.4 Click Add.
  8.5 Click OK.
  8.6 Select the Use phonetic search check box.
  8.7 In the Phonetic search rule field, enter or select a value.
  8.8 In the Weightage field, enter a number.
  8.9 You can hide a field and its duplicate check results from the Duplicate records found page. For example, if a field shows sensitive data, you can check the field for duplicate values but hide the results of the check.
  Select the Hide in results check box.
Related to Notes

Set up duplicate check

 

See also

Provide feedback