A Fuzzy matching duplicate check checks for duplicates applying fuzzy logic. A Fuzzy matching duplicate check compares, for a selected record, several field values with the values of the same fields of other records.


Data quality administrator Data quality administrator The data quality administrator (DQSDataQualityAdministrator) can set up and maintain: Data quality policies Data quality studio parameters Data quality studio general setup Start Start Check for duplicates - Fuzzy matching Check for duplicates - Fuzzy matching A Fuzzy matching duplicate check checks for duplicates applying fuzzy logic. A Fuzzy matching duplicate check compares, for a selected record, several field values with the values of the same fields of other records. When duplicate values are found in another record: The duplicate score is calculated. The duplicate score is calculated based on the field weightage as defined for the duplicate check. The calculated duplicate score is compared with the threshold as defined for the duplicate check. If the duplicate check is equal to or higher than the threshold, the record is reported as possible duplicate. Example: Duplicate check on CustTable Threshold: 50% Table name Datasource name Field Field label Weightage CustTable CustTable AccountNum Customer account   CustCustomerV3Entity CustCustomerV3Entity AddressStreet Street 1 CustCustomerV3Entity CustCustomerV3Entity AddressZipCode ZIP/postal code 1 CustCustomerV3Entity CustCustomerV3Entity OrganizationName Organization name 6 CustCustomerV3Entity CustCustomerV3Entity PrimaryContactEmail Primary email 3 CustCustomerV3Entity CustCustomerV3Entity PrimaryContactPhone Primary phone 3 Calculation examples: Duplicate values exist in the Primary email field and in the Primary phone field. The duplicate score is: 6 / 14 * 100 = 42,86. The record is not reported as possible duplicate. Duplicate values exist in the Organization name field and the ZIP/postal code field. The duplicate score is: 7 / 14 * 100 = 50. The record is reported as possible duplicate. Review duplicates You can review the found duplicates. To solve duplicates, you have several options, for example: Change or remove the checked record. To do so, go to the selected record and do the desired action. Change or remove the found duplicate record. To do so, go to the found duplicate record and do the desired action. Keep the records unchanged. Procedure 1. Go to the form from where you want to check if duplicates exist for a record. 2. In the list, find and select the desired record. 3. Start the fuzzy duplicate check. Only one duplicate check is done. This is the first found active Fuzzy matching duplicate check that: Applies to the main table of the form. Is used in a duplicate check rule of an active data quality policy. Click Check for duplicates. Note: On the applicable form, on the Action Pane, depending on the setup, the Check for duplicates button can be shown: On the 'Data quality' tab, in the 'Duplicate check' button group. As a separate button. On an existing action pane tab, in the 'Duplicate check' button group. 4. Close the page. Notes If a record is checked for fuzzy duplicates, you can view the fuzzy duplicate check history for that record. The history shows all fuzzy duplicate checks that are done for the record. To view the fuzzy duplicate check history, go to the applicable form, select the desired record, and click History. On the action pane, depending on the setup, the History button can be shown: On the 'Data quality' tab, in the 'Duplicate check' button group. As a button. On an existing action pane tab, in the 'Duplicate check' button group. If you run a quality assessment, duplicate checks of type 'Fuzzy matching' are done as well. If a duplicate record is found, in the Quality assessment results, a warning is shown for the record. The message shows the number of duplicate records found. End End

Activities

Name Responsible Description

Check for duplicates - Fuzzy matching

Data quality administrator

A Fuzzy matching duplicate check checks for duplicates applying fuzzy logic. A Fuzzy matching duplicate check compares, for a selected record, several field values with the values of the same fields of other records.

When duplicate values are found in another record:

  1. The duplicate score is calculated. The duplicate score is calculated based on the field weightage as defined for the duplicate check.
  2. The calculated duplicate score is compared with the threshold as defined for the duplicate check.
  3. If the duplicate check is equal to or higher than the threshold, the record is reported as possible duplicate.

Example:

Duplicate check on CustTable

Threshold: 50%

Table name Datasource name Field Field label Weightage
CustTable CustTable AccountNum Customer account  
CustCustomerV3Entity CustCustomerV3Entity AddressStreet Street 1
CustCustomerV3Entity CustCustomerV3Entity AddressZipCode ZIP/postal code 1
CustCustomerV3Entity CustCustomerV3Entity OrganizationName Organization name 6
CustCustomerV3Entity CustCustomerV3Entity PrimaryContactEmail Primary email 3
CustCustomerV3Entity CustCustomerV3Entity PrimaryContactPhone Primary phone 3

Calculation examples:

  • Duplicate values exist in the Primary email field and in the Primary phone field. The duplicate score is: 6 / 14 * 100 = 42,86. The record is not reported as possible duplicate.
  • Duplicate values exist in the Organization name field and the ZIP/postal code field. The duplicate score is: 7 / 14 * 100 = 50. The record is reported as possible duplicate.

Review duplicates

You can review the found duplicates. To solve duplicates, you have several options, for example:

  • Change or remove the checked record. To do so, go to the selected record and do the desired action.
  • Change or remove the found duplicate record. To do so, go to the found duplicate record and do the desired action.
  • Keep the records unchanged.

Provide feedback