On duplicate checks of type 'Fuzzy matching', you can apply phonetic search algorithms. You can use a phonetic search algorithm to check on duplicate names that sound similar, for example, John and Jon.

Setup

To apply a phonetic search algorithm to a duplicate check, set up a phonetic search rule and link it to a field in a duplicate check.

The supported phonetic search algorithm is Metaphone. You can apply these versions of the Metaphone algorithm:

  • Double metaphone
  • Metaphone 3

Advanced setup

You are advised to start applying phonetic search with the basic setup, that is with the selected Phonetic search algorithm.

Based on testing and experience, you can finetune the phonetic search rule setup by defining:

  • A maximum length: The maximum number of characters for a phonetic search key. The shorter a phonetic search key is, the fuzzier the duplicate check result is.
  • Words to be ignored: You can define the words for which you do not want to create a phonetic search key.

Synchronization of phonetic search keys

A fuzzy duplicate check with phonetic search rules applied, uses phonetic search keys to search for possible duplicate values. For better performance of duplicate checks, the phonetic search keys for existing data must be generated and synchronized regularly. The synchronized phonetic search keys are stored in the DQSPhoneticKey table.

To have up-to-date phonetic search keys, you are advised to synchronize the phonetic search keys several times per day. Regular synchronization of phonetic search keys is required due to:

  • New or changed data on which fuzzy duplicate checks are done.
  • New or changed fuzzy duplicate checks phonetic search setup.

On synchronization, phonetic search keys are created for:

  • Active fuzzy duplicate checks.
  • Only for the table fields with phonetic search setup.

The phonetic search keys are created considering the setup of the applicable phonetic search rule:

  • Phonetic search algorithm.
  • Maximum phonetic search key length.
  • Words to be ignored.

Phonetic search on fuzzy duplicate checks

When you do a fuzzy duplicate check with phonetic search:

  1. For the selected record, for a table field to which phonetic search applies, the field value is translated to a phonetic search key.
  2. This phonetic search key is compared with the phonetic search keys that exist for the same table field in the DQSPhoneticKey table.
  3. If a match is found, the Levenshtein distance is calculated. The Levenshtein distance is calculated based on the real field values, and not based on the phonetic search key. The table field value of the selected record is compared to the value of the found duplicate phonetic search key.
  4. The field weightage, as defined on the duplicate check, is multiplied by the calculated Levenshtein distance. This to determine the actual duplicate weightage of the field value. Example: The Levenshtein distance is 0.6, and the weightage is 6, the considered weightage is 3.6.
  5. The calculated weightage is considered in the duplicate score calculation: [Weightage sum of fields with duplicates] / [Total weightage sum] * 100%.  If the calculated duplicate score for a record is equal to or higher than the threshold, a record is marked as potential duplicate

Provide feedback