Skip to main content

Fuzzy Match for Excel

The Fuzzy Match add-in provides two custom Excel functions — FUZZY.LOOKUP and FUZZY.MATCH — that let you search for approximate text matches directly in your spreadsheet. Unlike Excel's built-in XLOOKUP and XMATCH, which require exact matches, these functions tolerate typos, abbreviations, alternate spellings, and other variations using configurable string comparison algorithms.

All computation runs locally inside the Excel runtime using Python (leveraging the jellyfish library). Your data never leaves Excel.

AppSource

Quick Start

Inserting from the Taskpane

After installing from AppSource, a Fuzzy Match button appears in the Excel Home ribbon. Click the Fuzzy Match button to open the taskpane. The taskpane lists both available functions.

Function List The taskpane showing both available functions. Click a function name to see its full signature and parameter descriptions.

The taskpane provides a guided insertion workflow:

  1. Click a function name in the list (e.g., =FUZZY.MATCH). The panel expands to show the function signature, description, and a collapsible Details section listing each parameter.

    Function Selected After clicking a function, the taskpane shows its full signature and a ▶ Details toggle for parameter descriptions.

  2. Click Next. The taskpane advances to the parameter input step.

    Insert Parameters Each parameter gets its own input field. Required parameters are marked with a red asterisk (). Optional parameters like threshold and fuzzy_algo can be left blank to use their defaults.*

  3. Click Insert. The formula is written into the active cell and evaluated.

    If you want a static value instead of a formula, check Insert result instead of formula before inserting.

    Workbook Result The demo workbook after the sample result is written into cell F2.

Function Reference

FUZZY.LOOKUP

Like XLOOKUP, but for approximate matches. Searches lookup_array for the closest match to lookup_value and returns the corresponding value from return_array.

Syntax:

=FUZZY.LOOKUP(lookup_value, lookup_array, return_array, [threshold], [fuzzy_algo])
ParameterRequiredTypeDescription
lookup_value✅ YesRange/ValueThe text you are searching for. Can be a single cell or a column range to match multiple values.
lookup_array✅ YesRangeThe column of text values to search within. Must be the same length as return_array.
return_array✅ YesRangeThe column of values to return from. Each row corresponds to the same row in lookup_array.
thresholdOptionalNumber (0–1)Minimum similarity score required to accept a match. Any candidate scoring below this is ignored. Default: 0.6.
fuzzy_algoOptionalEnum stringThe string comparison algorithm. Default: "levenshtein". See the Algorithms section for valid values.

Constraint: lookup_array and return_array must have the same number of rows. If they differ, the function returns an error string.

Return value: An Excel Data Type (Entity). If a match is found, the basicValue is the result. If no match meets the threshold, the basicValue is the string "no match" and all properties (Result, Score, Match) are returned as empty strings.

FUZZY.MATCH

Like XMATCH, but for approximate matches. Searches lookup_array for the closest match to lookup_value and returns its 1-based row index.

Syntax:

=FUZZY.MATCH(lookup_value, lookup_array, [threshold], [fuzzy_algo])
ParameterRequiredTypeDescription
lookup_value✅ YesRange/ValueThe text you are searching for. Can be a single cell or a column range.
lookup_array✅ YesRangeThe column of text values to search within.
thresholdOptionalNumber (0–1)Minimum similarity score. Default: 0.6.
fuzzy_algoOptionalEnum stringThe string comparison algorithm. Default: "levenshtein".

Return value: An Excel Data Type (Entity). If a match is found, the basicValue is the 1-based index. If no match meets the threshold, the basicValue is the string "no match" and all properties (Index, Score, Match, Distance) are returned as empty strings.

Using the index with INDEX(): FUZZY.MATCH is often used together with INDEX to retrieve a value, just like native XMATCH:

=INDEX(B2:B500, FUZZY.MATCH(D2, A2:A500))

Examples & Scenarios

Let's look at how these functions perform with real-world data variations.

Example Data: Product Catalog

Suppose you have a master catalog in columns A and B, with search terms in D:

Name (A)Price (B)Search Term (D)
Apple$1.50Appl
Banana$0.50Bananna
Orange$0.80Ornage
Grapes$2.20Grapes

Scenario 1: Finding a Price with FUZZY.LOOKUP

You have a list of messy search terms in column D and want to retrieve the price.

Search Term (D)Formula in F2Result in F2
Appl=FUZZY.LOOKUP(D2, A2:A5, B2:B5)$1.50
Bananna=FUZZY.LOOKUP(D3, A2:A5, B2:B5)$0.50
Ornage=FUZZY.LOOKUP(D4, A2:A5, B2:B5)$0.80
Grapes=FUZZY.LOOKUP(D5, A2:A5, B2:B5)$2.20

Scenario 2: Getting the Match Index with FUZZY.MATCH

If you need to know which row the match was found on:

Search Term (D)Formula in F2Result in F2
Appl=FUZZY.MATCH(D2, A2:A5)1
Bananna=FUZZY.MATCH(D3, A2:A5)2
Ornage=FUZZY.MATCH(D4, A2:A5)3
Grapes=FUZZY.MATCH(D5, A2:A5)4
Unknown=FUZZY.MATCH("xyz", A2:A5)no match

Scenario 3: Inspecting Match Quality

By default, the functions return a Data Type. You can extract the similarity score to see how confident the match is:

=F2.Score -> returns 0.80 (for "Appl" vs "Apple")
=F2.Match -> returns "Apple"

Scenario 4: Handling "no match" with INDEX

When a match is not found, FUZZY.MATCH returns a Data Type with the value "no match". If you use this inside an INDEX function, it will cause a #VALUE! error because INDEX expects a numeric row index.

You can handle this easily with IFERROR:

=IFERROR(INDEX(B2:B5, FUZZY.MATCH(D2, A2:A5)), "Not Found")

This ensures your spreadsheet remains clean even when no high-confidence matches are found.

Understanding Results

Both functions return Excel Data Types.

  • If a match is found: The basicValue is as follows:
    • FUZZY.MATCH: The 1-based row index of the best match in lookup_array.
    • FUZZY.LOOKUP: The corresponding value from return_array at the matched row. All properties are populated with match details.
  • If no match is found: The basicValue is the string "no match". Crucially, all properties (like .Score, .Match, or .Index) are returned as empty strings rather than being omitted, allowing you to drag formulas down a column without encountering #FIELD! errors.

Properties Returned by FUZZY.MATCH

PropertyTypeDescription
IndexNumberThe 1-based row position of the best match in lookup_array.
ScoreNumberSimilarity score between the lookup_value and the matched string. Range is 0.0 to 1.0.
MatchTextThe actual string from lookup_array that was selected as the best match.
DistanceNumberThe raw edit distance (number of character-level operations). Only present for edit-distance algorithms.

Properties Returned by FUZZY.LOOKUP

PropertyTypeDescription
ResultAnyThe value from return_array at the matched row.
ScoreNumberSimilarity score between lookup_value and the matched string, 0.0 to 1.0.
MatchTextThe actual text from lookup_array that was the best match.

Accessing Properties in Formulas

You can extract any property into adjacent cells using dot notation:

=E2.Score ← returns the similarity score from the FUZZY.LOOKUP result in E2
=E2.Match ← returns the matched text
=F2.Index ← returns the row index from a FUZZY.MATCH result in F2
=F2.Distance ← returns the edit distance (if applicable)

Advanced Features

Multi-Value Inputs (Array Mode)

Both functions accept a range (not just a single cell) for lookup_value. If you provide a column of values, each row is matched independently and results spill down into multiple cells, one per input row. This lets you match an entire column at once:

=FUZZY.LOOKUP(D2:D100, A2:A500, B2:B500)

This returns up to 99 result cells, each containing a full data type with its own Score, Match, and Result.

Algorithms & Logic

Which Algorithm Should I Use?

AlgorithmBest Used ForAdvantage over Levenshtein (Default)Example Scenario
LevenshteinGeneral-purpose text (addresses, titles).(Default baseline) Safely treats all edits (insertions, deletions, substitutions) equally."Microsoft" vs "Microsft" (Missing 'o')
Damerau-LevenshteinManual data entry (names, typed forms).Counts swapped adjacent letters as 1 error instead of 2. Scores typing slips higher than Levenshtein."recieve" vs "receive" (Swapped 'e' and 'i')
JaroShort, single words (first names).Looks at overall character overlap rather than rigid edit paths. Often yields more intuitive similarity scores for short names than Levenshtein."Jonathon" vs "Jonathan"
Jaro-WinklerBrands, names, city names.Gives a significant score boost if strings share the same starting characters. Penalizes end-of-word typos far less than Levenshtein."dixon" vs "dicksonx" (Prefix "di" matches)
HammingFixed-length codes (SKUs, ZIPs).Completely rejects strings of different lengths (score = 0). Prevents Levenshtein from incorrectly matching a 4-digit code to a 5-digit code."90210" vs "90211" (Works)
"90210" vs "9021" (Fails - length mismatch)

Similarity Score Formula

For edit-distance algorithms (Levenshtein, Damerau-Levenshtein, Hamming), the raw distance is normalized into a 0–1 score:

score = 1.0 - (distance / max(len(string1), len(string2)))

For example, "Appl" vs "Apple": Levenshtein distance = 1, max length = 5, so score = 1 - 1/5 = 0.8.

Jaro and Jaro-Winkler produce similarity natively (no distance conversion needed).

Levenshtein — Default

Counts the minimum number of single-character edits — insertions, deletions, and substitutions — to transform one string into another. (Wikipedia)

ExampleDistanceScore
"Appl""Apple"10.80
"colour""color"10.83
"Microsoft""Microsft"10.89

Best for: General-purpose matching. As the "standard" edit distance, Levenshtein measures how many operations (insertions, deletions, or substitutions) are needed to transform one string into another. It is highly versatile and reliable for unstructured text like addresses or descriptions because it treats all single-character changes equally. However, it can be sensitive to length differences in very short strings and does not handle transpositions efficiently, treating a swapped pair like "teh" vs "the" as two separate errors.

Damerau-Levenshtein

Like Levenshtein, but also counts transpositions (swapping two adjacent characters) as a single operation rather than two substitutions. (Wikipedia)

ExampleLevenshteinDamerau-Levenshtein
"teh""the"2 (sub+sub)1 (transpose)
"recieve""receive"21

Best for: Data entry errors where fingers slip on adjacent keys. Damerau-Levenshtein extends the standard Levenshtein algorithm by including transpositions as a first-class operation. Since swapping two adjacent letters is one of the most common human typing errors, this algorithm provides a more intuitive similarity score for "fat-finger" slips on forms or names. It retains all the versatility of Levenshtein while being significantly more robust for manually typed input.

Jaro

Measures similarity based on the number of matching characters and transpositions between two strings. Matching characters must be within floor(max_len / 2) - 1 positions of each other. Score ranges from 0.0 to 1.0 natively. (Wikipedia)

Best for: Short strings where character position matters less than character presence — particularly personal names and short identifiers. Originally developed for record linkage, Jaro similarity focuses on character overlap and order rather than rigid edit paths. It often produces more "natural" feeling scores for short words and allows characters to be slightly "out of place" without failing the match. However, it can occasionally yield false positives for unrelated strings that happen to share many characters.

Jaro-Winkler

Extends Jaro with a prefix bonus: strings that share a common prefix (up to 4 characters) receive a boosted score. This rewards strings that start the same way, as the beginning of a word is statistically less likely to contain a typo than the end. (Wikipedia)

Best for: Names, brand names, and any data where the beginning of the string is more reliable than the end. Jaro-Winkler extends the Jaro algorithm by adding a "prefix bonus," rewarding matches that share the first few characters. This is based on the observation that typing errors are statistically less likely at the beginning of a word. It is highly effective at matching variations like "Microsoft" vs "Micro-soft," though it can be very sensitive to typos in those critical first four characters.

ExampleJaroJaro-Winkler
"MARTHA" vs "MARHTA"0.9440.961
"dixon" vs "dicksonx"0.7670.813
"Jonathon" vs "Jonathan"0.9660.980

Hamming

Counts the number of positions where the characters differ. Requires both strings to be the same length — if they differ, the score is immediately 0.0. (Wikipedia)

Best for: Fixed-length codes such as zip codes, product SKUs, or reference numbers where a string of equal length is always expected. Hamming distance is a simple and fast substitution-only metric that compares characters at the exact same index. While extremely efficient for IDs and SKUs where length consistency is guaranteed, it has zero tolerance for shifts; a single leading space or missing character at the start will shift every subsequent index and result in a similarity score of zero.

ExampleDistanceScore
"apple" vs "applz"10.80
"10032" vs "10042"10.80
"apple" vs "appl"0.00 (length mismatch)

FAQ

What happens when two candidates have the same similarity score?

The function iterates through lookup_array from top to bottom and returns the first candidate that achieves the maximum score. If two entries score identically, the one that appears earlier in the array wins.

Can I use this with non-English text?

Yes. The algorithms operate on Unicode characters, so they work with any script (Latin, Cyrillic, CJK, etc.). However, for languages with complex morphology or character composition, results may be less intuitive — the algorithms are not linguistically aware and treat each character as atomic.

Why does the Distance property disappear for Jaro and Jaro-Winkler results?

Jaro and Jaro-Winkler are similarity-native metrics — they produce a similarity score directly and do not operate on an integer edit distance. There is no meaningful Distance value to report, so the property is omitted from the result card for these algorithms.

Does this work offline?

Yes. All computation happens inside the Excel runtime using a bundled Python environment. Once the add-in is loaded, no network requests are made for formula execution. The only network requirement is the initial load of the add-in itself.

How is the similarity score calculated for Levenshtein?

score = 1.0 - (levenshtein_distance / max(len(lookup_value), len(lookup_array_item))). A score of 1.0 is an exact match; 0.0 means the strings share no structure. The Distance property in the result card gives you the raw integer edit count.

Why is my result showing "no match" when I expect a result?

The most common cause is the threshold. If the best candidate scores below your threshold (default 0.6), the function returns "no match". Check: (1) Is your lookup_value much longer than the entries in lookup_array? (2) Try lowering threshold to 0.4 temporarily to see what score the best candidate actually achieves. (3) Try "jaro_winkler" if your data has short strings — it sometimes scores short strings higher than Levenshtein.