Not logged in.  Login/Logout/Register | List snippets | | Create snippet | Upload image | Upload data

25
LINES

< > BotCompany Repo | #1023862 // levenSubstringIC - find leven-closest substring in larger string. complexity O(m*n)

JavaX fragment (include) [tags: use-pretranspiled]

Libraryless. Click here for Pure Java version (211L/2K).

// Calculates the fuzzy match of needle in haystack,
// using a modified version of the Levenshtein distance algorithm.
// ported from: http://ginstrom.com/scribbles/2007/12/01/fuzzy-substring-matching-with-levenshtein-distance-in-python/

static int levenSubstringIC(S haystack, S needle) {
  int m = strL(needle), n = strL(haystack);

  // base cases
  if (containsIgnoreCase(haystack, needle)) ret 0;

  int[] v0 = new int[n+1], v1 = new int[n+1];
  for i to m: { // iterate over needle
    v1[0] = i+1;
    for j to n: { // iterate over haystack
      int cost = neqic(needle.charAt(i), haystack.charAt(j)) ? 1 : 0;
      v1[j+1] = min3(v1[j]+1,       // insertion
                       v0[j+1]+1,   // deletion
                       v0[j]+cost); // substitution
    }
    int[] temp = v0;
    v0 = v1;
    v1 = temp;
  }
  ret minOfIntArray(v0);
}

download  show line numbers  debug dex  old transpilations   

Travelled to 6 computer(s): bhatertpkbcr, mqqgnosmbjvj, pyentgdyhuwx, pzhvpgtvlbxg, tvejysmllsmz, vouqrxazstgt

No comments. add comment

Snippet ID: #1023862
Snippet name: levenSubstringIC - find leven-closest substring in larger string. complexity O(m*n)
Eternal ID of this version: #1023862/18
Text MD5: 9b4b6ac905b2e9f6c10602294e1a53f2
Transpilation MD5: af3329563c64aeeca709f9d60c287c34
Author: stefan
Category: javax / a.i.
Type: JavaX fragment (include)
Public (visible to everyone): Yes
Archived (hidden from active list): No
Created/modified: 2019-11-11 14:59:59
Source code size: 878 bytes / 25 lines
Pitched / IR pitched: No / No
Views / Downloads: 322 / 504
Version history: 17 change(s)
Referenced in: #1006654 - Standard functions list 2 (LIVE, continuation of #761)
#1023873 - levenSubstringIC_withIndex - find leven-closest substring in larger string. complexity O(m*n). return rough index of substring & distance
#1023884 - levenSubstringIC_limited - levenSubstringIC with distance limit. not optimized yet
#1026033 - levenWithSwapsSubstringIC - find damerau-Levenshtein-closest substring in larger string. complexity O(m*n)