Not logged in.  Login/Logout/Register | List snippets | | Create snippet | Upload image | Upload data

31
LINES

< > BotCompany Repo | #1023873 // levenSubstringIC_withIndex - find leven-closest substring in larger string. complexity O(m*n). return rough index of substring & distance

JavaX fragment (include) [tags: use-pretranspiled]

Libraryless. Click here for Pure Java version (302L/3K).

1  
// Calculates the fuzzy match of needle in haystack,
2  
// using a modified version of the Levenshtein distance algorithm.
3  
// ported from: http://ginstrom.com/scribbles/2007/12/01/fuzzy-substring-matching-with-levenshtein-distance-in-python/
4  
5  
static Pair<Int> levenSubstringIC_withIndex(S haystack, S needle) {
6  
  int m = strL(needle), n = strL(haystack);
7  
8  
  // base cases
9  
  if (m == 1) {
10  
    int idx = indexOfIC(haystack, needle);
11  
    ret idx < 0 ? pair(0, 1) : pair(idx, 0);
12  
  }
13  
  if (m == 0)
14  
    ret pair(0, m);
15  
16  
  int[] row1 = new int[n+1], row2 = new int[n+1];
17  
  for i to m: {
18  
    row2[0] = i+1;
19  
    for j to n: {
20  
      int cost = neqic(needle.charAt(i), haystack.charAt(j)) ? 1 : 0;
21  
      row2[j+1] = min3(row1[j+1]+1,   // deletion
22  
                       row2[j]+1,     // insertion
23  
                       row1[j]+cost); // substitution
24  
    }
25  
    int[] temp = row1;
26  
    row1 = row2;
27  
    row2 = temp;
28  
  }
29  
  Pair<Int> p = minOfIntArray_withIndex(row1);
30  
  ret pair(max(0, p.a-m), p.b);
31  
}

Author comment

Began life as a copy of #1023862

download  show line numbers  debug dex  old transpilations   

Travelled to 6 computer(s): bhatertpkbcr, mqqgnosmbjvj, pyentgdyhuwx, pzhvpgtvlbxg, tvejysmllsmz, vouqrxazstgt

No comments. add comment

Snippet ID: #1023873
Snippet name: levenSubstringIC_withIndex - find leven-closest substring in larger string. complexity O(m*n). return rough index of substring & distance
Eternal ID of this version: #1023873/4
Text MD5: 8148251608469a0749475c3b3f2bcfab
Transpilation MD5: a05f7293185a2aac2acce6338168ce6d
Author: stefan
Category: javax / a.i.
Type: JavaX fragment (include)
Public (visible to everyone): Yes
Archived (hidden from active list): No
Created/modified: 2019-07-11 00:10:34
Source code size: 1017 bytes / 31 lines
Pitched / IR pitched: No / No
Views / Downloads: 198 / 300
Version history: 3 change(s)
Referenced in: [show references]