Not logged in.  Login/Logout/Register | List snippets | | Create snippet | Upload image Upload data

87
LINES

< > BotCompany Repo | #1008051 - leven_limitedIC - limited Levenshtein distance, ignoring case

JavaX fragment (include)

static int leven_limitedIC(S left, S right, int threshold) {
  if (--threshold < 0) ret 0;

  int n = l(left);  // length of left
  int m = l(right); // length of right

  // if one string is empty, the edit distance is necessarily the length
  // of the other
  if (n == 0)
    ret m <= threshold ? m : threshold+1;
  if (m == 0)
    ret n <= threshold ? n : threshold+1;
    
  // if length difference is > threshold, just bail out
  if (abs(m-n) > threshold) ret threshold+1;

  if (n > m) {
      // swap the two strings to consume less memory
      final S tmp = left;
      left = right;
      right = tmp;
      n = m;
      m = right.length();
  }

  int[] p = new int[n + 1]; // 'previous' cost array, horizontally
  int[] d = new int[n + 1]; // cost array, horizontally
  int[] tempD; // placeholder to assist in swapping p and d

  // fill in starting table values
  final int boundary = Math.min(n, threshold) + 1;
  for (int i = 0; i < boundary; i++) {
      p[i] = i;
  }
  // these fills ensure that the value above the rightmost entry of our
  // stripe will be ignored in following loop iterations
  Arrays.fill(p, boundary, p.length, Integer.MAX_VALUE);
  Arrays.fill(d, Integer.MAX_VALUE);

  // iterates through t - USE LONG to fix JVM SAFE POINTS.
  for (long _j = 1; _j <= m; _j++) {
      int j = (int) _j;
      final char rightJ = right.charAt(j - 1); // jth character of right
      d[0] = j;

      // compute stripe indices, constrain to array size
      final int min = Math.max(1, j - threshold);
      final int max = j > Integer.MAX_VALUE - threshold ? n : Math.min(
              n, j + threshold);

      // the stripe may lead off of the table if s and t are of different
      // sizes
      if (min > max) {
          return threshold+1;
      }

      // ignore entry left of leftmost
      if (min > 1) {
          d[min - 1] = Integer.MAX_VALUE;
      }

      // iterates through [min, max] in s
      for (int i = min; i <= max; i++) {
          if (equalsIgnoreCase(left.charAt(i - 1), rightJ)) {
              // diagonally left and up
              d[i] = p[i - 1];
          } else {
              // 1 + minimum of cell to the left, to the top, diagonally
              // left and up
              d[i] = 1 + Math.min(Math.min(d[i - 1], p[i]), p[i - 1]);
          }
      }

      // copy current distance counts to 'previous row' distance counts
      tempD = p;
      p = d;
      d = tempD;
  }

  // if p[n] is greater than the threshold, there's no guarantee on it
  // being the correct
  // distance
  if (p[n] <= threshold) {
      return p[n];
  }
  return threshold+1;
}

Author comment

Began life as a copy of #1007596

download  show line numbers  debug dex   

Travelled to 9 computer(s): aoiabmzegqzx, cbybwowwnfue, cfunsshuasjs, gwrvuhgaqvyk, ishqpsrjomds, lpdgvwnxivlt, mqqgnosmbjvj, tslmcundralx, tvejysmllsmz

No comments. add comment

Snippet ID: #1008051
Snippet name: leven_limitedIC - limited Levenshtein distance, ignoring case
Eternal ID of this version: #1008051/4
Text MD5: c334db1122f818c4d91fdcaf23447d72
Author: stefan
Category: javax
Type: JavaX fragment (include)
Public (visible to everyone): Yes
Archived (hidden from active list): No
Created/modified: 2019-08-13 14:58:34
Source code size: 2715 bytes / 87 lines
Pitched / IR pitched: No / No
Views / Downloads: 265 / 274
Version history: 3 change(s)
Referenced in: [show references]