Not logged in.  Login/Logout/Register | List snippets | | Create snippet | Upload image | Upload data

26
LINES

< > BotCompany Repo | #1008388 // levenAttract - funny Levenshtein-based spell corrector. Converts input to lower-case

JavaX fragment (include)

// attractors: map of words (lower case) to number of changes allowed
static S levenAttract(Map<S, Int> attractors, S text) {
  L<S> tok = javaTok(text);
  for (int i = 1; i < l(tok); i += 2) {
    S t = tok.get(i);
    if (!startsWithLetter(t)) continue;
    tok.set(i, levenAttract_word(attractors, t));
  }
  ret join(tok);
}

static S levenAttract_word(Map<S, Int> attractors, S word) {
  word = toLower(word);
  if (attractors.containsKey(word)) ret word;
  S best = null;
  int bestScore = 1000;
  for (S attractor : keys(attractors)) {
    int limit = attractors.get(attractor);
    int diff = leven_limited(attractor, word, min(limit+1, bestScore));
    if (diff <= limit && diff < bestScore) {
      best = attractor;
      bestScore = diff;
    }
  }
  ret or(best, word);
}

Author comment

Began life as a copy of #1008387

download  show line numbers  debug dex  old transpilations   

Travelled to 13 computer(s): aoiabmzegqzx, bhatertpkbcr, cbybwowwnfue, cfunsshuasjs, gwrvuhgaqvyk, ishqpsrjomds, lpdgvwnxivlt, mqqgnosmbjvj, pyentgdyhuwx, pzhvpgtvlbxg, tslmcundralx, tvejysmllsmz, vouqrxazstgt

No comments. add comment

Snippet ID: #1008388
Snippet name: levenAttract - funny Levenshtein-based spell corrector. Converts input to lower-case
Eternal ID of this version: #1008388/3
Text MD5: 549bc57595babfb4255020449d0f4dba
Author: stefan
Category: javax / nl processing
Type: JavaX fragment (include)
Public (visible to everyone): Yes
Archived (hidden from active list): No
Created/modified: 2017-05-11 01:53:23
Source code size: 809 bytes / 26 lines
Pitched / IR pitched: No / No
Views / Downloads: 575 / 543
Version history: 2 change(s)
Referenced in: #1006654 - Standard functions list 2 (LIVE, continuation of #761)