Not logged in.  Login/Logout/Register | List snippets | | Create snippet | Upload image | Upload data

26
LINES

< > BotCompany Repo | #1008388 // levenAttract - funny Levenshtein-based spell corrector. Converts input to lower-case

JavaX fragment (include)

1  
// attractors: map of words (lower case) to number of changes allowed
2  
static S levenAttract(Map<S, Int> attractors, S text) {
3  
  L<S> tok = javaTok(text);
4  
  for (int i = 1; i < l(tok); i += 2) {
5  
    S t = tok.get(i);
6  
    if (!startsWithLetter(t)) continue;
7  
    tok.set(i, levenAttract_word(attractors, t));
8  
  }
9  
  ret join(tok);
10  
}
11  
12  
static S levenAttract_word(Map<S, Int> attractors, S word) {
13  
  word = toLower(word);
14  
  if (attractors.containsKey(word)) ret word;
15  
  S best = null;
16  
  int bestScore = 1000;
17  
  for (S attractor : keys(attractors)) {
18  
    int limit = attractors.get(attractor);
19  
    int diff = leven_limited(attractor, word, min(limit+1, bestScore));
20  
    if (diff <= limit && diff < bestScore) {
21  
      best = attractor;
22  
      bestScore = diff;
23  
    }
24  
  }
25  
  ret or(best, word);
26  
}

Author comment

Began life as a copy of #1008387

download  show line numbers  debug dex  old transpilations   

Travelled to 13 computer(s): aoiabmzegqzx, bhatertpkbcr, cbybwowwnfue, cfunsshuasjs, gwrvuhgaqvyk, ishqpsrjomds, lpdgvwnxivlt, mqqgnosmbjvj, pyentgdyhuwx, pzhvpgtvlbxg, tslmcundralx, tvejysmllsmz, vouqrxazstgt

No comments. add comment

Snippet ID: #1008388
Snippet name: levenAttract - funny Levenshtein-based spell corrector. Converts input to lower-case
Eternal ID of this version: #1008388/3
Text MD5: 549bc57595babfb4255020449d0f4dba
Author: stefan
Category: javax / nl processing
Type: JavaX fragment (include)
Public (visible to everyone): Yes
Archived (hidden from active list): No
Created/modified: 2017-05-11 01:53:23
Source code size: 809 bytes / 26 lines
Pitched / IR pitched: No / No
Views / Downloads: 576 / 545
Version history: 2 change(s)
Referenced in: [show references]