1 | // attractors: map of words (lower case) to number of changes allowed |
2 | static S levenAttract(Map<S, Int> attractors, S text) {
|
3 | L<S> tok = javaTok(text); |
4 | for (int i = 1; i < l(tok); i += 2) {
|
5 | S t = tok.get(i); |
6 | if (!startsWithLetter(t)) continue; |
7 | tok.set(i, levenAttract_word(attractors, t)); |
8 | } |
9 | ret join(tok); |
10 | } |
11 | |
12 | static S levenAttract_word(Map<S, Int> attractors, S word) {
|
13 | word = toLower(word); |
14 | if (attractors.containsKey(word)) ret word; |
15 | S best = null; |
16 | int bestScore = 1000; |
17 | for (S attractor : keys(attractors)) {
|
18 | int limit = attractors.get(attractor); |
19 | int diff = leven_limited(attractor, word, min(limit+1, bestScore)); |
20 | if (diff <= limit && diff < bestScore) {
|
21 | best = attractor; |
22 | bestScore = diff; |
23 | } |
24 | } |
25 | ret or(best, word); |
26 | } |
Began life as a copy of #1008387
download show line numbers debug dex old transpilations
Travelled to 13 computer(s): aoiabmzegqzx, bhatertpkbcr, cbybwowwnfue, cfunsshuasjs, gwrvuhgaqvyk, ishqpsrjomds, lpdgvwnxivlt, mqqgnosmbjvj, pyentgdyhuwx, pzhvpgtvlbxg, tslmcundralx, tvejysmllsmz, vouqrxazstgt
No comments. add comment
| Snippet ID: | #1008388 |
| Snippet name: | levenAttract - funny Levenshtein-based spell corrector. Converts input to lower-case |
| Eternal ID of this version: | #1008388/3 |
| Text MD5: | 549bc57595babfb4255020449d0f4dba |
| Author: | stefan |
| Category: | javax / nl processing |
| Type: | JavaX fragment (include) |
| Public (visible to everyone): | Yes |
| Archived (hidden from active list): | No |
| Created/modified: | 2017-05-11 01:53:23 |
| Source code size: | 809 bytes / 26 lines |
| Pitched / IR pitched: | No / No |
| Views / Downloads: | 800 / 769 |
| Version history: | 2 change(s) |
| Referenced in: | [show references] |