Not logged in.  Login/Logout/Register | List snippets | | Create snippet | Upload image | Upload data

51
LINES

< > BotCompany Repo | #1024242 // WordIndex - index a list by words (case-insensitive)

JavaX fragment (include) [tags: use-pretranspiled]

Libraryless. Click here for Pure Java version (3618L/22K).

1  
transient sclass WordIndex<A> {
2  
  // if this is set, it's used for sorting the values
3  
  // which can speed up lookups
4  
  Comparator<A> valueComparator;
5  
  
6  
  S regexp = "\\w+";
7  
  MultiSetMap<S, A> index = ciMultiSetMap(); // sets are better for lookups
8  
  
9  
  *() {}
10  
  *(Comparator<A> *valueComparator) { index = ciMultiSetMap_innerTreeSet(valueComparator); }
11  
  *(Map<A, S> map) { fOr (A a, S text : map) add(a, text); }
12  
  
13  
  void add(A a, S text) {
14  
    Set<S> words = extractWords(text);
15  
    for (S word : words) addWord(a, word);
16  
  }
17  
  
18  
  void addWord(A a, S word) {
19  
    index.add(word, a);
20  
  }
21  
  
22  
  Set<S> extractWords(S text) {
23  
    ret asCISet(extractWords_list(text));
24  
  }
25  
  
26  
  LS extractWords_list(S text) {
27  
    ret regexpExtractAll(regexp, text);
28  
  }
29  
  
30  
  L<IntRange> wordRanges(S text) {
31  
    ret regexpFindRanges(regexp, text);
32  
  }
33  
  
34  
  Set<A> get(S word) {
35  
    ret index.get(word);
36  
  }
37  
  
38  
  void remove(A a, S text) {
39  
    Set<S> words = extractWords(text);
40  
    for (S word : words) index.remove(word, a);
41  
  }
42  
  
43  
  NavigableSet<S> words() { ret (NavigableSet) keys(index); }
44  
  
45  
  int numWords() { ret index.keysSize(); }
46  
  
47  
  // These methods only work when A = S
48  
  
49  
  void add(S s) { add((A) s, s); }
50  
  void remove(S s) { remove((A) s, s); }
51  
}

download  show line numbers  debug dex  old transpilations   

Travelled to 7 computer(s): bhatertpkbcr, mqqgnosmbjvj, pyentgdyhuwx, pzhvpgtvlbxg, tvejysmllsmz, vouqrxazstgt, xrpafgyirdlv

No comments. add comment

Snippet ID: #1024242
Snippet name: WordIndex - index a list by words (case-insensitive)
Eternal ID of this version: #1024242/21
Text MD5: c57a78c4cfc36d03ca11afecdfddf74b
Transpilation MD5: 7a14d94d83b7c05323acbc0e6d68299c
Author: stefan
Category: javax
Type: JavaX fragment (include)
Public (visible to everyone): Yes
Archived (hidden from active list): No
Created/modified: 2020-07-16 18:43:16
Source code size: 1298 bytes / 51 lines
Pitched / IR pitched: No / No
Views / Downloads: 373 / 856
Version history: 20 change(s)
Referenced in: [show references]