Not logged in.  Login/Logout/Register | List snippets | | Create snippet | Upload image | Upload data

51
LINES

< > BotCompany Repo | #1024242 // WordIndex - index a list by words (case-insensitive)

JavaX fragment (include) [tags: use-pretranspiled]

Libraryless. Click here for Pure Java version (3618L/22K).

transient sclass WordIndex<A> {
  // if this is set, it's used for sorting the values
  // which can speed up lookups
  Comparator<A> valueComparator;
  
  S regexp = "\\w+";
  MultiSetMap<S, A> index = ciMultiSetMap(); // sets are better for lookups
  
  *() {}
  *(Comparator<A> *valueComparator) { index = ciMultiSetMap_innerTreeSet(valueComparator); }
  *(Map<A, S> map) { fOr (A a, S text : map) add(a, text); }
  
  void add(A a, S text) {
    Set<S> words = extractWords(text);
    for (S word : words) addWord(a, word);
  }
  
  void addWord(A a, S word) {
    index.add(word, a);
  }
  
  Set<S> extractWords(S text) {
    ret asCISet(extractWords_list(text));
  }
  
  LS extractWords_list(S text) {
    ret regexpExtractAll(regexp, text);
  }
  
  L<IntRange> wordRanges(S text) {
    ret regexpFindRanges(regexp, text);
  }
  
  Set<A> get(S word) {
    ret index.get(word);
  }
  
  void remove(A a, S text) {
    Set<S> words = extractWords(text);
    for (S word : words) index.remove(word, a);
  }
  
  NavigableSet<S> words() { ret (NavigableSet) keys(index); }
  
  int numWords() { ret index.keysSize(); }
  
  // These methods only work when A = S
  
  void add(S s) { add((A) s, s); }
  void remove(S s) { remove((A) s, s); }
}

download  show line numbers  debug dex  old transpilations   

Travelled to 7 computer(s): bhatertpkbcr, mqqgnosmbjvj, pyentgdyhuwx, pzhvpgtvlbxg, tvejysmllsmz, vouqrxazstgt, xrpafgyirdlv

No comments. add comment

Snippet ID: #1024242
Snippet name: WordIndex - index a list by words (case-insensitive)
Eternal ID of this version: #1024242/21
Text MD5: c57a78c4cfc36d03ca11afecdfddf74b
Transpilation MD5: 7a14d94d83b7c05323acbc0e6d68299c
Author: stefan
Category: javax
Type: JavaX fragment (include)
Public (visible to everyone): Yes
Archived (hidden from active list): No
Created/modified: 2020-07-16 18:43:16
Source code size: 1298 bytes / 51 lines
Pitched / IR pitched: No / No
Views / Downloads: 372 / 854
Version history: 20 change(s)
Referenced in: #1029004 - DoubleWordIndex - words indexed forwards and backwards
#1029068 - WordIndexWithBitSets - index a list by words (case-insensitive by default)
#1034167 - Standard Classes + Interfaces (LIVE, continuation of #1003674)