Libraryless. Click here for Pure Java version (3618L/22K).
transient sclass WordIndex<A> { // if this is set, it's used for sorting the values // which can speed up lookups Comparator<A> valueComparator; S regexp = "\\w+"; MultiSetMap<S, A> index = ciMultiSetMap(); // sets are better for lookups *() {} *(Comparator<A> *valueComparator) { index = ciMultiSetMap_innerTreeSet(valueComparator); } *(Map<A, S> map) { fOr (A a, S text : map) add(a, text); } void add(A a, S text) { Set<S> words = extractWords(text); for (S word : words) addWord(a, word); } void addWord(A a, S word) { index.add(word, a); } Set<S> extractWords(S text) { ret asCISet(extractWords_list(text)); } LS extractWords_list(S text) { ret regexpExtractAll(regexp, text); } L<IntRange> wordRanges(S text) { ret regexpFindRanges(regexp, text); } Set<A> get(S word) { ret index.get(word); } void remove(A a, S text) { Set<S> words = extractWords(text); for (S word : words) index.remove(word, a); } NavigableSet<S> words() { ret (NavigableSet) keys(index); } int numWords() { ret index.keysSize(); } // These methods only work when A = S void add(S s) { add((A) s, s); } void remove(S s) { remove((A) s, s); } }
download show line numbers debug dex old transpilations
Travelled to 7 computer(s): bhatertpkbcr, mqqgnosmbjvj, pyentgdyhuwx, pzhvpgtvlbxg, tvejysmllsmz, vouqrxazstgt, xrpafgyirdlv
No comments. add comment
Snippet ID: | #1024242 |
Snippet name: | WordIndex - index a list by words (case-insensitive) |
Eternal ID of this version: | #1024242/21 |
Text MD5: | c57a78c4cfc36d03ca11afecdfddf74b |
Transpilation MD5: | 7a14d94d83b7c05323acbc0e6d68299c |
Author: | stefan |
Category: | javax |
Type: | JavaX fragment (include) |
Public (visible to everyone): | Yes |
Archived (hidden from active list): | No |
Created/modified: | 2020-07-16 18:43:16 |
Source code size: | 1298 bytes / 51 lines |
Pitched / IR pitched: | No / No |
Views / Downloads: | 372 / 854 |
Version history: | 20 change(s) |
Referenced in: | #1029004 - DoubleWordIndex - words indexed forwards and backwards #1029068 - WordIndexWithBitSets - index a list by words (case-insensitive by default) #1034167 - Standard Classes + Interfaces (LIVE, continuation of #1003674) |