Not logged in.  Login/Logout/Register | List snippets | | Create snippet | Upload image | Upload data

33
LINES

< > BotCompany Repo | #1029082 // DeepBitSetWordIndex

JavaX source code (desktop) [tags: use-pretranspiled] - run with: x30.jar

Download Jar. Libraryless. Click here for Pure Java version (3599L/23K).

transient sclass DeepBitSetWordIndex<A> {
  S regexp = "\\w+";
  new Map<A, SingleTextWordIndex> singleTextIndices;
  new ElementInstanceMatrix<A, S> mainIndex;

  void add(A a, S text) {
    singleTextIndices.put(a, new SingleTextWordIndex(regexp, text));
    mainIndex.add(a, mapToSet upper(regexpExtractAll(regexp, text));
  }
  
  void doneAdding {
    mainIndex.doneAdding();
  }
  
  LPair<S, Int> wordsAndOffsets(S text) {
    ret map(regexpFindRanges(regexp, text),
      r -> pair(upper(substring(text, r)), r.start));
  }

  // assumes word boundaries left and right of query
  Cl<A> preSearch(S query, O... _) {
    optPar bool debug;
    LPair<S, Int> l = wordsAndOffsets(query);
    Cl<A> candidates = mainIndex.instancesContainingAllElements(pairsA(l));
    if (debug) {
      L<Int> lengths = map(candidates, a -> singleTextIndices.get(a).length);
      print(nCandidates(candidates) + ", total length: " + n2(intSum(lengths)) + ", lengths: " + lengths);
    }
    ret filter(candidates, a -> nempty(singleTextIndices.get(a).indicesOfWordCombination(l)));
  }
  
  int numWords() { ret mainIndex.numElements(); }
}

Author comment

Began life as a copy of #1029078

download  show line numbers  debug dex  old transpilations   

Travelled to 7 computer(s): bhatertpkbcr, mqqgnosmbjvj, pyentgdyhuwx, pzhvpgtvlbxg, tvejysmllsmz, vouqrxazstgt, xrpafgyirdlv

No comments. add comment

Snippet ID: #1029082
Snippet name: DeepBitSetWordIndex
Eternal ID of this version: #1029082/15
Text MD5: e26b556e2fd8d141d4bb5535451c02db
Transpilation MD5: 7d55bec89146d478c8b80ae0361a74a6
Author: stefan
Category: javax
Type: JavaX source code (desktop)
Public (visible to everyone): Yes
Archived (hidden from active list): No
Created/modified: 2020-07-19 02:39:47
Source code size: 1161 bytes / 33 lines
Pitched / IR pitched: No / No
Views / Downloads: 212 / 941
Version history: 14 change(s)
Referenced in: [show references]