Not logged in.  Login/Logout/Register | List snippets | | Create snippet | Upload image | Upload data

22
LINES

< > BotCompany Repo | #1014177 // Collect all sentences from 1000 Simple Wikipedia pages

JavaX source code [tags: use-pretranspiled] - run with: x30.jar

Uses 865K of libraries. Click here for Pure Java version (7845L/54K/184K).

!7

p {
  maxConsoleChars(10000);
  new LinkedHashMap<U, S> sentences; // sentence => title
  
  int n = 0, max = 1000;
  for (WikiPage page : indexedSimpleWikipedia_allPages()) {
    for (S s : sentencesAfterLines(page.text))
      putIfNotThere(sentences, toU(s), page.title);
    if ((++n % 1000) == 0) print(n + " / " + l(sentences));
    if (n >= max) break;
  }
  
  Pair<L<U>> p = splitAccordingToPredicate(f mechList_isEnglishSentence_u, keys(sentences));
  
  printAsciiHeading("Non-sentences");
  pnl(keysToLinkedHashMap(sentences, p.a));
  
  printAsciiHeading("Sentences");
  pnl(keysToLinkedHashMap(sentences, p.b));
}

Author comment

Began life as a copy of #1014173

download  show line numbers  debug dex  old transpilations   

Travelled to 13 computer(s): aoiabmzegqzx, bhatertpkbcr, cbybwowwnfue, cfunsshuasjs, gwrvuhgaqvyk, ishqpsrjomds, lpdgvwnxivlt, mqqgnosmbjvj, pyentgdyhuwx, pzhvpgtvlbxg, tslmcundralx, tvejysmllsmz, vouqrxazstgt

No comments. add comment

Snippet ID: #1014177
Snippet name: Collect all sentences from 1000 Simple Wikipedia pages
Eternal ID of this version: #1014177/9
Text MD5: 13e3469e6546cd4c7bb3ab9e08966571
Transpilation MD5: 19cb66ab8562ac17bfaca3c3e6479c43
Author: stefan
Category: javax / a.i. / networking
Type: JavaX source code
Public (visible to everyone): Yes
Archived (hidden from active list): No
Created/modified: 2018-04-15 23:37:56
Source code size: 654 bytes / 22 lines
Pitched / IR pitched: No / No
Views / Downloads: 295 / 399
Version history: 8 change(s)
Referenced in: [show references]