Uses 865K of libraries. Click here for Pure Java version (7845L/54K/184K).
1 | !7 |
2 | |
3 | p { |
4 | maxConsoleChars(10000); |
5 | new LinkedHashMap<U, S> sentences; // sentence => title |
6 | |
7 | int n = 0, max = 1000; |
8 | for (WikiPage page : indexedSimpleWikipedia_allPages()) { |
9 | for (S s : sentencesAfterLines(page.text)) |
10 | putIfNotThere(sentences, toU(s), page.title); |
11 | if ((++n % 1000) == 0) print(n + " / " + l(sentences)); |
12 | if (n >= max) break; |
13 | } |
14 | |
15 | Pair<L<U>> p = splitAccordingToPredicate(f mechList_isEnglishSentence_u, keys(sentences)); |
16 | |
17 | printAsciiHeading("Non-sentences"); |
18 | pnl(keysToLinkedHashMap(sentences, p.a)); |
19 | |
20 | printAsciiHeading("Sentences"); |
21 | pnl(keysToLinkedHashMap(sentences, p.b)); |
22 | } |
Began life as a copy of #1014173
download show line numbers debug dex old transpilations
Travelled to 13 computer(s): aoiabmzegqzx, bhatertpkbcr, cbybwowwnfue, cfunsshuasjs, gwrvuhgaqvyk, ishqpsrjomds, lpdgvwnxivlt, mqqgnosmbjvj, pyentgdyhuwx, pzhvpgtvlbxg, tslmcundralx, tvejysmllsmz, vouqrxazstgt
No comments. add comment
Snippet ID: | #1014177 |
Snippet name: | Collect all sentences from 1000 Simple Wikipedia pages |
Eternal ID of this version: | #1014177/9 |
Text MD5: | 13e3469e6546cd4c7bb3ab9e08966571 |
Transpilation MD5: | 19cb66ab8562ac17bfaca3c3e6479c43 |
Author: | stefan |
Category: | javax / a.i. / networking |
Type: | JavaX source code |
Public (visible to everyone): | Yes |
Archived (hidden from active list): | No |
Created/modified: | 2018-04-15 23:37:56 |
Source code size: | 654 bytes / 22 lines |
Pitched / IR pitched: | No / No |
Views / Downloads: | 373 / 495 |
Version history: | 8 change(s) |
Referenced in: | [show references] |