sclass WikiPage { S title, text; *() {} *(S *title, S *text) {} } static IterableIterator<WikiPage> streamInSimpleWikipedia() { File f = unpackSimpleWikipedia(); final BufferedReader reader = utf8bufferedReader(f); please include function iteratorFromFunction. ret main.<WikiPage> iteratorFromFunction(new O { int lines = 0, pages = 0; StringBuilder pageBuf = null; WikiPage get() ctex { S line; while ((line = reader.readLine()) != null) { line = trim(line); if (eq(line, "<page>")) pageBuf = new StringBuilder; if (pageBuf != null) pageBuf.append(line).append("\n"); if (eq(line, "</page>")) { L<S> tok = htmlTok(str(pageBuf)); S title = trim(htmldecode(join(contentsOfContainerTag(tok, "title")))); S text = trim(htmldecode(join(contentsOfContainerTag(tok, "text")))); if ((++pages % 1000) == 0) { fractionDone(pages/228400.0); print("Pages: " + pages + " (" + title + ")"); sleep(1); } ret new WikiPage(title, text); } } fractionDone(1); reader.close(); null; } }); }
Began life as a copy of #1008015
download show line numbers debug dex old transpilations
Travelled to 13 computer(s): aoiabmzegqzx, bhatertpkbcr, cbybwowwnfue, cfunsshuasjs, gwrvuhgaqvyk, ishqpsrjomds, lpdgvwnxivlt, mqqgnosmbjvj, pyentgdyhuwx, pzhvpgtvlbxg, tslmcundralx, tvejysmllsmz, vouqrxazstgt
No comments. add comment
Snippet ID: | #1008067 |
Snippet name: | streamInSimpleWikipedia |
Eternal ID of this version: | #1008067/8 |
Text MD5: | ebae3e08ef2f26ad05eaab4ab3c1888a |
Author: | stefan |
Category: | javax / a.i. / networking |
Type: | JavaX fragment (include) |
Public (visible to everyone): | Yes |
Archived (hidden from active list): | No |
Created/modified: | 2017-04-23 16:10:42 |
Source code size: | 1244 bytes / 41 lines |
Pitched / IR pitched: | No / No |
Views / Downloads: | 459 / 473 |
Version history: | 7 change(s) |
Referenced in: | [show references] |