srecord IndexedWikiPage(S title, long start, int len) {} please include function iteratorFromFunction. static IterableIterator<IndexedWikiPage> indexSimpleWikipedia() { File f = unpackSimpleWikipedia(); final ByteCountingLineReader reader = new(bufferedFileInputStream(f, 1024*1024)); ret main.<IndexedWikiPage> iteratorFromFunction(new O { int lines = 0, pages = 0; IndexedWikiPage get() ctex { long pageStart = 0; StringBuilder pageBuf = null; while licensed { long offset = reader.byteCount(); S line = reader.readLine(); if (line == null) break; line = trim(line); if (eq(line, "<page>")) { pageStart = offset; pageBuf = new StringBuilder; } if (pageBuf != null) pageBuf.append(line).append("\n"); if (eq(line, "</page>")) { L<S> tok = htmlTok(str(pageBuf)); S title = trim(htmldecode(join(contentsOfContainerTag(tok, "title")))); if ((++pages % 1000) == 0) { fractionDone(pages/228400.0); print("Pages: " + pages + " (" + title + ")"); sleep(1); } ret new IndexedWikiPage(title, pageStart, toInt(reader.byteCount()-pageStart)); } } fractionDone(1); reader.close(); null; } }); }
Began life as a copy of #1008067
download show line numbers debug dex old transpilations
Travelled to 13 computer(s): aoiabmzegqzx, bhatertpkbcr, cbybwowwnfue, cfunsshuasjs, gwrvuhgaqvyk, ishqpsrjomds, lpdgvwnxivlt, mqqgnosmbjvj, pyentgdyhuwx, pzhvpgtvlbxg, tslmcundralx, tvejysmllsmz, vouqrxazstgt
No comments. add comment
| Snippet ID: | #1014153 | 
| Snippet name: | indexSimpleWikipedia | 
| Eternal ID of this version: | #1014153/9 | 
| Text MD5: | d3460f1311734490734180efba5af218 | 
| Author: | stefan | 
| Category: | javax / a.i. / networking | 
| Type: | JavaX fragment (include) | 
| Public (visible to everyone): | Yes | 
| Archived (hidden from active list): | No | 
| Created/modified: | 2018-04-15 14:11:58 | 
| Source code size: | 1390 bytes / 43 lines | 
| Pitched / IR pitched: | No / No | 
| Views / Downloads: | 685 / 695 | 
| Version history: | 8 change(s) | 
| Referenced in: | [show references] |