Not logged in.  Login/Logout/Register | List snippets | | Create snippet | Upload image | Upload data

35
LINES

< > BotCompany Repo | #1005092 // Extract raw text from .odt

JavaX source code [tags: use-pretranspiled] - run with: x30.jar

Libraryless. Click here for Pure Java version (1706L/13K/41K).

1  
!752
2  
3  
p {
4  
  File odt = new File(userHome(), "Documents/super-state.odt");
5  
  
6  
  // unwrapContainerTags makes editable lists
7  
  L<L<S>> paragraphs = unwrapContainerTags(paragraphsFromODT(odt));
8  
  
9  
  paragraphs = map(paragraphs, func(L<S> p) {
10  
    dropListPrefix(p, "", "<text:soft-page-break/>")
11  
  });
12  
  
13  
  for i over paragraphs: {
14  
    L<S> p = paragraphs.get(i);
15  
    int idx;
16  
    while ((idx = p.indexOf("<text:line-break/>")) >= 0) {
17  
      paragraphs.add(i+1, newSubList(p, idx+1));
18  
      removeSubList(p, 0, idx+1);
19  
    }
20  
  }
21  
  
22  
  // nb: we might need to remove formatting if user used any
23  
  // (not doing that yet)
24  
  L<S> lines = map(paragraphs, func(L<S> p) { join(" # ", p) });
25  
  lines = map(lines, func(S s) { trim(htmldecode(s)) });
26  
  
27  
  paragraphs = groupParagraphs(lines);
28  
  lines = map(paragraphs, func(L<S> p) { fromLines(p) });
29  
  
30  
  psl(lines);
31  
}
32  
33  
static L<L<S>> groupParagraphs(L<S> lines) {
34  
  ret groupNonEmpty(lines, func(S line) { empty(line) });
35  
}

Author comment

Began life as a copy of #1005089

download  show line numbers  debug dex  old transpilations   

Travelled to 14 computer(s): aoiabmzegqzx, bhatertpkbcr, cbybwowwnfue, cfunsshuasjs, ddnzoavkxhuk, gwrvuhgaqvyk, ishqpsrjomds, lpdgvwnxivlt, mqqgnosmbjvj, pyentgdyhuwx, pzhvpgtvlbxg, tslmcundralx, tvejysmllsmz, vouqrxazstgt

No comments. add comment

Snippet ID: #1005092
Snippet name: Extract raw text from .odt
Eternal ID of this version: #1005092/1
Text MD5: b93af5d191c85759438c48cb1a36bf42
Transpilation MD5: 1711e685cb367b3381bdc4328a98c839
Author: stefan
Category: javax / loading
Type: JavaX source code
Public (visible to everyone): Yes
Archived (hidden from active list): No
Created/modified: 2016-10-16 17:23:14
Source code size: 996 bytes / 35 lines
Pitched / IR pitched: No / No
Views / Downloads: 606 / 695
Referenced in: [show references]