Not logged in.  Login/Logout/Register | List snippets | | Create snippet | Upload image | Upload data

38
LINES

< > BotCompany Repo | #1004692 // Apache Tika Test On Local PDF

JavaX source code [tags: use-pretranspiled] - run with: x30.jar

Uses 53509K of libraries. Click here for Pure Java version (327L/3K/10K).

!752

lib 1004690 // tika

import org.apache.http.HttpEntity;
import org.apache.http.HttpResponse;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.DefaultHttpClient;
import org.apache.tika.metadata.Metadata;
import org.apache.tika.metadata.TikaCoreProperties;
import org.apache.tika.parser.AutoDetectParser;
import org.apache.tika.parser.ParseContext;
import org.apache.tika.sax.BodyContentHandler;

static Map<S, O> processFile(File file) ctex {
  new HashMap<S, O> map;
  InputStream input = new FileInputStream(file);
  try {
    BodyContentHandler handler = new BodyContentHandler(-1);
    Metadata metadata = new Metadata();
    AutoDetectParser parser = new AutoDetectParser();
    ParseContext parseContext = new ParseContext();
    parser.parse(input, handler, metadata, parseContext);
    map.put("text", handler.toString());
    map.put("title", metadata.get(TikaCoreProperties.TITLE));
    map.put("pageCount", metadata.get("xmpTPg:NPages"));
  } finally {
    input.close();
  }
  return map;
}

p {
  Map<S, O> extractedMap = processFile(new File(or(get(args, 0), "/home/stefan/Desktop/maude-primer.pdf")));
  S text = (S) extractedMap.get("text");
  print(text);
  print(l(text));
}

Author comment

Began life as a copy of #1004691

download  show line numbers  debug dex  old transpilations   

Travelled to 15 computer(s): aoiabmzegqzx, bhatertpkbcr, cbybwowwnfue, cfunsshuasjs, ddnzoavkxhuk, gwrvuhgaqvyk, ishqpsrjomds, lpdgvwnxivlt, mqqgnosmbjvj, pyentgdyhuwx, pzhvpgtvlbxg, tslmcundralx, tvejysmllsmz, uwnvikuolobj, vouqrxazstgt

No comments. add comment

Snippet ID: #1004692
Snippet name: Apache Tika Test On Local PDF
Eternal ID of this version: #1004692/1
Text MD5: ad25d76a1835518f6836f7e72326db82
Transpilation MD5: 85844c0ce85c497d2b97f17138a92612
Author: stefan
Category: javax
Type: JavaX source code
Public (visible to everyone): Yes
Archived (hidden from active list): No
Created/modified: 2016-08-27 12:37:38
Source code size: 1269 bytes / 38 lines
Pitched / IR pitched: No / No
Views / Downloads: 503 / 564
Referenced in: [show references]