Not logged in.  Login/Logout/Register | List snippets | | Create snippet | Upload image | Upload data

26
LINES

< > BotCompany Repo | #1024521 - dropPunctuation3 [experimental]

JavaX fragment (include) [tags: use-pretranspiled]

Libraryless. Click here for Pure Java version (2326L/15K).

scope dropPunctuation3.

static LS #keep = ll("*", "<", ">");
static SS #cache = defaultSizeMRUCache();

static LS dropPunctuation3(LS tok) {
  tok = new ArrayList<S>(tok);
  for (int i = 1; i < tok.size(); i += 2) {
    S t = tok.get(i);
    if (t.length() == 1 && !Character.isLetter(t.charAt(0)) && !Character.isDigit(t.charAt(0)) && !dropPunctuation3_keep.contains(t)) {
      // merge spacing and make sure it's not completely empty
      tok.set(i-1, or2(tok.get(i-1) + tok.get(i+1), " "));
      tok.remove(i);
      tok.remove(i);
      i -= 2;
    }
  }
  return tok;
}

sS dropPunctuation3(S s) {
  ret getOrCreate_f0(cache, s,
    () -> join(dropPunctuation3(javaTokNoQuotes(s)));
}

end scope

Author comment

Began life as a copy of #1000814

download  show line numbers  debug dex   

Travelled to 2 computer(s): mqqgnosmbjvj, tvejysmllsmz

No comments. add comment

Snippet ID: #1024521
Snippet name: dropPunctuation3 [experimental]
Eternal ID of this version: #1024521/6
Text MD5: 25f30e08967e15026c63101ac92c444d
Transpilation MD5: 26bdc9c46e39aa7ad3cb8ccc3f975194
Author: stefan
Category:
Type: JavaX fragment (include)
Public (visible to everyone): Yes
Archived (hidden from active list): No
Created/modified: 2020-01-12 01:16:53
Source code size: 729 bytes / 26 lines
Pitched / IR pitched: No / No
Views / Downloads: 25 / 52
Version history: 5 change(s)
Referenced in: [show references]

Formerly at http://tinybrain.de/1024521 & http://1024521.tinybrain.de