Not logged in.  Login/Logout/Register | List snippets | | Create snippet | Upload image | Upload data

29
LINES

< > BotCompany Repo | #1007652 // splitIntoSentences

JavaX fragment (include)

static L<S> splitIntoSentences(S s) {
  new L<S> sentences;
  for (S sentence : splitIntoSentences_split(s)) {
    char first = sentence.charAt(0);
    if (Character.isLowerCase(first) || ",;:=".indexOf(first) >= 0) continue;
    if (!hasCharacters(sentence)) continue;
    sentences.add(sentence);
  }
  ret sentences;
}

static L<S> splitIntoSentences_split(S s) {
  L<S> tok = javaTok(s); // To parse quoted things
  simpleSpaces(tok);
  new L<S> list;
  int i = 0;
  while (true) {
    int j = i;
    do {
      j = indexOfAny(tok, j+1, ".", "?");
      if (j < 0) return list;
    } while (j+1 < tok.size()-1 && tok.get(j+1).equals("")); // matches stuff like "9.5"
    
    S sentence = join(tok.subList(i, j+1)).trim();
    if (sentence.length() > 1)
      list.add(sentence);
    i = j+1;
  }
}

download  show line numbers  debug dex  old transpilations   

Travelled to 13 computer(s): aoiabmzegqzx, bhatertpkbcr, cbybwowwnfue, cfunsshuasjs, gwrvuhgaqvyk, ishqpsrjomds, lpdgvwnxivlt, mqqgnosmbjvj, pyentgdyhuwx, pzhvpgtvlbxg, tslmcundralx, tvejysmllsmz, vouqrxazstgt

No comments. add comment

Snippet ID: #1007652
Snippet name: splitIntoSentences
Eternal ID of this version: #1007652/2
Text MD5: a6f583af3e90778edadc0da42d9796f2
Author: stefan
Category: javax / parsing
Type: JavaX fragment (include)
Public (visible to everyone): Yes
Archived (hidden from active list): No
Created/modified: 2017-03-30 19:46:59
Source code size: 830 bytes / 29 lines
Pitched / IR pitched: No / No
Views / Downloads: 485 / 512
Version history: 1 change(s)
Referenced in: [show references]