Not logged in.  Login/Logout/Register | List snippets | | Create snippet | Upload image | Upload data

51
LINES

< > BotCompany Repo | #1000670 - htmlcoarsetok (function)

JavaX fragment (include) [tags: use-pretranspiled]

Libraryless. Click here for Pure Java version (84L/1K).

// TODO: process CDATA, scripts

static LS htmlcoarsetok(S s) {
  new LS tok;
  int l = s == null ? 0 : s.length();
  
  int i = 0;
  while (i < l) {
    int j = i;
    char c;
    
    // scan for non-tags
    while (j < l) {
      if (s.charAt(j) != '<')
        // regular character
        ++j;
      else if (s.substring(j, Math.min(j+4, l)).equals("<!--")) {
        // HTML comment
        j = j+4;
        do ++j; while (j < l && !s.substring(j, Math.min(j+3, l)).equals("-->"));
        j = Math.min(j+3, l);
      } else {
        char d = charAt(s, j+1); // character after <
        if (d == '/' || isLetter(d))
          // it's a tag
          break;
        else
          ++j;
      }
    }
    
    tok.add(s.substring(i, j)); // add non-tag content
    i = j;
    if (i >= l) break;
    c = s.charAt(i);

    // scan over tag
    if (c == '<') {
      ++j;
      
      while (j < l && s.charAt(j) != '>') ++j; // TODO: strings in tag?
      if (j < l) ++j;
    }

    tok.add(s.substring(i, j)); // add tag
    i = j;
  }
  
  if ((tok.size() & 1) == 0) tok.add("");
  return tok;
}

download  show line numbers  debug dex   

Travelled to 16 computer(s): aoiabmzegqzx, bhatertpkbcr, cbybwowwnfue, cfunsshuasjs, ekrmjmnbrukm, gwrvuhgaqvyk, irmadwmeruwu, ishqpsrjomds, lpdgvwnxivlt, mqqgnosmbjvj, onxytkatvevr, pyentgdyhuwx, pzhvpgtvlbxg, tslmcundralx, tvejysmllsmz, xrpafgyirdlv

No comments. add comment

Snippet ID: #1000670
Snippet name: htmlcoarsetok (function)
Eternal ID of this version: #1000670/5
Text MD5: c0662bf48368a88db6238c0b5936a3ee
Transpilation MD5: cac2eefdc401b7a0519fb162806dcdc1
Author: stefan
Category:
Type: JavaX fragment (include)
Public (visible to everyone): Yes
Archived (hidden from active list): No
Created/modified: 2020-11-25 14:50:03
Source code size: 1151 bytes / 51 lines
Pitched / IR pitched: No / No
Views / Downloads: 481 / 1627
Version history: 4 change(s)
Referenced in: [show references]

Formerly at http://tinybrain.de/1000670 & http://1000670.tinybrain.de