Not logged in.  Login/Logout/Register | List snippets | | Create snippet | Upload image | Upload data

91
LINES

< > BotCompany Repo | #1000793 // htmldecode

JavaX fragment (include) [tags: use-pretranspiled]

Libraryless. Click here for Pure Java version (241L/3K).

static String htmldecode(final String input) {
  if (input == null) ret null;
  
  final int MIN_ESCAPE = 2;
  final int MAX_ESCAPE = 6;

  StringWriter writer = null;
  int len = input.length();
  int i = 1;
  int st = 0;
  while (true) {
      // look for '&'
      while (i < len && input.charAt(i-1) != '&')
          i++;
      if (i >= len)
          break;

      // found '&', look for ';'
      int j = i;
      while (j < len && j < i + MAX_ESCAPE + 1 && input.charAt(j) != ';')
          j++;
      if (j == len || j < i + MIN_ESCAPE || j == i + MAX_ESCAPE + 1) {
          i++;
          continue;
      }

      // found escape 
      if (input.charAt(i) == '#') {
          // numeric escape
          int k = i + 1;
          int radix = 10;

          final char firstChar = input.charAt(k);
          if (firstChar == 'x' || firstChar == 'X') {
              k++;
              radix = 16;
          }

          try {
              int entityValue = Integer.parseInt(input.substring(k, j), radix);

              if (writer == null) 
                  writer = new StringWriter(input.length());
              writer.append(input.substring(st, i - 1));

              if (entityValue > 0xFFFF) {
                  final char[] chrs = Character.toChars(entityValue);
                  writer.write(chrs[0]);
                  writer.write(chrs[1]);
              } else {
                  writer.write(entityValue);
              }

          } catch (NumberFormatException ex) { 
              i++;
              continue;
          }
      }
      else {
          // named escape
          CharSequence value = htmldecode_lookupMap().get(input.substring(i, j));
          if (value == null) {
              i++;
              continue;
          }

          if (writer == null) 
              writer = new StringWriter(input.length());
          writer.append(input.substring(st, i - 1));

          writer.append(value);
      }

      // skip escape
      st = j + 1;
      i = st;
  }

  if (writer != null) {
      writer.append(input.substring(st, len));
      return writer.toString();
  }
  return input;
}

static simplyCached HashMap<String, CharSequence> htmldecode_lookupMap() {
  var map = new HashMap<String, CharSequence>();
  for (CharSequence[] seq : htmldecode_escapes()) 
    map.put(seq[1].toString(), seq[0]);
  ret map;
}

Author comment

See http://unicode.e-workers.de/entities.php

download  show line numbers  debug dex  old transpilations   

Travelled to 14 computer(s): aoiabmzegqzx, bhatertpkbcr, cbybwowwnfue, cfunsshuasjs, gwrvuhgaqvyk, ishqpsrjomds, lpdgvwnxivlt, mqqgnosmbjvj, onxytkatvevr, pyentgdyhuwx, pzhvpgtvlbxg, tslmcundralx, tvejysmllsmz, vouqrxazstgt

No comments. add comment

Snippet ID: #1000793
Snippet name: htmldecode
Eternal ID of this version: #1000793/9
Text MD5: 1808854b076bfa92236ceff9f113f8ba
Transpilation MD5: 33eeb12667a48ee36b7c26df93e58d19
Author: stefan
Category: javax
Type: JavaX fragment (include)
Public (visible to everyone): Yes
Archived (hidden from active list): No
Created/modified: 2021-07-23 18:12:06
Source code size: 2453 bytes / 91 lines
Pitched / IR pitched: No / No
Views / Downloads: 803 / 1955
Version history: 8 change(s)
Referenced in: [show references]