Not logged in.  Login/Logout/Register | List snippets | | Create snippet | Upload image | Upload data

63
LINES

< > BotCompany Repo | #1012023 // reTok_multi - reTok multiple token ranges efficiently

JavaX fragment (include) [tags: use-pretranspiled]

Transpiled version (2678L) is out of date.

static LS reTok_multi(LS tok, L<IntRange> places) {
  if (empty(places)) ret tok;
  if (l(places) == 1) ret reTok(tok, first(places));
  L<S> orig = cloneList(tok); // copy to orig
  
  // sort, extend & merge ranges
  sortIntRangesInPlace(places);
  new L<IntRange> places2;
  for (IntRange p : places) {
    p = intRange(p.start & ~1, p.end | 1); // extend to N-to-N
    if (nempty(places2) && p.start <= last(places2).end)
      last(places2).end = p.end; // merge if overlapping
    else
      places2.add(p);
  }
  
  ifdef reTok_multi_debug
    printStruct("places: ", places2);
  endifdef
 
  int iPlace = 0, n = l(orig);
  IntRange p = get(places2, iPlace);

  int next = p.start, i = next;
  tok.subList(next, tok.size()).clear();
  while (i < n)
    if (i < next)
      tok.add(orig.get(i++));
    else {
      int j = p.end;
      
      S s = joinSubList(orig, i, j);
      ifdef reTok_multi_debug
        printStruct("retokking: ", s);
      endifdef

      tok.addAll(javaTok(s));
      i = j;
      p = get(places2, ++iPlace);
      if (p == null) break;
      next = p.start;
    }
    
  while (i < n)
    tok.add(orig.get(i++));
    
  ifdef reTok_multi_check
    LS correct = javaTok(join(orig));
    if (neq(correct, tok)) {
      n = min(l(correct), l(tok));
      if (l(correct) != l(tok)) print("reTok_multi_check: size difference " + l(correct) + " / " + l(tok));
      for ii to n:
        if (!eq(tok.get(ii), correct.get(ii))) {
          for (int j = max(0, ii-1); j < min(n, ii+1); j++)
            print("reTok_multi_check diff @ " + j + "/" + n + ": " + quote(correct.get(j)) + " / " + quote(tok.get(j)));
          break;
        }
      fail("reTok_multi_check");
    }
  endifdef

  ret tok;
}

Author comment

Began life as a copy of #1003367

download  show line numbers  debug dex  old transpilations   

Travelled to 13 computer(s): aoiabmzegqzx, bhatertpkbcr, cbybwowwnfue, cfunsshuasjs, gwrvuhgaqvyk, ishqpsrjomds, lpdgvwnxivlt, mqqgnosmbjvj, pyentgdyhuwx, pzhvpgtvlbxg, tslmcundralx, tvejysmllsmz, vouqrxazstgt

No comments. add comment

Snippet ID: #1012023
Snippet name: reTok_multi - reTok multiple token ranges efficiently
Eternal ID of this version: #1012023/20
Text MD5: 401f70b190eb68515a7f9f6676c4c09a
Author: stefan
Category: javax
Type: JavaX fragment (include)
Public (visible to everyone): Yes
Archived (hidden from active list): No
Created/modified: 2021-06-04 22:09:13
Source code size: 1789 bytes / 63 lines
Pitched / IR pitched: No / No
Views / Downloads: 436 / 637
Version history: 19 change(s)
Referenced in: [show references]