Not logged in.  Login/Logout/Register | List snippets | | Create snippet | Upload image | Upload data

30
LINES

< > BotCompany Repo | #1011236 // Scrape Google Spike [OK]

JavaX source code (desktop) [tags: use-pretranspiled] - run with: x30.jar

Download Jar. Libraryless. Click here for Pure Java version (5449L/37K).

!7

p {
  S query = "gramophone";
  S userAgent = "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:53.0) Gecko/20100101 Firefox/53.0";
  S html = loadPageWithUserAgent("http://google.com/search?q=" + urlencode(query) + "&lr=lang_en&hl=en", userAgent);
  S url = first(loadPage_responseHeaders->get("Location"));
  if (url != null)
    html = loadPageWithUserAgent(url, userAgent);
  //print(html);
  pnlStruct(loadPage_responseHeaders!);
  
  // Every h3 is a search result
  L<S> htmlTok = htmlTok(html);
  LL<S> h3s = findContainerTagDeep(htmlTok, "h3");
  pnlStruct(h3s);
  for (L<S> tok : h3s) {
    L<S> linkTok = first(findContainerTag(tok, "a"));
    if (empty(linkTok)) continue;
    
    S link = tagGet(second(linkTok), "href");
    S text = join(dropTags(contentsOfContainerTag(linkTok)));
    L<S> sub = subList(htmlTok, magicIndexOfSubList(htmlTok, tok)+l(tok)-1);
    S desc = trim(htmldecode(dropTags(join(first(findContainerTagWithParams(sub, "span", "class" := "st"))))));
    
    print("Link: " + link);
    print("  Text: " + text);
    print("  Desc: " + desc);
  }
}

download  show line numbers  debug dex  old transpilations   

Travelled to 13 computer(s): aoiabmzegqzx, bhatertpkbcr, cbybwowwnfue, cfunsshuasjs, gwrvuhgaqvyk, ishqpsrjomds, lpdgvwnxivlt, mqqgnosmbjvj, pyentgdyhuwx, pzhvpgtvlbxg, tslmcundralx, tvejysmllsmz, vouqrxazstgt

No comments. add comment

Snippet ID: #1011236
Snippet name: Scrape Google Spike [OK]
Eternal ID of this version: #1011236/19
Text MD5: a9c63f65d196a19c0e501e4037e3a861
Transpilation MD5: d9873086d1d0da7681f357242c2010b2
Author: stefan
Category: javax / networking
Type: JavaX source code (desktop)
Public (visible to everyone): Yes
Archived (hidden from active list): No
Created/modified: 2017-11-14 09:06:41
Source code size: 1113 bytes / 30 lines
Pitched / IR pitched: No / No
Views / Downloads: 572 / 1272
Version history: 18 change(s)
Referenced in: #1011241 - quickGoogle - returns pairs of (link, text)