Not logged in.  Login/Logout/Register | List snippets | | Create snippet | Upload image | Upload data

30
LINES

< > BotCompany Repo | #1011236 // Scrape Google Spike [OK]

JavaX source code (desktop) [tags: use-pretranspiled] - run with: x30.jar

Download Jar. Libraryless. Click here for Pure Java version (5449L/37K).

1  
!7
2  
3  
p {
4  
  S query = "gramophone";
5  
  S userAgent = "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:53.0) Gecko/20100101 Firefox/53.0";
6  
  S html = loadPageWithUserAgent("http://google.com/search?q=" + urlencode(query) + "&lr=lang_en&hl=en", userAgent);
7  
  S url = first(loadPage_responseHeaders->get("Location"));
8  
  if (url != null)
9  
    html = loadPageWithUserAgent(url, userAgent);
10  
  //print(html);
11  
  pnlStruct(loadPage_responseHeaders!);
12  
  
13  
  // Every h3 is a search result
14  
  L<S> htmlTok = htmlTok(html);
15  
  LL<S> h3s = findContainerTagDeep(htmlTok, "h3");
16  
  pnlStruct(h3s);
17  
  for (L<S> tok : h3s) {
18  
    L<S> linkTok = first(findContainerTag(tok, "a"));
19  
    if (empty(linkTok)) continue;
20  
    
21  
    S link = tagGet(second(linkTok), "href");
22  
    S text = join(dropTags(contentsOfContainerTag(linkTok)));
23  
    L<S> sub = subList(htmlTok, magicIndexOfSubList(htmlTok, tok)+l(tok)-1);
24  
    S desc = trim(htmldecode(dropTags(join(first(findContainerTagWithParams(sub, "span", "class" := "st"))))));
25  
    
26  
    print("Link: " + link);
27  
    print("  Text: " + text);
28  
    print("  Desc: " + desc);
29  
  }
30  
}

download  show line numbers  debug dex  old transpilations   

Travelled to 13 computer(s): aoiabmzegqzx, bhatertpkbcr, cbybwowwnfue, cfunsshuasjs, gwrvuhgaqvyk, ishqpsrjomds, lpdgvwnxivlt, mqqgnosmbjvj, pyentgdyhuwx, pzhvpgtvlbxg, tslmcundralx, tvejysmllsmz, vouqrxazstgt

No comments. add comment

Snippet ID: #1011236
Snippet name: Scrape Google Spike [OK]
Eternal ID of this version: #1011236/19
Text MD5: a9c63f65d196a19c0e501e4037e3a861
Transpilation MD5: d9873086d1d0da7681f357242c2010b2
Author: stefan
Category: javax / networking
Type: JavaX source code (desktop)
Public (visible to everyone): Yes
Archived (hidden from active list): No
Created/modified: 2017-11-14 09:06:41
Source code size: 1113 bytes / 30 lines
Pitched / IR pitched: No / No
Views / Downloads: 440 / 981
Version history: 18 change(s)
Referenced in: [show references]