Nan's Blog: Python - Web Scraping - request needs to post json parameters

Thursday, June 13, 2019

Python - Web Scraping - request needs to post json parameters

There are some websites that taking json data as parameters in the url.
e.g.
https://rrcsearch3.neubus.com/esd3-rrc/index.php_module_=esd&_action_=keysearch&profile=28.

This kind of website does not need to do several get or post to get the final result even though there are lots of websites pop up before the final one, like:

What you really need is just the final page + the urlencoded json parameters:

params = urlencode({
    'json': json.dumps({"Neusearch": {"profile": "28", "api_key": "publicuser",

           "api_sig": "68461babf138014f252e64732ef3b1a0",

           "Searchitems": {"item": [{"key": "operator_number", "value": '243195'}]},

           "page": 0, "pageSize": 50, "strict": "true", "saveSearch": "true"}})

})

scraper.do.http_post(
    url=context['address'] + '&' + params,

    send_to=detail,

    data=params,

    context=context
)

But, remember the header should include the referer since you are not doing get and post from the very first page:

headers = {
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_5).....,
    'Content-Type': 'application/x-www-form-urlencoded',

    'Referer': 'https://XXXXXXXX/index.php?........}

Difference between json.dumps() and json.dump()
dumps是将dict转化成str格式，loads是将str转化成dict格式。

dump和load也是类似的功能，只是与文件操作结合起来了。

Nan's Blog

Thursday, June 13, 2019

Python - Web Scraping - request needs to post json parameters

No comments:

Post a Comment

Labels

Blog Archive