e.g.
https://rrcsearch3.neubus.com/esd3-rrc/index.php_module_=esd&_action_=keysearch&profile=28.
This kind of website does not need to do several get or post to get the final result even though there are lots of websites pop up before the final one, like:
What you really need is just the final page + the urlencoded json parameters:
params = urlencode({ 'json': json.dumps({"Neusearch": {"profile": "28", "api_key": "publicuser",
"api_sig": "68461babf138014f252e64732ef3b1a0",
"Searchitems": {"item": [{"key": "operator_number", "value": '243195'}]},
"page": 0, "pageSize": 50, "strict": "true", "saveSearch": "true"}}) }) scraper.do.http_post( url=context['address'] + '&' + params,
send_to=detail,
data=params,
context=context
)
But, remember the header should include the referer since you are not doing get and post from the very first page:
headers = { "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_5)....., 'Content-Type': 'application/x-www-form-urlencoded',
'Referer': 'https://XXXXXXXX/index.php?........}
Difference between json.dumps() and json.dump()
dumps是将dict转化成str格式,loads是将str转化成dict格式。
dump和load也是类似的功能,只是与文件操作结合起来了。
No comments:
Post a Comment