Nan's Blog: Python - some tricks with web scrapping ( decompose(), zip(), modify html tags, etc.)

Monday, January 13, 2020

Python - some tricks with web scrapping ( decompose(), zip(), modify html tags, etc.)

If there are some junk html tab within the tab you want to scrape, e.g.

<table align="center" border="0" cellpadding="0" cellspacing="0" height="0%" summary="Scout Ticket well data content table" width="98%">
......data you want to scrape.......
<table border="0" cellpadding="0" cellspacing="0" height="0%" summary="Plan View Table" width="100%">....junk table....</table>
-----data you want to scrape
</table>

then you can use: soup.decompose()

for table_useless in soup.find_all("table", {"summary": "Plan View Table"}):
    table_useless.decompose()

If there are tags within another tag, you can extract data separately and zip them together, e.g.

then you can use: zip()

header_data = [html.get_contents(header.next) for header in data_points]

detail_data = [item.find('b').next if item.find('b') is not None else 'None' for item in data_points]

final_data = dict(zip(header_data, detail_data))

Nan's Blog

Monday, January 13, 2020

Python - some tricks with web scrapping ( decompose(), zip(), modify html tags, etc.)

No comments:

Post a Comment

Labels

Blog Archive