- If there are some junk html tab within the tab you want to scrape, e.g.
<table align="center" border="0" cellpadding="0" cellspacing="0" height="0%" summary="Scout Ticket well data content table" width="98%">
......data you want to scrape.......
<table border="0" cellpadding="0" cellspacing="0" height="0%" summary="Plan View Table" width="100%">....junk table....</table>
-----data you want to scrape
</table>
then you can use: soup.decompose()
for table_useless in soup.find_all("table", {"summary": "Plan View Table"}): table_useless.decompose()
- If there are tags within another tag, you can extract data separately and zip them together, e.g.
then you can use: zip()
header_data = [html.get_contents(header.next) for header in data_points]
detail_data = [item.find('b').next if item.find('b') is not None else 'None' for item in data_points]
final_data = dict(zip(header_data, detail_data))
No comments:
Post a Comment