html表格到csv废料

2024-09-23 22:23:08 发布

您现在位置:Python中文网/ 问答频道 /正文

我正试图在以下网站中搜索表,但未能成功:

https://www.moneycontrol.com/financials/relianceindustries/ratiosVI/RI?classic=true#RI

import csv

from bs4 import BeautifulSoup

from urllib.request import urlopen

soup = BeautifulSoup(urlopen('https://www.moneycontrol.com/financials/relianceindustries/ratiosVI/RI?classic=true#RI'))

table = soup.find('table', attrs={ "class" : "table-horizontal-line"})

headers = [header.text for header in table.find_all('th')]

rows = []

for row in table.find_all('tr'):
    rows.append([val.text.encode('utf8') for val in row.find_all('td')])

with open('output_file.csv', 'wb') as f:
    writer = csv.writer(f)
    writer.writerow(headers)
    writer.writerows(row for row in rows if row)

Tags: csvinhttpsimportcomforwwwtable
1条回答
网友
1楼 · 发布于 2024-09-23 22:23:08

你可以用熊猫来做这个。顶部有几行您可能希望删除,并用空字符串替换一些其他nan作为清理。你知道吗

import pandas as pd
result = pd.read_html('https://www.moneycontrol.com/financials/relianceindustries/ratiosVI/RI?classic=true#RI')
df = result[3].dropna(how='all').fillna('')
df.to_csv(r'C:\Users\User\Desktop\Data.csv', sep=',', encoding='utf-8',index = False )

相关问题 更多 >