python:在数据帧中排列

2024-10-04 09:25:57 发布

您现在位置:Python中文网/ 问答频道 /正文

我从网页中提取数据,但希望将其排列到pandas dataframe表中

finviz = requests.get('https://finviz.com/screener.ashx?v=152&o=ticker&c=0,1,2,3,4,5,6,7,10,11,12,14,16,17,19,21,22,23,24,25,31,32,33,38,41,48,65,66,67&r=1')
finz = html.fromstring(finviz.content)
col = finz.xpath('//table/tr/td[@class="table-top"]/text()')
data = finz.xpath('//table/tr/td/a[@class="screener-link"]/text()')

Col是数据帧的列,数据列表中的28个数据点将相应地排列成行。第二行中的数据点29到56,以此类推。如何优雅地编写代码

datalist = []
for y in range (28):
       datalist.append(data[y])
>>> datalist
['1', 'Agilent Technologies, Inc.', 'Healthcare', 'Medical Laboratories & Research', 'USA', '23.00B', '29.27', '4.39', '4.53', '18.76', '1.02%', '5.00%', '5.70%', '3
24.30M', '308.52M', '2.07', '8.30%', '15.70%', '14.60%', '1.09', '1,775,149', '2', 'Alcoa Corporation', 'Basic Materials', 'Aluminum', 'USA', '1.21B', '-']

但是结果不是像dataframe那样的表格形式


Tags: 数据text网页dataframedatatablexpathtr
1条回答
网友
1楼 · 发布于 2024-10-04 09:25:57

Pandas有一个解析HTML的函数:^{}

您可以尝试以下操作:

# Modules
import pandas as pd
import requests

# HTML content
finviz = requests.get('https://finviz.com/screener.ashx?v=152&o=ticker&c=0,1,2,3,4,5,6,7,10,11,12,14,16,17,19,21,22,23,24,25,31,32,33,38,41,48,65,66,67&r=1')

# Convert to dataframe
df = pd.read_html(finviz.content)[-2]

# Set 1st row to columns names
df.columns = df.iloc[0]
# Drop 1st row
df = df.drop(df.index[0])

# df = df.set_index('No.')
print(df)
# 0  No. Ticker                                     Company            Sector                               Industry      Country  ... Debt/Eq Profit M  Beta   Price  Change    Volume
# 1    1      A                  Agilent Technologies, Inc.        Healthcare        Medical Laboratories & Research          USA  ...    0.51   14.60 % 1.20   72.47 - 0.28 % 177333
# 2    2     AA                           Alcoa Corporation   Basic Materials                               Aluminum          USA  ...    0.44 - 10.80 % 2.03    6.28   3.46 % 3021371
# 3    3   AAAU                Perth Mint Physical Gold ETF         Financial                   Exchange Traded Fund          USA  ... - - - 16.08 - 0.99 % 45991
# 4    4   AACG                       ATA Creativity Global          Services          Education & Training Services        China  ...    0.02 - 2.96    0.95 - 0.26 % 6177
# 5    5   AADR         AdvisorShares Dorsey Wright ADR ETF         Financial                   Exchange Traded Fund          USA  ... - - - 40.80   0.22 % 1605
# 6    6    AAL                American Airlines Group Inc.          Services                         Major Airlines          USA  ... - 3.70 % 1.83   12.81   4.57 % 16736506
# 7    7   AAMC     Altisource Asset Management Corporation         Financial                       Asset Management          USA  ... - -17.90 % 0.78   12.28   0.00 % 0
# 8    8   AAME               Atlantic American Corporation         Financial                         Life Insurance          USA  ...    0.28 - 0.40 % 0.29    2.20   3.29 % 26
# 9    9    AAN                               Aaron's, Inc.          Services              Rental & Leasing Services          USA  ...    0.20    0.80 % 1.23   22.47 - 0.35 % 166203
# 10  10   AAOI               Applied Optoelectronics, Inc.        Technology    Semiconductor - Integrated Circuits          USA  ...    0.49 - 34.60 % 2.02    7.80   2.63 % 61303
# 11  11   AAON                                  AAON, Inc.  Industrial Goods             General Building Materials          USA  ...    0.02   11.40 % 0.88   48.60   0.71 % 20533
# 12  12    AAP                    Advance Auto Parts, Inc.          Services                      Auto Parts Stores          USA  ...    0.21    5.00 % 1.04   95.94 - 0.58 % 165445
# 13  13   AAPL                                  Apple Inc.    Consumer Goods                   Electronic Equipment          USA  ...    1.22   21.50 % 1.19  262.39   2.97 % 11236642
# 14  14    AAT                 American Assets Trust, Inc.         Financial                          REIT - Retail          USA  ...    1.03   12.50 % 0.99   25.35   2.78 % 30158
# 15  15    AAU                       Almaden Minerals Ltd.   Basic Materials                                   Gold       Canada  ...    0.04 - 0.53    0.28 - 1.43 % 34671
# 16  16   AAWW          Atlas Air Worldwide Holdings, Inc.          Services                    Air Services, Other          USA  ...    1.33 - 10.70 % 1.65   22.79   2.70 % 56521
# 17  17   AAXJ  iShares MSCI All Country Asia ex Japan ETF         Financial                   Exchange Traded Fund          USA  ... - - - 60.13   1.18 % 161684
# 18  18   AAXN                       Axon Enterprise, Inc.  Industrial Goods  Aerospace/Defense Products & Services          USA  ...    0.00    0.20 % 0.77   71.11   2.37 % 187899
# 19  19     AB              AllianceBernstein Holding L.P.         Financial                       Asset Management          USA  ...    0.00   89.60 % 1.35   19.15   1.84 % 54588
# 20  20    ABB                                     ABB Ltd  Industrial Goods                  Diversified Machinery  Switzerland  ...    0.67    5.10 % 1.10   17.44   0.52 % 723739

# [20 rows x 29 columns]

如果HTML页面结构发生变化,我可以让您改进数据选择!父div id可能有用


解释“[-2]”:该read_html返回数据帧列表:

list_df = pd.read_html(finviz.content)
print(type(list_df))
# <class 'list'>

# Elements types in the lists
print(type(list_df [0]))
# <class 'pandas.core.frame.DataFrame' >

因此,为了获得所需的数据帧,我选择了结尾前的第二个元素[-2]。这个discussion解释了负索引

相关问题 更多 >