使用pandas将DataFrame导出到Excel而不订阅

2024-05-20 02:44:47 发布

您现在位置:Python中文网/ 问答频道 /正文

如何在没有订阅的情况下将DataFrame导出到excel? 例如: 我正在做webscraping,有一个带有分页的表,所以我将第1页保存在DataFrame中,导出到excel,然后在第2页再次执行。但当保存的记录保留为最后一条时,所有记录都会被删除。 对不起,我的英语,这是我的代码:

import time import pandas as pd from bs4 import BeautifulSoup from selenium import webdriver i=1 url = "https://stats.nba.com/players/traditional/?PerMode=Totals&Season=2019-20&SeasonType=Regular%20Season&sort=PLAYER_NAME&dir=-1" driver = webdriver.Firefox(executable_path=r'C:/Users/Fabio\Desktop/robo/geckodriver.exe') driver.get(url) time.sleep(5) driver.find_element_by_xpath("/html/body/main/div[2]/div/div[2]/div/div/nba-stat-table/div[2]/div[1]/table/thead/tr/th[9]").click() contador = 1 #loop pagination while(contador < 4): #findind table elemento = driver.find_element_by_xpath("/html/body/main/div[2]/div/div[2]/div/div/nba-stat-table/div[2]") html_content = elemento.get_attribute('outerHTML') # 2. Parse HTML - BeaultifulSoup soup = BeautifulSoup(html_content, 'html.parser') table = soup.find(name='table') # 3. Data Frame - Pandas df_full = pd.read_html(str(table))[0] df = df_full[['PLAYER','TEAM', 'PTS']] df.columns = ['jogador','time', 'pontuacao'] dados1 = pd.DataFrame(df) driver.find_element_by_xpath("/html/body/main/div[2]/div/div[2]/div/div/nba-stat-table/div[1]/div/div/a[2]").click() contador = contador + 1 #4. export to excel dados = pd.DataFrame(df) dados.to_excel("fabinho.xlsx") driver.quit()

Tags: importdivdataframedfbytimehtmldriver
1条回答
网友
1楼 · 发布于 2024-05-20 02:44:47

您正在将df重新分配给每次通过循环检索到的任何数据。一种解决方案是将数据附加到列表中,然后在列表的末尾添加pd.concat

&13; 第13部分,;
import time
import pandas as pd
from bs4 import BeautifulSoup
from selenium import webdriver


i=1
url = "https://stats.nba.com/players/traditional/?PerMode=Totals&Season=2019-20&SeasonType=Regular%20Season&sort=PLAYER_NAME&dir=-1"

driver = webdriver.Firefox(executable_path=r'C:/Users/Fabio\Desktop/robo/geckodriver.exe')

driver.get(url)
time.sleep(5)


driver.find_element_by_xpath("/html/body/main/div[2]/div/div[2]/div/div/nba-stat-table/div[2]/div[1]/table/thead/tr/th[9]").click()



contador = 1
df_list = list()
#loop pagination
while(contador < 4):

    #findind table
    elemento = driver.find_element_by_xpath("/html/body/main/div[2]/div/div[2]/div/div/nba-stat-table/div[2]")
    html_content = elemento.get_attribute('outerHTML')

    # 2. Parse HTML - BeaultifulSoup
    soup = BeautifulSoup(html_content, 'html.parser')
    table = soup.find(name='table')

    # 3. Data Frame - Pandas
    df_full = pd.read_html(str(table))[0]
    df = df_full[['PLAYER','TEAM', 'PTS']]
    df.columns = ['jogador','time', 'pontuacao']
    df_list.append(df)
    
    driver.find_element_by_xpath("/html/body/main/div[2]/div/div[2]/div/div/nba-stat-table/div[1]/div/div/a[2]").click()

    contador = contador + 1

#4. export to excel

dados = pd.concat(df_list)
dados.to_excel("fabinho.xlsx")

driver.quit()
和#13;
和#13;

相关问题 更多 >