不能将名字和日期从字典中分离出来,以便将它们写入excel文件

2024-09-21 05:27:28 发布

您现在位置:Python中文网/ 问答频道 /正文

我用python创建了一个脚本来解析网页中的企业名称和日期,并使用openpyxl将它们写入excel文件。我的意图是把名字和日期放在不同的列中,比如name1 date1 name2 date2等等

我当前的尝试可以获取字典中的内容,并产生如下结果:

{'NATIONAL OPERA STUDIO': '18 Nov 2010', 'NATIONAL THEATRE BALLET SCHOOL': '12 Aug 2005', 'NATIONAL THEATRE DRAMA SCHOOL': '12 Aug 2005', 'NATIONAL THEATRE': '30 Mar 2000'}

如何将姓名和日期放入excel文件,如下所示

column1                 column2       column3                           column4  
NATIONAL OPERA STUDIO   18 Nov 2010   NATIONAL THEATRE BALLET SCHOOL    12 Aug 2005

这是我迄今为止的尝试:

import re
import requests
from bs4 import BeautifulSoup
from openpyxl import load_workbook

wb = load_workbook('container.xlsx')
ws = wb['Sheet1']

url = "https://abr.business.gov.au/ABN/View?id=78007306283"

response = requests.get(url)
soup = BeautifulSoup(response.text,'lxml')
try:
    names_n_dates = {item.find("a").get_text(strip=True):' '.join(item.find("a").find_parent().find_next_sibling().text.split()) for item in soup.find("th",text=re.compile("Business name",re.I)).find_parent().find_next_siblings("tr")}
except AttributeError: names_n_dates = ""

items = {k:v for k,v in names_n_dates.items()}
print(items)

ws.append([items.split()])
wb.save("container.xlsx")

我知道我不能在字典上应用split函数,但我不知道有什么替代方法可以达到同样的效果。我使用ws.append([])将字段包含在excel文件中,我希望保持此命令的原样,因为以后还有其他字段要包含在其中


Tags: 文件textimportrewsnamesitemsfind
2条回答

要解决这个问题,您可以迭代(key,value)元组的字典项,然后像列表一样获得这些项中每个项的key和value。键位于项的位置0,值位于位置1

import re
import requests
from bs4 import BeautifulSoup
from openpyxl import load_workbook

wb = load_workbook('container.xlsx')
ws = wb['Sheet1']

url = "https://abr.business.gov.au/ABN/View?id=78007306283"

response = requests.get(url)
soup = BeautifulSoup(response.text,'lxml')
try:
    names_n_dates = {item.find("a").get_text(strip=True):' '.join(item.find("a").find_parent().find_next_sibling().text.split()) for item in soup.find("th",text=re.compile("Business name",re.I)).find_parent().find_next_siblings("tr")}
except AttributeError: names_n_dates = ""

row = []

for item in dict.items(): #iterate over all dict items
   row.append(item[0]) #key
   row.append(item[1]) #value

ws.append(row)

wb.save("container.xlsx")

如果您想保持ws.append()如您所愿(将一个列表追加为一行),请执行以下操作:

import re
import requests
from bs4 import BeautifulSoup
from openpyxl import load_workbook

wb = load_workbook('container.xlsx')
ws = wb['Sheet1']

url = "https://abr.business.gov.au/ABN/View?id=78007306283"

response = requests.get(url)
soup = BeautifulSoup(response.text,'lxml')
try:
    names_n_dates = {item.find("a").get_text(strip=True):' '.join(item.find("a").find_parent().find_next_sibling().text.split()) for item in soup.find("th",text=re.compile("Business name",re.I)).find_parent().find_next_siblings("tr")}
except AttributeError: names_n_dates = ""

row = []

for item in names_n_dates.items():
   for column in item:
       row.append(column)

ws.append(row)

wb.save("container.xlsx")

相关问题 更多 >

    热门问题