用python解析并另存为cs

2024-10-03 04:26:05 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试用python和beautifulsoup包解析一个网页。在保存之前,我有一个控制台来打印解析结果。在cmd控制台中进行解析时,每列中的每个数据都可以很好地打印出来。但在保存到csv时,遇到逗号时,行会跳到下一列。遇到逗号之前的列都很好。我正在使用MS Professional Plus 2010执行csv数据

代码如下:

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup


import requests
from lxml import html
import re

filename = "eng.csv"
f = open(filename, "w")

headers ="abc, def, cdf\n"
f.write(headers)
url = ""

r = requests.get(url, headers = {"User-Agent":"Chrome/56.0.2924.87"})

tree = html.fromstring(r.content)

patternAB= r'ab\s=\s"(.*?)"'

script = tree.xpath('//script[contains(., "ab")]/text()')[0]
eng_name=re.search(patternAB, script).group(1)

script1 = tree.xpath('//script[contains(., "ab")]/text()')[2]
regions=re.search(patternAB, script1).group(1)
......#I use the above code repeatedly with different indexes

f.write(eng_name + ";" + regions + ";" + origins + ";" + "\n")
#I also tried "," as connector but to no avail.
f.close()

知道哪里出了问题吗? 提前谢谢


Tags: csv数据fromimportretreeabhtml