下面是一段有问题的代码:
data = requests.get(searchURL, auth=HTTPBasicAuth(config.flxusername, config.flxpassword), verify=False)
feed_data = data.content
d = feedparser.parse(feed_data)
tickets=[]
for ticketNum in d['entries'] :
tickets.append(ticketNum['title'])
s = requests.Session()
s.get(ticketsBaseUrl, auth=HTTPBasicAuth(config.flxusername, config.flxpassword), verify=False)
for ticket in tickets :
ticket_page = s.get(ticketsBaseUrl+ticket, auth=HTTPBasicAuth(config.flxusername, config.flxpassword), verify=False )
if ticket_page.status_code == 404 :
print('ticket %s data 404, skipping' %ticket)
continue
现在,这段代码本身会导致404响应预期的3次跳过。你知道吗
但是,当我添加一个else时:
data = requests.get(searchURL, auth=HTTPBasicAuth(config.flxusername,
config.flxpassword), verify=False)
feed_data = data.content
d = feedparser.parse(feed_data)
tickets=[]
for ticketNum in d['entries'] :
tickets.append(ticketNum['title'])
s = requests.Session()
s.get(ticketsBaseUrl, auth=HTTPBasicAuth(config.flxusername, config.flxpassword), verify=False)
for ticket in tickets :
ticket_page = s.get(ticketsBaseUrl+ticket, auth=HTTPBasicAuth(config.flxusername, config.flxpassword), verify=False )
if ticket_page.status_code == 404 :
print('ticket %s data 404, skipping' %ticket)
continue
else :
etree = ET.fromstring(ticket_page.content)
print(etree)
最后404页内容被传递给etree,脚本出错。你知道吗
当我只是做一个别的:打印(票)_页面状态代码)它打印3条错误消息,其余的打印200条。当我放入etree片段时,它才开始尝试解析最后的404。真让人抓狂。你知道吗
我错过了什么?你知道吗
尝试了另一种选择:
data = requests.get(searchURL, auth=HTTPBasicAuth(config.flxusername, config.flxpassword), verify=False)
feed_data = data.content
d = feedparser.parse(feed_data)
tickets=[]
for ticketNum in d['entries'] :
tickets.append(ticketNum['title'])
s = requests.Session()
s.get(ticketsBaseUrl, auth=HTTPBasicAuth(config.flxusername, config.flxpassword), verify=False)
for ticket in tickets :
ticket_page = s.get(ticketsBaseUrl+ticket, auth=HTTPBasicAuth(config.flxusername, config.flxpassword), verify=False )
if ticket_page.status_code == 404 :
print('ticket %s data 404, skipping' %ticket)
continue
etree = ET.fromstring(ticket_page.content)
这也不会跳过最后的404。你知道吗
测试了较小的代码部分:
if ticket_page.status_code == 404 :
print(str(ticket_page.status_code) + ' ' + ticket)
continue
else :
print(ET.fromstring(ticket_page.content))
失败;尝试从列表中的最后404开始。你知道吗
if ticket_page.status_code == 404 :
print(str(ticket_page.status_code) + ' ' + ticket)
continue
else :
print('continued')
工作,打印3404的,其他的都继续打印。(这在技术上是不正确的;它实际上处理了其他所有事情)
尝试相反的方式:
if ticket_page.status_code == 200:
print(ET.fromstring(ticket_page.content))
else :
print(str(ticket_page.status_code) + ' ' + ticket)
continue
if ticket_page.status_code != 200:
print(str(ticket_page.status_code) + ' ' + ticket)
continue
else :
print(ET.fromstring(ticket_page.content))
if ticket_page.status_code != 200:
print(str(ticket_page.status_code) + ' ' + ticket)
continue
print(ET.fromstring(ticket_page.content))
同样的结果。最终404仍然失败
甚至
for ticket in tickets :
ticket_page = s.get(ticketsBaseUrl+ticket, auth=HTTPBasicAuth(config.flxusername, config.flxpassword), verify=False )
if ticket_page.status_code != 200:
tickets.pop()
在列表中留下一个404。你知道吗
这是引发解析错误的XML:
b'<?xml version="1.0" standalone="yes"?>\n\n<error><statusCode>404</statusCode><name>Not Found</name><description>The server has not found anything matching the request URI: Ticket not found</description></error>\n\n'
最新测试:
if 'statusCode' in tree_root.decode() :
print(ticket)
continue
这给了我3张预期的票。你知道吗
if 'statusCode' in tree_root.decode() :
print(ticket)
continue
etree = ET.fromstring(ticket_page.content.decode())
print(etree)
这在第三张404罚单上失败。再加上一个延迟,认为这是因为在最后404之前有200吨重,并没有改变结果。你知道吗
答案如下:
检查你所有的200;不是404引起的问题。200个中的一个有坏的XML。我发布的大多数变体都能正常工作。我抽查了我的200个,每抽查一次我都漏掉了一个XML不好的。我发现了如何处理糟糕的XML,并且能够完成。你知道吗
相关问题 更多 >
编程相关推荐