带循环的Python列表填充

2024-05-07 17:58:34 发布

您现在位置:Python中文网/ 问答频道 /正文

我在获取下面的county列表时遇到问题,无法用循环中的结果填充。当我打印出每次迭代的结果以及列表中项目的索引时,我看到我每次都得到一个0的索引,这表明数据在每次循环之后都不会持久化在列表中。因此,当我试图在循环完成后索引county循环时,其中当然根本没有数据,因此我得到了“list index out out range error”

我研究了我一直得到的“list index out out range”错误,我明白我之所以得到它是因为county列表是空的,但是为什么它是空的呢?在

组成target_divs列表中一个条目的HTML源代码如下所示:

<div class="school-type-list-text">
<div class="table_cell_county"><a href='/alabama/autauga-county'>Autauga County</a></div>
<div class="change_div"></div>
<div class="table_cell_other">7<span> Schools</span></div>
<div class="table_cell_other">1,587<span> Students</span></div>
<div class="table_cell_other">8%<span> Minority</span></div>
<div class="break"></div>

这是我的剧本:

^{pr2}$

@Software2建议更改循环光标后进行更新,但我仍然得到相同的错误:

import urllib2
from bs4 import BeautifulSoup
import pandas
import csv

page1 = 'https://www.privateschoolreview.com/alabama'

alabama = urllib2.urlopen(page1)

soup = BeautifulSoup(alabama, "lxml")

target_divs = soup.find_all("div", class_= "school-type-list-text")

for div in target_divs:
    counties = div.find_all("div", class_= "table_cell_county")
    for county in counties:
        print county.text
        print counties.index(county) 

print counties

Tags: importdivtarget列表indextablecellout
3条回答

我假设您需要counties中的县列表。在我看来,问题在于div.find_all()的返回值,它最多返回一个country的数组。要填充县,请尝试以下方法:

counties = []
for div in target_divs:
    county = div.find_all('div', class_= 'table_cell_county')
    for c in county:
        counties.append(c.text.encode('utf-8'))

print counties    # Returns: ['Autauga County', 'Baldwin County', 'Barbour County', 'Bibb County', 'Blount County', 'Bullock County', 'Butler County', 'Calhoun County', 'Chambers County', 'Chilton County', 'Choctaw County', 'Clarke County', 'Clay County', 'Coffee County', 'Colbert County', 'Conecuh County', 'Covington County', 'Crenshaw County', 'Cullman County', 'Dale County', 'Dallas County', 'Dekalb County', 'Elmore County', 'Escambia County', 'Etowah County', 'Greene County', 'Hale County', 'Henry County', 'Houston County', 'Jackson County', 'Jefferson County', 'Lauderdale County', 'Lee County', 'Limestone County', 'Lowndes County', 'Macon County', 'Madison County', 'Marengo County', 'Marion County', 'Marshall County', 'Mobile County', 'Monroe County', 'Montgomery County', 'Morgan County', 'Perry County', 'Pickens County', 'Pike County', 'Randolph County', 'Russell County', 'Saint Clair County', 'Shelby County', 'Sumter County', 'Talladega County', 'Tallapoosa County', 'Tuscaloosa County', 'Walker County', 'Wilcox County', 'Winston County']
print counties[0] # Returns: 'Autauga County'

我可能错了你能试试这个吗。似乎您在嵌套循环中使用了相同的i

for i in target_divs:
    county = i.find_all("div", class_= "table_cell_county")
    for j in county:
        print j.text
        print county.index(j) 

在嵌套循环中,使用相同的变量i作为两个不同的东西。所以第一个被覆盖了。更改第二个变量名。在

理想情况下,像i这样的变量名不太具有描述性,很容易犯这样的错误。尝试类似于:

for div in target_divs:
    counties = div.find_all("div", class_= "table_cell_county")
    for county in counties:
        print county.text
        print counties.index(county) 

相关问题 更多 >