<for>循环上的切片功能

2024-09-25 00:33:04 发布

您现在位置:Python中文网/ 问答频道 /正文

我是一个初学者,在windows10和visualcodestudio上使用python3.7.1。你知道吗

作为练习,我试图从网页中删除一些由表格组织的数据。你知道吗

现在,我只想提取一些信息,这些信息嵌套在 <td valign="top" style="width:25%;">Parte edibile, %</td><td align="left" valign="top" style="font-weight:bold;">75</td>值。作为这里的分隔符,我有<td> ... </td>

我尝试了很多方法只得到每一行的第一行和第二行,因为第三行对我来说并不有趣,这只是浪费我不需要的内存。你知道吗

为此,我使用了一个'for'循环,但正如BeautifulSoup电子表格所理解的那样,当它采用循环时,每行的所有嵌套参数都统一为一个,因此如果我想要切片[0:1]=>;>;第一个和第二个“string”参数<td> </td>,是不可能的。你知道吗

下面是一个简单的循环'for':

for alim in soup.find_all('td')[0:1]: return alim.text

我说的对吗?有人能给我提出更聪明的解决方案吗?你知道吗

事先谢谢你的建议。 最大


Tags: 数据gt信息网页for参数styletop
3条回答

有几种方法可以采用前两个元素:

1)使用带有getattr的map函数,我喜欢这种方式,因为您只对前2个元素进行迭代

from bs4 import BeautifulSoup

soup = BeautifulSoup(your_html, 'lxml') 
r = soup.find_all('td')
gen_my_soup_text = map(lambda x: getattr(x, 'text'), r)

first_string = next(gen_my_soup_text)
second_string = next(gen_my_soup_text)

print(first_string)
print(second_string)

# output: 
# Parte edibile, %
# 75

2)使用切片和贴图

list(map(lambda x: getattr(x, 'text'), r))[:2]

3)使用列表理解和切片

[e.text for e in r][:2]

要清除网页,您可以尝试:

from bs4 import BeautifulSoup
import requests


req = requests.get('http://www.bda-ieo.it/test/Alphabetical.aspx?Lan=Ita')
soup =  BeautifulSoup(req.text, "lxml")

# result is the container of the tags of interest.
rows = soup.find_all("tr", attrs = {'class':'testonormale'})
first_second = [[e.text for e in row.find_all('td')][:2] for row in rows]

# output: 
#[['1300', 'ACCIUGHE o ALICI '],
# ['1502', 'ACCIUGHE o ALICI SOTTO SALE'],
# ['1501', "ACCIUGHE o ALICI SOTT'OLIO"],
# ['100205', 'ACETO'],
....
# ['602004', 'ASTICE '],
# ['600009', 'AVENA '],
# ['999692', 'AVOCADO ']]

如果我理解正确的话,您的表有3+列,您只对前两列感兴趣。你知道吗

要从前两列中提取数据,有多种可能性。一种是使用CSS选择器:

data = '''
    <table>
    <tr>
        <td valign="top" style="width:25%;">I. Parte edibile, %</td>
        <td align="left" valign="top" style="font-weight:bold;">I. 75</td>
        <td>This doesn't interest me</td>
    </tr>
    <tr>
        <td valign="top" style="width:25%;">II. Parte edibile, %</td>
        <td align="left" valign="top" style="font-weight:bold;">II. 75</td>
        <td>II. This doesn't interest me</td>
    </tr>
    <tr>
        <td valign="top" style="width:25%;">III. Parte edibile, %</td>
        <td align="left" valign="top" style="font-weight:bold;">III. 75</td>
        <td>III. This doesn't interest me</td>
    </tr>
    </table>'''

from bs4 import BeautifulSoup

soup = BeautifulSoup(data, 'html.parser')

for col1, col2 in zip(soup.select('td:nth-of-type(1)'), soup.select('td:nth-of-type(2)')):
    print('{: <25} {}'.format(col1.text, col2.text))

印刷品:

I. Parte edibile, %       I. 75
II. Parte edibile, %      II. 75
III. Parte edibile, %     III. 75

或者可以使用列表切片:

rows = []
for tr in soup.select('tr'):
    rows.append([td.text for td in tr.select('td')[0:2]])

for row in rows:
    print('{: <25} {}'.format(*row))

编辑:要分析页面http://www.bda-ieo.it/test/ComponentiAlimento.aspx?Lan=Ita&foodid=1300_2,可以使用以下代码:

from bs4 import BeautifulSoup
import requests

url = 'http://www.bda-ieo.it/test/ComponentiAlimento.aspx?Lan=Ita&foodid=1300_2'

soup = BeautifulSoup(requests.get(url).text, 'html.parser')

for col1, col2 in zip(soup.select('#tblComponenti > tr.testonormale > td:nth-of-type(1)'), soup.select('#tblComponenti > tr.testonormale > td:nth-of-type(2)')):
    print('{: <70} {}'.format(col1.text, col2.text))

印刷品:

Parte edibile, %                                                       75
Energia, ricalcolata, kJ                                               406
Energia, Ric con fibra, kJ                                             406
Energia, ricalcolata, kcal                                             96
Energia, Ric con fibra, kcal                                           96
Proteine totali, g                                                     16,8
   Proteine animali, g                                                 16,8
   Proteine vegetali, g                                                0,0
Lipidi totali, g                                                       2,6
   Lipidi animali, g                                                   2,6
   Lipidi vegetali, g                                                  0,0
Colesterolo, mg                                                        61
Carboidrati disponibili (MSE), g                                       1,5
   Amido (MSE), g                                                      0,0
   Carboidrati solubili (MSE), g                                       1,5
Fibra alimentare totale, g                                             0,0
Alcol, g                                                               0,0
Acqua, g                                                               76,5
Ferro, mg                                                              2,8
Calcio, mg                                                             148
Sodio, mg                                                              104
Potassio, mg                                                           278
Fosforo, mg                                                            196
Zinco, mg                                                              4,20
Magnesio, mg                                                           22
Rame, mg                                                               1,00
Selenio, µg                                                            37,0
Cloro, mg                                                              130
Iodio, µg                                                              29
Manganese, mg                                                          0,07
Zolfo, mg                                                              150
Vitamina B1, Tiamina, mg                                               0,06
Vitamina B2, Riboflavina, mg                                           0,26
Vitamina C, mg                                                         0
Niacina, mg                                                            14,00
Vitamina B6, mg                                                        0,14
Folati totali, µg                                                      9
Acido pantotenico, mg                                                  0,65
Biotina, µg                                                            6,0
Vitamina B12, µg                                                       0,6
Retinolo equivalente                                                   32
   Retinolo eq. (RE), µg                                               32
   Retinolo, µg                                                        tr
   ß-carotene eq., µg                                                  0,29
Vitamina E (ATE), mg                                                   11,00
Vitamina D, µg                                                         1,30
Acidi grassi saturi totali, g                                          0,00
Somma degli acidi butirrico, caproico, caprilico e caprico, g          0,00
Acido laurico, g                                                       0,14
Acido miristico, g                                                     1,01
Acido palmitico, g                                                     0,13
Acido stearico, g                                                      tr
Acido arachidico, g                                                    0,00
Acido beenico, g                                                       0,40
Acidi grassi monoinsaturi totali, g                                    0,00
Acido miristoleico, g                                                  0,10
Acido palmitoleico, g                                                  0,17
Acido oleico, g                                                        0,01
Acidi eicosenoico, g                                                   0,01
Acido erucico, g                                                       0,85
Acidi grassi polinsaturi totali, g                                     0,01
Acido linoleico, g                                                     0,01
Acido linolenico, g                                                    tr
Acido arachidonico, g                                                  0,27
Acido eicosapentaenoico (EPA), g                                       0,52
Acido decosaesaenoico (DHA), g                                         0,04
Altri acidi grassi polinsaturi, g                                      175
Triptofano, mg                                                         726
Treonina, mg                                                           823
Isoleucina, mg                                                         1330
Leucina, mg                                                            1379
Lisina, mg                                                             349
Metionina, mg                                                          183
Cistina, mg                                                            595
Fenilalanina, mg                                                       425
Tirosina, mg                                                           759
Valina, mg                                                             758
Arginina, mg                                                           675
Istidina, mg                                                           919
Alanina, mg                                                            1764
Acido aspartico, mg                                                    2261
Acido glutammico, mg                                                   722
Glicina, mg                                                            460
Prolina, mg                                                            650
Serina, mg                                                             1,5
Glucosio, g                                                            0,0
Fruttosio, g                                                           0,0
Galattosio, g                                                          0,0
Saccarosio (MSE), g                                                    0,0
Maltosio (MSE), g                                                      0,0

如果返回类型是一个列表,您应该使用[0:2],因为最终的数字是非包含的(但是返回将跳出循环),所以需要稍微更改:

result = []
for alim in soup.find_all('td')[0:2]:
     result.append(alim.text)
return result

相关问题 更多 >