如何在一个相似的结构中获取信息

2024-09-26 22:54:12 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试创建一个如下所示的html表:

Recent ratings:
thew              26-6-2014 11:02     Karma   +4      lucky you
user34            26-6-2014 10:34     Karma   +3      great!
godspeed          26-6-2014 06:50     Karma   +5      thanks!
                                                                [Report to Mod.]

我用的是靓汤,我的代码包括:

^{pr2}$

结果,在csv文件中,有一列如下所示:

thew
ᅡᅠᅡᅠ26-6-2014 11:02ᅡᅠᅡᅠKarmaᅡᅠᅡᅠ+4
ᅡᅠᅡᅠlucky you
user34
ᅡᅠᅡᅠ26-6-2014 10:34ᅡᅠᅡᅠKarmaᅡᅠᅡᅠ+3
ᅡᅠᅡᅠgreat!
godspeed
ᅡᅠᅡᅠ26-6-2014 06:50ᅡᅠᅡᅠKarmaᅡᅠᅡᅠ+5
ᅡᅠᅡᅠthanks!

如何在类似于表的结构中分离信息?在

我尝试添加.replace(“\n”,“”),结果是一行中的所有信息:

thewᅡᅠᅡᅠ26-6-2014 11:02ᅡᅠᅡᅠKarmaᅡᅠᅡᅠ+4ᅡᅠᅡᅠlucky youuser34ᅡᅠᅡᅠ26-6-2014 10:34ᅡᅠᅡᅠKarmaᅡᅠᅡᅠ+3ᅡᅠᅡᅠgreat!godspeedᅡᅠᅡᅠ26-6-2014 06:50ᅡᅠᅡᅠKarmaᅡᅠᅡᅠ+5ᅡᅠᅡᅠthanks!

当我有五个字的时候。在

[<fieldset><legend><a href="misc.php?action=viewratings&amp;tid=50510&amp;pid=502926" title="View Rating Log">Recent Ratings</a></legend><br/>
<table border="0" cellpadding="0" cellspacing="0">
<tr><td><a href="viewpro.php?uid=21445" target="_blank">thew</a></td>
<td>  26-6-2014 11:02</td><td>  Karma</td><td>  <b>+4</b></td>
<td>  lucky you</td></tr>
<tr><td><a href="viewpro.php?uid=43867" target="_blank">user34</a></td>
<td>  26-6-2014 10:34</td><td>  Karma</td><td>  <b>+3</b></td>
<td>  great!</td></tr>
<tr><td><a href="viewpro.php?uid=68709" target="_blank">godspeed</a></td>
<td>  26-6-2014 06:50</td><td>  Karma</td><td>  <b>+5</b></td>
<td>  thanks!</td></tr>
</table>
</fieldset>]

下面的答案在我打印输出时有效,但当我将其写入csv时就不行了。我的代码摘录:

five = soup.findAll("fieldset")

karmas = []

for i in five:
    for j in  i.findAll('td'):
        somevar = j.text
        print somevar           
        karmas.append(somevar.strip())

        csvfile = open('test.csv', 'ab')    
        writer = csv.writer(csvfile)

        for karma in zip(karmas):
                writer.writerow([karma])

        csvfile.close()

#output print somevar

thew
  26-6-2014 11:02
  Karma
  +4
  lucky you
user34
  26-6-2014 10:34
  Karma
  +3
  great!
godspeed
  26-6-2014 06:50
  Karma
  +5
  thanks!

# output in csv

thew

Tags: csvinyoutrtdphphrefkarma
1条回答
网友
1楼 · 发布于 2024-09-26 22:54:12

在字段集中使用soup.findAll("tr"),而不是soup.findAll("fieldset")

html=''' <fieldset><legend><a href="misc.php?action=viewratings&amp;tid=50510&amp;pid=502926" title="View Rating Log">Recent Ratings</a></legend><br/>
<table border="0" cellpadding="0" cellspacing="0">
<tr><td><a href="viewpro.php?uid=21445" target="_blank">thew</a></td>
<td>  26-6-2014 11:02</td><td>  Karma</td><td>  <b>+4</b></td>
<td>  lucky you</td></tr>
<tr><td><a href="viewpro.php?uid=43867" target="_blank">user34</a></td>
<td>  26-6-2014 10:34</td><td>  Karma</td><td>  <b>+3</b></td>
<td>  great!</td></tr>
<tr><td><a href="viewpro.php?uid=68709" target="_blank">godspeed</a></td>
<td>  26-6-2014 06:50</td><td>  Karma</td><td>  <b>+5</b></td>
<td>  thanks!</td></tr>
</table>
</fieldset> '''

from bs4 import BeautifulSoup
import csv

soup=BeautifulSoup(html)
five = soup.findAll("tr")
for i in five:
    with open('some.csv', 'a') as f:
        writer = csv.writer(f)

        writer.writerow([j.text for j in  i.findAll('td')])

#output

thew   26-6-2014 11:02   Karma  +4   lucky you
user34   26-6-2014 10:34   Karma  +3   great!
godspeed   26-6-2014 06:50   Karma  +5   thanks!

相关问题 更多 >

    热门问题