从smg文件Beautiful Soup和Python中提取身体标签

2024-10-04 01:35:31 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个sgm文件,格式如下:

<REUTERS TOPICS="NO" LEWISSPLIT="TRAIN" CGISPLIT="TRAINING-SET" OLDID="16321" NEWID="1001">
<DATE> 3-MAR-1987 09:18:21.26</DATE>
<TOPICS></TOPICS>
<PLACES><D>usa</D><D>ussr</D></PLACES>
<PEOPLE></PEOPLE>
<ORGS></ORGS>
<EXCHANGES></EXCHANGES>
<COMPANIES></COMPANIES>
<UNKNOWN> 
&#5;&#5;&#5;G T
&#22;&#22;&#1;f0288&#31;reute
d f BC-SANDOZ-PLANS-WEEDKILL   03-03 0095</UNKNOWN>
<TEXT>&#2;
<TITLE>SANDOZ PLANS WEEDKILLER JOINT VENTURE IN USSR</TITLE>
<DATELINE>    BASLE, March 3 - </DATELINE><BODY>Sandoz AG said it planned a joint venture
to produce herbicides in the Soviet Union.
    The company said it had signed a letter of intent with the
Soviet Ministry of Fertiliser Production to form the first
foreign joint venture the ministry had undertaken since the
Soviet Union allowed Western firms to enter into joint ventures
two months ago.
    The ministry and Sandoz will each have a 50 pct stake, but
a company spokeswoman was unable to give details of the size of
investment or planned output.
 Reuter
&#3;</BODY></TEXT>
</REUTERS>

同一个文件中有1000条记录具有根节点返回。我想从每个记录中提取body标签并对其做一些处理,但是,我做不到。下面是我的代码

^{pr2}$

问题是for循环不打印body标记的内容,而是打印记录本身。在


Tags: 文件ofthetodate记录peopleexchanges
1条回答
网友
1楼 · 发布于 2024-10-04 01:35:31

正如我在评论中所说,出于未知的(对我来说)原因,您不应该将标记命名为body。在

因此,第一步:将body标记名替换为content

<REUTERS TOPICS="NO" LEWISSPLIT="TRAIN" CGISPLIT="TRAINING-SET" OLDID="16321" NEWID="1001">
<DATE> 3-MAR-1987 09:18:21.26</DATE>
<TOPICS></TOPICS>
<PLACES><D>usa</D><D>ussr</D></PLACES>
<PEOPLE></PEOPLE>
<ORGS></ORGS>
<EXCHANGES></EXCHANGES>
<COMPANIES></COMPANIES>
<UNKNOWN> 
&#5;&#5;&#5;G T
&#22;&#22;&#1;f0288&#31;reute
d f BC-SANDOZ-PLANS-WEEDKILL   03-03 0095</UNKNOWN>
<TEXT>&#2;
<TITLE>SANDOZ PLANS WEEDKILLER JOINT VENTURE IN USSR</TITLE>
<DATELINE>    BASLE, March 3 - </DATELINE><CONTENT>Sandoz AG said it planned a joint venture
to produce herbicides in the Soviet Union.
    The company said it had signed a letter of intent with the
Soviet Ministry of Fertiliser Production to form the first
foreign joint venture the ministry had undertaken since the
Soviet Union allowed Western firms to enter into joint ventures
two months ago.
    The ministry and Sandoz will each have a 50 pct stake, but
a company spokeswoman was unable to give details of the size of
investment or planned output.
 Reuter
&#3;</CONTENT></TEXT>
</REUTERS>

代码如下:

^{pr2}$

相关问题 更多 >