回答此问题可获得 20 贡献值,回答如果被采纳可获得 50 分。
<pre><code><div class="members_box_second">
<div class="members_box0">
<p>1</p>
</div>
<div class="members_box1">
<p class="clear"><b>Name:</b><span>Mr.Jagadhesan.S</span></p>
<p class="clear"><b>Designation:</b><span>Proprietor</span></p>
<p class="clear"><b>CODISSIA - Designation:</b><span>(Founder President, CODISSIA)</span></p>
<p class="clear"><b>Name of the Industry:</b><span>Govardhana Engineering Industries</span></p>
<p class="clear"><b>Specification:</b><span>LIFE</span></p>
<p class="clear"><b>Date of Admission:</b><span>19.12.1969</span></p>
</div>
<div class="members_box2">
<p>Ukkadam South</p>
<p class="clear"><b>Phone:</b><span>2320085, 2320067</span></p>
<p class="clear"><b>Email:</b><span><a href="mailto:jagadhesan@infognana.com">jagadhesan@infognana.com</a></span></p>
</div>
</div>
<div class="members_box">
<div class="members_box0">
<p>2</p>
</div>
<div class="members_box1">
<p class="clear"><b>Name:</b><span>Mr.Somasundaram.A</span></p>
<p class="clear"><b>Designation:</b><span>Proprietor</span></p>
<p class="clear"><b>Name of the Industry:</b><span>Everest Engineering Works</span></p>
<p class="clear"><b>Specification:</b><span>LIFE</span></p>
<p class="clear"><b>Date of Admission:</b><span>19.12.1969</span></p>
</div>
<div class="members_box2">
<p>Alagar Nivas, 284 NSR Road</p>
<p class="clear"><b>Phone:</b><span>2435674</span></p>
<h4>Factory Address</h4>
Coimbatore - 641 027
<p class="clear"><b>Phone:</b><span>2435674</span></p>
</div>
</div>
</code></pre>
<p>我有上面的结构。因此,我试图从<code>class</code><strong>成员框1</strong>和成员框2</strong>的<code>div</code>内的文本。在</p>
<p>我有下面的脚本,它只从成员\u box1获取数据</p>
^{pr2}$
<p>这就是我试图从两个盒子里得到数据的方法</p>
<pre><code>from bs4 import BeautifulSoup
import urllib2
import csv
import re
page = urllib2.urlopen("http://www.codissia.com/member/members-directory/?mode=paging&Keyword=&Type=&pg=1")
soup = BeautifulSoup(page.read())
eachbox2 = soup.findAll('div ', {'class':'members_box2'})
for eachuniversity in soup.findAll('div',{'class':'members_box1'}):
data = eachbox2 + [re.sub('\s+', ' ', text).strip().encode('utf8') for text in eachuniversity.find_all(text=True) if text.strip()]
print data
</code></pre>
<p>但我得到的结果和我对<strong>成员的结果是一样的</p>
<p><strong>更新</strong></p>
<p>我希望迭代的输出是这样的(在单行中)</p>
<pre><code>Name:,Mr.Srinivasan.N,Designation:,Proprietor,CODISSIA - Designation:,(Past President, CODISSIA),Name of the Industry:,Arian Soap Manufacturing Co,Specification:,LIFE,Date of Admission:,19.12.1969, "Parijaat" 26/1Shanker Mutt Road, Basavana Gudi,Phone:,2313861
</code></pre>
<p>但我得到的是:</p>
<pre><code>Name:,Mr.Srinivasan.N,Designation:,Proprietor,CODISSIA - Designation:,(Past President, CODISSIA),Name of the Industry:,Arian Soap Manufacturing Co,Specification:,LIFE,Date of Admission:,19.12.1969
"Parijaat" 26/1Shanker Mutt Road, Basavana Gudi,Phone:,2313861
</code></pre>