<p>这里我们使用<code>requests</code>、<code>BeautifulSoup</code>和<code>pandas</code>:</p>
<pre><code>import requests
from bs4 import BeautifulSoup
import pandas as pd
url = 'https://www.programmableweb.com/category/all/apis?deadpool=1&page='
num = int(input('How Many Page to Parse?> '))
print('please wait....')
name = []
desc = []
cat = []
sub = []
for i in range(0, num):
r = requests.get(f"{url}{i}")
soup = BeautifulSoup(r.text, 'html.parser')
for item1 in soup.findAll('td', attrs={'class': 'views-field views-field-title col-md-3'}):
name.append(item1.text)
for item2 in soup.findAll('td', attrs={'class': 'views-field views-field-search-api-excerpt views-field-field-api-description hidden-xs visible-md visible-sm col-md-8'}):
desc.append(item2.text)
for item3 in soup.findAll('td', attrs={'class': 'views-field views-field-field-article-primary-category'}):
cat.append(item3.text)
for item4 in soup.findAll('td', attrs={'class': 'views-field views-field-created'}):
sub.append(item4.text)
result = []
for item in zip(name, desc, cat, sub):
result.append(item)
df = pd.DataFrame(
result, columns=['API Name', 'Description', 'Category', 'Submitted'])
df.to_csv('output.csv')
print('Task Completed, Result saved to output.csv file.')
</code></pre>
<p>结果可以在线查看:<a href="https://sheet.zoho.com/sheet/editor.do?doc=886bbe3d1c94a844b456f19ed051845db227f822eb0dd237fe8fa6a7529ed5707f42235af0d7c9e303b85a8def35465bd922dd454afc384e26085db3d391d38c" rel="nofollow noreferrer">Check Here</a></p>
<p>输出简单:</p>
<p><a href="https://i.ibb.co/CHz0dmD/Capture.png" rel="nofollow noreferrer"><img src="https://i.ibb.co/CHz0dmD/Capture.png" alt="enter image description here"/></a></p>
<p>现在进行<code>href</code>解析:</p>
<pre><code>import requests
from bs4 import BeautifulSoup
import pandas as pd
url = 'https://www.programmableweb.com/category/all/apis?deadpool=0&page='
num = int(input('How Many Page to Parse?> '))
print('please wait....')
links = []
for i in range(0, num):
r = requests.get(f"{url}{i}")
soup = BeautifulSoup(r.text, 'html.parser')
for link in soup.findAll('td', attrs={'class': 'views-field views-field-title col-md-3'}):
for href in link.findAll('a'):
result = 'https://www.programmableweb.com'+href.get('href')
links.append(result)
spans = []
for link in links:
r = requests.get(link)
soup = soup = BeautifulSoup(r.text, 'html.parser')
span = [span.text for span in soup.select('div.field span')]
spans.append(span)
data = []
for item in spans:
data.append(item)
df = pd.DataFrame(data)
df.to_csv('data.csv')
print('Task Completed, Result saved to data.csv file.')
</code></pre>
<p>在线检查结果:<a href="https://sheet.zoho.com/sheet/editor.do?doc=8059035d83f7549efc8d1e42de7adb3b50c56978993e7379e34d6137213248a05ad614759e69f95eabd37cf3be45dcacd5eec2e45c58ce6fa96e73acf48e76c6" rel="nofollow noreferrer">Here</a></p>
<p>示例视图如下:</p>
<p><a href="https://i.stack.imgur.com/OThba.png" rel="nofollow noreferrer"><img src="https://i.stack.imgur.com/OThba.png" alt="enter image description here"/></a></p>
<p>如果您希望将这2<code>csv</code>个文件放在一起,那么下面是代码:</p>
<pre><code>import pandas as pd
a = pd.read_csv("output.csv")
b = pd.read_csv("data.csv")
merged = a.merge(b)
merged.to_csv("final.csv", index=False)
</code></pre>
<p>联机结果:<a href="https://sheet.zoho.com/sheet/editor.do?doc=041e1cfa23ee5a62418b155925380d179f1934cbc78c3c9cfe8a089da65dec3175b8cad03a09cc8d5b330dcea65895657872c513b23346264b15bafff42004ba" rel="nofollow noreferrer">Here</a></p>