BeautifulSoup访问DC bikesh中可用的自行车

2024-06-25 06:37:53 发布

您现在位置:Python中文网/ 问答频道 /正文

我对编程和python是新手,正在尝试访问DC bikeshare程序中给定站点的可用自行车数量。我相信最好的办法就是用美的。好消息是,这里的数据格式看起来很干净:https://www.capitalbikeshare.com/data/stations/bikeStations.xml

这是一个电台的例子:

<station>
    <id>1</id>
    <name>15th & S Eads St</name>
    <terminalName>31000</terminalName>
    <lastCommWithServer>1460217337648</lastCommWithServer>
    <lat>38.858662</lat>
    <long>-77.053199</long>
    <installed>true</installed>
    <locked>false</locked>
    <installDate>0</installDate>
    <removalDate/>
    <temporary>false</temporary>
    <public>true</public>
    <nbBikes>7</nbBikes>
    <nbEmptyDocks>8</nbEmptyDocks>
    <latestUpdateTime>1460192501598</latestUpdateTime>
</station>

我在寻找<nbBikes>值。我有一个python脚本的开始,它会显示前5个工作站的值(一旦我控制了它,我会处理如何选择我想要的工作站),但是它不会返回任何值。剧本如下:

# bikeShareParse.py - parses the capital bikeshare info page 


import bs4, requests

url = "https://www.capitalbikeshare.com/data/stations/bikeStations.xml"

res = requests.get(url)
res.raise_for_status()

#create the soup element from the file
soup = bs4.BeautifulSoup("res.text", "lxml")

# defines the part of the page we are looking for
nbikes = soup.select('#text')

#limits number of results for testing
numOpen = 5
for i in range(numOpen):
        print nbikes

我认为我的问题(除了不理解如何在堆栈溢出问题中正确格式化代码)是nbikes = soup.select('#text')的值不正确。然而,我似乎无法用任何东西代替“#text”来获得任何值,更不用说我想要的值了。你知道吗

我走的路对吗?如果是,我还缺什么?你知道吗

谢谢


Tags: thetexthttpscomfordatawwwres
1条回答
网友
1楼 · 发布于 2024-06-25 06:37:53

这个脚本创建了一个结构为[station\u ID,bikes\u remaining]的字典。它从以下内容开始修改:http://www.plotsofdots.com/archives/68

# from http://www.plotsofdots.com/archives/68


import xml.etree.ElementTree as ET
import urllib2

#we parse the data using urlib2 and xml
site='https://www.capitalbikeshare.com/data/stations/bikeStations.xml'
htm=urllib2.urlopen(site)
doc = ET.parse(htm)

#we get the root tag
root=doc.getroot()
root.tag

#we define empty lists for the empty bikes
sID=[]
embikes=[]
#we now use a for loop to extract the information we are interested in
for country in root.findall('station'):
    sID.append(country.find('id').text)
    embikes.append(int(country.find('nbBikes').text))

#this just tests that the process above works, can be commented out
#print embikes
#print sID

#use zip to create touples and then parse them into a dataframe
prov=zip(sID,embikes)

print prov[0]

相关问题 更多 >