用Python删除美联储演讲日历

2024-09-29 21:27:46 发布

您现在位置:Python中文网/ 问答频道 /正文

我想为即将到来的演讲整理一下美联储的日程表。我已经为欧洲央行成功地做到了这一点,但美联储似乎是另一头野兽。最终目标是拥有一个包含列日期、时间、董事会成员、主题的数据框架

为了说明我的问题,我将举例说明我是如何为欧洲央行做到这一点的。1) 我列出了未来几周的日期,并在for循环中将它们与欧洲央行网站表中的日期进行匹配(table=html代码)。2) 对于每一场比赛,我都会拿出感兴趣的变量+进行一些格式化等,这里没有显示

我不知道该如何做类似于美联储html的事情。当你检查的时候,大部分都是胡言乱语,我认为最好用图片来说明

用于刮除ECB的代码

for day in dates:
            #find the title of each entry in the date
           supertable=table.find_all("dt",text=day)
           # print(supertable)
           for i in range(len(supertable)):
                   #for each date-entry, find description
                   subtable=supertable[i].findNext("dd")
                   print(supertable[i])
                   print(subtable)
                   bm=subtable.find("span",class_="boardMember")
                   ti=subtable.find("span",class_="time")
                   ev=subtable.find("span",class_="event")

ECB的html(具体日期示例)https://www.ecb.europa.eu/press/calendars/weekly/html/index.en.html

<div id="ecb-content-col" >
                    <main>
                        <h1>Weekly schedule of public speaking engagements and other activities</h1>
<h3>Friday, 17 July 2020 - Sunday, 26 July 2020</h3>
<dl class="ecb-basicList">
<dt >Monday, 20 Jul 2020</dt>
<dd>
<span class="event"><span class="label">Event:</span>Euro area monthly balance of payments (Dataset: BP6)</span>
<span class="time"><span class="label">Time:</span>10:00 CET</span>
<span class="infoWeb"><span class="label">Info website:</span><a class="arrow" href="https://www.ecb.europa.eu/press/pr/stats/bop/html/index.en.html" target="_self">https://www.ecb.europa.eu/press/pr/stats/bop/html/index.en.html</a></span>
<span class="lastModified">Last modified: 20 July 2020, 11:05 CET</span>
</dd>
<dt >Monday, 20 Jul 2020</dt>
<dd>
<span class="boardMember"><span class="label">Board member:</span>Luis de Guindos</span>
<span class="event"><span class="label">Event:</span>Participation by Mr de Guindos in the panel &quot;La respuesta europea frente a la crisis&quot; organised by Universidad Complutense de Madrid as part of the Cursos de verano de El Escorial</span>
<span class="time"><span class="label">Time:</span>10:00 CET</span>
<span class="venue"><span class="label">Venue:</span>Real Colegio Universitario Mar&iacute;a Cristina. Paseo de los Alamillos, 2, 28200 San Lorenzo de El Escorial, Madrid, Spain</span>
<span class="contact"><span class="label">Contact:</span>Esther Tejedor - ECB Global Media Relations - Tel: +49 69 1344 95596 - Mob: +49 172 5171280</span>
<span class="email"><span class="label">E-mail:</span><a class="mail" href="mailto:esther.tejedor@ecb.europa.eu">esther.tejedor@ecb.europa.eu</a></span>
<span class="infoWeb"><span class="label">Info website:</span><a class="external" href="www.pp.es" target="_blank">www.pp.es</a></span>
<span class="text"><span class="label">Text:</span>No text will be made available.</span>
<span class="notes"><span class="label">Notes:</span>The event will be streamed in Spanish via the above-mentioned link.</span>
<span class="lastModified">Last modified: 20 July 2020, 11:05 CET</span>
</dd>
</dl>
<script type="text/javascript">
                                var currentContentsUrl = "/press/calendars/weekly/html/index_content.en.html";
                            </script>


                    </main>
                        
                </div>

FED htmlhttps://www.federalreserve.gov/newsevents.htm(在“日历”下)

enter image description here


Tags: theinhtmlwwwdtdefindlabel
1条回答
网友
1楼 · 发布于 2024-09-29 21:27:46

Fed日历页面是使用javascript动态加载的,因此需要不同的方法。使用浏览器中的“开发人员”选项卡,可以看到指向实际包含数据的页面的链接。一旦您获得了该链接和对该链接的请求,事情就会简单得多:

import requests
import json
import pandas as pd

cookies = {
    'BIGipServerwww.federalreserve.gov_hsts.app~www.federalreserve.gov_hsts_pool': '!XzbhBUzoOQRgHRNSiGDasURiAFpsPA28LjvywchJo0mMdcFUyd/2zqN601BqfWI2JmSmmNuETixO1A==',
}

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:81.0) Gecko/20100101 Firefox/81.0',
    'Accept': 'application/json, text/plain, */*',
    'Accept-Language': 'en-US,en;q=0.5',
    'Connection': 'keep-alive',
    'Referer': 'https://www.federalreserve.gov/newsevents/calendar.htm',
    'Cache-Control': 'max-age=0',
}

response = requests.get('https://www.federalreserve.gov/json/calendar.json', headers=headers, cookies=cookies)
cal = json.loads(response.text)
pd.DataFrame(cal['events'])

输出就是您要查找的表。您可能需要将其清理一点,删除不相关的列等,以使其达到预期的最终形状

相关问题 更多 >

    热门问题