非官方图书馆获取个人建议的信息
pocket-recommendations的Python项目详细描述
袖珍推荐
非官方图书馆获取个人建议的信息
使用
获取一个公共Pocket配置文件,如this one。使用Python或其他任何工具下载其HTML:
$ curl "https://getpocket.com/@honzajavorek" > getpocket-com-honzajavorek.html
在Python程序中,将HTML作为字符串准备好:
^{pr2}$现在您可以使用此库来解析HTML:
>>>importpocket_recommendations>>>items=pocket_recommendations.parse(html_text)>>>len(items)50
每个项目看起来如下:
>>>frompprintimportpprint>>>pprint(items[0]){'pocket_comment':'Šablona na váš úspěšný HackerNews post','pocket_recommended_at':None,'pocket_url':'https://getpocket.com/redirect?&url=https%3A%2F%2Fsaagarjha.com%2Fblog%2F2020%2F05%2F10%2Fwhy-we-at-famous-company-switched-to-hyped-technology%2F&h=eff6d8cac22c9b475463d037037b0efdcf44b762c9b0b7913de2104cab5fa67d','title':'Why we at $FAMOUS_COMPANY Switched to $HYPED_TECHNOLOGY','url':'https://saagarjha.com/blog/2020/05/10/why-we-at-famous-company-switched-to-hyped-technology/'}
即使Pocket使用HTTP链接进行重定向,库也强制使用HTTPS。在
推荐日期
您可以指定下载HTML的日期,以获取发布建议的相对日期:
>>>fromdatetimeimportdate>>>items=pocket_recommendations.parse(html_text,today=date(2020,6,3))>>>pprint(items[0]){'pocket_comment':'Šablona na váš úspěšný HackerNews post','pocket_recommended_at':datetime.date(2020,6,2),'pocket_url':'https://getpocket.com/redirect?&url=https%3A%2F%2Fsaagarjha.com%2Fblog%2F2020%2F05%2F10%2Fwhy-we-at-famous-company-switched-to-hyped-technology%2F&h=eff6d8cac22c9b475463d037037b0efdcf44b762c9b0b7913de2104cab5fa67d','title':'Why we at $FAMOUS_COMPANY Switched to $HYPED_TECHNOLOGY','url':'https://saagarjha.com/blog/2020/05/10/why-we-at-famous-company-switched-to-hyped-technology/'}
缺少注释
如果没有注释,则设置为None
:
>>>fromdatetimeimportdate>>>items=pocket_recommendations.parse(html_text)>>>pprint(items[15]){'pocket_comment':None,'pocket_recommended_at':None,'pocket_url':'https://getpocket.com/redirect?&url=https%3A%2F%2Falmad.blog%2Fessays%2Fwhat-is-employment%2F&h=ef4216c9df41763fa900b12815a280bf790f50960468a45ebed5f3682156dc6a','title':"We Don't Know What an Employment Is",'url':'https://almad.blog/essays/what-is-employment/'}
误解的HTML实体
如果标题包含一些误解的HTML实体,库会处理它:
>>>fromdatetimeimportdate>>>items=pocket_recommendations.parse(html_text)>>>pprint(items[15])# title: We Don't Know What an Employment Is{'pocket_comment':None,'pocket_recommended_at':None,'pocket_url':'https://getpocket.com/redirect?&url=https%3A%2F%2Falmad.blog%2Fessays%2Fwhat-is-employment%2F&h=ef4216c9df41763fa900b12815a280bf790f50960468a45ebed5f3682156dc6a','title':"We Don't Know What an Employment Is",'url':'https://almad.blog/essays/what-is-employment/'}
- 项目
标签: