在GoodReads上搜寻书籍和引用。
scrapereads的Python项目详细描述
刮擦好读物
从中获取数据的Python包goodreads.com网站网站。作者、书籍和引文都可以提取。在
项目是在一个使用GPT2模型生成诗歌的深度学习项目中进行的。
安装
从PyPi安装scrapreads包:
pip install scrapereads
或来自GitHub:
^{pr2}$入门
GoodReads API公司
您可以从API中搜索Author
、Book
或{
fromscrapereadsimportGoodReads# Connect to the APIgoodreads=GoodReads()# Search for an author, from it's ID.AUTHOR_ID=3389author=goodreads.search_author(AUTHOR_ID)# Search for a bookBOOK_ID=3048970book=goodreads.search_book(AUTHOR_ID,BOOK_ID)# Look for the 10 first books (set it to ``top_k=None`` to turn it off)books=goodreads.search_books(AUTHOR_ID,top_k=10)# ...Or quotesquotes=goodreads.search_quotes(AUTHOR_ID,top_k=5)
引号是由文本组成的,但是可以添加可选信息(如喜欢的数量、标签, 参考等)
quotes=goodreads.search_quotes(AUTHOR_ID,top_k=5)forquoteinquotes:print(quote)print()
输出:
"Books are a uniquely portable magic."- Stephen King, from "On Writing: A Memoir Of The Craft" Likes: 16225, Tags: books, magic, reading"If you don't have time to read, you don't have the time (or the tools) to write. Simple as that."- Stephen King Likes: 12565, Tags: reading, writing"Get busy living or get busy dying."- Stephen King, from "Different Seasons" Likes: 9014, Tags: life"Books are the perfect entertainment: no commercials, no batteries, hours of enjoyment for each dollar spent. What I wonder is why everybody doesn't carry a book around for those inevitable dead spots in life."- Stephen King Likes: 8667, Tags: books"When his life was ruined, his family killed, his farm destroyed, Job knelt down on the ground and yelled up to the heavens, "Why god? Why me?" and the thundering voice of God answered, There's just something about you that pisses me off."- Stephen King, from "Storm Of The Century" Likes: 7686, Tags: god, humor, religion
结构
包装划分如下:
- 作者
- 书,继承自作者
- 引用,从书中继承
检索数据
一旦您有了这些对象之一,您还可以通过它们的方法直接访问数据:
author=goodreads.search_author(AUTHOR_ID)books=author.get_books()quotes=author.get_quotes()# Idem from an bookbook=goodreads.search_book(AUTHOR_ID,BOOK_ID)quotes=book.get_quotes()
此外,还可以从子对象中检索父对象:
author=goodreads.search_author(AUTHOR_ID)quotes=author.get_quotes(top_k=10)quote=quotes[0]# Access to parent classesbook=quote.get_book()author=quote.get_author()
您可以从中获取描述、链接和其他详细信息:
author=goodreads.search_author(AUTHOR_ID)info=author.get_info()# description of the author (genre, description, references etc.)
最后,您可以使用以下方法从某个作者检索类似的作者:
author=goodreads.search_author(AUTHOR_ID)authors=author.get_similar_authors(top_k=5)
保存并导出
您可以以JSON格式保存数据(如果需要,可以将其编码为ASCII)。在
author=goodreads.search_author(AUTHOR_ID)author_data=author.to_json(encode='ascii')# Idem for book and quote
- 项目
标签: