Python hockey-scraper包_程序模块 - PyPI

这个软件包的目的是让人们从各自的api和网站上获取全国曲棍球联盟（nhl）和全国女子曲棍球联盟（nwhl）的原始数据。

hockey-scraper的Python项目详细描述

https://badge.fury.io/py/hockey-scraper.svg

曲棍球刮板

目的

这个软件包的目的是让人们能够同时收集nhl和nwhl的数据。对于nhl来说，一场戏一场戏并将所有季前赛、常规赛和季后赛的国家曲棍球联盟（NHL）API和网站数据转移从2007-2008赛季开始的比赛。对于nwhl来说，用户可以从他们的api和网站上获取一个一个的播放数据。自2015-2016赛季以来的所有季前赛、常规赛和季后赛。

先决条件

为此，您需要安装python。这应该对Python2.7和3都有效（我建议至少是3.6.0版，但早期版本应该可以）。

如果您的机器上没有安装python，我建议您通过anaconda distribution安装它。Anaconda预装了一堆库，这样启动起来更容易。

安装

要安装，只需打开终端并键入：

pip install hockey_scraper

nhl使用

标准刮片功能

按季度级别收集数据：

import hockey_scraper

# Scrapes the 2015 & 2016 season with shifts and stores the data in a Csv file
hockey_scraper.scrape_seasons([2015, 2016], True)

# Scrapes the 2008 season without shifts and returns a dictionary containing the pbp Pandas DataFrame
scraped_data = hockey_scraper.scrape_seasons([2008], False, data_format='Pandas')

收集游戏列表：

import hockey_scraper

# Scrapes the first game of 2014, 2015, and 2016 seasons with shifts and stores the data in a Csv file
hockey_scraper.scrape_games([2014020001, 2015020001, 2016020001], True)

# Scrapes the first game of 2007, 2008, and 2009 seasons with shifts and returns a Dictionary with the Pandas DataFrames
scraped_data = hockey_scraper.scrape_games([2007020001, 2008020001, 2009020001], True, data_format='Pandas')

删除给定日期范围内的所有游戏：

import hockey_scraper

# Scrapes all games between 2016-10-10 and 2016-10-20 without shifts and stores the data in a Csv file
hockey_scraper.scrape_date_range('2016-10-10', '2016-10-20', False)

# Scrapes all games between 2015-1-1 and 2015-1-15 without shifts and returns a Dictionary with the pbp Pandas DataFrame
scraped_data = hockey_scraper.scrape_date_range('2015-1-1', '2015-1-15', False, data_format='Pandas')

将默认参数“data_format”设置为“pandas”返回的字典的结构如下：

{
  # Both of these are always included
  'pbp': pbp_df,
  'errors': scraping_errors,

  # This is only included when the argument 'if_scrape_shifts' is set equal to True
  'shifts': shifts_df
}

如果需要的话，也可以将擦掉的文件保存在单独的目录中。这使得我们可以更快地重新抓取游戏不需要找回它们。这是通过将关键字参数“docs\u dir”指定为true来完成的在主目录中创建、存储和查找。或者你可以提供你自己的目录存储（它必须预先存在）。

import hockey_scraper

# Create or try to refer to a directory in the home repository
# Will create a directory called 'hockey_scraper_data' in the home directory (if it doesn't exist)
hockey_scraper.scrape_seasons([2015, 2016], True, docs_dir=True)

# Path to the given directory
USER_PATH = "/...."

# Scrapes the 2015 & 2016 season with shifts and stores the data in a Csv file
# Also includes a path for an existing directory for the scraped files to be placed in or retrieved from.
hockey_scraper.scrape_seasons([2015, 2016], True, docs_dir=USER_PATH)

# Once could chose to re-scrape previously saved files by making the keyword argument rescrape=True
hockey_scraper.scrape_seasons([2015, 2016], True, docs_dir=USER_PATH, rescrape=True)

实时刮削

这里有一个简单的例子，一种方法来设置实时刮削。我强烈建议退房 this section如果你打算使用这个文档。

import hockey_scraper as hs


def to_csv(game):
    """
    Store each game DataFrame in a file

    :param game: LiveGame object

    :return: None
    """

    # If the game:
    # 1. Started - We recorded at least one event
    # 2. Not in Intermission
    # 3. Not Over
    if game.is_ongoing():
        # Get both DataFrames
        pbp_df = game.get_pbp()
        shifts_df = game.get_shifts()

        # Print the description of the last event
        print(game.game_id, "->", pbp_df.iloc[-1]['Description'])

        # Store in CSV files
        pbp_df.to_csv(f"../hockey_scraper_data/{game.game_id}_pbp.csv", sep=',')
        shifts_df.to_csv(f"../hockey_scraper_data/{game.game_id}_shifts.csv", sep=',')

if __name__ == "__main__":
    # B4 we start set the directory to store the files
    # You don't have to do this but I recommend it
    hs.live_scrape.set_docs_dir("../hockey_scraper_data")

    # Scrape the info for all the games on 2018-11-15
    games = hs.ScrapeLiveGames("2018-11-15", if_scrape_shifts=True, pause=20)

    # While all the games aren't finished
    while not games.finished():
        # Update for all the games currently being played
        games.update_live_games(sleep_next=True)

        # Go through every LiveGame object and apply some function
        # You can of course do whatever you want here.
        for game in games.live_games:
            to_csv(game)

NWHL使用

按季度级别收集数据：

import hockey_scraper

# Scrapes the 2015 & 2016 season and stores the data in a Csv file
hockey_scraper.nwhl.scrape_seasons([2015, 2016])

# Scrapes the 2008 season and returns a Pandas DataFrame containing the pbp
scraped_data = hockey_scraper.nwhl.scrape_seasons([2017], data_format='Pandas')

收集游戏列表：

import hockey_scraper

# Scrape some games and store the results in a Csv file
# Also saves the scraped pages
hockey_scraper.nwhl.scrape_games([14694271, 14814946, 14689491], docs_dir="...Path you specified")

删除给定日期范围内的所有游戏：

import hockey_scraper

# Scrapes all games between 2016-10-10 and 2017-01-01 and returns a Pandas DataFrame containing the pbp
hockey_scraper.nwhl.scrape_date_range('2016-10-10', '2017-01-01', data_format='pandas')

完整的文档可以在here中找到。

接触

如有任何问题或建议，请与我联系。对于任何错误或任何与代码相关的东西，请打开一个问题。否则你可以发邮件给我Harryshomer@gmail.com。

欢迎加入QQ群-->： 979659372

hockey-scraper 1.33

hockey-scraper的Python项目详细描述

曲棍球刮板

目的

先决条件

安装

nhl使用

NWHL使用

接触

推荐PyPI第三方库

baiduaip

nidhi-outlier

spdxtools

gaussian-binomial-distributions-tushcath

djangobootstrappagination

BMI-OpenGMS-Engine

venturelab

django-magiclink

astropyhealpix

suxinli-foo

predict-weather

yifeif-tensorflow-graphics

kvvmail

sqlalchemystubs

distribution-calculator

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

hockey-scraper 1.33

hockey-scraper的Python项目详细描述

曲棍球刮板

目的

先决条件

安装

nhl使用

NWHL使用

接触

推荐PyPI第三方库

baiduaip

nidhi-outlier

spdxtools

gaussian-binomial-distributions-tushcath

djangobootstrappagination

BMI-OpenGMS-Engine

venturelab

django-magiclink

astropyhealpix

suxinli-foo

predict-weather

yifeif-tensorflow-graphics

kvvmail

sqlalchemystubs

distribution-calculator

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签