gor的r-sdk的python包装器,带有pandas序列化。
gorpyter的Python项目详细描述
高比特
- 封装了gor查询api的r-sdk的python包(带有pandas序列化)。
- 安装了python&r sdk依赖项的jupyterlab(python&r内核)的docker映像。
gp.query()
动态地将r tibble数据帧转换为pandas数据帧。- rpy2包用于在python中包装gorr库函数。
- jupyter r内核安装了tidyverse(棘手的安装)和gorr(非cran)软件包。
- Docker镜像还包括OpenJDK1.8,以防用户安装Spark。
tldr
$ docker pull hashrocketsyntax/gorpyter:augustus
$ docker run -it -p 8888:8888 hashrocketsyntax/gorpyter:augustus
请阅读其余文档以了解完整的安装和使用方法。
一。Docker环境
本地笔记本文件夹
在本地计算机的桌面上创建一个文件夹,用于存储笔记本。保持pwd
的输出方便,因为我们将使用下面的volumes
yml键。你可以随意命名文件夹。我们称之为“笔记本”
$ cd ~/Desktop
$ mkdir notebooks
$ cd notebooks
$ pwd
'<PATH_TO_YOUR_NEW_FOLDER>'
Docker硬件资源
为了将大(1m行)r数据帧转换为pandas数据帧,Docker环境可能需要访问更多内存。memory
是下面最重要的设置。
- 停止任何正在运行的容器。
- 单击系统托盘中的Docker图标。
- 导航到“首选项”。
- 根据您的Docker版本,单击“资源”或“高级”选项卡。
- 将资源设置为以下值:
- 单击“应用并重新启动”
CPU: <keep default, should already be at 4 CPU>
Memory: <half of what's available in 'About this Mac', 4 or 8 GB>
Swap: <set to maximum, 4GB>
Disk Image Size: <keep default>
Docker图像和清单
拉入这个预构建的映像,其中包含一个jupyter环境,该环境配备了r和python 3.7内核以及gorpyter依赖项。它建立在Jupyter最新的DockerHub图片之上。如果你想自定义你自己的图像,请参阅第3节。
$ docker pull hashrocketsyntax/gorpyter:augustus
创建一个名为docker-compose.yml
的文件,并用文本编辑器(nano或submitext)打开它。
$ touch docker-compose.yml
$ nano docker-compose.yml
将下面的文本粘贴到该文件中。在volumes
键下,从上面粘贴pwd
的输出。
#docker-compose.yml
version: "3"
services:
jupyter:
image: "hashrocketsyntax/gorpyter:augustus"
ports:
- "8888:8888"
volumes:
- <PATH_TO_YOUR_NEW_FOLDER>:/usr/local/share/man/user_notebooks
确保您与.yml文件位于同一目录中,并按此方式运行。
$ docker-compose up
从控制台输出中,获取类似于http://127.0.0.1:8888/?token=<YOUR_TOKEN>
的url并将其粘贴到浏览器中。
2.Jupyterlab笔记本
教程笔记本
docker环境附带了python和r sdk的示例笔记本。
如果您在预构建的Docker环境中运行这些笔记本,请知道只有user_notebooks
文件夹中的文件才会被保存/持久化。实际上,您将无法向user_notebooks
目录之外的文件添加/删除/复制/删除/保存更改。
#python_sdk_gorpyter.ipynb
pip install gorpyter --upgrade
import gorpyter as gp
gp.setup()
"""
CHECKLIST
=============================================
✓ -- The version of your Jupyter Python environment is '3.7.3'.
✓ -- The path of the Jupyter R enviroment being accessed by `rpy2` is '/opt/conda/lib/R'.
✓ -- The Python dependencies of `gorpyter` are installed.
✓ -- The `tidyverse` R library is installed in your R environment.
✓ -- The `gorr` R library is installed in your R environment.
✓ -- Python was able to successfully load `gorr` as a module via `rpy2`.
=============================================
"""
api_key = "<YOUR_API_KEY>"
project = "<YOUR_PROJECT_NAME>"
conn = gp.connect(api_key, project)
gp.query("<YOUR_GOR_QUERY>", conn)
"""
nor example -- "nor ./"
gor example -- "gor -p chr10 #dbsnp# | TOP 100"
Tested successfully on a 1,000,000 row result.
Despite being run in Python, interupting the client's execution
of this function in `ctrl+c` manner is surprisingly still gracefully
intercepted by the gorr R library, and thus the server-side
execution of the query is simultaneously cleaned up.
"""
python包
pip install gorpyter --upgrade
conda install
将{em1}$not工作,因为此包尚未发布到conda forge。- 与
pip show gorpyter
的输出相比,这里可以看到最新版本号https://pypi.org/project/gorpyter
。 - 安装gorpyter还将安装这些依赖项:rpy2>;=3.0.5,tzlocal>;=2.0.0,pandas>;=0.25.0,numpy>;=1.17.0。
GOR查询语言
三。可选--自定义Docker图像
要基于jupyter/datascience-notebook:latest
创建自己的Docker映像,请按照以下说明操作。
将这些文件放在同一目录中:
- 文档文件
- python_sdk.ipynb
- r_sdk.ipynb
从该目录中运行docker build -t your-image-name:your-new-tag .
。
以下是dockerfile中包含的命令。
#Dockerfile
FROM jupyter/datascience-notebook:latest
MAINTAINER layne sadler <lsadler@wuxinextcode.com>
# ====== PRE SUDO ======
ENV JUPYTER_ENABLE_LAB=yes
# If you run pip as sudo it continually prints errors.
# Tidyverse is already installed, and installing gorpyter installs the correct versions of other Python dependencies.
RUN pip install gorpyter
RUN Rscript -e "install.packages('https://cdn.nextcode.com/public/libraries/gorr_0.2.5.tar.gz', repos = NULL, type = 'source')"
ENV R_HOME=/opt/conda/lib/R
# https://refspecs.linuxfoundation.org/FHS_3.0/fhs/ch04s09.html
# Looks like /usr/local/man is symlinking all R/W toward /usr/local/share/man instead
COPY python_sdk.ipynb /usr/local/share/man
COPY r_sdk.ipynb /usr/local/share/man
ENV NOTEBOOK_DIR=/usr/local/share/man
WORKDIR /usr/local/share/man
# ====== SUDO ======
USER root
# Spark requires Java 8.
RUN sudo apt-get update && sudo apt-get install openjdk-8-jdk -y
ENV JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java
# If you COPY files into the same VOLUME that you mount in docker-compose.yml, then those files will disappear at runtime.
# `user_notebooks/` is the folder that gets mapped as a VOLUME to the user's local folder during runtime.
RUN mkdir /usr/local/share/man/user_notebooks