gor的r-sdk的python包装器,带有pandas序列化。

gorpyter的Python项目详细描述



高比特

  1. 封装了gor查询api的r-sdk的python包(带有pandas序列化)。
  2. 安装了python&r sdk依赖项的jupyterlab(python&r内核)的docker映像。
  • gp.query()动态地将r tibble数据帧转换为pandas数据帧。
  • rpy2包用于在python中包装gorr库函数。
  • jupyter r内核安装了tidyverse(棘手的安装)和gorr(非cran)软件包。
  • Docker镜像还包括OpenJDK1.8,以防用户安装Spark。
tldr
$ docker pull hashrocketsyntax/gorpyter:augustus
$ docker run -it -p 8888:8888 hashrocketsyntax/gorpyter:augustus

请阅读其余文档以了解完整的安装和使用方法。


一。Docker环境

本地笔记本文件夹

在本地计算机的桌面上创建一个文件夹,用于存储笔记本。保持pwd的输出方便,因为我们将使用下面的volumesyml键。你可以随意命名文件夹。我们称之为“笔记本”

$ cd ~/Desktop
$ mkdir notebooks
$ cd notebooks
$ pwd
'<PATH_TO_YOUR_NEW_FOLDER>'
Docker硬件资源

为了将大(1m行)r数据帧转换为pandas数据帧,Docker环境可能需要访问更多内存。memory是下面最重要的设置。

  • 停止任何正在运行的容器。
  • 单击系统托盘中的Docker图标。
  • 导航到“首选项”。
  • 根据您的Docker版本,单击“资源”或“高级”选项卡。
  • 将资源设置为以下值:
  • 单击“应用并重新启动”
CPU:              <keep default, should already be at 4 CPU>
Memory:           <half of what's available in 'About this Mac', 4 or 8 GB>
Swap:             <set to maximum, 4GB>
Disk Image Size:  <keep default>
Docker图像和清单

拉入这个预构建的映像,其中包含一个jupyter环境,该环境配备了r和python 3.7内核以及gorpyter依赖项。它建立在Jupyter最新的DockerHub图片之上。如果你想自定义你自己的图像,请参阅第3节。

$ docker pull hashrocketsyntax/gorpyter:augustus

创建一个名为docker-compose.yml的文件,并用文本编辑器(nano或submitext)打开它。

$ touch docker-compose.yml
$ nano docker-compose.yml

将下面的文本粘贴到该文件中。在volumes键下,从上面粘贴pwd的输出。

#docker-compose.yml
version: "3"
services:
  jupyter:
    image: "hashrocketsyntax/gorpyter:augustus"
    ports:
      - "8888:8888"
    volumes:
      - <PATH_TO_YOUR_NEW_FOLDER>:/usr/local/share/man/user_notebooks

确保您与.yml文件位于同一目录中,并按此方式运行。

$ docker-compose up

从控制台输出中,获取类似于http://127.0.0.1:8888/?token=<YOUR_TOKEN>的url并将其粘贴到浏览器中。


2.Jupyterlab笔记本

教程笔记本

docker环境附带了python和r sdk的示例笔记本。

如果您在预构建的Docker环境中运行这些笔记本,请知道只有user_notebooks文件夹中的文件才会被保存/持久化。实际上,您将无法向user_notebooks目录之外的文件添加/删除/复制/删除/保存更改。

#python_sdk_gorpyter.ipynb


pip install gorpyter --upgrade
import gorpyter as gp


gp.setup()
"""
  CHECKLIST
  =============================================

	✓ -- The version of your Jupyter Python environment is '3.7.3'.
	✓ -- The path of the Jupyter R enviroment being accessed by `rpy2` is '/opt/conda/lib/R'.

	✓ -- The Python dependencies of `gorpyter` are installed.
	✓ -- The `tidyverse` R library is installed in your R environment.
	✓ -- The `gorr` R library is installed in your R environment.
	✓ -- Python was able to successfully load `gorr` as a module via `rpy2`.

  =============================================
"""


api_key = "<YOUR_API_KEY>"
project = "<YOUR_PROJECT_NAME>"
conn = gp.connect(api_key, project)


gp.query("<YOUR_GOR_QUERY>", conn)
"""
	nor example -- "nor ./"
	gor example -- "gor -p chr10 #dbsnp# | TOP 100"

	Tested successfully on a 1,000,000 row result.

	Despite being run in Python, interupting the client's execution 
  of this function in `ctrl+c` manner is surprisingly still gracefully 
  intercepted by the gorr R library, and thus the server-side 
  execution of the query is simultaneously cleaned up.
"""
python包
pip install gorpyter --upgrade
  • conda install将{em1}$not工作,因为此包尚未发布到conda forge。
  • pip show gorpyter的输出相比,这里可以看到最新版本号https://pypi.org/project/gorpyter
  • 安装gorpyter还将安装这些依赖项:rpy2>;=3.0.5,tzlocal>;=2.0.0,pandas>;=0.25.0,numpy>;=1.17.0。
GOR查询语言

http://docs.wuxinextcode.com/gor/basicGORqueries.html


三。可选--自定义Docker图像

要基于jupyter/datascience-notebook:latest创建自己的Docker映像,请按照以下说明操作。

将这些文件放在同一目录中:

  • 文档文件
  • python_sdk.ipynb
  • r_sdk.ipynb

从该目录中运行docker build -t your-image-name:your-new-tag .

以下是dockerfile中包含的命令。

#Dockerfile
FROM jupyter/datascience-notebook:latest
MAINTAINER layne sadler <lsadler@wuxinextcode.com>


# ====== PRE SUDO ======
ENV JUPYTER_ENABLE_LAB=yes

# If you run pip as sudo it continually prints errors.
# Tidyverse is already installed, and installing gorpyter installs the correct versions of other Python dependencies.
RUN pip install gorpyter
RUN Rscript -e "install.packages('https://cdn.nextcode.com/public/libraries/gorr_0.2.5.tar.gz', repos = NULL, type = 'source')"
ENV R_HOME=/opt/conda/lib/R

# https://refspecs.linuxfoundation.org/FHS_3.0/fhs/ch04s09.html
# Looks like /usr/local/man is symlinking all R/W toward /usr/local/share/man instead
COPY python_sdk.ipynb /usr/local/share/man
COPY r_sdk.ipynb /usr/local/share/man
ENV NOTEBOOK_DIR=/usr/local/share/man
WORKDIR /usr/local/share/man


# ====== SUDO ======
USER root

# Spark requires Java 8.
RUN sudo apt-get update && sudo apt-get install openjdk-8-jdk -y
ENV JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java

# If you COPY files into the same VOLUME that you mount in docker-compose.yml, then those files will disappear at runtime.
# `user_notebooks/` is the folder that gets mapped as a VOLUME to the user's local folder during runtime.
RUN mkdir /usr/local/share/man/user_notebooks

欢迎加入QQ群-->: 979659372 Python中文网_新手群

推荐PyPI第三方库


热门话题
Java Selenium测试无符号扩展   Java JDBC DB2“CLI0129E无更多句柄”   java无法使SeleniumWebDriver单击复选框   wordpress Http请求帖子只有在通过Java应用程序发送时才起作用   java在while循环中构建Flux对象   java如何使用EL从地图中检索所有值   java在ImagePanel上绘制形状   java为同一服务器上的多个应用程序提供相同的JMX Mbean类   java如何基于s显示文本字段:选择struts 2   单个Java变量可以接受原语或对象数组吗?   java如何在文件中找到最大字节?   java Springboot可执行文件JAR错误:由于缺少EmbeddedServletContainerFactory bean,无法启动EmbeddedWebApplicationContext   JavaSpringJMX级联/联合   eclipse外部的命令行JUnit测试   java设置放大gral plot会导致空plot   java Android小部件   java如何在不同片段中更改具有不同样式的动作栏标题?