bigquery datasetmanager是一个用于bigquery数据集的简单的基于文件的cli管理工具。
BigQuery-DatasetManager的Python项目详细描述
bigquery数据集管理器
bigquery datasetmanager是一个用于BigQuery Datasets的简单的基于文件的cli管理工具。
要求
- Python
- 第2、7、3、4、3.5、3.6节
安装
$ pip install BigQuery-DatasetManager
资源表示
数据集和表的资源表示见YAML format。
数据集
name:dataset1friendly_name:nulldescription:nulldefault_table_expiration_ms:nulllocation:USaccess_entries:-role:OWNERentity_type:specialGroupentity_id:projectOwners-role:WRITERentity_type:specialGroupentity_id:projectWriters-role:READERentity_type:specialGroupentity_id:projectReaders-role:OWNERentity_type:userByEmailentity_id:aaa@bbb.gserviceaccount.com-role:nullentity_type:viewentity_id:datasetId:view1projectId:project1tableId:table1labels:foo:bar
Key name | Value | Description | |
---|---|---|---|
dataset_id | str | ID of the dataset. | |
friendly_name | str | Title of the dataset. | |
description | str | Description of the dataset. | |
default_table_expiration_ms | int | Default expiration time for tables in the dataset. | |
location | str | Location in which the dataset is hosted. | |
access_entries | seq | Represents grant of an access role to an entity. | |
access_entries | role | str | Role granted to the entity. The following string values are supported:
It may also be ^{tt4}$ if the ^{tt5}$ is ^{tt6}$. |
entity_type | str | Type of entity being granted the role. One of
| |
entity_id | str/map | If the ^{tt5}$ is not ‘view’, the ^{tt13}$ is the ^{tt14}$ ID of the entity being granted the role. If the ^{tt5}$ is ‘view’, the ^{tt13}$ is a ^{tt17}$ representing the view from a different dataset to grant access to. | |
datasetId | str | ID of the dataset containing this table. (Specifies when ^{tt5}$ is ^{tt6}$.) | |
projectId | str | ID of the project containing this table. (Specifies when ^{tt5}$ is ^{tt6}$.) | |
tableId | str | ID of the table. (Specifies when ^{tt5}$ is ^{tt6}$.) | |
labels | map | Labels for the dataset. |
注意:有关密钥名称的详细信息,请参见the official documentation of BigQuery Datasets。
表格
table_id:table1friendly_name:nulldescription:nullexpires:nullpartitioning_type:nullview_use_legacy_sql:nullview_query:nullschema:-name:column1field_type:STRINGmode:REQUIREDdescription:nullfields:null-name:column2field_type:RECORDmode:NULLABLEdescription:nullfields:-name:column2_1field_type:STRINGmode:NULLABLEdescription:nullfields:null-name:column2_2field_type:INTEGERmode:NULLABLEdescription:nullfields:null-name:column2_3field_type:RECORDmode:REPEATEDdescription:nullfields:-name:column2_3_1field_type:BOOLEANmode:NULLABLEdescription:nullfields:nulllabels:foo:bar
table_id:view1friendly_name:nulldescription:nullexpires:nullpartitioning_type:nullview_use_legacy_sql:falseview_query:|select*from`project1.dataset1.table1`schema:nulllabels:null
Key name | Value | Description | |
---|---|---|---|
table_id | str | ID of the table. | |
friendly_name | str | Title of the table. | |
description | str | Description of the table. | |
expires | str | Datetime at which the table will be deleted. (ISO8601 format ^{tt24}$) | |
partitioning_type | str | Time partitioning of the table if it is partitioned. The only partitioning type that is currently supported is ^{tt25}$. | |
view_use_legacy_sql | bool | Specifies whether to use BigQuery’s legacy SQL for this view. | |
view_query | str | SQL query defining the table as a view. | |
schema | seq | The schema of the table destination for the row. | |
schema | name | str | The name of the field. |
field_type | str | The type of the field. One of
| |
mode | str | The mode of the field. One of
| |
description | str | Description for the field. | |
fields | seq | Describes the nested schema fields if the type property is set to ^{tt38}$. | |
labels | map | Labels for the table. |
注意:有关密钥名称的详细信息,请参见the official documentation of BigQuery Tables。
目录结构
.
├── dataset1 # Directory storing the table configuration file of dataset1.
│ ├── table1.yml # Configuration file of table1 in dataset1.
│ └── table2.yml # Configuration file of table2 in dataset1.
├── dataset1.yml # Configuration file of dataset1.
├── dataset2 # Directory storing the table configuration file of dataset2.
│ └── .gitkeep # When keeping a directory, dataset2 is empty.
├── dataset2.yml # Configuration file of dataset2.
└── dataset3.yml # Configuration file of dataset3. This dataset does not manage the table.
注意:如果不想管理表,请删除与数据集名称相同的目录。
使用量
Usage: bqdm [OPTIONS] COMMAND [ARGS]... Options: -c, --credential-file PATH Location of credential file for service accounts. -p, --project TEXT Project ID for the project which you’d like to manage with. --color / --no-color Enables output with coloring. --parallelism INTEGER Limit the number of concurrent operation. --debug Debug output management. -h, --help Show this message and exit. Commands: apply Builds or changes datasets. destroy Specify subcommand `plan` or `apply` export Export existing datasets into file in YAML format. plan Generate and show an execution plan.
导出
Usage: bqdm export [OPTIONS] [OUTPUT_DIR] Export existing datasets into file in YAML format. Options: -d, --dataset TEXT Specify the ID of the dataset to manage. -e, --exclude-dataset TEXT Specify the ID of the dataset to exclude from managed. -h, --help Show this message and exit.
计划
Usage: bqdm plan [OPTIONS] [CONF_DIR] Generate and show an execution plan. Options: --detailed_exitcode Return a detailed exit code when the command exits. When provided, this argument changes the exit codes and their meanings to provide more granular information about what the resulting plan contains: 0 = Succeeded with empty diff 1 = Error 2 = Succeeded with non- empty diff -d, --dataset TEXT Specify the ID of the dataset to manage. -e, --exclude-dataset TEXT Specify the ID of the dataset to exclude from managed. -h, --help Show this message and exit.
应用
Usage: bqdm apply [OPTIONS] [CONF_DIR] Builds or changes datasets. Options: -d, --dataset TEXT Specify the ID of the dataset to manage. -e, --exclude-dataset TEXT Specify the ID of the dataset to exclude from managed. -m, --mode [select_insert|select_insert_backup|replace|replace_backup|drop_create|drop_create_backup] Specify the migration mode when changing the schema. Choice from `select_insert`, `select_insert_backup`, `replace`, r`eplace_backup`, `drop_create`, `drop_create_backup`. [required] -b, --backup-dataset TEXT Specify the ID of the dataset to store the backup at migration -h, --help Show this message and exit.
销毁
Usage: bqdm destroy [OPTIONS] COMMAND [ARGS]... Specify subcommand `plan` or `apply` Options: -h, --help Show this message and exit. Commands: apply Destroy managed datasets. plan Generate and show an execution plan for...
销毁计划
Usage: bqdm destroy plan [OPTIONS] [CONF_DIR]
Generate and show an execution plan for datasets destruction.
Options:
--detailed-exitcode Return a detailed exit code when the command exits.
When provided, this argument changes
the exit codes and their meanings to provide
more granular information about what the
resulting plan contains:
0 = Succeeded with empty diff
1 = Error
2 = Succeeded with non-
empty diff
-d, --dataset TEXT Specify the ID of the dataset to manage.
-e, --exclude-dataset TEXT Specify the ID of the dataset to exclude from managed.
-h, --help Show this message and exit.
销毁应用程序
Usage: bqdm destroy apply [OPTIONS] [CONF_DIR]
Destroy managed datasets.
Options:
-d, --dataset TEXT Specify the ID of the dataset to manage.
-e, --exclude-dataset TEXT Specify the ID of the dataset to exclude from managed.
-h, --help Show this message and exit.
迁移模式
选择“插入”
- 待办事项
限制:TOdo
选择“插入备份”
- 待办事项
限制:TOdo
更换
- 待办事项
限制:TOdo
更换备份
- 待办事项
限制:TOdo
拖放创建
- 待办事项
拖放创建备份
- 待办事项
认证
参见google-cloud-python官方文档中的authentication section。
If you’re running in Compute Engine or App Engine, authentication should “just work”.
If you’re developing locally, the easiest way to authenticate is using the Google Cloud SDK:
$ gcloud auth application-default loginNote that this command generates credentials for client libraries. To authenticate the CLI itself, use:
$ gcloud auth loginPreviously, gcloud auth login was used for both use cases. If your gcloud installation does not support the new command, please update it:
$ gcloud components updateIf you’re running your application elsewhere, you should download a service account JSON keyfile and point to it using an environment variable:
$ exportGOOGLE_APPLICATION_CREDENTIALS="/path/to/keyfile.json"
测试
取决于以下环境变量:
$ exportGOOGLE_APPLICATION_CREDENTIALS=/path/to/credentials.json $ exportGOOGLE_CLOUD_PROJECT=YOUR_PROJECT_ID
运行测试
$ pip install pipenv
$ pipenv install --dev
$ pipenv run pytest
运行测试多个python版本
$ pip install pipenv
$ pipenv install --dev
$ pyenv local3.6.5 3.5.5 3.4.8 2.7.14
$ pipenv run tox
待办事项
- 支持表的加密配置
- 支持表的外部数据配置
- 模式复制
- 集成测试