将SageMaker和Databricks等云解决方案与Hopsworks集成的SDK。
hopsworks-cloud-sdk的Python项目详细描述
hopsworks cloud sdk是一个将现有云解决方案(如Amazon SageMaker our Databricks)与hopsworks平台集成的sdk。在
它允许从SageMaker和Databricks笔记本访问Hopsworks功能库。在
快速入门
确保Hopsworks安装设置正确:Setting up Hopsworks for the cloud
要安装:
>>> pip install hopsworks-cloud-sdk
示例用法:
^{pr2}$文件
Hopsworks功能库的Hopsworks有一个用于机器学习的数据管理层,称为特征存储。 功能库支持简单高效的版本控制、共享、管理和定义特性,这些特性既可用于训练机器学习模型,也可用于服务推理请求。 featurestore是数据工程和数据科学之间的自然接口。在
正在从featurestore读取:
fromhopsimportfeaturestorefeatures_df=featurestore.get_features(["team_budget","average_attendance","average_player_age"])
与Sci工具包学习集成:
fromhopsimportfeaturestoretrain_df=featurestore.get_featuregroup("iris_features",dataframe_type="pandas")x_df=train_df[['sepal_length','sepal_width','petal_length','petal_width']]y_df=train_df[["label"]]X=x_df.valuesy=y_df.values.ravel()iris_knn=KNeighborsClassifier()iris_knn.fit(X,y)
与Tensorflow集成:
fromhopsimportfeaturestorefeature_list=["team_budget","average_attendance","average_player_age","team_position","sum_attendance","average_player_rating","average_player_worth","sum_player_age","sum_player_rating","sum_player_worth","sum_position","average_position"]latest_version=featurestore.get_latest_training_dataset_version("team_position_prediction")featurestore.create_training_dataset(features=feature_list,training_dataset="team_position_prediction",descriptive_statistics=False,feature_correlation=False,feature_histograms=False,cluster_analysis=False,training_dataset_version=latest_version+1)defcreate_tf_dataset():dataset_dir=featurestore.get_training_dataset_path("team_position_prediction")input_files=tf.gfile.Glob(dataset_dir+"/part-r-*")dataset=tf.data.TFRecordDataset(input_files)tf_record_schema=...# Add tf schemafeature_names=["team_budget","average_attendance","average_player_age","sum_attendance","average_player_rating","average_player_worth","sum_player_age","sum_player_rating","sum_player_worth","sum_position","average_position"]label_name="team_position"defdecode(example_proto):example=tf.parse_single_example(example_proto,tf_record_schema)x=[]forfeature_nameinfeature_names:x.append(example[feature_name])y=[tf.cast(example[label_name],tf.float32)]returnx,ydataset=dataset.map(decode).shuffle(SHUFFLE_BUFFER_SIZE).batch(BATCH_SIZE).repeat(NUM_EPOCHS)returndatasettf_dataset=create_tf_dataset()
功能可视化:
开发说明书
有关如何测试和生成文档等开发详细信息,请参阅参考文献:Development。在
- 项目
标签: