将SageMaker和Databricks等云解决方案与Hopsworks集成的SDK。

hopsworks-cloud-sdk的Python项目详细描述


DownloadsPypiStatusPythonVersions

hopsworks cloud sdk是一个将现有云解决方案(如Amazon SageMaker our Databricks)与hopsworks平台集成的sdk。在

它允许从SageMaker和Databricks笔记本访问Hopsworks功能库。在

快速入门

确保Hopsworks安装设置正确:Setting up Hopsworks for the cloud

要安装:

>>> pip install hopsworks-cloud-sdk

示例用法:

^{pr2}$

文件

Hopsworks功能库的

Hopsworks有一个用于机器学习的数据管理层,称为特征存储。 功能库支持简单高效的版本控制、共享、管理和定义特性,这些特性既可用于训练机器学习模型,也可用于服务推理请求。 featurestore是数据工程和数据科学之间的自然接口。在

API documentation

正在从featurestore读取

fromhopsimportfeaturestorefeatures_df=featurestore.get_features(["team_budget","average_attendance","average_player_age"])

与Sci工具包学习集成

fromhopsimportfeaturestoretrain_df=featurestore.get_featuregroup("iris_features",dataframe_type="pandas")x_df=train_df[['sepal_length','sepal_width','petal_length','petal_width']]y_df=train_df[["label"]]X=x_df.valuesy=y_df.values.ravel()iris_knn=KNeighborsClassifier()iris_knn.fit(X,y)

与Tensorflow集成

fromhopsimportfeaturestorefeature_list=["team_budget","average_attendance","average_player_age","team_position","sum_attendance","average_player_rating","average_player_worth","sum_player_age","sum_player_rating","sum_player_worth","sum_position","average_position"]latest_version=featurestore.get_latest_training_dataset_version("team_position_prediction")featurestore.create_training_dataset(features=feature_list,training_dataset="team_position_prediction",descriptive_statistics=False,feature_correlation=False,feature_histograms=False,cluster_analysis=False,training_dataset_version=latest_version+1)defcreate_tf_dataset():dataset_dir=featurestore.get_training_dataset_path("team_position_prediction")input_files=tf.gfile.Glob(dataset_dir+"/part-r-*")dataset=tf.data.TFRecordDataset(input_files)tf_record_schema=...# Add tf schemafeature_names=["team_budget","average_attendance","average_player_age","sum_attendance","average_player_rating","average_player_worth","sum_player_age","sum_player_rating","sum_player_worth","sum_position","average_position"]label_name="team_position"defdecode(example_proto):example=tf.parse_single_example(example_proto,tf_record_schema)x=[]forfeature_nameinfeature_names:x.append(example[feature_name])y=[tf.cast(example[label_name],tf.float32)]returnx,ydataset=dataset.map(decode).shuffle(SHUFFLE_BUFFER_SIZE).batch(BATCH_SIZE).repeat(NUM_EPOCHS)returndatasettf_dataset=create_tf_dataset()

功能可视化

Visualizing feature distributions
Visualizing feature correlations

开发说明书

有关如何测试和生成文档等开发详细信息,请参阅参考文献:Development。在

欢迎加入QQ群-->: 979659372 Python中文网_新手群

推荐PyPI第三方库


热门话题
空字符串检查在java中未按预期工作   JavaSpringWebClient:自动计算主体的HMAC签名并将其作为头传递   foreach是否有一个Java等效的foreach循环和一个引用变量?   java如何在Eclipse中导入jar   使用特定第三方或java时lombok触发错误。*方法或构造函数   安卓 java将对象数组转换为int数组   java使一定百分比的JUnit测试通过   java Android:将Seekbar的一个值与另一个值进行比较   java将int数组(图像数据)写入文件的最佳方式是什么   java取代了系统。yml的构造函数内的getProperty   sqlite Java将公钥和私钥转换为字符串,然后再转换回字符串   安卓获取白色像素并将其保存到java opencv中的数组中   java为什么是ServerSocket。setSocketFactory静态?   Java数组似乎在不直接修改的情况下更改值