无法pickle_thread.RLock对象Pyspark模型

2024-10-10 18:23:04 发布

您现在位置:Python中文网/ 问答频道 /正文


我用PySpark创建了一个随机森林模型
我需要将此模型保存为扩展名为.pkl的文件,为此我使用了pickle库,但当我使用它时,会出现以下错误:
TypeError                                 Traceback (most recent call last)

<ipython-input-76-bf32d5617a63> in <module>()
      2 
      3 filename = "drive/My Drive/Progetto BigData/APPOGGIO/Modelli/SVM/svm_sentiment_analysis"
----> 4 pickle.dump(model, open(filename, "wb"))

TypeError: can't pickle _thread.RLock objects

可以将PICKLE与类似RandomForest的PySPark模型一起使用,还是只能与Scikit学习模型一起使用?

这是我的代码:

from pyspark.ml.classification import RandomForestClassifier
rf = RandomForestClassifier(labelCol = "label", featuresCol = "word2vect", weightCol = "classWeigth", seed = 0, maxDepth=10, numTrees=100, impurity="gini")
model = rf.fit(train_df)

# Save our model into a file with the help of pickle library
filename = "drive/My Drive/Progetto BigData/APPOGGIO/Modelli/SVM/svm_sentiment_analysis"
pickle.dump(model, open(filename, "wb")) 

我的环境是Google Colab
我需要将模型转换为一个PICKLE文件来创建一个webapp,要保存它,我通常使用.save(path)方法,在这种情况下我不需要保存
PySpark模型是否可能无法转换为文件?
提前谢谢


Tags: 文件模型modelmydrivefilenamepicklepyspark