我在运行pyspark(从ipython笔记本)时遇到了一个库错误,我想在包含(key,list(int))对的RDD操作中使用Statistics.chiSqTest(obs)
from{
在主节点上,如果我收集RDD作为一个映射,然后迭代这些值,这样我就没有问题了
keys_to_bucketed = vectors.collectAsMap()
keys_to_chi = {key:Statistics.chiSqTest(value).pValue for key,value in keys_to_bucketed.iteritems()}
但是如果我直接在RDD上做同样的事情,我就会遇到问题
^{pr2}$导致以下异常
Traceback (most recent call last):
File "<ipython-input-80-c2f7ee546f93>", line 3, in chi_sq
File "/Users/atbrew/Development/Spark/spark-1.4.0-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/mllib/stat/_statistics.py", line 238, in chiSqTest
jmodel = callMLlibFunc("chiSqTest", _convert_to_vector(observed), expected)
File "/Users/atbrew/Development/Spark/spark-1.4.0-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/mllib/common.py", line 127, in callMLlibFunc
api = getattr(sc._jvm.PythonMLLibAPI(), name)
AttributeError: 'NoneType' object has no attribute '_jvm'
在安装spark的早期,我遇到了一个问题:macosx有两个python安装(一个来自brew,一个来自OS),但我认为我已经解决了这个问题。奇怪的是,这是spark安装附带的python库之一(我以前的问题是关于numpy的)。在
PYSPARK_PYTHON=/usr/bin/python
PYTHONPATH=/usr/local/lib/python2.7/site-packages:$PYTHONPATH:$EA_HOME/omnicat/src/main/python:$SPARK_HOME/python/
正如您在评论中注意到的,worker节点上的sc是None。SparkContext仅在驱动程序节点上定义。在
相关问题 更多 >
编程相关推荐