我在Windows10上试用了spark-3.0.2和bin-hadoop2.7,效果很好。✅
然而。。。😟
当我试图用bin-hadoop3.
2安装pyspark-3.1.1
时,我在df.write..
上遇到一个错误。
我想要一个带有hadoop 3.2+的pyspark版本3.0+,因为我想将spark与azure ADL连接起来,这是一个要求
我遵循与以前相同的步骤。(安装了相同版本的pyspark,添加了winutils,设置了pyspark\u HOME,HADOOP\u HOME,并将它们添加到路径中)。虽然我可以成功地启动spark会话并使用df = spark.range(0, 5)
创建df,但我得到以下错误❌ 使用df.write.csv('mycsv.csv')
或df.write.parquet('myp.parquet')等。❌
错误如下:
Traceback (most recent call last):
File "<input>", line 23, in <module>
File "C:\Users\user\Work\RAP2.0\.rap2_env\lib\site-packages\pyspark\sql\readwriter.py", line 1158, in saveAsTable
self._jwrite.saveAsTable(name)
File "C:\Users\user\Work\RAP2.0\.rap2_env\lib\site-packages\py4j\java_gateway.py", line 1305, in __call__
answer, self.gateway_client, self.target_id, self.name)
File "C:\Users\user\Work\RAP2.0\.rap2_env\lib\site-packages\pyspark\sql\utils.py", line 111, in deco
return f(*a, **kw)
File "C:\Users\user\Work\RAP2.0\.rap2_env\lib\site-packages\py4j\protocol.py", line 328, in get_return_value
format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o49.saveAsTable.
: java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.createDirectoryWithMode0(Ljava/lang/String;I)V
at org.apache.hadoop.io.nativeio.NativeIO$Windows.createDirectoryWithMode0(Native Method)
at org.apache.hadoop.io.nativeio.NativeIO$Windows.createDirectoryWithMode(NativeIO.java:560)
at org.apache.hadoop.fs.RawLocalFileSystem.mkOneDirWithMode(RawLocalFileSystem.java:534)
at org.apache.hadoop.fs.RawLocalFileSystem.mkdirsWithOptionalPermission(RawLocalFileSystem.java:587)
at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:559)
at org.apache.hadoop.fs.ChecksumFileSystem.mkdirs(ChecksumFileSystem.java:705)
at org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.liftedTree1$1(InMemoryCatalog.scala:121)
at org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.createDatabase(InMemoryCatalog.scala:118)
at org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:137)
at org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:124)
at org.apache.spark.sql.internal.BaseSessionStateBuilder.$anonfun$catalog$1(BaseSessionStateBuilder.scala:140)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.externalCatalog$lzycompute(SessionCatalog.scala:98)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.externalCatalog(SessionCatalog.scala:98)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.databaseExists(SessionCatalog.scala:266)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.requireDbExists(SessionCatalog.scala:191)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.getTableRawMetadata(SessionCatalog.scala:487)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.getTableMetadata(SessionCatalog.scala:474)
at org.apache.spark.sql.execution.datasources.v2.V2SessionCatalog.loadTable(V2SessionCatalog.scala:65)
at org.apache.spark.sql.connector.catalog.DelegatingCatalogExtension.loadTable(DelegatingCatalogExtension.java:68)
at org.apache.spark.sql.delta.catalog.DeltaCatalog.loadTable(DeltaCatalog.scala:147)
at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:637)
at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:623)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Unknown Source)
我怀疑这与我从这里得到的winutils文件有关:https://github.com/cdarlint/winutils用于3.2.0。也许是无效的
有什么想法吗
目前没有回答
相关问题 更多 >
编程相关推荐