pyspark与python包性能

1条回答

网友

1楼 · 发布于 2024-09-29 23:32:55

if I'm loading data into a pandas df with the read_csv method, I'm assuming this is NOT a distributed task, right?

是吗

Likewise, are any python packages you import and use on pyspark also not distributed?

事实并非如此。你知道吗

Pyspark不会改变您可能使用的任何其他包的行为。实际上，它有点像另一个Python包（现在甚至是available in PyPi）；只有当您开始将数据传输到它自己的结构（rdd、Spark dataframes等）时，分布式部分才会启动，即使这样，它也取决于您运行脚本的方式（例如，在本地模式下运行不会分发任何内容）。你知道吗

I'm wondering if anyone has any resources to help explain what goes on in the background when someone uses, say, Pandas from an anaconda install

现在应该很清楚了，这方面没有任何资源，只是因为在Pyspark脚本中使用pandas（或numpy，或sciket learn…）时没有任何特定的。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章

pyspark与python包性能

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >