pyspark dataframe“条件应为字符串或列”

2024-10-17 06:30:40 发布

您现在位置：Python中文网/ 问答频道 /正文

1725

网友

男 | 程序猿一只，喜欢编程写python代码。

我无法对数据帧使用筛选器。我一直得到错误“TypeError（“条件应该是字符串或列”）”

我已经尝试改变过滤器使用col对象。不过，它还是不起作用。在

path = 'dbfs:/FileStore/tables/TravelData.txt'
data = spark.read.text(path)
from pyspark.sql.types import StructType, StructField, IntegerType , StringType, DoubleType
schema = StructType([
  StructField("fromLocation", StringType(), True),
  StructField("toLocation", StringType(), True),
  StructField("productType", IntegerType(), True)
])
df = spark.read.option("delimiter", "\t").csv(path, header=False, schema=schema)
from pyspark.sql.functions import col
answerthree = df.select("toLocation").groupBy("toLocation").count().sort("count", ascending=False).take(10)  # works fine
display(answerthree)

我为变量“answerthree”添加了一个过滤器，如下所示：

^{pr2}$

抛出错误如下： “无法解析“productType”给定的输入列”“条件应为字符串或列”

在jist中，我试图用pyspark而不是scal来解决下面链接中给出的问题3。数据集也在下面的url中提供。 https://acadgild.com/blog/spark-use-case-travel-data-analysis?fbclid=IwAR0fgLr-8aHVBsSO_yWNzeyh7CoiGraFEGddahDmDixic6wmumFwUlLgQ2c

我应该只能得到productType值为1的期望结果

Tags：数据 path 字符串 true 过滤器 schema 错误条件

1条回答

网友

1楼 · 发布于 2024-10-17 06:30:40

由于没有引用数据帧的变量，因此最简单的方法是使用字符串条件：

answerthree = df.select("toLocation").groupBy("toLocation").count()\
                .filter("productType = 1")\
                .sort(...

或者，可以使用数据帧变量和基于列的筛选器：

^{pr2}$

pyspark dataframe“条件应为字符串或列”

相关问题更多 >

编程相关推荐

热门问题

热门文章

pyspark dataframe“条件应为字符串或列”

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >