我想在我的表中的情绪栏中提取“文本”,并按city=london进行过滤
我有一张这样的桌子:
name city sentiment
harry london "[
Row(score='0.999926',
sentiment=Row(score='-0.640237'),
text='happy'),
Row(score='0.609836',
sentiment=Row(score='-0.607594'),
text='sad'),
Row(score='0.58564',
sentiment=Row(score='-0.6833'),
text='mad')
]"
sally london "[
Row(score='0.999926',
sentiment=Row(score='-0.640237'),
text='sad'),
Row(score='0.609836',
sentiment=Row(score='-0.607594'),
text='mad'),
Row(score='0.58564',
sentiment=Row(score='-0.6833'),
text='agitated')
]"
gary london "[
Row(score='0.999926',
sentiment=Row(score='-0.640237'),
text='excited'),
Row(score='0.609836',
sentiment=Row(score='-0.607594'),
text='down'),
Row(score='0.58564',
sentiment=Row(score='-0.6833'),
text='agitated')
]"
mary manchester "[
Row(score='0.999926',
sentiment=Row(score='-0.640237'),
text='sad'),
Row(score='0.609836',
sentiment=Row(score='-0.607594'),
text='low'),
Row(score='0.58564',
sentiment=Row(score='-0.6833'),
text='content')
]"
gerry manchester "[
Row(score='0.999926',
sentiment=Row(score='-0.640237'),
text='ecstatic'),
Row(score='0.609836',
sentiment=Row(score='-0.607594'),
text='good'),
Row(score='0.58564',
sentiment=Row(score='-0.6833'),
text='bad')
]"
我的代码当前看起来像这样,但不起作用:
from pyspark.sql import functions as F
from pyspark.sql import types as T
data= spark.read.parquet("INSERT S3 TABLE").where("city LIKE 'london' AND sentiment['text=']")
df = sharethis.toPandas()
print (df)
我希望输出像这样:
name city sentiment
harry london happy
harry london sad
harry london mad
sally london sad
sally london mad
sally london agitated
gary london sad
gary london low
gary london content
有人知道我如何访问情感栏中的数组以提取文本吗
提前谢谢
让我们首先使用示例中的数据创建一个数据帧:
您拥有的是以下数据帧:
一旦我们有了数据帧,您需要分解
sentiment
列:结果是:
最后,让我们创建一个只包含文本的列,按城市筛选并获得3个想要的列:
结果将是:
相关问题 更多 >
编程相关推荐