如何使用spark和python访问拼花地板表中单元格内的嵌套数组？问题的回答

如何使用spark和python访问拼花地板表中单元格内的嵌套数组？

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

我想在我的表中的情绪栏中提取“文本”，并按city=london进行过滤 我有一张这样的桌子： <pre><code>name city sentiment harry london "[ Row(score='0.999926', sentiment=Row(score='-0.640237'), text='happy'), Row(score='0.609836', sentiment=Row(score='-0.607594'), text='sad'), Row(score='0.58564', sentiment=Row(score='-0.6833'), text='mad') ]" sally london "[ Row(score='0.999926', sentiment=Row(score='-0.640237'), text='sad'), Row(score='0.609836', sentiment=Row(score='-0.607594'), text='mad'), Row(score='0.58564', sentiment=Row(score='-0.6833'), text='agitated') ]" gary london "[ Row(score='0.999926', sentiment=Row(score='-0.640237'), text='excited'), Row(score='0.609836', sentiment=Row(score='-0.607594'), text='down'), Row(score='0.58564', sentiment=Row(score='-0.6833'), text='agitated') ]" mary manchester "[ Row(score='0.999926', sentiment=Row(score='-0.640237'), text='sad'), Row(score='0.609836', sentiment=Row(score='-0.607594'), text='low'), Row(score='0.58564', sentiment=Row(score='-0.6833'), text='content') ]" gerry manchester "[ Row(score='0.999926', sentiment=Row(score='-0.640237'), text='ecstatic'), Row(score='0.609836', sentiment=Row(score='-0.607594'), text='good'), Row(score='0.58564', sentiment=Row(score='-0.6833'), text='bad') ]" </code></pre> 我的代码当前看起来像这样，但不起作用： <pre><code>from pyspark.sql import functions as F from pyspark.sql import types as T data= spark.read.parquet("INSERT S3 TABLE").where("city LIKE 'london' AND sentiment['text=']") df = sharethis.toPandas() print (df) </code></pre> 我希望输出像这样： <pre><code>name city sentiment harry london happy harry london sad harry london mad sally london sad sally london mad sally london agitated gary london sad gary london low gary london content </code></pre> 有人知道我如何访问情感栏中的数组以提取文本吗 提前谢谢

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

<pre><code>from pyspark.sql import SparkSession from pyspark.sql import functions as F spark = SparkSession.builder.appName('Test').getOrCreate() data = [ ("harry", "london", [ {"score": "0.999926", "sentiment": {"score": "-0.640237"}, "text": "happy"}, {"score": "0.609836", "sentiment": {"score": "-0.607594"}, "text": "sad"}, {"score": "0.58564", "sentiment": {"score": "-0.6833"}, "text": "mad"} ]), ("sally", "london", [ {"score": "0.999926", "sentiment": {"score": "-0.640237"}, "text": "sad"}, {"score": "0.609836", "sentiment": {"score": "-0.607594"}, "text": "mad"}, {"score": "0.58564", "sentiment": {"score": "-0.6833"}, "text": "agitated"} ]), ("gary", "london", [ {"score": "0.999926", "sentiment": {"score": "-0.640237"}, "text": "excited"}, {"score": "0.609836", "sentiment": {"score": "-0.607594"}, "text": "down"}, {"score": "0.58564", "sentiment": {"score": "-0.6833"}, "text": "agitated"} ]), ("mary", "manchester", [ {"score": "0.999926", "sentiment": {"score": "-0.640237"}, "text": "sad"}, {"score": "0.609836", "sentiment": {"score": "-0.607594"}, "text": "low"}, {"score": "0.58564", "sentiment": {"score": "-0.6833"}, "text": "content"} ]), ("gerry", "manchester", [ {"score": "0.999926", "sentiment": {"score": "-0.640237"}, "text": "ecstatic"}, {"score": "0.609836", "sentiment": {"score": "-0.607594"}, "text": "good"}, {"score": "0.58564", "sentiment": {"score": "-0.6833"}, "text": "bad"} ]) ] df = spark.createDataFrame(data=data, schema=["name", "city", "sentiment"]) df.show() df.filter(df.city == "london").select("name", "city", F.explode("sentiment").alias("sentiment"))\ .select("name", "city", F.col("sentiment.text").alias("sentiment")).show() Output: + -+ + -+ | name| city|sentiment| + -+ + -+ |harry|london| happy| |harry|london| sad| |harry|london| mad| |sally|london| sad| |sally|london| mad| |sally|london| agitated| | gary|london| excited| | gary|london| down| | gary|london| agitated| + -+ + -+ </code></pre>

如何使用spark和python访问拼花地板表中单元格内的嵌套数组？

1 个回答

相关Python问题