回答此问题可获得 20 贡献值,回答如果被采纳可获得 50 分。
<p>在pysparkSQL中,我有一个名为<code>bmd2</code>的数据帧,如下所示:</p>
<pre><code>DataFrame[genres: string, id: int, tagline: string, title: string, vote_average: double, vote_count: int]
</code></pre>
<p>数据<code>bmd2['genres']</code>是这样的:</p>
<pre><code>bmd2.select('genres').show():
</code></pre>
<pre class="lang-none prettyprint-override"><code>+--------------------+
| genres|
+--------------------+
|[{'id': 16, 'name...|
|[{'id': 12, 'name...|
|[{'id': 10749, 'n...|
|[{'id': 35, 'name...|
|[{'id': 35, 'name...|
|[{'id': 28, 'name...|
|[{'id': 35, 'name...|
|[{'id': 28, 'name...|
|[{'id': 28, 'name...|
|[{'id': 12, 'name...|
|[{'id': 35, 'name...|
|[{'id': 35, 'name...|
|[{'id': 10751, 'n...|
|[{'id': 36, 'name...|
|[{'id': 28, 'name...|
|[{'id': 18, 'name...|
|[{'id': 18, 'name...|
|[{'id': 80, 'name...|
|[{'id': 80, 'name...|
|[{'id': 28, 'name...|
+--------------------+
only showing top 20 rows
</code></pre>
<p>“genres”列中的数据类型是字符串,但在python中它们可以通过“eval function”传输到dict列表。那么我应该如何在这里应用eval()来将这里的字符串转移到每一行的列表中呢?我试过很多方法:</p>
<blockquote>
<ol>
<li>bmd2.select('genres'.astype('list')):AttributeError: 'str' object
has no attribute 'astype'</li>
<li>bmd2.select(eval('genres')):NameError: name 'genres' is not defined</li>
<li>bmd2.withColumn('genres',eval('genres')):NameError: name 'genres'
is not defined</li>
</ol>
</blockquote>