擅长:python、mysql、java
<p>在spark 2.2中,有两种方法可以在DataFrame的列中添加常量值:</p>
<p>1)使用<code>lit</code></p>
<p>2)使用<code>typedLit</code>。</p>
<p>两者的区别在于<code>typedLit</code>还可以处理参数化的scala类型,例如List、Seq和Map</p>
<p><b>示例数据帧:</b></p>
<pre><code>val df = spark.createDataFrame(Seq((0,"a"),(1,"b"),(2,"c"))).toDF("id", "col1")
+---+----+
| id|col1|
+---+----+
| 0| a|
| 1| b|
+---+----+
</code></pre>
<p><b>1)使用<code>lit</code>:</b>在名为newcol的新列中添加常量字符串值:</p>
<pre><code>import org.apache.spark.sql.functions.lit
val newdf = df.withColumn("newcol",lit("myval"))
</code></pre>
<p>结果:</p>
<pre><code>+---+----+------+
| id|col1|newcol|
+---+----+------+
| 0| a| myval|
| 1| b| myval|
+---+----+------+
</code></pre>
<p><b>2)使用<code>typedLit</code>:</b></p>
<pre><code>import org.apache.spark.sql.functions.typedLit
df.withColumn("newcol", typedLit(("sample", 10, .044)))
</code></pre>
<p>结果:</p>
<pre><code>+---+----+-----------------+
| id|col1| newcol|
+---+----+-----------------+
| 0| a|[sample,10,0.044]|
| 1| b|[sample,10,0.044]|
| 2| c|[sample,10,0.044]|
+---+----+-----------------+
</code></pre>