擅长:python、mysql、java
<p>带有<code>regexp_extract</code>的正则表达式:</p>
<pre><code>from pyspark.sql.functions import regexp_extract
df = spark.createDataFrame([("8841673_3", )], ("id", ))
df.select(regexp_extract("id", "^(\d+)_.*", 1)).show()
# + +
# |regexp_extract(id, ^(\d+)_.*, 1)|
# + +
# | 8841673|
# + +
</code></pre>
<p><code>regexp_replace</code>:</p>
^{pr2}$
<p>或者只是<code>split</code>:</p>
<pre><code>from pyspark.sql.functions import split
df.select(split("id", "_")[0]).show()
# + -+
# |split(id, _)[0]|
# + -+
# | 8841673|
# + -+
</code></pre>