擅长:python、mysql、java
<p>让<code>split</code>在<code>text</code>后面加上<code>-</code>再加上<code>lower case</code>或<code>-</code>再加上<code>string Startingwithcaps but followed with lowercase letters</code></p>
<p>分割之后,我们可以切片<code>first element in list</code>,这将给我们<code>upper</code></p>
<p>一旦我们有了上限,<code>remove the upper</code>从全文中保留<code>lower</code></p>
<p>下面的代码和享受编码</p>
<p>资料</p>
<pre><code>data=[
(1,"MEM-BEN-BTN-CLK-healthandwellness-love-and-meaning-after-50"),
(2,"MEM-BEN-LOC-MODAL-LOCATION-INPUT-Birmingham, AL, USA")
]
df=spark.createDataFrame(data, ['id','text'])
df.show(truncate=False)
</code></pre>
<p>代码</p>
<pre><code>df.withColumn('upper', F.split('text','\\-(?=[a-z]+)|(\\-[A-Z][a-z]+)')[0]).withColumn("lower",expr("regexp_replace(text,upper,'')")).show(truncate=False)
</code></pre>