如何获取列的名称或更改现有列的名称？

# TEST Capitalization and punctuation (4b) testPunctDF = sqlContext.createDataFrame([(" The Elephant's 4 cats. ",)]) testPunctDF.show() Test.assertEquals(testPunctDF.select(removePunctuation(col('_1'))).first()[0], 'the elephants 4 cats', 'incorrect definition for removePunctuation function')

def removePunctuation(column): """Removes punctuation, changes to lower case, and strips leading and trailing spaces. Note: Only spaces, letters, and numbers should be retained. Other characters should should be eliminated (e.g. it's becomes its). Leading and trailing spaces should be removed after punctuation is removed. Args: column (Column): A Column containing a sentence. Returns: Column: A Column named 'sentence' with clean-up operations applied. """ return lower(trim(regexp_replace("column_name", "[\W_]+"," "))).alias("sentence");

2条回答

网友

1楼 · 编辑于 2024-10-04 11:35:10

我会尝试：

stringWithPunctuation.translate(None, string.punctuation)

它在引擎盖下使用c，这是效率方面最好的

您的尝试：

return lower(trim(regexp_replace(, "[\W_]+"," "))).alias("sentence");

似乎没有在任何地方使用参数column，这可能解释了错误

网友

2楼 · 编辑于 2024-10-04 11:35:10

令人惊讶的是，我只能以regexp_replace()参数而不是列名传递列对象

相关问题更多 >

编程相关推荐

热门问题

热门文章