如何获取列的名称或更改现有列的名称?

2024-10-04 11:35:10 发布

您现在位置:Python中文网/ 问答频道 /正文

我的任务是构建一个函数“RemovePercentration”,该函数去除标点符号,从而通过此测试:

# TEST Capitalization and punctuation (4b)
testPunctDF = sqlContext.createDataFrame([(" The Elephant's 4 cats. ",)])
testPunctDF.show()
Test.assertEquals(testPunctDF.select(removePunctuation(col('_1'))).first()[0],
                  'the elephants 4 cats',
                  'incorrect definition for removePunctuation function')

这是我设法写的

def removePunctuation(column):
    """Removes punctuation, changes to lower case, and strips leading and trailing spaces.

    Note:
        Only spaces, letters, and numbers should be retained.  Other characters should should be
        eliminated (e.g. it's becomes its).  Leading and trailing spaces should be removed after
        punctuation is removed.

    Args:
        column (Column): A Column containing a sentence.

    Returns:
        Column: A Column named 'sentence' with clean-up operations applied.
    """

    return lower(trim(regexp_replace("column_name", "[\W_]+"," "))).alias("sentence");

但我仍然无法使函数regexp_替换为使用别名“句子”。我得到这个错误:

AnalysisException: u"cannot resolve 'sentence' given input columns: [_1];"


Tags: and函数columnbelowersentencespacesshould
2条回答

我会尝试:

stringWithPunctuation.translate(None, string.punctuation)

它在引擎盖下使用,这是效率方面最好的


您的尝试:

return lower(trim(regexp_replace(, "[\W_]+"," "))).alias("sentence");

似乎没有在任何地方使用参数column,这可能解释了错误

令人惊讶的是,我只能以regexp_replace()参数而不是列名传递列对象

相关问题 更多 >