re.sub使用“预期的字符串或类似于对象的字节”时出错

def fix_Plan(location): letters_only = re.sub("[^a-zA-Z]", # Search for all non-letters " ", # Replace all non-letters with spaces location) # Column and row to search words = letters_only.lower().split() stops = set(stopwords.words("english")) meaningful_words = [w for w in words if not w in stops] return (" ".join(meaningful_words)) col_Plan = fix_Plan(train["Plan"][0]) num_responses = train["Plan"].size clean_Plan_responses = [] for i in range(0,num_responses): clean_Plan_responses.append(fix_Plan(train["Plan"][i]))

Traceback (most recent call last): File "C:/Users/xxxxx/PycharmProjects/tronc/tronc2.py", line 48, in <module> clean_Plan_responses.append(fix_Plan(train["Plan"][i])) File "C:/Users/xxxxx/PycharmProjects/tronc/tronc2.py", line 22, in fix_Plan location) # Column and row to search File "C:\Users\xxxxx\AppData\Local\Programs\Python\Python36\lib\re.py", line 191, in sub return _compile(pattern, flags).sub(repl, string, count) TypeError: expected string or bytes-like object

3条回答

网友

1楼 · 编辑于 2024-05-17 21:02:01

我想最好是使用re.match（）函数。这是一个可以帮助你的例子。

import re
import nltk
from nltk.tokenize import word_tokenize
nltk.download('punkt')
sentences = word_tokenize("I love to learn NLP \n 'a :(")
#for i in range(len(sentences)):
sentences = [word.lower() for word in sentences if re.match('^[a-zA-Z]+', word)]  
sentences

网友

2楼 · 编辑于 2024-05-17 21:02:01

如您在注释中所述，某些值似乎是浮点数，而不是字符串。在将其传递给re.sub之前，需要将其更改为字符串。最简单的方法是在使用re.sub时将location更改为str(location)。即使已经是一个str了，无论如何这样做也不会有什么坏处。

letters_only = re.sub("[^a-zA-Z]",  # Search for all non-letters
                          " ",          # Replace all non-letters with spaces
                          str(location))

网友

3楼 · 编辑于 2024-05-17 21:02:01

最简单的解决方案是将python str函数应用于试图循环遍历的列。

如果你用熊猫这可以实现为

dataframe['column_name']=dataframe['column_name'].apply（str）

相关问题更多 >

编程相关推荐

热门问题

热门文章