替换特殊字符python

import pandas as pd import string new_title =[] alphabests = list(string.ascii_letters) # this list invlude all alphabets title_file = pd.read_csv('D:\\titles.csv',sep=';') title = title_file['title'] x=0 #move in titles while x < len(title): y=0 #move in alphabets while y < len(alphabests): check_about_alpabets = [w.replace(',{}'.format(alphabests[y]), '{}'.format(alphabests[y])) for w in title[x]] y +=1 new_title.append(title[x]) x +=1

3条回答

网友

1楼 · 编辑于 2024-10-01 00:34:57

我将建议进行两次编辑，这可能有助于您找出错误：首先，不要使用while循环，而是使用for循环，并迭代标题列表中的项目。在尝试处理错误之前，请尝试打印该列表中的变量，可能有一个浮动类型的标题-表示一个数字，您无法对其进行迭代（或尝试删除特殊字符）。如果有这样一个标题，请尝试使用If语句创建两个条件，以不同方式处理每种类型，我的意思如下

for x in title: #iterating over the list
    if type(title) == float or type(title) == int:
       #do something
    else:
       #do something else

网友

2楼 · 编辑于 2024-10-01 00:34:57

可以对发布的代码进行两项改进

使用dataframe apply而不是使用Python for或while循环来处理每个标题（即非常慢）
使用正则表达式，而不是循环检查字母表中的每个字母，以检查逗号后面是否有字母（也很慢）

代码

import re
def clean_title(title): 
  " Expression to clean title "
  # Remove comma when followed by a word letter 
  return re.sub(r',(\w)', lambda m: m.group(1), title)
    
# Clean titles
df['title'] = df['title'].apply(clean_title)

测试

生成电影标题和发布年份的数据集列表
标题中包含所需和不需要的逗号

不需要的逗号示例：

那些人，甚至是武士

所需逗号的示例：

“我，托尼亚”

创建数据集

df = pd.DataFrame({'title':['Lock, Stock and Two Smoking Barrels', 'The S,even Samurai', 'B,onnie and C,lyde', 'Reser,voir Dogs', 'A,irplane!', 'Doct,or Zhiva,go', 'I, Tonya'], 
                    'Year':['1998', '1954', '1967', '1992', '1980', '1965', '2017']})
  
print(df)

清理前的数据集

                                title  Year
0  Lock, Stock and Two Smoking Barrels  1998
1                   The S,even Samurai  1954
2                   B,onnie and C,lyde  1967
3                      Reser,voir Dogs  1992
4                           A,irplane!  1980
5                     Doct,or Zhiva,go  1965
6                             I, Tonya  2017

清理后的数据集

                               title  Year
0  Lock, Stock and Two Smoking Barrels  1998
1                    The Seven Samurai  1954
2                     Bonnie and Clyde  1967
3                       Reservoir Dogs  1992
4                            Airplane!  1980
5                       Doctor Zhivago  1965
6                             I, Tonya  2017

网友

3楼 · 编辑于 2024-10-01 00:34:57

出现此错误是因为变量“title”是Pandas中的系列对象，而不是列表。如果要更改数据帧中的列名，可以执行以下操作：

      column_name = list(title_file.columns)
      column_dict = {}
      for name in column_name:
          for char,idx in enumerate(name):
             if char == ',':
                new_name = name[idx+1:])
                column_dict[name] = new_name
      title_file.rename(columns = column_dict, inplace = True)

但在将inplace参数设置为True之前，只需检查输出

相关问题更多 >

编程相关推荐

热门问题

热门文章