从字符串中替换特殊字符

2024-10-01 19:30:50 发布

您现在位置:Python中文网/ 问答频道 /正文

我有文本格式的原始输入,字符串中有特殊字符。我想更改字符串中的这些特殊字符,以便在运行代码后,其中不会有任何特殊字符

enter image description here

enter image description here

我试着写下面的代码。我不确定它是对还是错

def avoid(x):
#print(x)
#value=[]
for ele in range(0, len(x)):
    
    p=invalidcharch(ele)
    #value.append(p)
      #value=''.join(p)
    print(p)    
return p
def invalidcharch(e):
items={"ä":"a","ç":"c","è":"e","º":"","Ã":"A","Í":"I","í":"i","Ü":"U","â":"a","ò":"o","¿":"","ó":"o","á":"a","à":"a","õ":"o","¡":"","Ó":"O","ù":"u","Ú":"U","´":"","Ñ":"N","Ò":"O","ï":"i","Ï":"I","Ç":"C","À":"A","É":"E","ë":"e","Á":"A","ã":"a","Ö":"O","ú":"u","ñ":"n","é":"e","ê":"e","·":"-","ª":"a","°":"","ü":"u","ô":"o"} 

for i, j in items.items():
    e = e.replace(i, j)
return e

for col in df.columns:
 df[col]=df[col].apply(lambda x:avoid(x))

但在上面的代码中,我无法将整个字符串存储在变量p中。我需要将整个字符串值存储在p中,以便它存储替换单元格值。 包含混合数据类型值(如字符串整数)的数据

A列
卡卡韦洛斯之家
布拉根萨
拉斐尔旅社圣阿古斯蒂广场酒店 Cartão MOBI.E R.Conselheiro Emídio Navarro(法语)

在更改后
Junto a Estacao de Carcavelos
布拉干卡
拉法利特旅社圣阿古斯蒂广场核电站现场。
卡托莫比酒店 R.Conselheiro Emidio Navarro(伊塞尔阵线)


Tags: 字符串代码indfforreturnvaluedef
3条回答

我们可以使用^{},它相当于python中的^{}+^{}

converter = str.maketrans(items) # `items` is special chars dict.
df['colA'].str.translate(converter)

0                                              Junto a Estacao de Carcavelos;
1                                                                    Braganca
2    Situado en el nucleo de Es Calo de Sant Agusti frente al Hostal Rafalet.
3                Cartao MOBI.E R. Conselheiro Emidio Navarro (frente ao ISEL)
Name: col A, dtype: object

不完全理解你想要达到的目标,但你可以尝试

items={"ä":"a","ç":"c","è":"e","º":"","Ã":"A","Í":"I","í":"i","Ü":"U","â":"a","ò":"o","¿":"","ó":"o","á":"a","à":"a","õ":"o","¡":"","Ó":"O","ù":"u","Ú":"U","´":"","Ñ":"N","Ò":"O","ï":"i","Ï":"I","Ç":"C","À":"A","É":"E","ë":"e","Á":"A","ã":"a","Ö":"O","ú":"u","ñ":"n","é":"e","ê":"e","·":"-","ª":"a","°":"","ü":"u","ô":"o"} 

df = pd.DataFrame([
    'abcä',
    'Ãbcd12345'
], columns=['colA'])

df['colA'] = df['colA'].str.replace(r'[^\x00-\x7F]', lambda x: items.get(x.group(0)) or '_', regex=True)

df
    colA
0   abca
1   Abcd12345

对于r'[^\x00-\x7F]检查Regular expression that finds and replaces non-ascii characters with Python

除了Achille Huet链接this question的评论之外,您还可以在pandas dataframe列上使用以下内容:

import unidecode
df['col A'] = df['col A'].apply(lambda x: unidecode.unidecode(x))

import unidecode
for col in df.columns:
    df[col]=df[col].apply(lambda x: unidecode.unidecode(x))

但是,由于您已经创建了特殊字符词典,因此可以使用它:

只需通过传递regex=True来创建一个字典special_charsreplace整个数据帧上的值。这也应该更快。我不知道是否有一个更快的解决方案使用unicode。这也取决于你用它做什么。例如,如果发送到.csv文件,我相信to_csv()中也有一个参数,但我不确定这是否相关:

special_chars = {"ä":"a","ç":"c","è":"e","º":"","Ã":"A","Í":"I","í":"i","Ü":"U","â":"a","ò":"o","¿":"",
"ó":"o","á":"a","à":"a","õ":"o","¡":"","Ó":"O","ù":"u","Ú":"U","´":"","Ñ":"N",
"Ò":"O","ï":"i","Ï":"I","Ç":"C","À":"A","É":"E","ë":"e","Á":"A","ã":"a","Ö":"O",
"ú":"u","ñ":"n","é":"e","ê":"e","·":"-","ª":"a","°":"","ü":"u","ô":"o"}

df.replace(special_chars, regex=True)

相关问题 更多 >

    热门问题