在python中删除列名中不需要的字符

M1 Out[347]: a1 b1 a2 b2 0 0.238066 0.976816 0.238066 0.976816 1 0.373340 1.469728 0.373340 1.469728 2 0.968814 1.248595 0.968814 1.248595 3 0.886586 3.451292 0.886586 3.451292 4 0.244301 2.206757 0.244301 2.206757 5 0.389688 2.893761 0.389688 2.893761 6 0.704340 2.621483 0.704340 2.621483 7 0.301238 1.678316 0.301238 1.678316 8 0.375927 0.574135 0.375927 0.574135 9 0.065749 2.259736 0.065749 2.259736 print(M1.columns.tolist()) ['\ufeffa1', 'b1', 'a2', 'b2'] M1.columns = M1.columns.str.strip().str.lower().str.replace(' ', '_').str.replace('(', '').str.replace(')', '') print(M1.columns.tolist()) ['\ufeffa1', 'b1', 'a2', 'b2']

3条回答

网友

1楼 · 编辑于 2024-10-02 18:27:24

这就是编码问题

   ...: df = pd.DataFrame(np.random.randint(3,10,16).reshape(4,4), columns=['\ufeffa1', 'b1', 'a2', 'b2'])
   ...: df.head()
Out[3]: 
   a1  b1  a2  b2
0    7   7   9   6
1    5   9   6   7
2    4   8   4   3
3    6   9   8   7

In [4]: df.columns
Out[4]: Index(['a1', 'b1', 'a2', 'b2'], dtype='object')

In [5]: df.columns.to_list()
Out[5]: ['\ufeffa1', 'b1', 'a2', 'b2']

In [6]: df.columns = pd.Series(df.columns).apply(lambda x:x.encode('utf-8').decode('utf-8-sig'))

In [7]: df.columns
Out[7]: Index(['a1', 'b1', 'a2', 'b2'], dtype='object')

In [8]: df.columns.to_list()
Out[8]: ['a1', 'b1', 'a2', 'b2']

网友

2楼 · 编辑于 2024-10-02 18:27:24

请使用'Some String'。编码（'ascii'，'ignore'），它给出字节，并使用解码来获取字符串

代码：

lst = ['\ufeffa1', 'b1', 'a2', 'b2']
print(lst)
newlst = [s.encode('ascii', 'ignore').decode("utf-8") for s in lst]

print(newlst)

输出：

['\ufeffa1', 'b1', 'a2', 'b2']
['a1', 'b1', 'a2', 'b2']

网友

3楼 · 编辑于 2024-10-02 18:27:24

字符\ufeef（U+FEFF）是一个byte order mark (BOM)，它是一个特殊的字符，通知读者编码的“端性”（小端对大端）。对于utf-8，BOM是可选的，通常不会写入。您可能正在使用默认编码读取带有BOM的UTF-8文件，即“UTF-8”（不带BOM的UTF-8）。尝试使用“utf-8-sig”（带BOM的utf-8）

# You file is probably encoded with 'utf-8-sig'. 
# You are decoding it with encoding='utf-8' (the default).
# This is what happens:
'hi there'.encode('utf-8-sig').decode('utf-8')
Out[14]: '\ufeffhi there'

'hi there'.encode('utf-8-sig').decode('utf-8-sig')
Out[15]: 'hi there'

编辑：“那么我应该如何处理文件？更改编码并不能解决问题。”

您可以打开记事本++，然后格式化->；转换为UTF-8。或者在Python中：

with open(input_path, encoding='utf-8-sig') as fin:
    text = fin.read()
with open(output_path, 'w', encoding='utf-8') as fout:
    fout.write(text)

这将删除BOM表

相关问题更多 >

编程相关推荐

热门问题

热门文章