在Python中用“N/A”替换特殊字符

df['Comments'][:6] 0 nice 1 Insane3 2 😻😻❤️ 3 @bertelsen1986 4 20 or 30 mm rise on the Renthal Fatbar? 5 Luckily I have one to 🔥💪🏻

df['Comments'][:6] 0 nice 1 Insane3 2 nan 3 @bertelsen1986 4 20 or 30 mm rise on the Renthal Fatbar? 5 Luckily I have one to 🔥💪🏻

2条回答

网友

1楼 · 编辑于 2024-10-03 23:17:46

通过迭代每行中的unicode字符（使用emoji和unicodedata包），可以检测仅包含emojis的行：

df = {}
df['Comments'] = ["Test", "Hello 😉", "😉😉😉"]

import unicodedata
import numpy as np
from emoji import UNICODE_EMOJI
for i in range(len(df['Comments'])):
    pure_emoji = True
    for unicode_char in unicodedata.normalize('NFC', df['Comments'][i]):
        if unicode_char not in UNICODE_EMOJI:
            pure_emoji = False
            break
    if pure_emoji:
        df['Comments'][i] = np.NaN
print(df['Comments'])

网友
2楼 · 编辑于 2024-10-03 23:17:46

函数（remove_emoji）引用https://stackoverflow.com/a/61839832/6075699
试试看
安装第一个emojilib-pip install emoji
import re import emoji df.Comments.apply(lambda x: x if (re.sub(r'(:[!_\-\w]+:)', '', emoji.demojize(x)) != "") else np.nan) 0 nice 1 Insane3 2 NaN 3 @bertelsen1986 4 Luckily I have one to 🔥💪🏻 Name: a, dtype: object

相关问题更多 >

编程相关推荐

热门问题

热门文章