TypeError:name()参数1必须是unicode字符,而不是str python

2024-10-03 19:20:28 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图使用Gender Computer将性别数据生成到我的数据框中。这是我的代码:

import os
import pandas as pd
import numpy as np
import re

crd = os.getcwd()
df_hash = pd.read_csv(crd +"\\hashtag4.csv", encoding="utf-8")

from genderComputer import GenderComputer
gc = GenderComputer()

df_hash['gender'] = gc.resolveGender(df_hash['full_name'], None)

但我有一个错误:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~\Desktop\recsys\genderComputer-master\nameUtils.py in is_cyrillic(uchr)
    109 def is_cyrillic(uchr):
--> 110     try: return cyrillic_letters[uchr]
    111     except KeyError:

KeyError: 'CoffeeCaine'

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
<ipython-input-29-6fd7ed6bd781> in <module>
----> 1 df_hash['gender'] = gc.resolveGender(df_hash['full_name'], None)

~\Desktop\recsys\genderComputer-master\genderComputer.py in resolveGender(self, name, country)
    558         def resolveGender(self, name, country):
    559                 '''Check if name is written in Cyrillic or Greek script, and transliterate'''
--> 560                 if only_cyrillic_chars(name) or only_greek_chars(name):
    561                         name = unidecode(name)
    562 

~\Desktop\recsys\genderComputer-master\nameUtils.py in only_cyrillic_chars(unistr)
    115 def only_cyrillic_chars(unistr):
    116     return all(is_cyrillic(uchr)
--> 117         for uchr in unistr if uchr.isalpha())
    118 
    119 '''Check whether a given character is written in Greek'''

~\Desktop\recsys\genderComputer-master\nameUtils.py in <genexpr>(.0)
    115 def only_cyrillic_chars(unistr):
    116     return all(is_cyrillic(uchr)
--> 117         for uchr in unistr if uchr.isalpha())
    118 
    119 '''Check whether a given character is written in Greek'''

~\Desktop\recsys\genderComputer-master\nameUtils.py in is_cyrillic(uchr)
    110     try: return cyrillic_letters[uchr]
    111     except KeyError:
--> 112         return cyrillic_letters.setdefault(uchr, 'CYRILLIC' in unicodedata.name(uchr))
    113 
    114 '''Check whether a given string is written in Cyrillic'''

TypeError: name() argument 1 must be a unicode character, not str

这是'df_hash['full_name'中的内容:

enter image description here

我知道的是我需要对列中的所有值进行编码。您可以从索引3中的CoffeeCaine中看到错误。我已经尝试对列或整个数据帧进行编码,如df_hash['full_name'].str.encode("utf-8")df_hash.full_name.str.encode('utf-8'),使用编码加载csv,或者加载到数据帧中,然后使用编码再次将其保存到csv,但仍然没有效果

我试着用一个像“John”这样的字符串来代替这个列,它可以工作,它会创建一个新的列,所有的值都是“male”。同样,当我删除CoffeeCaine行时,错误会像以前一样再次出现在某些值上。有没有办法解决这个问题


Tags: nameinpyimportmasterdfreturnis
1条回答
网友
1楼 · 发布于 2024-10-03 19:20:28

我怀疑问题在于您传递的是整个数据帧而不是单个str。因此,它会迭代数据帧的条目,而不是str的字符。请尝试以下操作:

df_hash['gender'] = df_hash['full_name'].apply(lambda s: gc.resolveGender(s, None))

相关问题 更多 >