如何在python中从数据集中删除unicode

1条回答

网友

1楼 · 发布于 2024-06-28 14:31:27

You can use Series.str.decode() on the columns with the offending encoding,但如果您可以重新读取数据并直接访问数据，我不喜欢这种方法。你知道吗

您可以在读取数据时使用encoding='utf-8'参数，Pandas将尝试为您解决这个问题。假设您的数据是csv格式的，并且是UTF-8编码的：

df = pd.read_csv("yourfile.csv", encoding="utf-8")

编辑：您注意到您的数据是从数据库导入的，pandas.read_sql没有encoding参数。因此，我建议使用我的第一个建议Series.str.decode()。您可以在列上使用它：

df["column_name"] = df["column_name"].str.decode("encoding_name")

如果遇到错误，可以传递kwarg errors，默认值为strict，但也可以ignore。你知道吗

df["column_name"] = df["column_name"].str.decode("encoding_name", errors="policy")