如何从选定的pandas.df行开始for循环？

1条回答

网友

1楼 · 发布于 2024-10-02 16:22:46

如果您的API需要编码到GBK，那么只需使用'strict'（默认值）以外的错误处理程序编码到该编解码器

'ignore'将删除任何无法编码为GBK的代码点：

dfs['ssentence_encoded'] = dfs['ssentence'].str.encode('gbk', 'ignore')

参见Error Handlers section of Python's ^{} documentation

如果需要传入字符串，但只传入可以安全编码为GBK的字符串，那么我将创建一个适合^{} method的翻译映射：

class InvalidForEncodingMap(dict):
    def __init__(self, encoding):
        self._encoding = encoding
        self._negative = set()
    def __missing__(self, codepoint):
        if codepoint in self._negative:
            raise LookupError(codepoint)
        if chr(codepoint).encode(self._encoding, 'ignore'):
            # can be mapped, record as a negative and raise
            self._negative.add(codepoint)
            raise LookupError(codepoint)
        # map to None to remove
        self[codepoint] = None
        return None

only_gbk = InvalidForEncodingMap('gbk')
dfs['ssentence_gbk_safe'] = dfs['sentence'].str.translate(only_gbk)

InvalidForEncodingMap类在查找代码点时延迟地创建条目，因此只处理数据中实际存在的代码点。我仍然会保留map实例以供重用如果您需要多次使用它，那么它构建的缓存可以这样重用

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何从选定的pandas.df行开始for循环？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >