在python中处理http用户代理的unicode字符

2024-09-30 00:38:26 发布

男 | 程序猿一只，喜欢编程写python代码。

我对python完全陌生，但我找到了一个需要使用的包，并正在测试它。有问题的python包是pywurfl。在

我根据从简单文本文件中的列读取用户代理（UA）字符串的示例创建了一个简单的代码。有大量的ua（有些可能有外来字符）。现在，使用bash output命令“>；”和一个perl脚本生成了包含UAs的文件。例如，perlsomescript.pl>；输出文件.txt. 在

但是，当在该文件中运行以下代码时，我得到一个错误。在

#!/usr/bin/python

import fileinput
import sys

from wurfl import devices
from pywurfl.algorithms import LevenshteinDistance


for line in fileinput.input():
    line = line.rstrip("\r\n")    # equiv of chomp
    H = line.split('\t')

    if H[27]=='Mobile':

        user_agent = H[23].decode('utf8')           
        search_algorithm = LevenshteinDistance()
        device = devices.select_ua(user_agent, search=search_algorithm)

        sys.stdout.write( "%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s" % (user_agent, device.devid, device.devua, device.fall_back, device.actual_device_root, device.brand_name, device.marketing_name, device.model_name, device.device_os, device.device_os_version, device.mobile_browser, device.mobile_browser_version, device.model_extra_info, device.pointing_method, device.has_qwerty_keyboard, device.is_tablet, device.has_cellular_radio, device.max_data_rate, device.wifi, device.dual_orientation, device.physical_screen_height, device.physical_screen_width,device.resolution_height, device.resolution_width, device.full_flash_support, device.built_in_camera, device.built_in_recorder, device.receiver, device.sender, device.can_assign_phone_number, device.is_wireless_device, device.sms_enabled) + "\n")

    else:
        # do something else
        pass

这里H[23]是具有UA字符串的列。但是我得到了一个错误

^{pr2}$

当我用“latin1”替换“utf8”时，我得到了以下错误

 sys.stdout.write(................) # with the .... as in the code
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa9' in position 0: ordinal not in range(128).

我做错什么了吗？我需要转换Unicode格式的UA字符串，因为包是这样的。我不太精通Unicode，尤其是python。我如何处理这个错误？例如，找出导致这个错误的UA字符串，这样我就可以提出一个更明智的问题了？在

Tags：文件字符串代码 name in import search device

1条回答

网友

1楼 · 发布于 2024-09-30 00:38:26

看来你有两个不同的问题。在

第一个是假设输入文件是utf-8，而不是utf-8。将输入编码改为拉丁语-1可以解决这个问题。在

第二个问题是，stdout似乎被设置为ascii输出，因此写入失败。对此，this question可能会有所帮助。在

在python中处理http用户代理的unicode字符

相关问题更多 >

编程相关推荐

热门问题

热门文章

在python中处理http用户代理的unicode字符

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >