用奇怪的编码从Python中的.txt url收集数据

2条回答

网友

1楼 · 编辑于 2024-06-26 19:38:44

我不知道这是不是最好的方法，但它能得到你想要的结果。所以如果有人有更好的方法，就分享吧。你知道吗

在这里：

import requests

URL = "http://wps.aw.com/wps/media/objects/8992/9208383/Data_Sets/Ascii/Chapter1/HTWT1.txt"

response = requests.get(URL)

data = dict()

text = response.content.decode('ISO-8859-1').encode('utf-8').replace('\x00', '').strip()[2:]
for row in text[2:].splitlines()[1:]:
    OBS, x, y = row.split('\t')
    data[int(OBS)] = dict(x=int(x), y=int(y))

print data

输出：

{
    1: {
        'y': 140,
        'x': 5
    },
    2: {
        'y': 157,
        'x': 9
    },
    3: {
        'y': 205,
        'x': 13
    },
    4: {
        'y': 198,
        'x': 12
    },
    5: {
        'y': 162,
        'x': 10
    },
    6: {
        'y': 174,
        'x': 11
    },
    7: {
        'y': 150,
        'x': 8
    },
    8: {
        'y': 165,
        'x': 9
    },
    9: {
        'y': 170,
        'x': 10
    },
    10: {
        'y': 180,
        'x': 12
    },
    11: {
        'y': 170,
        'x': 11
    },
    12: {
        'y': 162,
        'x': 9
    },
    13: {
        'y': 165,
        'x': 10
    },
    14: {
        'y': 180,
        'x': 12
    },
    15: {
        'y': 160,
        'x': 8
    },
    16: {
        'y': 155,
        'x': 9
    },
    17: {
        'y': 165,
        'x': 10
    },
    18: {
        'y': 190,
        'x': 15
    },
    19: {
        'y': 185,
        'x': 13
    },
    20: {
        'y': 155,
        'x': 11
    }
}

ADDED:

如果需要一些代码来解析特定的txt格式，可以使用下面这样的更通用的脚本。您只需根据txt文件头更改头列表（无OBS）：

import requests

def wrapper(thelist):
    return thelist[0], thelist[1:]

# URL = "http://wps.aw.com/wps/media/objects/8992/9208383/Data_Sets/Ascii/Chapter1/HTWT1.txt"
URL = "http://wps.aw.com/wps/media/objects/8992/9208383/Data_Sets/Ascii/Chapter7/CARS7.txt"

response = requests.get(URL)

data = dict()

# headers = ['X', 'Y']
headers = ['Make', 'Model', 'Time', 'Speed', 'Top', 'Weight', 'HP'] # Must be in order and without OBS

text = response.content.decode('ISO-8859-1').encode('utf-8').replace('\x00', '').strip()[2:]
for row in text[2:].splitlines()[1:]:
    OBS, extras = wrapper(row.split('\t'))
    helper_dict = dict()

    for extra in extras:
        header = headers[extras.index(extra)]
        helper_dict[header] = extra
    data[int(OBS)] = helper_dict

print data

输出：

{
    1: {
        'Weight': '1335',
        'Make': 'Audi',
        'Time': '8.9',
        'HP': '150',
        'Model': 'TT Roadster',
        'Speed': '133',
        'Top': '0'
    },
    2: {
        'Weight': '1240',
        'Make': 'Mini ',
        'Time': '7.4',
        'HP': '168',
        'Model': 'Cooper S',
        'Speed': '134',
        'Top': '0'
    },
    3: {
        'Weight': '1711',
        'Make': 'Volvo',
        'Time': '7.4',
        'HP': '220',
        'Model': 'C70 T5 Sport',
        'Speed': '150',
        'Top': '0'
    },
    4: {
        'Weight': '1680',
        'Make': 'Saab',
        'Time': '7.9',
        'HP': '247',
        'Model': ' Nine-Three ',
        'Speed': '149',
        'Top': '0'
    },
    5: {
        'Weight': '1825',
        'Make': 'Mercedes-Benz',
        'Time': '6.6',
        'HP': '268',
        'Model': 'SL350',
        'Speed': '155',
        'Top': '0'
    },
    6: {
        'Weight': '1703',
        'Make': 'Jaguar',
        'Time': '6.7',
        'HP': '290',
        'Model': 'XK8',
        'Speed': '154',
        'Top': '0'
    },
    7: {
        'Weight': '1950',
        'Make': 'Bugatti',
        'Time': '2.4',
        'HP': '1000',
        'Model': 'Veyron 16.4',
        'Speed': '253',
        'Top': '1'
    },
    8: {
        'Weight': '875',
        'Make': 'Lotus',
        'Time': '4.9',
        'HP': '189',
        'Model': 'Exige',
        'Speed': '147',
        'Top': '1'
    },
    9: {
        'Weight': '1257',
        'Make': 'BMW',
        'Time': '6.7',
        'HP': '220',
        'Model': 'M3 (E30)',
        'Speed': '144',
        'Top': '1'
    },
    10: {
        'Weight': '1510',
        'Make': 'BMW',
        'Time': '5.9',
        'HP': '231',
        'Model': '330i Sport',
        'Speed': '155',
        'Top': '1'
    },
    11: {
        'Weight': '1350',
        'Make': 'Porsche',
        'Time': '5.3',
        'HP': '291',
        'Model': 'Cayman S',
        'Speed': '171',
        'Top': '1'
    },
    12: {
        'Weight': '1560',
        'Make': 'Nissan',
        'Time': '4.7',
        'HP': '276',
        'Model': 'Skyline GT-R (R34)',
        'Speed': '165',
        'Top': '1'
    },
    13: {
        'Weight': '1270',
        'Make': 'Porsche',
        'Time': '4.7',
        'HP': '300',
        'Model': '911 RS',
        'Speed': '172',
        'Top': '1'
    },
    14: {
        'Weight': '1584',
        'Make': 'Ford',
        'Time': '5',
        'HP': '319',
        'Model': 'Shelby GT',
        'Speed': '150',
        'Top': '1'
    },
    15: {
        'Weight': '1260',
        'Make': 'Mitsubishi',
        'Time': '4.4',
        'HP': '320',
        'Model': 'Evo VII RS Sprint',
        'Speed': '150',
        'Top': '1'
    },
    16: {
        'Weight': '1630',
        'Make': 'Aston Martin',
        'Time': '5.2',
        'HP': '380',
        'Model': 'V8 Vantage',
        'Speed': '175',
        'Top': '1'
    },
    17: {
        'Weight': '1540',
        'Make': 'Mercedes-Benz',
        'Time': '4.8',
        'HP': '355',
        'Model': 'SLK55 AMG',
        'Speed': '155',
        'Top': '1'
    },
    18: {
        'Weight': '1930',
        'Make': 'Maserati',
        'Time': '5.1',
        'HP': '394',
        'Model': 'Quattroporte Sport GT',
        'Speed': '171',
        'Top': '1'
    },
    19: {
        'Weight': '1275',
        'Make': 'Spyker',
        'Time': '4.5',
        'HP': '400',
        'Model': 'C8',
        'Speed': '187',
        'Top': '1'
    },
    20: {
        'Weight': '1161',
        'Make': 'Ferrari',
        'Time': '4.9',
        'HP': '400',
        'Model': '288GTO',
        'Speed': '189',
        'Top': '1'
    },
    21: {
        'Weight': '1130',
        'Make': 'Mosler',
        'Time': '3.9',
        'HP': '435',
        'Model': 'MT900',
        'Speed': '190',
        'Top': '1'
    },
    22: {
        'Weight': '1447',
        'Make': 'Lamborghini',
        'Time': '4.9',
        'HP': '455',
        'Model': 'Countach QV',
        'Speed': '180',
        'Top': '1'
    },
    23: {
        'Weight': '1290',
        'Make': 'Chrysler',
        'Time': '4',
        'HP': '460',
        'Model': 'Viper GTS-R',
        'Speed': '190',
        'Top': '1'
    },
    24: {
        'Weight': '2585',
        'Make': 'Bentley',
        'Time': '5.2',
        'HP': '500',
        'Model': 'Arnage T',
        'Speed': '179',
        'Top': '1'
    },
    25: {
        'Weight': '1350',
        'Make': 'Ferrari',
        'Time': '3.5',
        'HP': '503',
        'Model': '430 Scuderia',
        'Speed': '198',
        'Top': '1'
    },
    26: {
        'Weight': '1247',
        'Make': 'Saleen',
        'Time': '3.3',
        'HP': '550',
        'Model': 'S7',
        'Speed': '240',
        'Top': '1'
    },
    27: {
        'Weight': '1650',
        'Make': 'Lamborghini',
        'Time': '4',
        'HP': '570',
        'Model': 'Murcielago',
        'Speed': '205',
        'Top': '1'
    },
    28: {
        'Weight': '1230',
        'Make': 'Pagani',
        'Time': '3.6',
        'HP': '602',
        'Model': 'Zonda F',
        'Speed': '214',
        'Top': '1'
    },
    29: {
        'Weight': '1140',
        'Make': 'McLaren',
        'Time': '3.2',
        'HP': '627',
        'Model': 'F1',
        'Speed': '240',
        'Top': '1'
    },
    30: {
        'Weight': '1180',
        'Make': 'Koenigsegg ',
        'Time': '3.2',
        'HP': '806',
        'Model': 'CCR',
        'Speed': '242',
        'Top': '1'
    }
}

网友

2楼 · 编辑于 2024-06-26 19:38:44

试试Python 3

http_pool = urllib3.connection_from_url(url)
# Submit request, and write data locally
response = http_pool.urlopen('GET', url)

with open('local.txt', 'wb') as f:
    f.write(response.data)

Python 2-（未测试）

req = urllib2.Request(url)
response = urllib2.urlopen(req)
the_page = response.read()

相关问题更多 >

编程相关推荐

热门问题

热门文章

用奇怪的编码从Python中的.txt url收集数据

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >