将难看的字符串格式化为列表/字典

2024-05-19 11:30:44 发布

您现在位置:Python中文网/ 问答频道 /正文

感谢您查看以下内容。从“搁置”,我有一个丑陋的数据集。你知道吗

import shelve,os,re

os.chdir('''.\wikiUSstates''')

shelfFile = shelve.open('''.\IndiaTEMP''')
k=shelfFile['indiaprovinces']
print(type(k)) #This is type STR
print(k)

返回下面的。这是印度的省份,其次是它们的首都。你知道吗

<class 'str'>
['Andhra Pradesh – Hyderabad (Proposed Capital: Amaravati in Guntur district. See ', 'below.)', 'Arunachal Pradesh – Itanagar', 'Assam – Dispur', 'Bihar – Patna', 'Goa – Panaji', 'Gujarat – Gandhinagar', 'Haryana – Chandigarh', 'Himachal Pradesh – Shimla', 'Jammu & Kashmir – Srinagar (Winter : Jammu)', 'Karnataka – Bangalooru', 'Kerala – Thiruvananthapuram', 'Madhya Pradesh – Bhopal', 'Maharashtra – Mumbai', 'Manipur – Imphal', 'Meghalaya – Shillong', 'Mizoram – Aizawl', 'Nagaland – Kohima', 'Orissa – Bhubaneswar', 'Punjab – Chandigarh', 'Rajasthan – Jaipur', 'Sikkim – Gangtok', 'Tamil Nadu – Chennai', 'Tripura – Agartala', 'Uttar Pradesh – Lucknow', 'West Bengal – Kolkata', 'Chhattisgarh – Raipur', 'Uttarakhand – Dehradun', 'Jharkhand – Ranchi', 'Telangana – Hyderabad (see ', ' below)', 'Delhi (National Capital Territory of Delhi or NCT) – New Delhi *', 'Andaman & Nicobar Islands – Port Blair', 'Chandigarh – Chandigarh', 'Dadra & Nagar Haveli – Silvasa', 'Daman & Diu – Daman', 'Lakshadweep – Kavaratti', 'Puducherry – Puducherry', '\n', '\n', '\n', '\n']

我正在努力使这个数据变得可行(或者是一个列表,我可以在这个列表上迭代索引编号,列表[0]是省,列表[1]是首都,或者更好的是一个字典(键是“省”,值是“首都”)。你知道吗

我试着用REGEX去掉连字符,用comas替换它们,但没有成功。字符串末尾难看的新行也必须注意(k.replace??)。你知道吗

干杯,谢谢你的手!你知道吗

热情的初学者。你知道吗


Tags: 数据列表ostypeshelvebelowprintcapital
1条回答
网友
1楼 · 发布于 2024-05-19 11:30:44

我会给你一个开始,你可以修改,使之工作,你想要的。你知道吗

>>> def clean_string(string):
...     bracket = string.find('(')
...     if bracket > 1:
...         string = string[:bracket]
...     return string.strip()
... 
>>> for line in k:
...     print(list(map(clean_string, line.split('–'))))
... 
['Andhra Pradesh', 'Hyderabad']
['below.)']
['Arunachal Pradesh', 'Itanagar']
['Assam', 'Dispur']
['Bihar', 'Patna']
['Goa', 'Panaji']
['Gujarat', 'Gandhinagar']
['Haryana', 'Chandigarh']
['Himachal Pradesh', 'Shimla']
['Jammu & Kashmir', 'Srinagar']
['Karnataka', 'Bangalooru']
['Kerala', 'Thiruvananthapuram']
['Madhya Pradesh', 'Bhopal']
['Maharashtra', 'Mumbai']
['Manipur', 'Imphal']
['Meghalaya', 'Shillong']
['Mizoram', 'Aizawl']
['Nagaland', 'Kohima']
['Orissa', 'Bhubaneswar']
['Punjab', 'Chandigarh']
['Rajasthan', 'Jaipur']
['Sikkim', 'Gangtok']
['Tamil Nadu', 'Chennai']
['Tripura', 'Agartala']
['Uttar Pradesh', 'Lucknow']
['West Bengal', 'Kolkata']
['Chhattisgarh', 'Raipur']
['Uttarakhand', 'Dehradun']
['Jharkhand', 'Ranchi']
['Telangana', 'Hyderabad']
['below)']
['Delhi', 'New Delhi *']
['Andaman & Nicobar Islands', 'Port Blair']
['Chandigarh', 'Chandigarh']
['Dadra & Nagar Haveli', 'Silvasa']
['Daman & Diu', 'Daman']
['Lakshadweep', 'Kavaratti']
['Puducherry', 'Puducherry']
['']
['']
['']
['']

可能有助于查看字符串中是否存在'–',这样我们就不会得到像['below.)']['']这样的垃圾值。你知道吗

相关问题 更多 >

    热门问题