从字符串创建嵌套列表

2024-10-01 15:31:07 发布

您现在位置:Python中文网/ 问答频道 /正文

这是一系列区域位置,以及新加坡的各个分区

Bishan[1]
Bishan East
Marymount
Upper Thomson
Bukit Merah[2] (Not to be confused with Bukit Merah subzone.)
Alexandra Hill
Alexandra North
Bukit Ho Swee
Bukit Merah (Not to be confused with Bukit Merah planning area.)
City Terminals (Formerly called "Tanjong Pagar" subzone.)
Depot Road
Everton Park
Henderson Hill
Kampong Tiong Bahru
Maritime Square (Formerly called "HarbourFront" subzone.)
Redhill
Singapore General Hospital
Telok Blangah Drive
Telok Blangah Rise
Telok Blangah Way
Tiong Bahru
Tiong Bahru Station
Bukit Timah[3]
Anak Bukit
Coronation Road
Farrer Court
Hillcrest
Holland Road
Leedon Park
Swiss Club
Ulu Pandan
Downtown Core[4]
Anson
Bayfront
Bugis
Cecil
Central
City Hall
Clifford Pier
Marina Centre
Maxwell
Phillip
Raffles Place
Tanjong Pagar
Geylang[5]
Aljunied
Geylang East
Kallang Way
MacPherson
Kampong Ubi
Kallang[6]
Bendemeer
Boon Keng
Crawford
Geylang Bahru
Kallang Bahru
Kampong Bugis
Kampong Java
Lavender
Tanjong Rhu

或者,作为Python字符串:

data = 'Bishan[1]\nBishan East\nMarymount\nUpper Thomson\nBukit Merah[2] (Not to be confused with Bukit Merah subzone.)\nAlexandra Hill\nAlexandra North\nBukit Ho Swee\nBukit Merah (Not to be confused with Bukit Merah planning area.)\nCity Terminals (Formerly called "Tanjong Pagar" subzone.)\nDepot Road\nEverton Park\nHenderson Hill\nKampong Tiong Bahru\nMaritime Square (Formerly called "HarbourFront" subzone.)\nRedhill\nSingapore General Hospital\nTelok Blangah Drive\nTelok Blangah Rise\nTelok Blangah Way\nTiong Bahru\nTiong Bahru Station\nBukit Timah[3]\nAnak Bukit\nCoronation Road\nFarrer Court\nHillcrest\nHolland Road\nLeedon Park\nSwiss Club\nUlu Pandan\nDowntown Core[4]\nAnson\nBayfront\nBugis\nCecil\nCentral\nCity Hall\nClifford Pier\nMarina Centre\nMaxwell\nPhillip\nRaffles Place\nTanjong Pagar\nGeylang[5]\nAljunied\nGeylang East\nKallang Way\nMacPherson\nKampong Ubi\nKallang[6]\nBendemeer\nBoon Keng\nCrawford\nGeylang Bahru\nKallang Bahru\nKampong Bugis\nKampong Java\nLavender\nTanjong Rhu\n'

square brackets[]的单词是区域,后面是由换行符\n分隔的子区域。我想做的是创建一个分区列表,其中包含一个子列表,如下所示(稍后我将删除方括号和圆括号及其内容):

1.璧山[1]

- Bishan East
- Marymount
- Upper Thomson

2.Bukit Merah[2](不要与Bukit Merah分区混淆。)

- Alexandra Hill
- Alexandra North
- Bukit Ho Swee
- Bukit Merah (Not to be confused with Bukit Merah planning area.)
- City Terminals (Formerly called "Tanjong Pagar" subzone.)

到目前为止,我只能使用split()和regex提取区域

zones_and_subzones = data.split('\n')
zones = [zone for zone in zones_and_subzones if re.match(r'(.*?)\[', zone)]

这就是我所要解决的问题,我在提取每个分区的子分区时遇到了麻烦。我试着用

regex = (\].*?\[)

提取结束方括号和开始方括号之间的文本,但其结果不完整。我做这件事已经有一段时间了,非常感谢你的帮助。如果有比我现在拥有的更好的方法,请分享。多谢各位


Tags: towithnotbe分区eastroadhill
2条回答

按换行方式拆分,然后逐行检查,确定每行是“标题”还是“内容”。使用字典按标题访问内容

s = your data
result = {}
for item in s.splitlines():
    if '[' in item:
        key = item
        result[key] = []
    else:
        result[key].append(item)

结果是一个类似于{'Bishan[1]': ['Bishan East', 'Marymount', 'Upper Thomson'], ...}的字典

在这种情况下,最好使用字典,特别是为了更快地实现,我会使用默认dict:

from collections import defaultdict 
dicti = defaultdict(lambda:[])
for word in str_data.split('\n'):
    if '[' in word and ']' in word:
        name = word
    else:
        dicti[name].append(word) # or alternatively -> `dicti[name] += [word]`
>>>dicti
{'Bishan[1]': ['Bishan East', 'Marymount', 'Upper Thomson'],
             'Bukit Merah[2] (Not to be confused with Bukit Merah subzone.)': ['Alexandra Hill',
              'Alexandra North',
              'Bukit Ho Swee',
              'Bukit Merah (Not to be confused with Bukit Merah planning area.)',
              'City Terminals (Formerly called "Tanjong Pagar" subzone.)',
              'Depot Road',
              'Everton Park',
              'Henderson Hill',
              'Kampong Tiong Bahru',
              'Maritime Square (Formerly called "HarbourFront" subzone.)',
              'Redhill',
              'Singapore General Hospital',
              'Telok Blangah Drive',
              'Telok Blangah Rise',
              'Telok Blangah Way',
              'Tiong Bahru',
              'Tiong Bahru Station'],
   #...
})

相关问题 更多 >

    热门问题