Python正则表达式通用解决方案

2024-09-22 20:20:43 发布

您现在位置:Python中文网/ 问答频道 /正文

我需要把这个字符串分成一本字典这个。拜托请注意,字符串中键的顺序可能不同。你知道吗

String = 'Specialty: "Neurology: Neurology, NeuroScience", Profession: Nurse Practitioner, Source: TestSource'

Dict = { 'Specialty': "Neurology: Neurology, NeuroScience", 'Profession': 'Nurse Practitioner', 'Source': 'TestSource' }

一个正则表达式解决这个问题将非常感谢。你知道吗


Tags: 字符串sourcestring字典顺序dictpractitionerprofession
2条回答

您需要这样移动:

def create_dict(string, splitter=',', dict_splitter=':'):
    _dict = {}

    temp = ([s for s in string.split(splitter)])

    for item in temp:
        key = item.split(dict_splitter)[0]
        value = item.split(dict_splitter)[1]
        _dict[key] = value

    return _dict

string = 'Specialty: "Neurology; Neurology NeuroScience", Profession: Nurse Practitioner, Source: TestSource'

_dict = create_dict(string)

for k, v in _dict.items():
    print(k, '\t', v)


 #  Output must be like this

 #   Specialty    "Neurology; Neurology NeuroScience"
 #  Profession   Nurse Practitioner
 #  Source       TestSource

最简单的方法是使用适当的解析器,比如pyparsingpip install pyparsing):

from pyparsing import *

text = 'Specialty: "Neurology: Neurology, NeuroScience", Profession: Nurse Practitioner, Source: TestSource'

word = Word(alphas)
key = word + Suppress(':')
words = Combine(word + ZeroOrMore(" " + word))
value = (QuotedString('"') ^ words) + Optional(Suppress(', '))

dictionary = dictOf(key, value)

print dictionary.parseString(text).asDict()
# => {'Source': 'TestSource', 'Profession': 'Nurse Practitioner', 'Specialty': 'Neurology: Neurology, NeuroScience'}

我们定义了一种语法,它将word定义为一系列字母,key定义为一个单词后跟一个冒号(我们不考虑这个问题),words定义为一个字符串,该字符串可能由一个单词组成,其中多个单词之间用空格隔开,value定义为单词或一个字符串,该字符串用双引号引起来,可能以逗号结尾(我们不需要),然后作为键和值对列表的dictionary。然后我们让解析器做它的事情。你知道吗

编辑:但我想如果你真的想要一个regexp解决方案。。。你知道吗

print {m[0]: m[1] or m[2]
    for m in re.findall(r'([^,:\s]+): (?:"([^"]*)"|([^,]+))', text)}

相关问题 更多 >