使用python分隔字符串中的连接词

2024-10-03 04:35:37 发布

您现在位置:Python中文网/ 问答频道 /正文

"10JAN2015AirMail standard envelope from HyderabadAddress details:John Cena Palm DriveAdelaide.Also Contained:NilAction Taken:Goods referred to HGI QLD for further action.Attachments:Nil34FEB2004"

我要做的是用python读取这个字符串,并分离连接的单词。我真正想要的是一个正则表达式来分隔字符串中的连接词。你知道吗

我想从一个文件中读取上面的字符串,输出应该如下所示:

"10 JAN 2015 AirMail standard envelope from Hyderabad Address details : John Cena Palm Drive Adelaide. Also calculated: Nil Action Taken: Goods referred to USG for further action. Attachments : Nil 60 FEB 2004." 

(把连词分开)

我需要写一个正则表达式来分隔:

'10Jan2015AirMail', 'HyderabadAddress', 'details:John', 'DriveAdelaide'

需要一个正则表达式来识别像上面这样的连接词,并在同一个字符串中用空格分隔它们,像

'10 Jan 2015 AirMail, 'Hyderabad Address', 'details : John'

text = open('C:\sample.txt', 'r').read().replace("\n","").replace("\t","").replace("-",""‌​).replace("/"," ")

newtext = re.sub('[a-zA-Z0-9_:]','',text) #This regex does not work.Please assist

print text
print newtext

上面的代码不起作用


Tags: 字符串textfromdetailsjohntakenreplacestandard
1条回答
网友
1楼 · 发布于 2024-10-03 04:35:37

我知道这个解决方案可以做得非常简单,在集合中分类字符(上,下,数字),但我更喜欢做一个更详细的解决方案:

test_text = "10JAN2015AirMail standard envelope from HyderabadAddress details:John Cena Palm DriveAdelaide.Also Contained:NilAction Taken:Goods referred to HGI QLD for further action.Attachments:Nil34FEB2004"
splitted_text = test_text.split(' ')
num = False
low = False
upp = False
result = []

for word in ss:
  new_word = ''
  if not word.isupper() and not word.islower():
    if word[0].isnumeric():
        num = True
        low = False
        upp = False
    elif word[0].islower():
        num = False
        low = True
        upp = False
    elif word[0].isupper():
        num = False
        low = False
        upp = True
    for letter in word:
      if letter.isnumeric():
        if num:
            new_word += letter
        else:
            new_word += ' ' + letter
        low = False
        upp = False
        num = True
      elif letter.islower():
        if low or upp:
            new_word += letter
        else:
            new_word += ' ' + letter
        low = True
        upp = False
        num = False
      elif letter.isupper():
        if low or num:
            new_word += ' ' + letter
        else:
            new_word += letter
        low = False
        upp = True
        num = False
      else:
        new_word += ' ' + letter
    result.append(''.join(new_word))
  else:
    result.append(word)
' '.join(result)
#'10 JAN 2015 Air Mail standard envelope from Hyderabad Address details : John Cena Palm Drive Adelaide . Also Contained : Nil Action Taken : Goods referred to HGI QLD for further action . Attachments : Nil 34 FEB 2004'

有时一个人只需要被指向正确的方向。你知道吗

相关问题 更多 >