Python按字符组拆分文本

2024-09-29 07:35:21 发布

您现在位置:Python中文网/ 问答频道 /正文

我试着在每一个字符组中将一些文本解析成片段,在我的例子中,字符组将是“*((”和“)”

import re
file = "Name* ((Bla Bla Bla (Bla Bla) A40 & A41)) Name2* ((Bla Bla Bla (Bla Bla) A42 & A43)) Name3* ((Bla Bla Bla (Bla Bla) A44 & A45)) Name4* ((Bla Bla Bla (Bla Bla) A46 & A47)) Name5* ((Bla Bla Bla (Bla Bla) A48 & A49)) Name6* ((Bla Bla Bla (Bla Bla) A50 & A51)) Name7* ((Bla Bla Bla (Bla Bla) A452 & A53)) Name8* ((Bla Bla Bla (Bla Bla) A54 & A55)) Name9* ((Bla Bla Bla (Bla Bla) A56 & A57)) Name10* ((Bla Bla Bla (Bla Bla) A58 & A59)) Name11* ((Bla Bla Bla (Bla Bla) A60 & A61)) Name12* ((Bla Bla Bla (Bla Bla) A62 & A63)) Name13* ((Bla Bla Bla (Bla Bla) A64 & A65)) Name14* ((Bla Bla Bla (Bla Bla) A66 & A67)) Name14* ((Bla Bla Bla (Bla Bla) A68 & A69))"
parse = re.split('[* ((][)) ]', file)
print parse

我的结果是:

['Name', '((Bla Bla Bla (Bla Bla) A40 & A41)) Name2', '((Bla Bla Bla (Bla Bla) A42 & A43)) Name3', '((Bla Bla Bla (Bla Bla) A44 & A45)) Name4', '((Bla Bla Bla (Bla Bla) A46 & A47)) Name5', '((Bla Bla Bla (Bla Bla) A48 & A49)) Name6', '((Bla Bla Bla (Bla Bla) A50 & A51)) Name7', '((Bla Bla Bla (Bla Bla) A452 & A53)) Name8', '((Bla Bla Bla (Bla Bla) A54 & A55)) Name9', '((Bla Bla Bla (Bla Bla) A56 & A57)) Name10', '((Bla Bla Bla (Bla Bla) A58 & A59)) Name11', '((Bla Bla Bla (Bla Bla) A60 & A61)) Name12', '((Bla Bla Bla (Bla Bla) A62 & A63)) Name13', '((Bla Bla Bla (Bla Bla) A64 & A65)) Name14', '((Bla Bla Bla (Bla Bla) A66 & A67)) Name14', '((Bla Bla Bla (Bla Bla) A68 & A69))']

它似乎只是在“*”处拆分文本。我似乎不知道如何设置多个多字符分隔符。有人有什么建议吗?谢谢。你知道吗


Tags: name文本re字符fileblaname2name3
3条回答

我想分享我最终使用的解决方案,以防其他人受益。有一个混合正则表达式在那里,但我用芬德尔而不是分裂。现在我已经到了这一步,我必须进一步研究控制输出。数据被转储到3个字段(从\节点、到\节点、链接)。我需要第一个“To\u Node”的值变成下一行“from\u Node”的值,依此类推。想象点沿着一条线,点a到B,然后点B到C,然后点C到D,等等。。。。以我有限的知识,我甚至不知道从哪里开始寻找这个解决方案。有什么想法吗?你知道吗

import re, arcpy

# Local variables:
Table1 = "D:\Database1.mdb\\Table1"
RAW_Data = "D:\Database1.mdb\RAW_Data"

#Create Cursors and Insert Rows
insertcursor = arcpy.da.InsertCursor(Table1, ["From_Node", "To_Node", "Link"])
with arcpy.da.SearchCursor(RAW_Data, ["Field1", "Field1", "Field1"]) as searchcursor:
    try: 
        for row in searchcursor:
            listFrom_Node = re.findall('\w+(?=\*\s*)', row[0]) #From Node
            print listFrom_Node
            print "From Node List Success"
            listTo_Node = re.findall('\w+(?=\*\s*)', row[1]) #To Node
            print listTo_Node
            print "To Node List Success"
            listLink = re.findall('\(\((.*?)\)\)', row[2]) #Link descriptions
            print listLink
            print "Link List Success"
            for n,Value in enumerate(listFrom_Node):
                insertcursor.insertRow((listFrom_Node[n], listTo_Node[n], listLink[n]))
    except:
        print ('Empty Cursor')

你能对字符串使用分割函数吗?再加上一些列表理解就可以了。你知道吗

In[31]: [i for s in [s.split(')) ') for s in file.split('* ((')] for i in s]
Out[31]: 
['Name',
 'Bla Bla Bla (Bla Bla) A40 & A41',
 'Name2',
 'Bla Bla Bla (Bla Bla) A42 & A43',
 'Name3',
 'Bla Bla Bla (Bla Bla) A44 & A45',
 'Name4',
 'Bla Bla Bla (Bla Bla) A46 & A47',
 'Name5',
 'Bla Bla Bla (Bla Bla) A48 & A49',
 'Name6',
 'Bla Bla Bla (Bla Bla) A50 & A51',
 'Name7',
 'Bla Bla Bla (Bla Bla) A452 & A53',
 'Name8',
 'Bla Bla Bla (Bla Bla) A54 & A55',
 'Name9',
 'Bla Bla Bla (Bla Bla) A56 & A57',
 'Name10',
 'Bla Bla Bla (Bla Bla) A58 & A59',
 'Name11',
 'Bla Bla Bla (Bla Bla) A60 & A61',
 'Name12',
 'Bla Bla Bla (Bla Bla) A62 & A63',
 'Name13',
 'Bla Bla Bla (Bla Bla) A64 & A65',
 'Name14',
 'Bla Bla Bla (Bla Bla) A66 & A67',
 'Name14',
 'Bla Bla Bla (Bla Bla) A68 & A69))']

我试着跟着正则表达式

import re
file = "your....string.... content" #your string goes here.

parse = re.split(r"\*|\)\)|\(\(", file)

OUTPUT:

['Name', ' ', 'Bla Bla Bla (Bla Bla) A40 & A41', ' Name2', ' ', 'Bla Bla Bla (Bla Bla) A42 & A43', ' Name3', ' ', 'Bla Bla Bla (Bla Bla) A44 & A45', ' Name4', ' ', 'Bla Bla Bla (Bla Bla) A46 & A47', ' Name5', ' ', 'Bla Bla Bla (Bla Bla) A48 & A49', ' Name6', ' ', 'Bla Bla Bla (Bla Bla) A50 & A51', ' Name7', ' ', 'Bla Bla Bla (Bla Bla) A452 & A53', ' Name8', ' ', 'Bla Bla Bla (Bla Bla) A54 & A55', ' Name9', ' ', 'Bla Bla Bla (Bla Bla) A56 & A57', ' Name10', ' ', 'Bla Bla Bla (Bla Bla) A58 & A59', ' Name11', ' ', 'Bla Bla Bla (Bla Bla) A60 & A61', ' Name12', ' ', 'Bla Bla Bla (Bla Bla) A62 & A63', ' Name13', ' ', 'Bla Bla Bla (Bla Bla) A64 & A65', ' Name14', ' ', 'Bla Bla Bla (Bla Bla) A66 & A67', ' Name14', ' ', 'Bla Bla Bla (Bla Bla) A68 & A69', '']

相关问题 更多 >