如何在Python中使用正则表达式提取字符串

for line in verses: for item in topten: count = line.count(item) ARFF_FILE.write(str(count) + ",") # Here is where i could use regular expressions to extract the desired substring # verse and chapter then write these to the end of a line in the arff file. ARFF_FILE.write("\n")

for line in verses: for item in topten: parts = line.split('|') count = line.count(item) ARFF_FILE.write(str(count) + ",") ARFF_FILE.write(parts[0] + ",") ARFF_FILE.write(parts[1]) ARFF_FILE.write("\n")

3条回答

网友

1楼 · 编辑于 2024-05-17 05:04:10

带圆括号？不是所有的正则表达式都是这样工作的吗？

网友

2楼 · 编辑于 2024-05-17 05:04:10

如果您的所有行的格式都是A|B|C，那么您不需要任何正则表达式，只需拆分它。

for line in fp:
    parts = line.split('|') # or line.split('|', 2) if the last part can contain |
    # use parts[0], parts[1]

网友

3楼 · 编辑于 2024-05-17 05:04:10

我认为最简单的方法是使用re.split（）来获取经文还有一个re.findall（）来得到章节和诗节的数字结果将存储在列表中，以后可以使用下面是一个代码示例：

#!/usr/bin/env python

import re

# string to be parsed
Quran= '''2|12|Of a surety, they are the ones who make mischief, but they realise (it) not.
2|242|Thus doth Allah Make clear His Signs to you: In order that ye may understand.'''

# list containing the text of all the verses
verses=re.split(r'[0-9]+\|[0-9]+\|',Quran)
verses.remove("")

# list containing the chapter and verse number:
#
#   if you look closely, the regex should be r'[0-9]+\|[0-9]+\|'
#   i ommited the last pipe character so that later when you need to split
#   the string to get the chapter and verse nembuer you wont have an
#   empty string at the end of the list
#
chapter_verse=re.findall(r'[0-9]+\|[0-9]+',Quran)


# looping over the text of the verses assuming len(verses)==len(chp_vrs)
for index in range(len(verses)):
    chapterNumber,verseNumber =chapter_verse[index].split("|")
    print "Chapter :",chapterNumber, "\tVerse :",verseNumber
    print verses[index]

相关问题更多 >

编程相关推荐

热门问题

热门文章