你能找出这个正则表达式的问题吗?

2024-10-02 10:34:42 发布

您现在位置:Python中文网/ 问答频道 /正文

我通过一个for循环运行.txt文件,这个循环应该将关键字切掉,然后.append将它们放入列表中。出于某种原因,我的正则表达式返回了非常奇怪的结果

我的第一条语句遍历完整的文件名并切掉关键字,效果很好

# Creates a workflow list of file names within target directory for further iteration
stack = os.listdir(
  "/Users/me/Documents/software_development/my_python_code/random/countries"
)

# declares list, to be filled, and their associated regular expression,       to be used,
# in the primary loop
names = []
name_pattern = r"-\s(.*)\.txt"

# PRIMARY LOOP
for entry in stack:
  if entry == ".DS_Store":
    continue

# extraction of country name from file name into `names` list
  name_match = re.search(name_pattern, entry)
  name = name_match.group(1)
  names.append(name)

这很好,创建了我期望的列表

然而,一旦我转到一个类似的处理文件实际内容的过程,它就不再工作了

religions = []
reli_pattern = r"religion\s=\s(.+)."

# PRIMARY LOOP
for entry in stack:
  if entry == ".DS_Store":
    continue
# opens and reads file within `contents` variable
  file_path = (
  "/Users/me/Documents/software_development/my_python_code/random/countries" + "/" + entry
  )
  selection = open(file_path, "rb")
  contents = str(selection.read())

# extraction of religion type and placement into `religions` list
  reli_match = re.search(reli_pattern, contents)
  religion = reli_match.group(1)
  religions.append(religion)

结果应该是:"therevada", "catholic", "sunni"等。 相反,我从文档中得到的文本似乎是随机的,与我的REGEX类标尺名称和不包含"religion"这个词的stat值无关 为了尝试解决这个问题,我通过以下方式隔离了一些代码:

contents = "religion = catholic"
reli_pattern = r"religion\s=\s(.*)\s"

reli_match = re.search(reli_pattern, contents)

print(reli_match)

并且None被打印到控制台,所以我假设问题出在我的REGEX。我犯了什么愚蠢的错误导致了这一切


Tags: andofnamefornamesstackmatchcontents
1条回答
网友
1楼 · 发布于 2024-10-02 10:34:42

正则表达式(religion\s=\s(.*)\s)要求后面有一个空格(最后一个\s)。因为您的字符串没有,所以在搜索时找不到任何内容,因此re.search返回None

你应该:

  1. 将正则表达式更改为r"religion\s=\s(.*)"
  2. 将正在搜索的字符串更改为具有尾随空格(即'religion = catholic''religion = catholic '

相关问题 更多 >

    热门问题