我通过一个for
循环运行.txt文件,这个循环应该将关键字切掉,然后.append
将它们放入列表中。出于某种原因,我的正则表达式返回了非常奇怪的结果
我的第一条语句遍历完整的文件名并切掉关键字,效果很好
# Creates a workflow list of file names within target directory for further iteration
stack = os.listdir(
"/Users/me/Documents/software_development/my_python_code/random/countries"
)
# declares list, to be filled, and their associated regular expression, to be used,
# in the primary loop
names = []
name_pattern = r"-\s(.*)\.txt"
# PRIMARY LOOP
for entry in stack:
if entry == ".DS_Store":
continue
# extraction of country name from file name into `names` list
name_match = re.search(name_pattern, entry)
name = name_match.group(1)
names.append(name)
这很好,创建了我期望的列表
然而,一旦我转到一个类似的处理文件实际内容的过程,它就不再工作了
religions = []
reli_pattern = r"religion\s=\s(.+)."
# PRIMARY LOOP
for entry in stack:
if entry == ".DS_Store":
continue
# opens and reads file within `contents` variable
file_path = (
"/Users/me/Documents/software_development/my_python_code/random/countries" + "/" + entry
)
selection = open(file_path, "rb")
contents = str(selection.read())
# extraction of religion type and placement into `religions` list
reli_match = re.search(reli_pattern, contents)
religion = reli_match.group(1)
religions.append(religion)
结果应该是:"therevada", "catholic", "sunni"
等。
相反,我从文档中得到的文本似乎是随机的,与我的REGEX
类标尺名称和不包含"religion"
这个词的stat值无关
为了尝试解决这个问题,我通过以下方式隔离了一些代码:
contents = "religion = catholic"
reli_pattern = r"religion\s=\s(.*)\s"
reli_match = re.search(reli_pattern, contents)
print(reli_match)
并且None
被打印到控制台,所以我假设问题出在我的REGEX
。我犯了什么愚蠢的错误导致了这一切
正则表达式(
religion\s=\s(.*)\s
)要求后面有一个空格(最后一个\s
)。因为您的字符串没有,所以在搜索时找不到任何内容,因此re.search
返回None
你应该:
r"religion\s=\s(.*)"
或'religion = catholic'
到'religion = catholic '
)相关问题 更多 >
编程相关推荐