使用正则表达式python提取以月份开头的年份

File "<ipython-input-216-a995358d0957>", line 1, in <module> runfile('C:/Users/Muntabir/nltk_data/corpora/cookbook/clean_data/text-classification_year(clean).py', wdir='C:/Users/Muntabir/nltk_data/corpora/cookbook/clean_data') File "C:\Users\Muntabir\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 705, in runfile execfile(filename, namespace) File "C:\Users\Muntabir\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile exec(compile(f.read(), filename, 'exec'), namespace) File "C:/Users/Muntabir/nltk_data/corpora/cookbook/clean_data/text-classification_year(clean).py", line 76, in <module> year_data = re.findall('^(?<month>)\w+(\1)\s[0-9]{4}$|(^(?<fmonth>)\w+,\s[0-9]{4}$)', tokenized_string) File "C:\Users\Muntabir\Anaconda3\lib\re.py", line 222, in findall return _compile(pattern, flags).findall(string) File "C:\Users\Muntabir\Anaconda3\lib\re.py", line 301, in _compile p = sre_compile.compile(pattern, flags) File "C:\Users\Muntabir\Anaconda3\lib\sre_compile.py", line 562, in compile p = sre_parse.parse(p, flags) File "C:\Users\Muntabir\Anaconda3\lib\sre_parse.py", line 855, in parse p = _parse_sub(source, pattern, flags & SRE_FLAG_VERBOSE, 0) File "C:\Users\Muntabir\Anaconda3\lib\sre_parse.py", line 416, in _parse_sub not nested and not items)) File "C:\Users\Muntabir\Anaconda3\lib\sre_parse.py", line 691, in _parse len(char) + 2) error: unknown extension ?<m

3条回答

网友

1楼 · 编辑于 2024-10-01 00:31:23

命名的捕获组是：(?P<name>...)而不是~~(?<name>...)~~

用法：^(?P<month>\w+),?\s[0-9]{4}$

Demo & explanation

网友

2楼 · 编辑于 2024-10-01 00:31:23

我真的很感谢你的贡献。但是@Joan Lara Ganau的解决方案为我提供了一个关于regexp可以是什么的指南@Joan，如果任何年份之前有一个月和一个日期，您的regexp都将匹配。此外，它不搜索逗号和空格。正如我提到的，我有数千个数据集，我正想从中提取一个月之前的年份。我正在寻找以下格式：

a.）月/年 b、）月，年

无论如何，我在做了大量实验后找到了解决问题的方法。解决办法是：

year_result = re.compile(
                    r"(Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?|"
                    "Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?|"
                    "Dec(ember)?)(,?)(\s\d{4})")

此外，如果模式不匹配，match（）方法也将返回None。在这种情况下，使用group（）方法将抛出AttributeError。该错误类似于None类型对象没有匹配的组（）。因此，我用以下方式修复它：

def matched(document):                   
         year = year_result.match(document)
         year = year_result.search(document)
         if year is None:
               return '0'
         return year.group(14)

现在，您可以将要提取年份的文本文档传递到上述函数

谢谢

网友

3楼 · 编辑于 2024-10-01 00:31:23

import re

year = re.compile(r'(\b\d{1,2}\D{0,3})?\b(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|(Nov|Dec)(?:ember)?)\D?\D?(\d{1,4})')
print(year.match('September 1980').group(3))
print(year.match('October, 1978').group(3))

输出：

1980
1978

相关问题更多 >

编程相关推荐

热门问题

热门文章