Python提取文件中的yyyyMMddhhmmss - 使用正则表达式

2024-10-02 10:32:52 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图使用Regex从字符串中获取日期(格式为yyymmddhhmmss),但找不到要使用的模式。你知道吗

我正在尝试以下代码:

import re
string = "date file /20190529050003/folder "
regex = re.compile(r'\b\d{4}\d{2}\d{2}\s\d{2}\d{2}\d{2}\b')
result = regex.findall(string)[0],
print(result)

但我得到以下错误:

result = regex.findall(string)[0],
IndexError: list index out of range

如何使用regex从脚本中的字符串返回“20190529050003”?你知道吗

谢谢!你知道吗


Tags: 字符串代码importredatestring格式模式
3条回答

从表达式中去掉\s。你知道吗

string = "date file /20190529050003/folder "
regex = re.compile(r'\b\d{4}\d{2}\d{2}\d{2}\d{2}\d{2}\b')
result = regex.findall(string)[0]
'20190529050003'

如果我们的日期正好在斜杠之后,我们可以简单地使用以下表达式:

.+\/(\d{4})(\d{2})(\d{2}).+

如果有必要,我们希望增加更多的边界,我们当然可以这样做,例如:

.+\/(\d{4})(\d{2})(\d{2})(\d{2})(\d{2})(\d{2}).+

DEMO

或:

^.+\/(\d{4})(\d{2})(\d{2})(\d{2})(\d{2})(\d{2})\/.+$

DEMO

测试

# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r".+\/(\d{4})(\d{2})(\d{2}).+"

test_str = "date file /20190529050003/folder "

subst = "\\1-\\2-\\3"

# You can manually specify the number of replacements by changing the 4th argument
result = re.sub(regex, subst, test_str, 0, re.MULTILINE)

if result:
    print (result)

# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.

如果我们想得到所有的数字,那么我们可以使用另一个表达式:

.+\/(\d+)\/.+

测试

# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r".+\/(\d+)\/.+"

test_str = "date file /20190529050003/folder "

subst = "\\1"

# You can manually specify the number of replacements by changing the 4th argument
result = re.sub(regex, subst, test_str, 0, re.MULTILINE)

if result:
    print (result)

# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.

DEMO

正则表达式电路

jex.im可视化正则表达式:

enter image description here

您的正则表达式模式已关闭,因为目标时间戳中没有空格。以下是执行搜索的一种简单方法:

string = "date file /20190529050003/folder "
matches = re.findall(r'\b\d{14}\b', string)
print(matches)

这张照片:

['20190529050003']

我们可以尝试使模式更有针对性,例如只允许小时、分钟等字段的有效值。但是,这将是一个更大的工作,如果你不希望看到任何14位数字在你的文本中是不是时间戳,那么我建议避免使模式更复杂。你知道吗

相关问题 更多 >

    热门问题