Python regex从不同的组合中获取日期

>>> from dateutil.parser import parse >>> test_cases = ['04/30/2009', '06/20/95', '8/2/69', '1/25/2011', '9/3/2002', '4-13-82', 'Mar-02-2009', 'Jan 20, 1974', ... 'March 20, 1990', 'Dec. 21, 2001', 'May 25 2009', '01 Mar 2002', '2 April 2003', '20 Aug. 2004', ... '20 November, 1993', 'Aug 10th, 1994', 'Sept 1st, 2005', 'Feb. 22nd, 1988', 'Sept 2002', 'Sep 2002', ... 'December, 1998', 'Oct. 2000', '6/2008', '12/2001', '1998', '2002'] >>> for date_string in test_cases: ... print(date_string, parse(date_string).strftime("%Y%m%d")) ... 04/30/2009 20090430 06/20/95 19950620 8/2/69 19690802 ----- etc --------

description: colasas|04/18/2017|NXP description: colasas|04/18/2017|NXP description: Remedy Tkt 01212152 Orcad move description: FTP Permanent|09|10|2012|FTP description: Remedy Tkt 01212152 Orcad move description: TDA Drop12 Account|July 2004|TDA Drop12 Account description: ftp|121210|ftp description: Design Foundry Project|16 July 2005|Design Foundry Project description: FTP Permanent|10/10/2010|FTP description: WFS-JP|7-31-05|WFS-JP description: FTP Permanent|10|11|2010|FTP

日期被封装在管道"|"之间

#!/usr/bin/python3 # ./dataparse.py from __future__ import print_function from signal import signal, SIGPIPE, SIG_DFL signal(SIGPIPE,SIG_DFL) import re with open('test2', 'r') as f: for line in f: line = line.strip() data = f.read() regex = (r"dn:(.*?)\nftpuser: (.*)\ndescription:* (.*)") matchObj = re.findall(regex, data) for index in matchObj: #print(index) index_str = ' '.join(index) new_str = re.sub(r'[=,]', ' ', index_str) new_str = new_str.split() print("{0:<30}{1:<20}{2:<50}".format(new_str[1],new_str[8],new_str[9]))

Resulted output:

$ ./dataparse.py ab02 disabled_5Mar07 Remedy mela Y ROYALS|none|customer ab01 Y VGVzdGluZyA tt@regg.com T REG-JP|7-31-05|REG-JP

3条回答

网友

1楼 · 编辑于 2024-05-04 07:23:53

使用一些字符串操作

演示：

s = """description: colasas|04/18/2017|NXP
description: colasas|04/18/2017|NXP
description: Remedy Tkt 01212152 Orcad move
description: FTP Permanent|09|10|2012|FTP
description: Remedy Tkt 01212152 Orcad move
description: TDA Drop12 Account|July 2004|TDA Drop12 Account
description: ftp|121210|ftp
description: Design Foundry Project|16 July 2005|Design Foundry Project
description: FTP Permanent|10/10/2010|FTP
description: WFS-JP|7-31-05|WFS-JP
description: FTP Permanent|10|11|2010|FTP"""


from dateutil.parser import parse

for i in s.split("\n"):
    val = i.split("|", 1)                            #Split by first "|"
    if len(val) > 1:                                 #Check if Date in string.
        val = val[1].rpartition("|")[0]               #Split by right "|"
        print( parse(val, fuzzy=True) )

输出：

2017-04-18 00:00:00
2017-04-18 00:00:00
2012-07-03 00:00:00
2004-07-03 00:00:00
2010-12-12 00:00:00
2005-07-16 00:00:00
2010-10-10 00:00:00
2005-07-31 00:00:00
2010-07-03 00:00:00

关于日期时间错误，请删除from datetime import datetime

演示：

import re
import datetime
strh = "description: colasas|04/18/2017|NXP"
match = re.search(r'\d{2}/\d{2}/\d{4}', strh)
date = datetime.datetime.strptime(match.group(), '%m/%d/%Y').date()
print(date)

网友

2楼 · 编辑于 2024-05-04 07:23:53

您正在使用的parse方法接受关键字参数，以允许忽略字符串中不相关的部分。你知道吗

:param fuzzy:
    Whether to allow fuzzy parsing, allowing for string like "Today is
    January 1, 2047 at 8:21:00AM".

演示：

>>> parse('colasas|04/18/2017|NXP', fuzzy=True)
datetime.datetime(2017, 4, 18, 0, 0)

还有一种方法也可以返回元组，包括字符串中被忽略的部分：

>>> parse('colasas|04/18/2017|NXP', fuzzy_with_tokens=True)
(datetime.datetime(2017, 4, 18, 0, 0), ('colasas|', '|NXP'))

这个方法并不能完美地处理所有的输入字符串，但它应该可以让您在大部分时间内完成。你可能需要为陌生人做一些预处理。你知道吗

网友

3楼 · 编辑于 2024-05-04 07:23:53

text="""
description: colasas|04/18/2017|NXP
description: colasas|04/18/2017|NXP
description: Remedy Tkt 01212152 Orcad move
description: FTP Permanent|09|10|2012|FTP
description: Remedy Tkt 01212152 Orcad move
description: TDA Drop12 Account|July 2004|TDA Drop12 Account
description: ftp|121210|ftp
description: Design Foundry Project|16 July 2005|Design Foundry Project
description: FTP Permanent|10/10/2010|FTP
description: WFS-JP|7-31-05|WFS-JP
description: FTP Permanent|10|11|2010|FTP
"""
import re

reg=re.compile(r"(?ms)\|(\d\d)(\d\d)(\d\d)\||\|(\d{1,2})[\|/\-](\d{1,2})[\|/\-](\d{2,4})\||\|(\d*)\s*(\w+)\s*(\d{4})\|")

dates= [ t[:3] if t[1] else t[3:6] if t[4] else t[6:] for t in reg.findall(text) ]
print(dates)

"""
    regexp for |121210| ---> \|(\d\d)(\d\d)(\d\d)\|
    for |16 July 2005| ---> \|(\d*)\s*(\w+)\s*(\d{4})\|
    for the others ---> \|(\d{1,2})[\|/\-](\d{1,2})[\|/\-](\d{2,4})\|
"""
Output: [('04', '18', '2017'), ('04', '18', '2017'), ('09', '10', '2012'), ('', 'July', '2004'), ('12', '12', '10'), ('16', 'July', '2005'), ('10', '10', '2010'), ('7', '31', '05'), ('10', '11', '2010')]

按原样获取日期：

reg=re.compile(r"(?ms)\|(\d{6})\||\|(\d{1,2}[\|/\-]\d{1,2}[\|/\-]\d{2,4})\||\|(\d*\s*\w+\s+\d{4})\|")

dates= [ t[0] or t[1] or t[2] for t in reg.findall(text) ]
print(dates)

Output:
['04/18/2017', '04/18/2017', '09|10|2012', 'July 2004', '121210', '16 July 2005', '10/10/2010', '7-31-05', '10|11|2010']

日期被封装在管道`"|"`之间

相关问题更多 >

编程相关推荐

热门问题

热门文章