我试图从长字符串列表中获取所有日期,每个字符串都有几个日期,并且格式不同,我想获取所有日期。我试过了{
import datefinder
import dateutil.parser as dparser
input_string = 'the document is valid from 2018-11-20 until 2021-11-19, or 25 October 2020 until 25 October 2021, or 3/14/2020 to 3/13/2021, or April 4, 2015 until April 3 2018, or 3rd March 2007 to 4th March 2008'
print(list(datefinder.find_dates(input_string)))
print(dparser.parse(input_string,fuzzy=True))
输出:
[datetime.datetime(2020, 3, 14, 0, 0), datetime.datetime(2021, 3, 13, 0, 0), datetime.datetime(2007, 3, 3, 0, 0), datetime.datetime(2008, 3, 4, 0, 0)]
---------------------------------------------------------------------------
ParserError Traceback (most recent call last)
<ipython-input-1244-b5979411a38b> in <module>
4 print(list(datefinder.find_dates(input_string)))
5
----> 6 print(dparser.parse(input_string,fuzzy=True))
~\Anaconda3\lib\site-packages\dateutil\parser\_parser.py in parse(timestr, parserinfo, **kwargs)
1372 return parser(parserinfo).parse(timestr, **kwargs)
1373 else:
-> 1374 return DEFAULTPARSER.parse(timestr, **kwargs)
1375
1376
~\Anaconda3\lib\site-packages\dateutil\parser\_parser.py in parse(self, timestr, default, ignoretz, tzinfos, **kwargs)
647
648 if res is None:
--> 649 raise ParserError("Unknown string format: %s", timestr)
650
651 if len(res) == 0:
ParserError: Unknown string format: the document is valid from 2018-11-20 until 2021-11-19, or 25 October 2020 until 25 October 2021, or 3/14/2020 to 3/13/2021, or April 4, 2015 until April 3 2018, or 3rd March 2007 to 4th March 2008
datefinder
在字符串中的10个日期中找到了4个日期,dparser
如果字符串有一个日期,则可以单独识别它们,但如果一个字符串中有多个日期,则返回错误
PS:格式不限于示例中的格式,而且这些字符串由pytesseract
拉出,因此存在错误字符和类似问题,因此regex
是一个复杂的选择,我正在寻找另一个更好的
目前没有回答
相关问题 更多 >
编程相关推荐