<p>下面的代码示例使用<code>re</code>、<code>datetime</code>和第三方包<code>inflect</code>处理文本修饰符。你知道吗</p>
<p>代码将返回修改后的文本以及修改后的单词的位置。你知道吗</p>
<p><strong>PS:</strong>你需要解释更多你想做的事情。否则,您可以使用此代码并对其进行修改以满足您的需要。你知道吗</p>
<p>安装<code>inflect</code>:<code>pip install inflect</code></p>
<p><strong>示例代码:</strong></p>
<pre><code>import re
from datetime import datetime
import inflect
ENGINE = inflect.engine()
def num2words(num):
"""Number to Words using inflect package"""
return ENGINE.number_to_words(num)
def pretty_format_date(pattern, date_found, text):
"""Pretty format dates"""
_month, _day, _year = date_found.groups()
month = datetime.strptime('{day}/{month}/{year}'.format(
day=_day, month=_month.strip('.'), year=_year
), '%d/%b/%Y').strftime('%B')
day, year = num2words(_day), num2words(_year)
date = '{month} {day}, {year} '.format(month=month, day=day, year=year)
begin, end = date_found.span()
_text = re.sub(pattern, date, text[begin:end])
text = text[:begin] + _text + text[end:]
return text, begin, end
def format_date(pattern, text):
"""Format given string into date"""
spans = []
# For loop prevents us from going into an infinite loop
# If there is malformed texts or bad regex
for _ in re.findall(pattern, text):
date_found = re.search(pattern, text)
if not date_found:
break
try:
text, begin, end = pretty_format_date(pattern, date_found, text)
spans.append([begin, end])
except Exception:
# Pass without any modification if there is any errors with date formats
pass
return text, spans
def number_to_words(pattern, text):
"""Numer to Words with spans"""
spans = []
# For loop prevents us from going into an infinite loop
# If there is malformed texts or bad regex
for _ in re.findall(pattern, text):
number_found = re.search(pattern, text)
if not number_found:
break
_number = number_found.groups()
number = num2words(_number)
begin, end = number_found.span()
spans.append([begin, end])
_text = re.sub(pattern, number, text[begin:end])
text = text[:begin] + ' {} '.format(_text) + text[end:]
return text, spans
def custom_func(pattern, text, output):
"""Custom function"""
spans = []
for _ in re.findall(pattern, text):
_found = re.search(pattern, text)
begin, end = _found.span()
spans.append([begin, end])
_text = re.sub(pattern, output, text[begin:end])
text = text[:begin] + ' {} '.format(_text) + text[end:]
return text, spans
text = '''
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt
ut labore et dolore magna aliqua. On Apr. 6th, 2009 Ut enim culpa minim veniam, quis nostrud exercitation ullamco
laboris nisi ut aliquip ex 5 ea commodo consequat. Duis aute irure dolor in reprehenderit in
voluptate velit esse cillum dolore eu fugiat nulla pariatur. On June 23rd, 3004 excepteur sint occaecat
cupidatat non proident, sunt in culpa qui officia deserunt 6 mollit anim id est laborum.
'''
modifiers = [
(
r'([\w]+\.?)\s+(\d{1,2})\w{2},\s+(\d{4})',
format_date
),
(
r' (\d) ',
number_to_words
),
(
r'( \bculpa\b)', # Better using this pattern to catch the exact word
'culpae'
)
]
for regex, func in modifiers:
if not isinstance(func, str):
print('\n{} {} {}'.format('#' * 20, func.__name__, '#' * 20))
_text, spans = func(regex, text)
else:
print('\n{} {} {}'.format('#' * 20, func, '#' * 20))
_text, spans = custom_func(regex, text, func)
print(_text, spans)
</code></pre>
<p>输出:</p>
<pre><code>#################### format_date ####################
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt
ut labore et dolore magna aliqua. On April six, two thousand and nine Ut enim culpa minim veniam, quis nostrud exercitation ullamco
laboris nisi ut aliquip ex 5 ea commodo consequat. Duis aute irure dolorin reprehenderit in
voluptate velit esse cillum dolore eu fugiat nulla pariatur. On June 23rd, 3004 excepteur sint occaecat
cupidatat non proident, sunt in culpa qui officia deserunt 6 mollit animid est laborum.
[[128, 142]]
#################### number_to_words ####################
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt
ut labore et dolore magna aliqua. On Apr. 6th, 2009 Ut enim culpa minim veniam, quis nostrud exercitation ullamco
laboris nisi ut aliquip ex five ea commodo consequat. Duis aute irure dolor in reprehenderit in
voluptate velit esse cillum dolore eu fugiat nulla pariatur. On June 23rd, 3004 excepteur sint occaecat
cupidatat non proident, sunt in culpa qui officia deserunt six mollit anim id est laborum.
[[231, 234], [463, 466]]
#################### culpae ####################
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt
ut labore et dolore magna aliqua. On Apr. 6th, 2009 Ut enim culpae minim veniam, quis nostrud exercitation ullamco
laboris nisi ut aliquip ex 5 ea commodo consequat. Duis aute irure dolorin reprehenderit in
voluptate velit esse cillum dolore eu fugiat nulla pariatur. On June 23rd, 3004 excepteur sint occaecat
cupidatat non proident, sunt in culpae qui officia deserunt 6 mollit anim id est laborum.
[[150, 156], [435, 441]]
</code></pre>
<p><strong>演示<a href="https://repl.it/repls/InternationalUpbeatGenres" rel="nofollow noreferrer">Replit</a></strong></p>