问题:我的目标是从文件中提取所有电话号码。 除了数据文件中倒数第二行中名为“suneja,amit”的用户的手机外,我可以获取所有手机。 我可以拿到它,直到代码的第3步,我用了3组。但当我试着使用第四组时,它就不出现了
以下是数据文件:
Love, Kenneth kenneth@teamtreehouse.com +1 (555) 555-5555 Teacher, Treehouse @kennethlove
McFarland, Dave dave@teamtreehouse.com (555) 555-5554 Teacher, Treehouse
Arthur, King king_arthur@camelot.co.uk King, Camelot
Österberg, Sven-Erik governor@norrbotten.co.se Governor, Norrbotten @sverik
, Tim tim@killerrabbit.com Enchanter, Killer Rabbit Cave
Carson, Ryan ryan@teamtreehouse.com (555) 555-5543 CEO, Treehouse @ryancarson
Doctor, The doctor+companion@tardis.co.uk Time Lord, Gallifrey
Exampleson, Example me@example.com +1-555-555-5552 Example, Example Co. @example
Obama, Barack president.44@us.gov 555 555-5551 President, United States of America @potus44
Chalkley, Andrew andrew@teamtreehouse.com (555) 555-5553 Teacher, Treehouse @chalkers
Vader, Darth darth-vader@empire.gov (555).555.4444 Sith Lord, Galactic Empire @darthvader
suneja, amit amit.suneja007@gmail.com 444-444444 B102, City Center @programmer
Fernndez de la Vega Sanz, María Teresa mtfvs@spain.gov First Deputy Prime Minister, Spanish Govt.
这是我的密码:
import re
data_file = 'names.txt'
with open(data_file, 'r', encoding="utf-8") as myfile:
data_dump = myfile.read()
print("___________________________________")
print(re.findall(r"(\+\d[\-\s])", data_dump))
print("___________________________________")
print(re.findall(r"(\+\d[\s\-])?(\(?\d{3}\)?)", data_dump))
print("___________________________________")
print(re.findall(r"(\+\d[\s\-])?(\(?\d{3}\)?)([\s\-.]\d{3})", data_dump))
print("___________________________________")
print(re.findall(r"(\+\d[\s\-])?(\(?\d{3}\)?)([\s\-.]\d{3})([\s.-]\d{4,6})", data_dump))
print(len(re.findall(r"(\+\d[\s\-])?(\(?\d{3}\)?)([\s\-.]\d{3})([\s.-]\d{4,6})", data_dump)))
下面是我的代码输出:
___________________________________
['+1 ', '+1-']
___________________________________
[('+1 ', '(555)'), ('', '555'), ('', '555'), ('', '(555)'), ('', '555'), ('', '555'), ('', '(555)'), ('', '555'), ('', '554'), ('+1-', '555'), ('', '555'), ('', '555'), ('', '555'), ('', '555'), ('', '555'), ('', '(555)'), ('', '555'), ('', '555'), ('', '(555)'), ('', '555'), ('', '444'), ('', '007'), ('', '444'), ('', '444'), ('', '444'), ('', '102')]
___________________________________
[('+1 ', '(555)', ' 555'), ('', '(555)', ' 555'), ('', '(555)', ' 555'), ('+1-', '555', '-555'), ('', '555', ' 555'), ('', '(555)', ' 555'), ('', '(555)', '.555'), ('', '444', '-444')]
___________________________________
[('+1 ', '(555)', ' 555', '-5555'), ('', '(555)', ' 555', '-5554'), ('', '(555)', ' 555', '-5543'), ('+1-', '555', '-555', '-5552'), ('', '555', ' 555', '-5551'), ('', '(555)', ' 555', '-5553'), ('', '(555)', '.555', '.4444')]
7
您只需对上一个正则表达式进行一点更改即可使其正常工作:
这个变化只出现在最后一个捕获组,把问号放在:
([\s.-]?\d{3,6})
该组中的问号使[\s.-]成为可选的。因为你的上一个电话号码没有这些字符,所以它们必须是可选的
相关问题 更多 >
编程相关推荐