对引号字符串列表使用regex

2024-10-05 13:07:28 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个字符串列表,我想使用regex将列表筛选为特定的字符串。你知道吗

例如,以下是原始列表:

quoteTitle = ['\r\n      ', ' ', '\r\n    ', '\r\n    ', '\r\n    ', '\r\n    ', '\r\n  ', '30. Loyalty', '29. Speed Scale', '28. Security', '27. Every Position', '26. Superior Brain Power', '25. A Long Line of Fighters', '24. Dwight Surveillance', '23. Friends ', '22. Pull the Plug ', '21. Second Life', '20. Accidentally vs. On Purpose', '19. Menstruation Wishes ', '18. Ideal Choice', '17. Healthcare in the Wild', '16. Superior Cousins', '15. Regular Ideas', '14. Immunity Logic', '13. The Person You Least, Medium and Most Suspect', '12. Real Heroes ', '11. Water Cooler Gossip', '10. Stress', '9. All These People!', '8. The “R” Sound', '7. A Woman’s Defects ', '6.Werewolf Hunting Experience ', '5. An Ideal World ', '4. Attention', '3. The Thing About Bear Attacks ', '2. Resume Critiquing', '1. Yeast Infections ', 'Tags', 'Recently in TV', '5/8/2018 3:45:00 PM', '5/3/2018 2:00:00 PM', '5/3/2018 12:00:00 PM', 'Most Popular', '5/3/2018 11:00:00 AM', '5/3/2018 12:00:00 PM', '5/8/2018 3:45:00 PM', '4/13/2018 5:00:00 PM', '4/18/2018 8:22:31 PM', '5/3/2018 2:00:00 PM', '5/8/2018 6:02:54 PM', '5/7/2018 4:52:04 PM', '5/7/2018 2:57:00 PM', '5/3/2018 5:04:43 PM', '5/3/2018 1:06:18 PM', 'Music', '5/3/2018 12:00:00 AM', 'Music', '5/3/2018 12:00:00 AM', 'Music', '5/2/2018 12:00:00 AM', '11/28/2017 8:00:00 AM', '11/30/2017 11:00:00 AM', '5/3/2018 11:00:00 AM', '5/3/2018 2:00:00 PM', '5/3/2018 12:00:00 PM', '12/11/2017 1:00:00 PM', '12/15/2017 8:00:00 AM', '1/9/2018 3:00:00 PM', '1/4/2017 12:30:00 PM', '4/13/2013 12:13:00 PM', 'TV', '4/3/2018 10:00:00 AM', 'TV', '4/3/2018 9:25:00 AM', 'Comedy', '3/22/2018 1:00:28 PM', 'TV', '3/15/2018 10:00:00 AM', 'Comedy', '3/13/2018 2:00:00 PM', 'TV', '3/10/2018 10:00:00 AM', 'TV', '3/2/2018 11:00:00 AM', 'TV', '2/25/2018 10:30:00 PM', 'TV', '2/23/2018 1:00:00 PM', '5/3/2018 11:00:00 AM', '5/3/2018 12:00:00 PM', '5/8/2018 3:45:00 PM', '4/13/2018 5:00:00 PM', '4/18/2018 8:22:31 PM', '5/3/2018 2:00:00 PM', '5/3/2018 10:00:00 AM', '5/7/2018 10:00:00 AM', '4/26/2018 2:00:00 PM', '5/6/2018 10:00:00 PM', '5/8/2018 6:02:54 PM', '5/7/2018 4:52:04 PM', '5/7/2018 2:57:00 PM', '5/3/2018 5:04:43 PM', '5/3/2018 1:06:18 PM', '5/3/2018 12:00:00 AM', '5/3/2018 12:00:00 AM', '5/2/2018 12:00:00 AM', '5/2/2018 12:00:00 AM', '5/2/2018 12:00:00 AM', '11/28/2017 8:00:00 AM', '11/30/2017 11:00:00 AM', '5/3/2018 11:00:00 AM', '5/3/2018 2:00:00 PM', '5/3/2018 12:00:00 PM', '12/11/2017 1:00:00 PM', '12/15/2017 8:00:00 AM', '1/9/2018 3:00:00 PM', '1/4/2017 12:30:00 PM', '4/13/2013 12:13:00 PM', '4/3/2018 10:00:00 AM', '4/3/2018 9:25:00 AM', '3/22/2018 1:00:28 PM', '3/15/2018 10:00:00 AM', '3/13/2018 2:00:00 PM']

我只想要编号的项目和他们的文字后从30到1。我可以成功地过滤掉任何不以数字开头的东西

p = re.compile(r'\w')
q = filter(p.match, quoteTitle)
p = re.compile(r'^\d+')
q = filter(p.match, q)

这让我想到

print(list(q)) --> ['30. Loyalty', '29. Speed Scale', '28. Security', '27. Every Position', '26. Superior Brain Power', '25. A Long Line of Fighters', '24. Dwight Surveillance', '23. Friends ', '22. Pull the Plug ', '21. Second Life', '20. Accidentally vs. On Purpose', '19. Menstruation Wishes ', '18. Ideal Choice', '17. Healthcare in the Wild', '16. Superior Cousins', '15. Regular Ideas', '14. Immunity Logic', '13. The Person You Least, Medium and Most Suspect', '12. Real Heroes ', '11. Water Cooler Gossip', '10. Stress', '9. All These People!', '8. The “R” Sound', '7. A Woman’s Defects ', '6.Werewolf Hunting Experience ', '5. An Ideal World ', '4. Attention', '3. The Thing About Bear Attacks ', '2. Resume Critiquing', '1. Yeast Infections ', 'Tags', 'Recently in TV', '5/8/2018 3:45:00 PM', '5/3/2018 2:00:00 PM', '5/3/2018 12:00:00 PM', 'Most Popular', '5/3/2018 11:00:00 AM', '5/3/2018 12:00:00 PM', '5/8/2018 3:45:00 PM', '4/13/2018 5:00:00 PM', '4/18/2018 8:22:31 PM', '5/3/2018 2:00:00 PM', '5/8/2018 6:02:54 PM', '5/7/2018 4:52:04 PM', '5/7/2018 2:57:00 PM', '5/3/2018 5:04:43 PM', '5/3/2018 1:06:18 PM', 'Music', '5/3/2018 12:00:00 AM', 'Music', '5/3/2018 12:00:00 AM', 'Music', '5/2/2018 12:00:00 AM', '11/28/2017 8:00:00 AM', '11/30/2017 11:00:00 AM', '5/3/2018 11:00:00 AM', '5/3/2018 2:00:00 PM', '5/3/2018 12:00:00 PM', '12/11/2017 1:00:00 PM', '12/15/2017 8:00:00 AM', '1/9/2018 3:00:00 PM', '1/4/2017 12:30:00 PM', '4/13/2013 12:13:00 PM', 'TV', '4/3/2018 10:00:00 AM', 'TV', '4/3/2018 9:25:00 AM', 'Comedy', '3/22/2018 1:00:28 PM', 'TV', '3/15/2018 10:00:00 AM', 'Comedy', '3/13/2018 2:00:00 PM', 'TV', '3/10/2018 10:00:00 AM', 'TV', '3/2/2018 11:00:00 AM', 'TV', '2/25/2018 10:30:00 PM', 'TV', '2/23/2018 1:00:00 PM', '5/3/2018 11:00:00 AM', '5/3/2018 12:00:00 PM', '5/8/2018 3:45:00 PM', '4/13/2018 5:00:00 PM', '4/18/2018 8:22:31 PM', '5/3/2018 2:00:00 PM', '5/3/2018 10:00:00 AM', '5/7/2018 10:00:00 AM', '4/26/2018 2:00:00 PM', '5/6/2018 10:00:00 PM', '5/8/2018 6:02:54 PM', '5/7/2018 4:52:04 PM', '5/7/2018 2:57:00 PM', '5/3/2018 5:04:43 PM', '5/3/2018 1:06:18 PM', '5/3/2018 12:00:00 AM', '5/3/2018 12:00:00 AM', '5/2/2018 12:00:00 AM', '5/2/2018 12:00:00 AM', '5/2/2018 12:00:00 AM', '11/28/2017 8:00:00 AM', '11/30/2017 11:00:00 AM', '5/3/2018 11:00:00 AM', '5/3/2018 2:00:00 PM', '5/3/2018 12:00:00 PM', '12/11/2017 1:00:00 PM', '12/15/2017 8:00:00 AM', '1/9/2018 3:00:00 PM', '1/4/2017 12:30:00 PM', '4/13/2013 12:13:00 PM', '4/3/2018 10:00:00 AM', '4/3/2018 9:25:00 AM', '3/22/2018 1:00:28 PM', '3/15/2018 10:00:00 AM', '3/13/2018 2:00:00 PM']

现在我想删除列表中的日期

我试过很多这样的组合,但我觉得我遗漏了什么或者不明白。我的想法是获取列表中所有不遵循日期条目格式的字符串。你知道吗

p = re.compile(r'[^'\d+/]')
q = filter(p.match, q)

它们以撇号开头,因为它是一个引用字符串,我想这可能是我的问题。除此之外,格式如下:

撇号,数字(介于1-12 so\d+),/

这应该足够过滤掉日期条目,只要我让它正常工作

更新:甚至尝试搜索列表中有AM或PM的元素,但仍然没有成功

p = re.compile(r'[^(AM|PM)]')
q = filter(p.search, q)

Tags: the字符串inremost列表musictv
1条回答
网友
1楼 · 发布于 2024-10-05 13:07:28

您可以搜索以数字和.开头的字符串:

import re
quoteTitle = ['\r\n      ', ' ', '\r\n    ', '\r\n    ', '\r\n    ', '\r\n    ', '\r\n  ', '30. Loyalty', '29. Speed Scale', '28. Security', '27. Every Position', '26. Superior Brain Power', '25. A Long Line of Fighters', '24. Dwight Surveillance', '23. Friends ', '22. Pull the Plug ', '21. Second Life', '20. Accidentally vs. On Purpose', '19. Menstruation Wishes ', '18. Ideal Choice', '17. Healthcare in the Wild', '16. Superior Cousins', '15. Regular Ideas', '14. Immunity Logic', '13. The Person You Least, Medium and Most Suspect', '12. Real Heroes ', '11. Water Cooler Gossip', '10. Stress', '9. All These People!', '8. The “R” Sound', '7. A Woman’s Defects ', '6.Werewolf Hunting Experience ', '5. An Ideal World ', '4. Attention', '3. The Thing About Bear Attacks ', '2. Resume Critiquing', '1. Yeast Infections ', 'Tags', 'Recently in TV', '5/8/2018 3:45:00 PM', '5/3/2018 2:00:00 PM', '5/3/2018 12:00:00 PM', 'Most Popular', '5/3/2018 11:00:00 AM', '5/3/2018 12:00:00 PM', '5/8/2018 3:45:00 PM', '4/13/2018 5:00:00 PM', '4/18/2018 8:22:31 PM', '5/3/2018 2:00:00 PM', '5/8/2018 6:02:54 PM', '5/7/2018 4:52:04 PM', '5/7/2018 2:57:00 PM', '5/3/2018 5:04:43 PM', '5/3/2018 1:06:18 PM', 'Music', '5/3/2018 12:00:00 AM', 'Music', '5/3/2018 12:00:00 AM', 'Music', '5/2/2018 12:00:00 AM', '11/28/2017 8:00:00 AM', '11/30/2017 11:00:00 AM', '5/3/2018 11:00:00 AM', '5/3/2018 2:00:00 PM', '5/3/2018 12:00:00 PM', '12/11/2017 1:00:00 PM', '12/15/2017 8:00:00 AM', '1/9/2018 3:00:00 PM', '1/4/2017 12:30:00 PM', '4/13/2013 12:13:00 PM', 'TV', '4/3/2018 10:00:00 AM', 'TV', '4/3/2018 9:25:00 AM', 'Comedy', '3/22/2018 1:00:28 PM', 'TV', '3/15/2018 10:00:00 AM', 'Comedy', '3/13/2018 2:00:00 PM', 'TV', '3/10/2018 10:00:00 AM', 'TV', '3/2/2018 11:00:00 AM', 'TV', '2/25/2018 10:30:00 PM', 'TV', '2/23/2018 1:00:00 PM', '5/3/2018 11:00:00 AM', '5/3/2018 12:00:00 PM', '5/8/2018 3:45:00 PM', '4/13/2018 5:00:00 PM', '4/18/2018 8:22:31 PM', '5/3/2018 2:00:00 PM', '5/3/2018 10:00:00 AM', '5/7/2018 10:00:00 AM', '4/26/2018 2:00:00 PM', '5/6/2018 10:00:00 PM', '5/8/2018 6:02:54 PM', '5/7/2018 4:52:04 PM', '5/7/2018 2:57:00 PM', '5/3/2018 5:04:43 PM', '5/3/2018 1:06:18 PM', '5/3/2018 12:00:00 AM', '5/3/2018 12:00:00 AM', '5/2/2018 12:00:00 AM', '5/2/2018 12:00:00 AM', '5/2/2018 12:00:00 AM', '11/28/2017 8:00:00 AM', '11/30/2017 11:00:00 AM', '5/3/2018 11:00:00 AM', '5/3/2018 2:00:00 PM', '5/3/2018 12:00:00 PM', '12/11/2017 1:00:00 PM', '12/15/2017 8:00:00 AM', '1/9/2018 3:00:00 PM', '1/4/2017 12:30:00 PM', '4/13/2013 12:13:00 PM', '4/3/2018 10:00:00 AM', '4/3/2018 9:25:00 AM', '3/22/2018 1:00:28 PM', '3/15/2018 10:00:00 AM', '3/13/2018 2:00:00 PM']
new_result = list(filter(lambda x:re.findall('^\d+\.', x), quoteTitle))

输出:

['30. Loyalty', '29. Speed Scale', '28. Security', '27. Every Position', '26. Superior Brain Power', '25. A Long Line of Fighters', '24. Dwight Surveillance', '23. Friends ', '22. Pull the Plug ', '21. Second Life', '20. Accidentally vs. On Purpose', '19. Menstruation Wishes ', '18. Ideal Choice', '17. Healthcare in the Wild', '16. Superior Cousins', '15. Regular Ideas', '14. Immunity Logic', '13. The Person You Least, Medium and Most Suspect', '12. Real Heroes ', '11. Water Cooler Gossip', '10. Stress', '9. All These People!', '8. The \xe2\x80\x9cR\xe2\x80\x9d Sound', '7. A Woman\xe2\x80\x99s Defects ', '6.Werewolf Hunting Experience ', '5. An Ideal World ', '4. Attention', '3. The Thing About Bear Attacks ', '2. Resume Critiquing', '1. Yeast Infections ']

编辑:要查找引号之间的所有数据,可以使用.*?

quote = ['i dont want this', '\r\n ', '\r\n ', ' "this is the quote i want to extract" ', '" and also this one"', '\r\n "and me"']
new_results = list(map(lambda x:x[0], filter(None, [re.findall('"(.*?)"', i) for i in quote])))

输出:

['this is the quote i want to extract', ' and also this one', 'and me']

相关问题 更多 >

    热门问题