对于列表中的多个字符串，如何查找以大写字母开头的字符串中的所有单词

['Remember', 'The', 'Common', 'App', 'Do', 'Your', 'Often', 'We', 'Monica', 'Lannom', 'Co', 'Founder', 'Campus', 'Ventures', 'One', 'Break', 'Campus', 'Ventures', 'Universities', 'Undermatching', 'Stanford', 'Yale', 'Undermatching', 'What', 'A', 'Yale', 'Lannom', 'There', 'During', 'Some', 'The', 'Lannom', 'That', 'It', 'Lannom', 'Institutions', 'University', 'Chicago', 'Boston', 'College', 'These', 'Students', 'If', 'Lannom', 'Recruiting', 'Elite', 'Campus', 'Ventures', 'Understanding', 'Campus', 'Ventures', 'The', 'For', 'Lannom', 'What', 'I', 'Wish', 'I', 'Knew', 'Before', 'Starting', 'Company', 'I', 'Even', 'I', 'Lannom', 'The', 'There']

3条回答

网友

1楼 · 编辑于 2024-09-29 17:15:18

假设句子由一个空格分隔，您可以将re.findall与以下正则表达式一起使用

r'(?m)(?<!^)(?<![.?!] )[A-Z][A-Za-z]*'

Start your engine!Python code

Python的正则表达式引擎执行以下操作

(?m)         : set multiline mode so that ^ and $ match the beginning
               and the end of a line
(?<!^)       : negative lookbehind asserts current location is not
               at the beginning of a line
(?<![.?!] )  : negative lookbehind asserts current location is not
               preceded by '.', '?' or '!', followed by a space
[A-Z]        : match an uppercase letter
[A-Za-z]*    : match 1+ letters

如果句子可以用一个或两个空格分隔，则在(?<![.?!] )之后插入否定的lookbehind(?<![.?!] )

如果使用PyPI regex模块，则可以使用可变长度lookback (?<![.?!] +)

网友

2楼 · 编辑于 2024-09-29 17:15:18

最简单的方法是编写for循环，检查列表元素的第一个字母是否大写。如果是，它将被附加到output列表中

output = []
for i in list_3:
    if i[0] == i[0].upper():
        output.append(i)
print(output)

我们也可以使用列表理解，并在一行中完成。我们还检查元素的第一个字母是否大写

output = [x for x in list_3 if x[0].upper() == x[0]]
print(output)

编辑

您希望将句子作为列表的一个元素，因此下面是解决方案。我们迭代list_3，然后使用split()函数迭代每个单词。然后我们检查这个词是否大写。如果是，则将其添加到output

list_3 = ["Remember your college application process? The tedious Common App applications, hours upon hours of research, ACT/SAT, FAFSA, visiting schools, etc. Do you remember who helped you through this process? Your family and guidance counselors perhaps, maybe your peers or you may have received little to no help"]
output = []
for i in list_3:
    for j in i.split():
        if j[0].isupper():
            output.append(j)
print(output)

网友

3楼 · 编辑于 2024-09-29 17:15:18

据我所知，你们有如下清单：

list_3 = [
  'First sentence. Another Sentence',
  'And yet one another. Sentence',
]

您正在对列表进行迭代，但每次迭代都会覆盖test变量，因此结果不正确。您必须在附加变量中累积结果，或在每次迭代中立即打印：

acc = []
for item in list_3:
  acc.extend(re.findall(regexp, item))
print(acc)

或

for item in list_3:
  print(re.findall(regexp, item))

至于regexp，它忽略了句子中的第一个单词，您可以使用

re.findall(r'(?<!\A)(?<!\.)\s+[A-Z]\w+', s)

(?<!\A)-不是字符串的开头
(?<!\.)-不是点后的第一个单词
\s+-点后的可选空格

您将收到可能以空格作为前缀的单词，下面是最后一个示例：

acc = []
for item in list_3:
  words = [w.strip() for w in re.findall(r'(?<!\A)(?<!\.)\s+[A-Z]\w+', item)]
  acc.extend(words)
print(acc)

相关问题更多 >

编程相关推荐

热门问题

热门文章