考虑文本1:
What is Lorem Ipsum:
Lorem Ipsum is simply dummy text of the printing and typesetting industry.Where does it come from:
Contrary to popular belief, Lorem Ipsum is not simply random text.Why do we use it:
It is a long established fact that a reader will be distracted by the readable content of a page when looking at its layout.
文本2:
What is Lorem Ipsum:
Lorem Ipsum is simply dummy text of the printing and typesetting industry.Other Topic:
There are many variations of passages of Lorem Ipsum available.Why do we use it:
It is a long established fact that a reader will be distracted by the readable content of a page when looking at its layout.
文本3:
What is Lorem Ipsum:
Lorem Ipsum is simply dummy text of the printing and typesetting industry.Where does it come from:
Contrary to popular belief, Lorem Ipsum is not simply random text.Some other topic:
Various versions have evolved over the years.
我可以使用python处理这个文本,在开始字符串和结束字符串之间进行提取。我使用的代码-
# This code is run once separately for each text variation
import sys
s = "text1 or text2 or text3" # one at a time
start_String = s.find("What is Lorem Ipsum:")
end_String = s.find("Why do we use it:")
if start_String == -1 or end_String == -1:
print("Not found")
sys.exit(0)
print(s[start_String:end_String])
但我的要求不同。 我需要的文本只涉及“什么是洛雷姆Ipsum:”,“它从哪里来:”,“我们为什么要使用它:”
预期结果:
文本1:
What is Lorem Ipsum:
Lorem Ipsum is simply dummy text of the printing and typesetting industry.Where does it come from:
Contrary to popular belief, Lorem Ipsum is not simply random text.Why do we use it:
It is a long established fact that a reader will be distracted by the readable content of a page when looking at its layout.
文本2:
What is Lorem Ipsum:
Lorem Ipsum is simply dummy text of the printing and typesetting industry.Why do we use it:
It is a long established fact that a reader will be distracted by the readable content of a page when looking at its layout.
文本3:
What is Lorem Ipsum:
Lorem Ipsum is simply dummy text of the printing and typesetting industry.Where does it come from:
Contrary to popular belief, Lorem Ipsum is not simply random text.
我在一个巨大的数据集中收集了如上所述的文本。所有我需要做的是提取只需要根据必要的主题子文本。如何在python中实现这一点?我希望我说的有道理
这正是你想要的:
相关问题 更多 >
编程相关推荐