我正在努力做到以下几点:
我已经能够通读文本并从中获取我需要的其他数据,但到目前为止,我还无法做到上面提到的
我已尝试实现以下示例:Python - Search Text File For Any String In a List 但我没能让它读对
我还试图修改以下内容:https://www.geeksforgeeks.org/python-finding-strings-with-given-substring-in-list/ 但我也同样不成功
以下是我的一些代码:
import re
from itertools import islice
import os
# list of all countries
oneCountries = "Afghanistan, Albania, Algeria, Andorra, Angola, Antigua & Deps, Argentina, Armenia, Australia, Austria, Azerbaijan, Bahamas, Bahrain, Bangladesh, Barbados, Belarus, Belgium, Belize, Benin, Bhutan, Bolivia, Bosnia Herzegovina, Botswana, Brazil, Brunei, Bulgaria, Burkina, Burma, Burundi, Cambodia, Cameroon, Canada, Cape Verde, Central African Rep, Chad, Chile, China, Republic of China, Colombia, Comoros, Democratic Republic of the Congo, Republic of the Congo, Costa Rica,, Croatia, Cuba, Cyprus, Czech Republic, Danzig, Denmark, Djibouti, Dominica, Dominican Republic, East Timor, Ecuador, Egypt, El Salvador, Equatorial Guinea, Eritrea, Estonia, Ethiopia, Fiji, Finland, France, Gabon, Gaza Strip, The Gambia, Georgia, Germany, Ghana, Greece, Grenada, Guatemala, Guinea, Guinea-Bissau, Guyana, Haiti, Holy Roman Empire, Honduras, Hungary, Iceland, India, Indonesia, Iran, Iraq, Republic of Ireland, Israel, Italy, Ivory Coast, Jamaica, Japan, Jonathanland, Jordan, Kazakhstan, Kenya, Kiribati, North Korea, South Korea, Kosovo, Kuwait, Kyrgyzstan, Laos, Latvia, Lebanon, Lesotho, Liberia, Libya, Liechtenstein, Lithuania, Luxembourg, Macedonia, Madagascar, Malawi, Malaysia, Maldives, Mali, Malta, Marshall Islands, Mauritania, Mauritius, Mexico, Micronesia, Moldova, Monaco, Mongolia, Montenegro, Morocco, Mount Athos, Mozambique, Namibia, Nauru, Nepal, Newfoundland, Netherlands, New Zealand, Nicaragua, Niger, Nigeria, Norway, Oman, Ottoman Empire, Pakistan, Palau, Panama, Papua New Guinea, Paraguay, Peru, Philippines, Poland, Portugal, Prussia, Qatar, Romania, Rome, Russian Federation, Rwanda, St Kitts & Nevis, St Lucia, Saint Vincent & the Grenadines, Samoa, San Marino, Sao Tome & Principe, Saudi Arabia, Senegal, Serbia, Seychelles, Sierra Leone, Singapore, Slovakia, Slovenia, Solomon Islands, Somalia, South Africa, Spain, Sri Lanka, Sudan, Suriname, Swaziland, Sweden, Switzerland, Syria, Tajikistan, Tanzania, Thailand, Togo, Tonga, Trinidad & Tobago, Tunisia, Turkey, Turkmenistan, Tuvalu, Uganda, Ukraine, United Arab Emirates, United Kingdom, United States, Uruguay, Uzbekistan, Vanuatu, Vatican City, Venezuela, Vietnam, Yemen, Zambia, Zimbabwe"
countries = oneCountries.split(",")
path = "C:/Users/me/Desktop/read.txt"
thefile = open(path, errors='ignore')
countryParsing = False
for line in thefile:
line = line.strip()
# if line.startswith("Submitting Author:"):
# if re.match(r"Submitting Author:", line):
# print("blahblah1")
# countryParsing = True
# if countryParsing == True:
# print("blahblah2")
#
# res = [x for x in line if re.search(countries, x)]
# print("blah blah3: " + str(res))
# elif re.match(r"Running Head:", line):
# countryParsing = False
# if countryParsing == True:
# res = [x for x in line if re.search(countries, x)]
# print("blah blah4: " + str(res))
# for x in countries:
# if x in thefile:
# print("a country is: " + x)
# if any(s in line for s in countries):
# listOfAuthorCountries = listOfAuthorCountries + s + ", "
# if re.match(f"Submitting Author:, line"):
#注释掉的行是我尝试过的代码版本,但未能正常工作
根据要求,这是我试图从中获取数据的文本文件的一个示例。我对其进行了修改以删除敏感信息,但在这种情况下,“新列表”应附加一定数量的“法国”条目:
txt above....
Submitting Author:
asdf, asdf (proxy)
France
asdfasdf
blah blah
asdfasdf
asdf, Provence-Alpes-Côte d'Azu 13354
France
blah blah
France
asdf
Running Head:
...more text below
我认为您的主要问题是,在
oneCountries
中,国家名称用逗号+空格分隔,但您只使用逗号分隔,因此countries
中的第二个条目是" Albania"
,前面有空格。您需要更改:致:
在那之后,在你注释掉的代码中似乎有足够的有用的东西来实现你想要的
基于您所陈述的三点,即您希望实现的目标,以及我从您的代码中所了解的内容(这可能不是您想要的),我建议:
编辑摘要
调整代码以处理提供的文本示例
修正了国家列表(最初哥斯达黎加之后有两个逗号)
相关问题 更多 >
编程相关推荐