在python中解析字符串/文件时，如何忽略第一行（或某些行）？

from bs4 import BeatifulSoup import urllib.request country_id = 56451 country_url = "http://www.blocgame.com/stats.php?id=" + str(country_id) country_source = urllib.request.urlopen(country_url) country_page = BeautifulSoup(country_source, 'html.parser') country_text = country_page.get_text() #This is a game where each player owns their country in the cold war #This checks for the airforce level def check_airfoce_cosmetic(): cforce_slice1 = int(country_text.index("Airforce:")) cforce_slice2 = int(country_text.index("Navy:")) country_airforce_cosmetic = country_text[cforce_slice1:cforce_slice2] print(country_airforce_cosmetic + "\n\n") #However, the player might have something in their description bragging about their airforce.

2条回答

网友

1楼 · 编辑于 2024-10-02 20:43:45

通过使用bs4的decompose删除HTML中潜在的危险部分，我已经解决了这样一个问题。例如，对于如下所示的page_html soup：

<about>Not important, and possibly dangerous!</about> <stats>This is the important part.</stats>

我会做一些类似的事情：

not_needed = page_html.about not_needed.decompose()

你只剩下<stats>This is the important part.</stats>。因此，您可以完全删除用户的个人描述，然后安全地提取您需要的任何内容。在

网友

2楼 · 编辑于 2024-10-02 20:43:45

如果不想使用索引，可以将文本作为字符串拉出，然后解析该字符串。不是最优雅，但很管用。在

country_id = 56451
country_url = "http://www.blocgame.com/stats.php?id=" + str(country_id)
country_source = urllib.request.urlopen(country_url)

country_page = BeautifulSoup(country_source)
country_text = country_page.text
rawAirforce = country_text.split("Airforce:")[1]
navyArray = rawAirforce.split("Navy:")
airforce = navyArray[0].strip()
navy = navyArray[1].split("Chemical Weapons:")[0].strip()
print("Airforce: " + airforce + " Navy: " + navy)

生产

^{pr2}$

相关问题更多 >

编程相关推荐

热门问题

热门文章