用Python链接解析文件

2024-09-27 21:32:53 发布

男 | 程序猿一只，喜欢编程写python代码。

我有一个需要解析的文件，它有很多链接，还有它的外观示例：

  <hm><w syst="whatrudoing" please="http://facebook.com.u/qwe-     
  pls/facebook?funn=wordlis&sys;sys;colorsdif_id=11908675">colors</p></hm>

 <hm><w syst="whatrudoing" please="http://facebook.com.u/qwe-
  pls/facebook?funn=wordlis&sys;sys;colorsdif_id=45103481">yelloW</p></hm>

  <td>I have a dream, and it is all good 2</hm>

 <hm><w syst="whatrudoing" please="http://facebook.com.u/qwe-    
  pls/facebook?funn=wordlis&sys;sys;colorsdif_id=40984930">orangE</p></hm>

 <hm><w syst="whatrudoing" please="http://facebook.com.u/qwe-
  pls/facebook?funn=wordlis&sys;sys;colorsdif_id=90648361">pinK</p></hm>

我只需要保留位于>；颜色<；位置的单词，所以我还想要>；黄色<；、>；橙色<；和>；粉色<；。你知道吗

在本例中，它们之间的共同表达式将是所有链接，除了数字（id，即它在所有链接中是不同的数字）和单词。你知道吗

在找到所有要保存在字典中的单词后，使用第一个元素作为键，其他元素作为元素，因此最终结果将是：

   d = {"colors": ["yelloW", "orangE", "pinK"]}

Tags： gt com id http facebook sys please hm

1条回答

网友

1楼 · 发布于 2024-09-27 21:32:53

您可以尝试以下方法：

import re
re.findall(r"http://[^>]+>(\w+)",ree)

其中：

[^>；]+-获取除>；以外的任何字符
\有信吗
（..）-返回括号之间的组

Python字典不支持相同的键。你可以看看this question。你知道吗

用Python链接解析文件

相关问题更多 >

编程相关推荐

热门问题

热门文章

用Python链接解析文件

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >