通过正则表达式从正则表达式添加字符串问题的回答

通过正则表达式从正则表达式添加字符串

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

下面是由不同字符串组成的混乱数据示例。我想将URL转换为网站格式类型，包括protocoll和给定路径。排除其他因素就不那么重要了 以下是熊猫系列： <pre><code>0 None 1 http://fakeurl.com/example/fakeurl 2 https://www.qwer.com/example/qwer 3 None 4 test.com/example/test 5 None 6 123135123 7 nourlhere 8 lol 9 hello.tv 10 nolink 11 ihavenowebsite.com </code></pre> 在我的代码中，我首先要将所有URL转换为简单的<code>domain.com</code>+路径（如果有），然后使用正则表达式添加协议。在第二个正则表达式中，我想用以下模式<code>https://www.example.com/example/example</code>将路径添加到没有路径的路径中，因此路径的结尾应该重复域名 代码： <pre><code>def change_by_regexp(dfc, regexp, string): dfc[~dfc.str.match(regexp)==False] = string example = pd.Series(['None', 'http://fakeurl.com/example/fakeurl', 'https://www.qwer.com/example/qwer', 'None', 'test.com/example/test', 'None', '123135123', 'nourlhere', 'lol', 'hello.tv', 'nolink', 'ihavenowebsite.com']) example = example.map(lambda x: x.replace('https://www.', '')) example = example.map(lambda x: x.replace('www.', '')) example = example.map(lambda x: x.replace('https://', '')) example = example.map(lambda x: x.replace('http://', '')) change_by_regexp(example, r'([-a-zA-Z0-9\u0080-\u024F@:%._\+~#=]{1,256})\.[a-zA-Z0-9()]{1,6}\b','http://www.' + example) change_by_regexp(example, r'^((http[s]?|ftp):\/)?\/?([-a-zA-Z0-9\u0080-\u024F@:%._\+~#=]{1,256})\.[a-zA-Z0-9()]{1,6}\b$', example + '/example/') print(example) </code></pre> 输出： <pre><code>0 None 1 http://www.fakeurl.com/example/fakeurl 2 http://www.qwer.com/example/qwer 3 None 4 http://www.test.com/example/test 5 None 6 123135123 7 nourlhere 8 lol 9 http://www.hello.tv/example/ 10 nolink 11 http://www.ihavenowebsite.com/example/ dtype: object </code></pre> 现在是否有方法获取主机名并在路径末尾返回它？是否可以使用另一个正在搜索主机名并返回它的正则表达式来实现这一点？我只是还没有找到一个好的解决办法。到达我的 预期产出： <pre><code>0 None 1 http://www.fakeurl.com/example/fakeurl 2 http://www.qwer.com/example/qwer 3 None 4 http://www.test.com/example/test 5 None 6 123135123 7 nourlhere 8 lol 9 http://www.hello.tv/example/hello 10 nolink 11 http://www.ihavenowebsite.com/example/ihavenowebsite dtype: object </code></pre>

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

通过正则表达式从正则表达式添加字符串

1 个回答

相关Python问题