擅长:python、mysql、java
<p>执行groupby操作,然后应用自定义聚合函数:</p>
<pre><code>def summarize(group):
has_xml = group['links'].str.contains(r'\.xml')
has_archive = group['links'].str.contains('archive')
return group[has_xml | has_archive] if has_xml.any() and has_archive.any() else None
df.groupby('url').apply(summarize).reset_index(0, drop=True)
</code></pre>
<p>结果:</p>
<pre><code> url links title
8 https://example333.com /atom.xml EXAMPLE333
9 https://example333.com /archives EXAMPLE333
11 https://example333.com /archives EXAMPLE333
</code></pre>