<p>如果到<code>?</code>或<code>/</code>的字符长度相等,则可以使用<a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.iloc.html" rel="nofollow">^{<cd1>}</a>选择带有<a href="http://pandas.pydata.org/pandas-docs/stable/text.html#indexing-with-str" rel="nofollow">indexing with str</a>的列:</p>
<pre><code>print df.iloc[:,0].str[:7]
0 /page_1
1 /page_1
2 /page_1
3 /page_2
Name: 0, dtype: object
print df.groupby(df.iloc[:,0].str[:7]).sum().reset_index()
0 1
0 /page_1 4
1 /page_2 10
</code></pre>
<p>或:</p>
<pre><code>print df.groupby([df.iloc[:,0].str[:7], df.iloc[:,1]]).sum().reset_index()
0 1 2
0 /page_1 China 3
1 /page_1 US 1
2 /page_2 Britain 10
</code></pre>
<p>如果长度不相等,请使用<a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.iloc.html" rel="nofollow">^{<cd1>}</a>选择带有<a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.str.extract.html" rel="nofollow">^{<cd5>}</a>的列:</p>
<pre><code>print df
0 1 2
0 /paaaage_1 China 1
1 /paaaage_1?x=123 China 2
2 /page_1/subpage_1 US 1
3 /page_2 Britain 10
xpr = re.compile('/([^/?]+)')
print df.iloc[:,0].str.extract(xpr)
0 paaaage_1
1 paaaage_1
2 page_1
3 page_2
print df.groupby([df.iloc[:,0].str.extract(xpr), df.iloc[:,1]]).sum().reset_index()
0 1 2
0 paaaage_1 China 3
1 page_1 US 1
2 page_2 Britain 10
</code></pre>