<p>不含<code>groupby</code>的溶液:</p>
<pre><code>#rename columns
df = df.rename(columns={'v1':'v2'})
#get counter
counter= df.v2.str.contains('country').cumsum()
#get mask where are changed country to city
df.insert(0, 'v1', df.loc[counter.ne(counter.shift()), 'v2'])
#forward filling NaN
df.v1 = df.v1.ffill()
#remove rows where v1 == v2
df = df[df.v1.ne(df.v2)].reset_index(drop=True)
print (df)
v1 v2
0 Belgium[country] Antwerp[city]
1 Belgium[country] Gent[city]
2 France[country] Paris[city]
3 France[country] Marseille[city]
4 France[country] Toulouse[city]
5 Spain[country] Madrid[city]
</code></pre>
<p>时间安排:</p>
<pre><code>In [189]: %timeit (jez(df))
100 loops, best of 3: 2.47 ms per loop
In [191]: %timeit (IanS(df1))
100 loops, best of 3: 5.06 ms per loop
</code></pre>
<p><strong>计时代码</strong>:</p>
<pre><code>def jez(df):
df = df.rename(columns={'v1':'v2'})
counter= df.v2.str.contains('country').cumsum()
df.insert(0, 'v1', df.loc[counter.ne(counter.shift()), 'v2'])
df.v1 = df.v1.ffill()
df = df[df.v1.ne(df.v2)].reset_index(drop=True)
return (df)
def IanS(df):
counter = df['v1'].str.contains('country').cumsum()
result = df.groupby(counter).apply(lambda g: g[1:]).reset_index(level=1, drop=True)
result = result.rename(columns={'v1': 'v2'}).reset_index(drop=False)
result['v1'] = result['v1'].replace(df.groupby(counter).first().squeeze())
return (result)
</code></pre>