擅长:python、mysql、java
<h3><code>pandas.factorize</code>和<code>numpy.bincount</code></h3>
<ol>
<li>如果不计算立即重复的值,则删除它们。你知道吗</li>
<li>对剩下的值进行正常计数。你知道吗</li>
<li>然而,这是一个比什么要求,所以减去一。你知道吗</li>
</ol>
<hr/>
<ol>
<li><code>factorize</code></li>
<li>过滤掉即时重复</li>
<li><code>bincount</code></li>
<li>产生<code>pandas.Series</code></li>
</ol>
<hr/>
<pre><code>i, r = pd.factorize(df.Cookie)
mask = np.append(True, i[:-1] != i[1:])
cnts = np.bincount(i[mask]) - 1
pd.Series(cnts, r)
A 2
B 1
C 0
D 2
E 0
dtype: int64
</code></pre>
<hr/>
<h3><code>pandas.value_counts</code></h3>
<p><code>zip</code>具有滞后自我的cookie,拉出非重复的</p>
<pre><code>c = df.Cookie.tolist()
pd.value_counts([a for a, b in zip(c, [None] + c) if a != b]).sort_index() - 1
A 2
B 1
C 0
D 2
E 0
dtype: int64
</code></pre>
<hr/>
<h3><code>defaultdict</code></h3>
<pre><code>from collections import defaultdict
def count(s):
d = defaultdict(lambda:-1)
x = None
for y in s:
d[y] += y != x
x = y
return pd.Series(d)
count(df.Cookie)
A 2
B 1
C 0
D 2
E 0
dtype: int64
</code></pre>