擅长:python、mysql、java
<p>如果要获取下划线之前的所有字段:</p>
<pre><code>key = df.Type.str.split(r'_', n=1, expand=True)[0]
key
# out:
0 Hello-HEL-HE-A6123-123A-12T
1 Hello-HEL-HE-A6123-123A-12T
2 Hello-HEL-HE-A6123-123A-12T
3 Hello-HEL-HE-A6123-123A-50T
4 Hello-HEL-HE-A6123-123A-50T
5 Happy-HAP-HA-R650-570A-90T
</code></pre>
<p>如果您希望使用前三个单词,最后一个单词位于下划线之前,则:</p>
<pre><code>a = df.Type.str.split(r'_', n=1, expand=True)[0].str.split(r'-', expand=True)
sel = list(a.columns)
sel = sel[1:3] + sel[-1:]
key = a[0].str.cat(a[sel], '-')
key
# out:
0 Hello-HEL-HE-12T
1 Hello-HEL-HE-12T
2 Hello-HEL-HE-12T
3 Hello-HEL-HE-50T
4 Hello-HEL-HE-50T
5 Happy-HAP-HA-90T
</code></pre>
<p>在任何一种情况下,您都可以按该键分组:</p>
<pre><code>cnt = df.groupby(key)['Value'].count()
100 * cnt / cnt.sum()
# out:
Happy-HAP-HA-90T 16.666667
Hello-HEL-HE-12T 50.000000
Hello-HEL-HE-50T 33.333333
</code></pre>