Python中列的复杂分离

2024-10-04 11:31:28 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个数据框df,在这里我希望在我的列中用特定的值分隔来显示百分比。我想要第一个用“-”分隔的“单词”及其#T值。例如(12T)基本上,如果第一个单词匹配,以及它的#T(例如12T、50T等),我将对类型进行分组:,一些#T值如下所示:6.4T,而一些“第一个”单词看起来像:AY14.5

更新

There are        2  unique counts    of   Hello-HEL-HE-A6123-123A-12T
                 2  unique counts    of   Hello-HEL-HE-A6123-123A-50T
                 1  unique count(s)  of   Happy-HAP-HA-R650-570A-90T
                 2  unique counts    of   Kind-KIN-KI-T490-NET_14T-A.0
                 2  unique counts    of   AY14.5-fyy-FY-R770-256G-6.4T-R1-v.A

数据:

Type

Hello-HEL-HE-A6123-123A-12T_TYPE-v.A    
Hello-HEL-HE-A6123-123A-12T_TYPE-v.E    
Hello-HEL-HE-A6123-123A-50T_TYPE-v.C    
Hello-HEL-HE-A6123-123A-50T_TYPE-v.A    
Happy-HAP-HA-R650-570A-90T_version-v.A  
Kind-KIN-KI-T490-NET_14T-A.0
Kind-KIN-KI-T490-NET_14T-A.0       
AY14.5-fyy-FY-R770-256G-6.4T-R1-v.A
AY14.5-fyy-FY-R770-256G-6.4T-R1-v.A

所需:

Type                                    Percent

Hello    12T                        22.2%
Hello    50T                        22.2%
Happy    90T                        11.1%
Kind     14T                        22.2%
AY14.5   6.4T                       22.2%                 

正在做:

df=df.assign(Type=df.Type.str.split('_').str[0]) df2= 
(df['Type'].value_counts(normalize=True)*100).to_frame('%') 
print(df2.rename_axis(index='Type'))

但是,我的输出显示的是完整的类型(名称),并且不包括“\uz”出现的情况 在#T(例如12T)之前

Type     
                         
Hello-HEL-HE-A6123-123A-12T 

感谢您的建议,我仍在排除故障

更新

所需:

Type                                    Percent

Hello    12T                        22.2%
Hello    50T                        22.2%
Happy    90T                        11.1%
Kind     14T                        22.2%
AY14.5   6.4T                       22.2%        

Tags: ofhellodftype单词heuniquehappy
2条回答
import pandas as pd

def extract_value(s):
    regex = re.search(r'(^.+?)-.+?(\d+(?:\.\d+)?T)', s)
    if regex:
        first_word = regex.group(1)
        code = regex.group(2)
        return f'{first_word} {code}'
    return s

df = pd.DataFrame([['Hello-HEL-HE-A6123-123A-12T_TYPE-v.A'],
                   ['Hello-HEL-HE-A6123-123A-12T_TYPE-v.E'],
                   ['Hello-HEL-HE-A6123-123A-50T_TYPE-v.C'],
                   ['Hello-HEL-HE-A6123-123A-50T_TYPE-v.A'],
                   ['Happy-HAP-HA-R650-570A-90T_version-v.A'],
                   ['Kind-KIN-KI-T490-NET_14T-A.0'],
                   ['Kind-KIN-KI-T490-NET_14T-A.0'],
                   ['AY14.5-fyy-FY-R770-256G-6.4T-R1-v.A'],
                   ['AY14.5-fyy-FY-R770-256G-6.4T-R1-v.A']])

df.columns = ['Type']
df['Type'] = df['Type'].apply(lambda x: extract_value(x))
df2 = (df['Type'].value_counts(normalize=True)*100).to_frame('%')
print(df2.rename_axis(index='Type').reset_index())

输出:

          Type          %
0  AY14.5 6.4T  22.222222
1     Kind 14T  22.222222
2    Hello 50T  22.222222
3    Hello 12T  22.222222
4    Happy 90T  11.111111
import pandas as pd
df = pd.DataFrame({'Type': {0: 'Hello-HEL-HE-A6123-123A-12T_TYPE-v.A',
  1: 'Hello-HEL-HE-A6123-123A-12T_TYPE-v.E',
  2: 'Hello-HEL-HE-A6123-123A-50T_TYPE-v.C',
  3: 'Hello-HEL-HE-A6123-123A-50T_TYPE-v.A',
  4: 'Happy-HAP-HA-R650-570A-90T_version-v.A',
  5: 'Kind-KIN-KI-T490-NET_14T-A.0',
  6: 'Kind-KIN-KI-T490-NET_14T-A.0'}})  

df.Type = df.Type.str.split('-').str[0] + ' ' + df.Type.str.extract('(\d+T)')
(df.groupby('Type').size() / len(df) * 100).to_frame('Percent').reset_index()

输出

        Type    Percent
0   Hello 12T   28.571429
1   Hello 14T   28.571429
2   Hello 50T   28.571429
3   Hello 90T   14.285714

相关问题 更多 >