TypeError:“<=”在执行Seaborn Histplot时,“int”和“str”的实例之间不受支持

2024-05-03 14:14:19 发布

您现在位置:Python中文网/ 问答频道 /正文

我在创建不同柱状图的子图时出现上述类型错误

为了提供一些上下文,我有一个大数据集,我必须将它分成几个单独的块来清理,以避免内存问题。我分别保存了每个块,然后将它们连接到另一个笔记本上

当我运行代码以使用分块数据帧创建子地块时,它工作正常,但当我再次使用串联数据运行子地块代码时,我得到一个类型错误。我不明白为什么,因为我没有真正改变任何事情

错误发生在以下位置:

enter image description here

我的完整代码

#Overall
CRev_All_age1 = df_optimized.groupby(['YearOnboarded', 'age_buckets']).sum().reset_index()
#Europe
CRev_EU = df_optimized.loc[df_optimized['Continents'] == 'Europe']
Plot_CRev_EU_age1 = CRev_EU.groupby(['YearOnboarded', 'age_buckets']).sum().reset_index()
#Asia
CRev_Asia = df_optimized.loc[df_optimized['Continents'] == 'Asia']
Plot_CRev_Asia_age1 = CRev_Asia.groupby(['YearOnboarded', 'age_buckets']).sum().reset_index()
#Other
CRev_Other = df_optimized.loc[(df_optimized['Continents'] != 'Europe') & (df_optimized['Continents'] != 'Asia')]
Plot_CRev_Other_age1 = CRev_Other.groupby(['YearOnboarded', 'age_buckets']).sum().reset_index()

fig, axes = plt.subplots(2,2, constrained_layout=True, figsize=(14,12))
ax1, ax2, ax3, ax4 =axes.flatten()

#plot1
ax1 = sns.histplot( data=CRev_All_age1, x="YearOnboarded", hue="age_buckets",weights="Revenue2", multiple="stack", discrete=True, shrink=.9, ax=ax1)
ax1.set_title('Overall - Client Revenue (Million)', fontsize=16, fontweight='bold')
ax1.tick_params('x', labelrotation=15)
ax1.set_ylabel('Revenue', fontsize=12)
ax1.set_xlabel('Year Onboarded', fontsize=12)
#plot2
ax2 = sns.histplot( data=Plot_CRev_EU_age1, x="YearOnboarded", hue="age_buckets",weights="Revenue2", multiple="stack", discrete=True, shrink=.9, ax=ax2)
ax2.set_title('Europe - Client Revenue (Million)', fontsize=14, fontweight='bold')
plt.setp(ax2.xaxis.get_majorticklabels(), rotation=15)
ax2.set_ylabel('Revenue', fontsize=12)
ax2.set_xlabel('Year Onboarded', fontsize=12)
#plot3
ax3 = sns.histplot( data=Plot_CRev_Asia_age1, x="YearOnboarded", hue="age_buckets",weights="Revenue2", multiple="stack", discrete=True, shrink=.9, ax=ax3)
ax3.set_title('Asia - Client Revenue (Million)', fontsize=14, fontweight='bold')
for tick in ax3.get_xticklabels():
    tick.set_rotation(15)
ax3.set_ylabel('Revenue', fontsize=12)
ax3.set_xlabel('Year Onboarded', fontsize=12)
#plot4
ax4 = sns.histplot( data=Plot_CRev_Other_age1, x="YearOnboarded", hue="age_buckets",weights="Revenue2", multiple="stack", discrete=True, shrink=.9, ax=ax4)
ax4.set_title('Other Continents - Client Revenue (Million)', fontsize=14, fontweight='bold')
ax4.tick_params(labelrotation=15)
ax4.set_ylabel('Revenue', fontsize=12)
ax4.set_xlabel('Year Onboarded', fontsize=12)

plt.show()

玩具数据

dataset = {'YearOnboarded': [2018,2019,2020,2016,2019,2020,2017,2019,2020,2018,2019,2020,2016,2016,2016,2017,2016,2018,2016],
           'Revenue2': [100,50,25,30,40,50,60,100,20,40,100,20,5,5,8,4,10,20,8],
           'age_buckets': ['18-30','30-39','40-49','50-59','18-30','30-39','40-49','50-59','18-30','30-39','40-49','50-59',
                           '18-30','30-39','40-49','50-59','18-30','30-39','40-49'],
           'Continents': ['Europe','Asia','Africa','Africa','Other','Asia','Africa','Other','America','America','Europe','Europe',
                      'Other','Europe','Asia','Africa','Asia','Europe','Other']}
df_optimized = pd.DataFrame(data=dataset)

如果有人能帮助我理解为什么会发生这种情况以及如何解决这个问题,我将不胜感激

谢谢大家!

编辑:找到问题的来源和解决方法。将每个区块数据集导入新内核时,其中一列具有混合数据类型。使用.astype('category')转换具有混合数据类型的列并不能解决我的问题,因此,在使用read_csv{}导入数据时,我必须更改数据类型,这是有效的


Tags: dfageotherseteuropeasiabucketsfontsize
1条回答
网友
1楼 · 发布于 2024-05-03 14:14:19

此问题可能源于Revenue2中的一个字符,pandas在从用于保存数据块的任何文件类型加载数据时都无法将该字符识别为整数。pandas将整个列作为对象读取,即使列中只有一个元素不能解释为整数。在这个例子中,我使用了-来表示这个字符串字符,没有等效的整数。
如果运行此代码:

import pandas as pd
import seaborn as sns
df = pd.DataFrame({'YearOnboarded': [2018,2019,2020,2016,2019,2020,2017,2019,2020,2018,2019,2020,2016,2016,2016,2017,2016,2018,2016],
           'Revenue2': ["-",50,25,30,40,50,60,100,20,40,100,20,5,5,8,4,10,20,8],
           'age_buckets': ['18-30','30-39','40-49','50-59','18-30','30-39','40-49','50-59','18-30','30-39','40-49','50-59',
                           '18-30','30-39','40-49','50-59','18-30','30-39','40-49'],
           'Continents': ['Europe','Asia','Africa','Africa','Other','Asia','Africa','Other','America','America','Europe','Europe',
                      'Other','Europe','Asia','Africa','Asia','Europe','Other']})

df['Revenue2'] = df['Revenue2'].astype(int)

您将得到以下错误:

ValueError: invalid literal for int() with base 10: '-'

这很有用,因为它指示第一个违规字符,然后您可以用填充符替换该字符,然后重试:

df['Revenue2'] = df.Revenue2.astype(str).str.replace('-','0').astype(int)
df['Revenue2'] = df['Revenue2'].astype(int)

最后,我认为您应该能够删除所有无效字符,并拥有一个全是整数的列

相关问题 更多 >