如何使用行索引创建基于函数的计算列

2024-09-28 01:26:29 发布

您现在位置:Python中文网/ 问答频道 /正文

我的数据如下

BINS
SKILL      object
LOGIN      object
50.0      float64
100.0     float64
150.0     float64
200.0     float64
250.0     float64
300.0     float64
350.0     float64
400.0     float64
450.0     float64
500.0     float64
550.0     float64
600.0     float64
650.0     float64
700.0     float64
750.0     float64
800.0     float64
850.0     float64
900.0     float64
950.0     float64
1000.0    float64
dtype: object

以下是数据示例:HMDrr.头部()价值观

array([[‘Skill1’, ‘loginA’, 0.07090909090909091, 0.25, 0.35,
        0.147619047619047616, 0.057823529411764705, 0.0,
        0.0, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
        nan],
       [‘Skill1’, ‘loginB’, nan, nan, nan, nan, nan, nan, nan,
        nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan],
       [‘Skill1’, ‘loginC’, 0.15, nan, nan, nan, nan, nan, nan,
        nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan],
       [‘Skill1’, ‘loginD’, 0.3333333333333333,
        0.1857142857142857, 0.0, 0.15, 0.1, 0.0, 0.05666666666666667,
        0.06692307692307693, 0.05692307692307693, 0.13529411764705882, 0.1,
        0.0, nan, nan, nan, nan, nan, nan, nan, nan],
       [‘Skill1’, ‘loginE’, 0.1, 0.0, nan, nan, nan, nan, nan,
        nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan]], dtype=object)

我有按工作类型(技能)列出的员工数据(登录)。数字列是容器。每个bin包含前50次交互的性能结果,然后是100次交互,以此类推。我需要通过技能和登录计算坡度和截距,以便创建新的员工绩效渐变计划。你知道吗

为此,我构建了以下内容:

#Bins for contacts
startBin = 0.0
stopBin = 1000.0
incrementBin = 50.0
sortBins = np.arange(startBin, stopBin + incrementBin, incrementBin)
binLabels = np.arange(startBin + incrementBin, stopBin + incrementBin, incrementBin)

#Caculate logarithimic slope in HMDrr Dataset
def calc_slope(z):
    y = HMDrr.loc[z,binLabels].dropna()
    number = y.count()+1
    y = y.values.astype(float)
    x = np.log(range(1,number,1))
    slope, intercept, r, p, stderr = linregress(x, y)
    return slope
#Caculate logarithimic intercept in HMDrr Dataset
def calc_intercept(z):
    y = HMDrr.loc[z,binLabels].dropna()
    number = y.count()+1
    y = y.values.astype(float)
    x = np.log(range(1,number,1))
    slope, intercept, r, p, stderr = linregress(x, y)
    return intercept

当我通过手动放置z值运行时,它运行良好:

calc_slope(10)
-0.018236067481219649

我想在使用上述函数创建的df中创建SLOPE和INTERCEPT列。你知道吗

我尝试过多种方法,例如:

HMDrr['SLOPE'] = calc_slope(HMDrr.index)

TypeError                                 Traceback (most recent call last)
<ipython-input-717-4a58ad29d7b0> in <module>()
----> 1 HMDrr['SLOPE'] = calc_slope(HMDrr.index)

<ipython-input-704-26a18390e20c> in calc_slope(z)
      7 def calc_slope(z):
      8     y = HMDrr.loc[z,binLabels].dropna()
----> 9     x = np.log(range(1,y.count()+1,1))
     10     slope, intercept, r, p, stderr = linregress(x, y)
     11     return slope

C:\Anaconda\lib\site-packages\pandas\core\series.pyc in wrapper(self)
     67             return converter(self.iloc[0])
     68         raise TypeError(
---> 69             "cannot convert the series to {0}".format(str(converter)))
     70     return wrapper
     71 

TypeError: cannot convert the series to <type 'int'>

我也尝试过使用apply函数,但很可能是我做错了。我的猜测是,我要么没有正确地为列应用函数,要么得到的值不是整数。我已经试了好几天了,所以现在我要崩溃去寻求帮助。。。。你知道吗

如何使用上述函数生成列,以便获得特定于行的数据?你知道吗


Tags: 数据innumberreturnobjectnpcalcnan
1条回答
网友
1楼 · 发布于 2024-09-28 01:26:29

也许不是最好的方法,但我用下面的方法解决了这个问题。你知道吗

建立了一个计算线性函数来返回斜率和截距:

#Caculate logarithimic slope and intercept in HMDrr Dataset
def calc_linear(z):
    y = HMDrr.loc[z,binLabels].dropna()
    number = y.count()+1
    y = y.values.astype(float)
    x = np.log(range(1,number,1))
    slope, intercept, r, p, stderr = linregress(x, y)
    return slope, intercept

为数据创建了空列:

#Create metric columns
HMDrr['SLOPE'] = ""
HMDrr['INTERCEPT'] = ""

运行for循环来填充列:

#For loop to calculate metrics
for x in range(0,HMDrr.SLOPE.count()):
    values = calc_linear(x)
    HMDrr.SLOPE[x] = values[0]
    HMDrr.INTERCEPT[x] = values[1]

如果有更干净的方法,那么我很乐意听到:)

相关问题 更多 >

    热门问题