一维数组/数列中值的映射/绘制距离

2024-09-28 12:15:13 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个熊猫系列是这样的:

(['StartGame', 'TutorialEnded',  'FBConnect',
  'StartGame', 'Sale', 'FBConnect', 'InviteSent',
  'StartGame', 'Finish_1', 'Sale', 'Bought',
  'Finish_22',  'FBConnect', 'Finish_2',
  'TutorialEnded', 'Finish_18', ...])

我想绘制包含字符串Finish的值与值sale的外观之间的距离,以查看两者之间是否存在任何相关性,并检查其他单词相对于sale的外观之间的相关性。换句话说,我能用序列中任何值的出现来预测附近sale的发生吗?即使绘制一条散点线,在那里我给每个值分配一个不同的颜色,这样我就可以感觉到它会有帮助,但我不知道怎么做。你知道吗


Tags: 字符串距离颜色绘制序列sale单词外观
1条回答
网友
1楼 · 发布于 2024-09-28 12:15:13

设置

df = pd.DataFrame(['StartGame', 'TutorialEnded',  'FBConnect',
  'StartGame', 'Sale', 'FBConnect', 'InviteSent',
  'StartGame', 'Finish_1', 'Sale', 'Bought',
  'Finish_22',  'FBConnect', 'Finish_2',
  'TutorialEnded', 'Finish_18'], columns=['Value'])
df.index.name = 'position'
df.reset_index(inplace=True)

助手函数

def isFinish(x):
    """Returns True if Value matches 'Finish', False otherwise."""
    return bool(re.match(r'.*Finish.*', x.ix['Value']))

def isSale(x):
    """Returns True if Value matches 'Sale', False otherwise."""
    return bool(re.match(r'.*Sale.*', x.ix['Value']))

df['Finish'] = df.apply(isFinish, axis=1)
df['Sale'] = df.apply(isSale, axis=1)
df['FinishCount'] = df.Finish.cumsum()

def cumargmax(x):
    """get latest position of a Finish row."""
    if x.ix['FinishCount'] == 0:
        return np.nan
    else:
        return df.FinishCount.loc[:x.ix['position']].argmax()

df['Distance'] = df.position - df.apply(cumargmax, axis=1)

演示

print df

    position          Value Finish   Sale  FinishCount  Distance
0          0      StartGame  False  False            0       NaN
1          1  TutorialEnded  False  False            0       NaN
2          2      FBConnect  False  False            0       NaN
3          3      StartGame  False  False            0       NaN
4          4           Sale  False   True            0       NaN
5          5      FBConnect  False  False            0       NaN
6          6     InviteSent  False  False            0       NaN
7          7      StartGame  False  False            0       NaN
8          8       Finish_1   True  False            1       0.0
9          9           Sale  False   True            1       1.0
10        10         Bought  False  False            1       2.0
11        11      Finish_22   True  False            2       0.0
12        12      FBConnect  False  False            2       1.0
13        13       Finish_2   True  False            3       0.0
14        14  TutorialEnded  False  False            3       1.0
15        15      Finish_18   True  False            4       0.0

或者在出售的时候

print df[df.Sale]

   position Value Finish  Sale  FinishCount  Distance
4         4  Sale  False  True            0       NaN
9         9  Sale  False  True            1       1.0

相关问题 更多 >

    热门问题