Pandas:制造新的作为列值存储的列表中的列

2024-09-26 18:17:22 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个Dataframe,它的列值如下所示:

[
    {
      "OrderID" : "0",
      "TimeStamp" : "2019-09-24 10:17:48 +0000",
      "Screen" : "Home_Screen",
      "StateVars" : "",
      "Event" : "A"
    },
    {
      "Event" : "B",
      "TimeStamp" : "2019-09-24 10:17:38 +0000",
      "Screen" : "Home_Screen",
      "StateVars" : "",
      "OrderID" : "0"
    },
    {
      "OrderID" : "0",
      "TimeStamp" : "2019-09-24 10:17:35 +0000",
      "Screen" : "Home_Screen",
      "StateVars" : "",
      "Event" : "D"
    },
    {
      "Event" : "V",
      "TimeStamp" : "2019-09-24 10:17:33 +0000",
      "Screen" : "Home_Screen",
      "StateVars" : "",
      "OrderID" : "0"
    },
    {
      "OrderID" : "0",
      "TimeStamp" : "2019-09-24 10:17:32 +0000",
      "Screen" : "Home_Screen",
      "StateVars" : "",
      "Event" : "C"
    }
  ]

我要把所有的键列起来。 因此,原始数据帧如下所示:


+----+------------+-------------+---------+---------------------------------------+----------------------------------------------------+-------------+------+------+------+------+------+-----+
|    | O          | v           | S       |               I                       |                     EventLog                       | CustomerID  |  a   |  b   |  c   |  d   |  e   |  f  |
+----+------------+-------------+---------+---------------------------------------+----------------------------------------------------+-------------+------+------+------+------+------+-----+
| 0  |      1     |        0.4  |  OS     | 92D42D7E-68F0-4688-83C5-781920E05335  | [{'OrderID': '0', 'TimeStamp': '2019-09-24 10:...  |         1   | NaN  | NaN  | NaN  | NaN  | NaN  | NaN |
| 1  |      1     |        0.4  |  OS     | 92D42D7E-68F0-4688-83C5-781920E05335  | [{'OrderID': '0', 'TimeStamp': '2019-09-24 10:...  |         1   | NaN  | NaN  | NaN  | NaN  | NaN  | NaN |
| 2  |      1     |        0.4  |  OS     | 92D42D7E-68F0-4688-83C5-781920E05335  | [{'OrderID': '0', 'TimeStamp': '2019-09-24 10:...  |         1   | NaN  | NaN  | NaN  | NaN  | NaN  | NaN |
| 3  |      1     |        0.4  |  OS     | 92D42D7E-68F0-4688-83C5-781920E05335  | [{'OrderID': '0', 'TimeStamp': '2019-09-24 10:...  |         1   | NaN  | NaN  | NaN  | NaN  | NaN  | NaN |
| 4  |      1     |        0.4  |  OS     | 92D42D7E-68F0-4688-83C5-781920E05335  | [{'OrderID': '0', 'TimeStamp': '2019-09-24 10:...  |         15  | NaN  | NaN  | NaN  | NaN  | NaN  | NaN |
+----+------------+-------------+---------+---------------------------------------+----------------------------------------------------+-------------+------+------+------+------+------+-----+

我在找这样的东西


+----+------------+-------------+---------+---------------------------------------+----------------------------------------------------+-------------+------+----------------------------+--------------+------------+------+
|    | O          | v           | S       |               I                       |                     EventLog                       | CustomerID  |OrdeID|  TimeStamp                 |Screen        | StarsVar   |Event |
+----+------------+-------------+---------+---------------------------------------+----------------------------------------------------+-------------+------+----------------------------+--------------+------------+------+
| 0  |      1     |        0.4  |  OS     | 92D42D7E-68F0-4688-83C5-781920E05335  | [{'OrderID': '0', 'TimeStamp': '2019-09-24 10:...  |         1   | 0    | 2019-09-24 10:17:33 +0000  | Home_Screen  | NaN        | A    |
| 1  |      1     |        0.4  |  OS     | 92D42D7E-68F0-4688-83C5-781920E05335  | [{'OrderID': '0', 'TimeStamp': '2019-09-24 10:...  |         1   | 0    | 2019-09-24 10:17:33 +0000  | Home_Screen  | NaN        | B    |
| 2  |      1     |        0.4  |  OS     | 92D42D7E-68F0-4688-83C5-781920E05335  | [{'OrderID': '0', 'TimeStamp': '2019-09-24 10:...  |         1   | 0    | 2019-09-24 10:17:33 +0000  | Home_Screen  | NaN        | C    |
| 3  |      1     |        0.4  |  OS     | 92D42D7E-68F0-4688-83C5-781920E05335  | [{'OrderID': '0', 'TimeStamp': '2019-09-24 10:...  |         1   | 0    | 2019-09-24 10:17:33 +0000  | Home_Screen  | NaN        | D    |
| 4  |      1     |        0.4  |  OS     | 92D42D7E-68F0-4688-83C5-781920E05335  | [{'OrderID': '0', 'TimeStamp': '2019-09-24 10:...  |         1   | 0    | 2019-09-24 10:17:33 +0000  | Home_Screen  | NaN        | E    |
+----+------------+-------------+---------+---------------------------------------+----------------------------------------------------+-------------+------+----------------------------+--------------+------------+------+

不必像上面的输出那样删除列。你知道吗


Tags: eventdataframehome原始数据osnanscreentimestamp
1条回答
网友
1楼 · 发布于 2024-09-26 18:17:22

首先由构造函数创建DataFrame

df1 = pd.DataFrame(df['EventLog'].values.tolist())
print (df1)
  OrderID                  TimeStamp       Screen StateVars Event
0       0  2019-09-24 10:17:48 +0000  Home_Screen               A
1       0  2019-09-24 10:17:38 +0000  Home_Screen               B
2       0  2019-09-24 10:17:35 +0000  Home_Screen               D
3       0  2019-09-24 10:17:33 +0000  Home_Screen               V
4       0  2019-09-24 10:17:32 +0000  Home_Screen               C

对于添加到原件:

df = df.join(df1)
print (df)

编辑:我认为有一些缺少的值,所以解决方法是将它们替换为空dict-最后创建缺少的值:

print (df)
                                            EventLog
0  {'OrderID': '0', 'TimeStamp': '2019-09-24 10:1...
1  {'Event': 'B', 'TimeStamp': '2019-09-24 10:17:...
2  {'OrderID': '0', 'TimeStamp': '2019-09-24 10:1...
3  {'Event': 'V', 'TimeStamp': '2019-09-24 10:17:...
4  {'OrderID': '0', 'TimeStamp': '2019-09-24 10:1...
5                                                NaN

df = pd.DataFrame([x if x ==x else {} for x in df['EventLog']])
print (df)
  OrderID                  TimeStamp       Screen StateVars Event
0       0  2019-09-24 10:17:48 +0000  Home_Screen               A
1       0  2019-09-24 10:17:38 +0000  Home_Screen               B
2       0  2019-09-24 10:17:35 +0000  Home_Screen               D
3       0  2019-09-24 10:17:33 +0000  Home_Screen               V
4       0  2019-09-24 10:17:32 +0000  Home_Screen               C
5     NaN                        NaN          NaN       NaN   NaN

另一种解决方案:

a=df['EventLog'].values.tolist()
a = [x for x in a if x == x]
empty_df=pd.DataFrame()
for i in range(0, len(a)):
    b=a[i]
    for j in range(0, len(b)):
        c=b[j]
        empty_df=empty_df.append(c, ignore_index=True, sort=False)
df = df.join(empty_df)

相关问题 更多 >

    热门问题