如何将Azure Backup report Duration列转换为带小数的日期时间

appended_data['Backup Size'] = appended_data['Backup Size'].str.replace('MB','') appended_data['DurationFixed'] = pd.to_timedelta(df['Duration'].str.split(':',expand=True)\ .stack()\ .astype(float)\ .round()\ .astype(int).astype(str).unstack(1).fillna('00').agg(':'.join,axis=1), unit='s') appended_data['DurationHours'] = appended_data['DurationFixed'] / np.timedelta64(1,'h') appended_data['Duration'] 1 04:01:22.7756139 1 03:31:17.0678262 1 04:41:32.7253765 1 03:11:18.3396588 1 04:51:20.2017034 ... 1 02:21:17.8554095 1 02:21:19.5547075 1 03:41:23.8876812 1 02:21:32.5529160 1 02:01:20.3247238 appended_data['DurationFixed'] 1 02:01:20 1 02:01:20 1 02:01:20 1 02:01:20 1 02:01:20 ... 1 02:01:20 1 02:01:20 1 02:01:20 1 02:01:20 1 02:01:20

2条回答

网友

1楼 · 编辑于 2024-10-02 16:26:42

这似乎是一个奇怪的错误，因为我从未见过azure中有这样的日志——不管怎样，除非有某种内置方法来处理这样的数据，否则我们需要手动解析它

我们将按:进行拆分，然后在重新创建时间增量字符串之前对数字进行四舍五入

我必须清楚地说，这不是一个真正的修复，因为你需要限定1.05是什么，是1小时x分钟吗

如果你不在乎上面的，那么下面的就行了

方法1无精度，字符串格式

print(df)

             Duration
0  1.05:27:39.9470724
1             21:17.7
2             21:41.4
3  1.02:42:37.1136811
4             21:17.2

df['DurationFixed'] = pd.to_timedelta(df['Duration'].str.split(':',expand=True)\
                    .stack()\
                    .astype(float)\
                    .round()\
                    .astype(int).astype(str).unstack(1).fillna('00').agg(':'.join,axis=1),
               unit='s')
                

print(df)

           Duration DurationFixed
0  1.05:27:39.9470724      01:27:40
1             21:17.7      21:18:00
2             21:41.4      21:41:00
3  1.02:42:37.1136811      01:42:37
4             21:17.2      21:17:00

如果您只需要几个小时，您可以使用np.timedelta64转换它

import numpy as np

df['DurationFixed'] / np.timedelta64(1,'h')
0     1.461111
1    21.300000
2    21.683333
3     1.710278
4    21.283333
Name: DurationFixed, dtype: float64

方法2更精确

如果您的数据格式相同-即Hours : Minutes : Seconds

我们可以堆叠并应用累积计数和映射元数据字段，以便在行级别使用我们的pd.to_timedelta

s = df['Duration'].str.split(':',expand=True)\
                    .stack()\
                    .astype(float).to_frame('time_delta')

print(s)

     time_delta
0 0   1.050000
  1  27.000000
  2  39.947072
1 0  21.000000
  1  17.700000
2 0  21.000000
  1  41.400000
3 0   1.020000
  1  42.000000
  2  37.113681
4 0  21.000000
  1  17.200000

s['metadata'] = s.groupby(level=0).cumcount().map({0 : 'h', 1 : 'm', 2 : 's' })

print(s)

    time_delta metadata
0 0   1.050000        h
  1  27.000000        m
  2  39.947072        s
1 0  21.000000        h
  1  17.700000        m
2 0  21.000000        h
  1  41.400000        m
3 0   1.020000        h
  1  42.000000        m
  2  37.113681        s
4 0  21.000000        h
  1  17.200000        m

最后，我们在行级别使用apply将每一行转换为其代表格式，并四舍五入到最接近的n秒。我选了10个

df['DurationPrecise'] = s.apply(lambda x : pd.to_timedelta(x.time_delta,
                                x.metadata,errors='coerce'),axis=1)\          
              .groupby(level=0).sum().dt.round('10s')


print(df)

             Duration DurationFixed DurationPrecise
0  1.05:27:39.9470724      01:27:40        01:30:40
1             21:17.7      21:18:00        21:17:40
2             21:41.4      21:41:00        21:41:20
3  1.02:42:37.1136811      01:42:37        01:43:50
4             21:17.2      21:17:00        21:17:10

网友

2楼 · 编辑于 2024-10-02 16:26:42

根据对数据的分析，我可以得出结论，hh部分的小数位实际上是天。示例2.4:30:30=2天4小时30分钟30秒

def cleanhours(x):
    hms=x.split(":")
    dh=hms[0].split(".")
    if len(dh)>1:
        hms[0]=str(int(dh[-1])+24*int(dh[-2]))
    hms[2] = hms[2].split(".")[0]
    return int(hms[0])+int(hms[1])/60.0+int(hms[2])/3600.0
#     return ":".join(hms)

方法1无精度，字符串格式

方法2更精确

相关问题更多 >

编程相关推荐

热门问题

热门文章