从多个列中的值生成行

value_id, area_id_number, value_type 1, 01293091302390, c000 2, 01293091302390, c000 3, 01293091302390, c001 4, 01293091302390, c001 5, 01293091302391, c000 6, 01293091302391, c000 7, 01293091302392, c000 8, 01293091302392, c000 9, 01293091302392, c000 10, 01293091302392, c001 11, 01293091302392, c002 ...

1条回答

网友

1楼 · 发布于 2024-09-26 18:15:16

主进程是由ndarray.repeat()计算的，我没有足够的内存来测试11M行，但下面是代码：

首先创建测试数据：

import numpy as np
import pandas as pd

#create sample data
nrows = 500000
ncols = 21

nones = int(70e6)
ntwos = int(20e6)
nthrees = int(10e6)

rint = np.random.randint

counts = np.zeros((nrows, ncols), dtype=np.int8)
counts[rint(0, nrows, nones), rint(0, ncols, nones)] = 1
counts[rint(0, nrows, ntwos), rint(0, ncols, ntwos)] = 2
counts[rint(0, nrows, nthrees), rint(0, ncols, nthrees)] = 3

columns = ["c%03d" % i for i in range(ncols)]
index = ["%014d" % i for i in range(nrows)]

df = pd.DataFrame(counts, index=index, columns=columns)

以下是流程代码：

idx, col = np.where(df.values)
n = df.values[idx, col]
idx2 = df.index.values[idx.repeat(n)]
col2 = df.columns.values[col.repeat(n)]
df2 = pd.DataFrame({"id":idx2, "type":col2})

相关问题更多 >

编程相关推荐

热门问题

热门文章

从多个列中的值生成行

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >