<p>这是一个完全矢量化(无<code>for</code>循环)的解决方案。其思想是创建一个包含所有日期列表的临时列,然后将其展开成行。<code>expand_column</code>函数基于<a href="https://stackoverflow.com/a/27266225/304209">this answer</a>。你知道吗</p>
<pre><code>df = pd.DataFrame([['2015-01-01', 1, 1200, 'CB04 box', 'USD'],
['2015-01-01', 3, 1500, 'AB01 box', 'USD'],
['2015-01-02', 2, 550, 'CB03 box', 'USD'],
], columns=['hold_date', 'day_count', 'qty', 'item', 'ccy'])
range_col = lambda row: list(pd.date_range(start=pd.to_datetime(row.hold_date), periods=row.day_count))
df = df.assign(hold_date=df.apply(range_col, axis=1))
expand_column(df, 'hold_date')[['hold_date', 'qty', 'item', 'ccy']]
hold_date qty item ccy
0 2015-01-01 1200 CB04 box USD
1 2015-01-01 1500 AB01 box USD
1 2015-01-02 1500 AB01 box USD
1 2015-01-03 1500 AB01 box USD
2 2015-01-02 550 CB03 box USD
2 2015-01-03 550 CB03 box USD
def expand_column(dataframe, column):
"""Transform iterable column values into multiple rows.
Source: https://stackoverflow.com/a/27266225/304209.
Args:
dataframe: DataFrame to process.
column: name of the column to expand.
Returns:
copy of the DataFrame with the following updates:
* for rows where column contains only 1 value, keep them as is.
* for rows where column contains a list of values, transform them
into multiple rows, each of which contains one value from the list in column.
"""
tmp_df = dataframe.apply(
lambda row: pd.Series(row[column]), axis=1).stack().reset_index(level=1, drop=True)
tmp_df.name = column
return dataframe.drop(column, axis=1).join(tmp_df)
</code></pre>