<p>您可以使用来自<code>pyjanitor</code>的<code>pivot_longer</code>;对于这种情况,将正则表达式传递给<code>names_pattern</code>,并在<code>names_to</code>中传递新列名:</p>
<pre><code># pip install pyjanitor
import janitor
import pandas as pd
df.pivot_longer(index='refnum',
names_to=['year', 'REV', 'GP'],
names_pattern=['^y\d$', '.*rev$', '.*gp$']
)
refnum year REV GP
0 10001 2021 300 200
1 10002 2020 300 200
2 10003 2021 300 200
3 10001 2022 100 600
4 10002 2021 200 500
5 10003 2022 500 500
6 10001 2023 300 300
7 10002 2022 300 300
8 10003 2023 300 300
</code></pre>
<p>如果希望包含基准年,可以在使用<code>pivot_longer</code>之前修改以数字结尾的列标签:</p>
<pre><code>(df.rename(columns = lambda col: f"{col}YEAR"
if col.endswith(('1','2','3'))
else col)
.pivot_longer(index='refnum',
names_to= ("Base Year", ".value"),
names_pattern=r".(\d)(.+)",
sort_by_appearance=True)
)
refnum Base Year YEAR rev gp
0 10001 1 2021 300 200
1 10001 2 2022 100 600
2 10001 3 2023 300 300
3 10002 1 2020 300 200
4 10002 2 2021 200 500
5 10002 3 2022 300 300
6 10003 1 2021 300 200
7 10003 2 2022 500 500
8 10003 3 2023 300 300
</code></pre>
<p>与<code>.value</code>相关联的标签保留为列标题,而其余标签则集中到一个新列(<code>base year</code>)</p>