如何将dataframe的列值拆分为多个列问题的回答

如何将dataframe的列值拆分为多个列

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

正如评论部分所建议的，正则表达式在这里应该工作得很好 <h2>数据帧示例：</h2> <pre><code>>>> df PLUGS\nDESIGN\nGEAR 0 700\nDaewoo 8000 Gearless 1 300\nHyundai 4400 Gearless 2 600\nSTX 2600 Gearless 3 200\nB170 \nGeared 4 362 Wenchong 1700 Mk II \nGeared 5 252\nRichMax 1550 Gearless 6 220\nCV 1100 Plus \nGeared 7 232\nOrskov Mk VII Gearless 8 119\nKouan 1000 Gearless 9 100\nHanjin 700 Gearless </code></pre> 只需从列名中删除换行符，即可使可读性易于使用 <pre><code>>>> df.columns = df.columns.str.replace(r"\\n", " ", regex=True) </code></pre> 现在，列名没有任何特殊的汽车： <pre><code>>>> df PLUGS DESIGN GEAR 0 700\nDaewoo 8000 Gearless 1 300\nHyundai 4400 Gearless 2 600\nSTX 2600 Gearless 3 200\nB170 \nGeared 4 362 Wenchong 1700 Mk II \nGeared 5 252\nRichMax 1550 Gearless 6 220\nCV 1100 Plus \nGeared 7 232\nOrskov Mk VII Gearless 8 119\nKouan 1000 Gearless 9 100\nHanjin 700 Gearless </code></pre> 现在，我们可以使用<a href="https://pandas.pydata.org/docs/reference/api/pandas.Series.str.extract.html" rel="nofollow noreferrer">pandas.Series.str.extract</a>。使用<code>regex</code>方法时，所有命名组<code>()</code>将成为结果中的列名 由于，命名组将成为具有预定义名称的列，如<code>0,1,2</code>，因此我们可以使用所需名称对它们进行重命名，以获得所需结果，如下所示： <pre><code>>>> df = df['PLUGS DESIGN GEAR'].str.extract(r"^(\d+)[\\n\s]+([^\\]+)[\\n\s]+([\\n|^Gear][a-z]+)").rename(columns={0: 'PLUGS', 1: 'DESIGN', 2: 'GEAR'}) </code></pre> <h2>结果:</h2> <pre><code>>>> print(df) PLUGS DESIGN GEAR 0 700 Daewoo 8000 Gearless 1 300 Hyundai 4400 Gearless 2 600 STX 2600 Gearless 3 200 B170 Geared 4 362 Wenchong 1700 Mk II Geared 5 252 RichMax 1550 Gearless 6 220 CV 1100 Plus Geared 7 232 Orskov Mk VII Gearless 8 119 Kouan 1000 Gearless 9 100 Hanjin 700 Gearless </code></pre> 正则表达式解释： 你可以在<a href="https://regex101.com/" rel="nofollow noreferrer">regex101.com</a>查看 <pre><code>(\d+)[\\n\s]+([^\\]+)[\\n\s]+([\|^Gear][a-z]+) </code></pre> 第一个捕获组（\d+） <pre><code> \d matches a digit (equivalent to [0-9]) + matches the previous token between one and unlimited times, as many times as possible, giving back as needed (greedy) Match a single character present in the list below [\\n\s] + matches the previous token between one and unlimited times, as many times as possible, giving back as needed (greedy) \\ matches the character \ literally (case sensitive) n matches the character n literally (case sensitive) \s matches any whitespace character (equivalent to [\r\n\t\f\v ]) </code></pre> 第二捕获组（[^\]+） <pre><code> Match a single character not present in the list below [^\\] + matches the previous token between one and unlimited times, as many times as possible, giving back as needed (greedy) \\ matches the character \ literally (case sensitive) Match a single character present in the list below [\\n\s] + matches the previous token between one and unlimited times, as many times as possible, giving back as needed (greedy) \\ matches the character \ literally (case sensitive) n matches the character n literally (case sensitive) \s matches any whitespace character (equivalent to [\r\n\t\f\v ]) </code></pre> 第三捕获组（[^Gear][a-z]+） <pre><code>Match a single character present in the list below [\|^Gear] \| matches the character | literally (case sensitive) ^Gear matches a single character in the list ^Gear (case sensitive) Match a single character present in the list below [a-z] + matches the previous token between one and unlimited times, as many times as possible, giving back as needed (greedy) a-z matches a single character in the range between a (index 97) and z (index 122) (case sensitive) Global pattern flags g modifier: global. All matches (don't return after first match) m modifier: multi line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string) </code></pre>

如何将dataframe的列值拆分为多个列

1 个回答

相关Python问题