如何在dataframe单元格中提取字符串的一部分，并使用该字符串创建一个新列

Comments Image 0 Row 1 Ch475 Vi 17.0V BF27 Sclk 100ns 1in24 24segs 0 1 Row 1 Ch475 Vi 17.0V BF27 Sclk 100ns 1in24 24segs 0 2 Row 1 Ch475 Vi 17.0V BF27 Sclk 100ns 1in24 24segs 0 3 Row 1 Ch475 Vi 17.0V BF27 Sclk 100ns 1in24 24segs 0 4 Row 1 Ch475 Vi 17.0V BF27 Sclk 100ns 1in24 24segs 0 .. ... ... 706 Row 2 Ch475 Vi 17.5V BF27 Sclk 100ns 1in24 24segs 0 707 Row 2 Ch475 Vi 17.5V BF27 Sclk 100ns 1in24 24segs 0 708 Row 2 Ch475 Vi 17.5V BF27 Sclk 100ns 1in24 24segs 0 709 Row 2 Ch475 Vi 17.5V BF27 Sclk 100ns 1in24 24segs 0 710 Row 2 Ch475 Vi 17.5V BF27 Sclk 100ns 1in24 24segs 0

import pandas as pd import numpy as np path = "/Users/.../Desktop/tk_gui_grid/" file = "orig_data.txt" filepath = path+file df = pd.read_csv(filepath, sep='\t', lineterminator='\r') com = df.loc[:,['Comments']] dfLen = len(com) image = [0]*dfLen com['Image'] = image print(com)

2条回答

网友

1楼 · 编辑于 2024-06-29 00:53:08

下面是一个使用带有命名捕获组的正则表达式的快速解决方案

regex对`split`:

有些人评论说，regex不是必需的，这是一个真实的说法。然而，从数据验证的角度来看，使用正则表达式有助于防止“散乱”数据悄悄进入。使用'blind'split()函数拆分（字符）上的数据；但是如果源数据已更改怎么办？函数split对此视而不见。然而，使用正则表达式将有助于突出一个问题，因为模式根本不匹配。是的，您可能会收到一条错误消息，但这是一件好事，因为您将收到数据格式更改的警报，从而有机会解决问题或更新正则表达式模式

源数据：

模拟其他行以进行演示

0    Row 1 Ch475 Vi 17.0V BF27 Sclk 100ns 1in24 24segs
1    Row 2 Ch475 Vi 17.1V BF27 Sclk 101ns 1in24 25segs
2    Row 3 Ch475 Vi 17.2V BF27 Sclk 102ns 1in24 26segs
3    Row 4 Ch475 Vi 17.3V BF27 Sclk 103ns 1in24 27segs
4    Row 5 Ch475 Vi 17.4V BF27 Sclk 104ns 1in24 28segs

代码：

import pandas as pd
import re

path = './orig_data.txt'
cols = ['rownumber', 'volts', 'wfm', 'sclk', 'image', 'segment']
exp = re.compile(r'^\d+\s+Row\s'
                 r'(?P<rownumber>\d+).*\s'
                 r'(?P<volts>\d+\.\d+)V\s'
                 r'(?P<wfm>\w+)\sSclk\s'
                 r'(?P<sclk>\d+)ns\s'
                 r'(?P<image>\w+)\s'
                 r'(?P<segment>\d+)segs.*$')

df = pd.read_csv(path, sep='|', header=None, names=['comment'])
df[cols] = df['comment'].str.extract(exp, expand=True)

输出：

                                             comment rownumber volts   wfm  \
0  0    Row 1 Ch475 Vi 17.0V BF27 Sclk 100ns 1in2...         1  17.0  BF27   
1  1    Row 2 Ch475 Vi 17.1V BF27 Sclk 101ns 1in2...         2  17.1  BF27   
2  2    Row 3 Ch475 Vi 17.2V BF27 Sclk 102ns 1in2...         3  17.2  BF27   
3  3    Row 4 Ch475 Vi 17.3V BF27 Sclk 103ns 1in2...         4  17.3  BF27   
4  4    Row 5 Ch475 Vi 17.4V BF27 Sclk 104ns 1in2...         5  17.4  BF27   

  sclk  image segment  
0  100  1in24      24  
1  101  1in24      25  
2  102  1in24      26  
3  103  1in24      27  
4  104  1in24      28

网友

2楼 · 编辑于 2024-06-29 00:53:08

您需要将序列obj转换为字符串，然后将其拆分。之后，您可以通过其索引访问每个元素

df['Comments'].str.split(' ')

0    [Row, 1, Ch475, Vi, 17.0V, BF27, Sclk, 100ns, ...

df['Comments'].str.split(' ').str[0]

Out[7]: 
0    Row

df['Comments'].str.split(' ').str[4]

Out[8]: 
0    17.0V

如果您了解如何从拆分中访问每一列，则可以将其分配到数据帧中的新行，例如：

df['RowNumber'] = df['Comments'].str.split(' ').str[1]
df['Volts'] = df['Comments'].str.split(' ').str[4]

regex对`split`:

源数据：

代码：

输出：

相关问题更多 >

编程相关推荐

热门问题

热门文章