从匹配regex numpy fromregex的行加载数字数据

# random line of text # random line of text # keywordtomatch wordtext, wordtext, wordtext with spaces, 751791.000000, text with alphanumeric characters # keywordtomatch wordtext, wordtext, wordtext with spaces, 751791.000000, text with alphanumeric characters # random line of text # random line of text

1条回答

网友

1楼 · 发布于 2024-09-29 23:26:19

从小处开始：

In [58]: txt = "# keywordtomatch wordtext, wordtext, wordtext with spaces, 751791.000000, text with alphanumeric characters"
In [59]: re.match('# keywordtomatch (\w+)', txt[3])
In [60]: re.match('# keywordtomatch (\w+)', txt)
Out[60]: <_sre.SRE_Match object; span=(0, 25), match='# keywordtomatch wordtext'>
In [64]: _.groups()
Out[64]: ('wordtext',)

让我们简化文本：

^{pr2}$

现在将txt行复制到一个文件中

In [80]: dt = np.dtype('U10,U10,int,U10')
In [81]: np.fromregex('stack42659805.txt', pat, dtype=dt)
Out[81]: 
array([('word', 'another', 123, 'word')], 
      dtype=[('f0', '<U10'), ('f1', '<U10'), ('f2', '<i4'), ('f3', '<U10')])

它适用于多行匹配的行，跳过不匹配的行

所以剩下的问题就是找出一个正确的模式。在

概括一下：

In [89]: re.match('start (\w+), ([\w ]+), ([\d\.]+), (\w+)', 'start one, two words, 3.4, four')
Out[89]: <_sre.SRE_Match object; span=(0, 31), match='start one, two words, 3.4, four'>
In [90]: _.groups()
Out[90]: ('one', 'two words', '3.4', 'four')

In [91]: pat = 'start (\w+), ([\w ]+), ([\d\.]+), (\w+)'
In [95]: dt = np.dtype('U10,U20,float,U10')
In [96]: np.fromregex('stack42659805.txt', pat, dtype=dt)
Out[96]: 
array([('word', 'another',  123.  , 'word'),
       ('word', 'another word',  123.  , 'word'),
       ('word', 'another and more',  123.43, 'word')], 
      dtype=[('f0', '<U10'), ('f1', '<U20'), ('f2', '<f8'), ('f3', '<U10')])

相关问题更多 >

编程相关推荐

热门问题

热门文章

从匹配regex numpy fromregex的行加载数字数据

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >