获取python中括号和特殊字符之间的字符串

2024-09-30 01:34:43 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个非常类似的问题

我真的很想知道为什么我的结果是:NaN

我有一个dataframe,在本专栏中:

Action
Player[J♡, K♧] won the $5.40 main pot with a Straight
Player [5, 2] won the $21.00 main pot with a flush

我想用这些牌建立一个新的专栏,这些牌是谁打的:

[J♡, K♧]
[5, 2]

甚至:

[J, K]
[5, 2]

但是,当我在regex上玩时,我使用: dfpot['cards'] = dfpot['Action'].str.extract(r'\[([A-Za-z0-9_]+)\]', expand=False)

我只得到了NaN


Tags: thedataframemainwithactionnanregexcards
3条回答

Try模式(我假设您在文本中使用(),而不是[],正如在regex demo中发布的那样):

\([^,]+,[^\)]+\)

说明:

\(-按字面意思匹配(

[^,]+-匹配除,以外的一个或多个字符

,-按字面意思匹配,

[^\)]+-匹配除)以外的一个或多个字符

\)-按字面意思匹配)

Regex demo

使用

>>> import pandas as pd
>>> df = pd.DataFrame({'Action':['Player[J♡, K♧] won the $5.40 main pot with a Straight', 'Player [5, 2] won the $21.00 main pot with a flush']})
>>> df['cards'] = df['Action'].str.findall(r'(\w+)(?=[^][]*])')
>>> df
                                              Action   cards
0  Player[J♡, K♧] won the $5.40 main pot with a S...  [J, K]
1  Player [5, 2] won the $21.00 main pot with a f...  [5, 2]
>>> 

正则表达式(\w+)(?=[^][]*])

解释

                                        
  (                        group and capture to \1:
                                        
    \w+                      word characters (a-z, A-Z, 0-9, _) (1 or
                             more times (matching the most amount
                             possible))
                                        
  )                        end of \1
                                        
  (?=                      look ahead to see if there is:
                                        
    [^][]*                   any character except: ']', '[' (0 or
                             more times (matching the most amount
                             possible))
                                        
    ]                        ']'
                                        
  )                        end of look-ahead

您可以将字符添加到捕获组中的字符类中,就像在模式\[([A-Za-z0-9_♤♡♢♧, ]+)\]中一样,或者使模式更具体一些:

\[([A-Za-z0-9_][♤♡♢♧]?,\s*[A-Za-z0-9_][♤♡♢♧]?)]

模式匹配:

  • \[匹配[
  • (捕获第1组
    • [A-Za-z0-9_]匹配列出的字符之一
    • [♤♡♢♧]?可选地匹配列出的字符之一
    • ,\s*[A-Za-z0-9_][♤♡♢♧]?匹配逗号和与逗号之前相同的逻辑
  • )关闭组1
  • ]匹配]

Regex demo

比如说

import pandas as pd

dfpot = pd.DataFrame({'Action':['Player[J♡, K♧] won the $5.40 main pot with a Straight', 'Player [5, 2] won the $21.00 main pot with a flush']})
dfpot['cards'] = dfpot['Action'].str.extract(r'\[([A-Za-z0-9_][♤♡♢♧]?,\s*[A-Za-z0-9_][♤♡♢♧]?)]', expand=False)
print(dfpot)

输出

                                              Action   cards
0  Player[J♡, K♧] won the $5.40 main pot with a S...  J♡, K♧
1  Player [5, 2] won the $21.00 main pot with a f...    5, 2

相关问题 更多 >

    热门问题