将电话号码和字符串解析为datafram中的新列

2024-06-16 16:07:09 发布

您现在位置:Python中文网/ 问答频道 /正文

我在一列中有一个地址列表address,如何将电话号码和餐厅类别解析为新的列?我的数据框是这样的

  address
0 Arnie Morton's of Chicago 435 S. La Cienega Blvd. Los Angeles 310-246-1501 Steakhouses                                                                    
1 Art's Deli 12224 Ventura Blvd. Studio City 818-762-1221 Delis                                                                                             
2 Bel-Air Hotel 701 Stone Canyon Rd. Bel Air 310-472-1211 French Bistro 

我想去哪里

  address | phone_number | category
0 Arnie Morton's of Chicago 435 S. La Cienega Blvd. Los Angeles | 310-246-1501 | Steakhouses                                                                    
1 Art's Deli 12224 Ventura Blvd. Studio City | 818-762-1221 | Delis                                                                                             
2 Bel-Air Hotel 701 Stone Canyon Rd. Bel Air | 310-472-1211 | French Bistro 

有人有什么建议吗?你知道吗


Tags: ofaddressairlamortonbelartchicago
2条回答

使用str.extractstr.split:

  1. 我们为phone_number提取模式numbers dash numbers dash numbers
  2. 我们在模式3 numbers followed by a space上拆分,然后抓住它后面的部分category。我们为此使用positive lookbehind,这在regex中是?<=
df['phone_number'] = df['address'].str.extract('(\d+-\d+-\d+)')
df['category'] = df['address'].str.split('(?<=\d{3})\s').str[-1]

输出

                                                                                  address  phone_number       category
0  Arnie Morton's of Chicago 435 S. La Cienega Blvd. Los Angeles 310-246-1501 Steakhouses  310-246-1501    Steakhouses
1                           Art's Deli 12224 Ventura Blvd. Studio City 818-762-1221 Delis  818-762-1221          Delis
2                   Bel-Air Hotel 701 Stone Canyon Rd. Bel Air 310-472-1211 French Bistro  310-472-1211  French Bistro

尝试将Regex与str.extract一起使用。你知道吗

例如:

df = pd.DataFrame({'address':["Arnie Morton's of Chicago 435 S. La Cienega Blvd. Los Angeles 310-246-1501 Steakhouses", 
                              "Art's Deli 12224 Ventura Blvd. Studio City 818-762-1221 Delis",
                              "Bel-Air Hotel 701 Stone Canyon Rd. Bel Air 310-472-1211 French Bistro"]})
df[["address", "phone_number", "category"]] = df["address"].str.extract(r"(?P<address>.*?)(?P<phone_number>\b\d{3}\-\d{3}\-\d{4}\b)(?P<category>.*$)")
print(df)

输出:

                                             address  phone_number  \
0  Arnie Morton's of Chicago 435 S. La Cienega Bl...  310-246-1501   
1        Art's Deli 12224 Ventura Blvd. Studio City   818-762-1221   
2        Bel-Air Hotel 701 Stone Canyon Rd. Bel Air   310-472-1211   

         category  
0     Steakhouses  
1           Delis  
2   French Bistro  

注意::假设地址的内容总是address phone_number category

相关问题 更多 >