如果一列中的文本包含特定的字符串模式,那么如何创建新列?

2024-09-29 22:18:31 发布

您现在位置:Python中文网/ 问答频道 /正文

我当前的数据如下所示

+-------+----------------------------+-------------------+-----------------------+
| Index |             0              |         1         |           2           |
+-------+----------------------------+-------------------+-----------------------+
|     0 | Reference Curr             | Daybook / Voucher | Invoice Date Due Date |
|     1 | V50011 Tech Comp           | nan               | Phone:0177222222      |
|     2 | Regis Place                | nan               | Fax:017757575789      |
|     3 | Catenberry                 | nan               | nan                   |
|     4 | Manhattan, NY              | nan               | nan                   |
|     5 | V7484 Pipe                 | nan               | Phone:                   |
|     6 | Japan                      | nan               | nan                   |
|     7 | nan                        | nan               | nan                   |
|     8 | 4543.34GBP (British Pound) | nan               | nan                   |
+-------+----------------------------+-------------------+-----------------------+

我正在尝试创建一个新列df['Company'],它应该包含df[0]中的内容,如果它以“V”开头,并且df[2]中有“Phone”。如果条件不满足,那么它可以是nan。下面是我要找的

+-------+----------------------------+-------------------+-----------------------+------------+
| Index |             0              |         1         |           2           | Company    |
+-------+----------------------------+-------------------+-----------------------+------------+
|     0 | Reference Curr             | Daybook / Voucher | Invoice Date Due Date | nan        |
|     1 | V50011 Tech                | nan               | Phone:0177222222      |V50011 Tech |
|     2 | Regis Place                | nan               | Fax:017757575789      | nan        |
|     3 | Catenberry                 | nan               | nan                   | nan        |
|     4 | Manhattan, NY              | nan               | nan                   | nan        |
|     5 | V7484 Pipe                 | nan               | Phone:                | V7484 Pipe |
|     6 | Japan                      | nan               | nan                   | nan        |
|     7 | nan                        | nan               | nan                   | nan        |
|     8 | 4543.34GBP (British Pound) | nan               | nan                   | nan        |
+-------+----------------------------+-------------------+-----------------------+------------+

我正在尝试下面的脚本,但我得到一个错误ValueError: Wrong number of items passed 1420,位置意味着1

df['Company'] = pd.np.where(df[2].str.contains("Ph"), df[0].str.extract(r"(^V[A-Za-z0-9]+)"),"stop")

我将“stop”作为else部分,因为我不知道在不满足条件时如何让python使用nan

我还希望能够解析出df[0]的一个部分,例如,仅解析v5001部分,而不解析其余的单元格内容。我使用AMCs答案尝试了类似的方法,但出现了一个错误:

df.loc[df[0].str.startswith('V') & df[2].str.contains('Phone'), 'Company'] = df[0].str.extract(r"(^V[A-Za-z0-9]+)")

多谢各位


Tags: dfdateindexphonenantechcompanyreference
3条回答

一个潜在的解决方案是使用列表理解。你可能会得到一个速度提升使用熊猫的一些内置功能,但这将使你达到那里

#!/usr/bin/env python

import numpy as np
import pandas as pd

df = pd.DataFrame({
    0:["reference", "v5001 tech comp", "catenberry", "very different"],
    1:["not", "phone", "other", "text"]
    })

df["new_column"] = [x  if (x[0].lower() == "v") & ("phone" in y.lower())
  else np.nan for x,y in df.loc[:, [0,1]].values]

print(df)

那会产生什么

                 0      1       new_column
0        reference    not              NaN
1  v5001 tech comp  phone  v5001 tech comp
2       catenberry  other              NaN
3   very different   text              NaN

我所做的就是接受你的两个条件,建立一个新的列表,然后分配给你的新专栏

您没有为我们提供一种简单的方法来测试潜在的解决方案,但这应该可以完成这项工作:

df.loc[df[0].str.startswith('V') & df[2].str.contains('Phone'), 'Company'] = df[0]

这是另一种获得结果的方法

condition1=df['0'].str.startswith('V')
condition2=df['2'].str.contains('Phone')

df['Company']=np.where((condition1 & condition2), df['0'],np.nan)
df['Company']=df['Company'].str.split(' ',expand=True)

相关问题 更多 >

    热门问题