大Pandas中的python vlookup应用%LIKE%

2024-05-19 10:23:36 发布

您现在位置:Python中文网/ 问答频道 /正文

我是Python新手,正在尝试连接两个CSV文件(以“;”分隔)

CSV1
Sender;Recipient
Adam;123
Alex;234
John;123
Adam;888

CSV2
Name;Phone
Winnie;123,234,456
Celeste;777,888,999

预期输出:

Sender;Recipient;RecipientName
Adam;123;Winnie
Alex;234;Winnie
John;123;Winnie
Adam;888;Celeste

CSV2中的Phone用逗号分隔。所以当我匹配的时候,我需要做一些搜索或者%LIKE%。你知道吗

我知道我可以用join来做一个vlookup类型,但是我怎样才能实现%LIKE%?你知道吗


Tags: 文件csvnamephonejohnsenderalex新手
3条回答

通过Series解决^{}

from  itertools import chain

#split values by `,` to lists
lens = df2['Phone'].str.split(',')
#if some zero list remove it
df2 = df2.dropna(subset=['Phone'])

#explode Names by length of lists, flat values by chain.from_iterable
s = pd.Series(np.repeat(df2.Name.values, lens), 
              index= list(chain.from_iterable(df2.Phone.values)))
#convert index to int for match
s.index = s.index.astype(int)
print (s)
123     Winnie
234     Winnie
456     Winnie
777    Celeste
888    Celeste
999    Celeste
dtype: object

#map values to new column
df1['RecipientName'] = df1['Recipient'].map(s)
print(df1)
  Sender  Recipient RecipientName
0   Adam        123        Winnie
1   Alex        234        Winnie
2   John        123        Winnie
3   Adam        888       Celeste

#write to csv
df.to_csv('out.csv', sep=';', header=None)

Sender;Recipient;RecipientName
Adam;123;Winnie
Alex;234;Winnie
John;123;Winnie
Adam;888;Celeste

^{}的解决方案类似:

df2['Phone'] = df2['Phone'].str.split(',')
df2 = df2.dropna(subset=['Phone'])

s = pd.Series(np.repeat(df2.Name.values, df2.Phone.str.len()), 
              index= list(chain.from_iterable(df2.Phone.values)))
s.index = s.index.astype(int)
s.name = 'RecipientName'
print (s)

df1 = df1.join(s, on='Recipient')
print(df1)
  Sender  Recipient RecipientName
0   Adam        123        Winnie
1   Alex        234        Winnie
2   John        123        Winnie
3   Adam        888       Celeste

编辑:

我的数据示例:

import pandas as pd
from pandas.compat import StringIO

temp=u"""
Sender;Recipient
Adam;123
Alex;234
John;123
Adam;888"""
#after testing replace 'StringIO(temp)' to 'filename.csv'
df1 = pd.read_csv(StringIO(temp), sep=";")
print (df1)
  Sender  Recipient
0   Adam        123
1   Alex        234
2   John        123
3   Adam        888

temp=u"""
Name;Phone
Winnie;123,234,456
Celeste;777,888,999"""
#after testing replace 'StringIO(temp)' to 'filename.csv'
df2 = pd.read_csv(StringIO(temp), sep=";")
print (df2)
      Name        Phone
0   Winnie  123,234,456
1  Celeste  777,888,999
  • 使用str.splitPhone列转换为列表
  • 使用str.len()查找每个列表的长度。我们将使用它来分解'Name'
  • 把所有的清单放在一起。确保过滤掉长度为零的列表
  • 使用repeat分解'Name'
  • 创建一个字典,其中键是电话号码,值是姓名
  • 创建一个d1的副本,在这里我们使用map和我们创建的新字典添加了新列。你知道吗

p = d2.Phone.str.split(',')
p = p[p.astype(bool)]
l = p.str.len()
p2 = np.concatenate(p.values).astype(int)
nm = d2.Name.repeat(l)
m = dict(zip(p2, nm))

df = d1.assign(RecipientName=d1.Recipient.map(m))
print(df)

  Sender  Recipient RecipientName
0   Adam        123        Winnie
1   Alex        234        Winnie
2   John        123        Winnie
3   Adam        888       Celeste

df.to_csv('out.csv', sep=';', header=None)

Sender;Recipient;RecipientName
Adam;123;Winnie
Alex;234;Winnie
John;123;Winnie
Adam;888;Celeste

下面是一些伪代码和如何做到这一点的想法。你知道吗

我将从分析CSV2文件开始。跳过第一行,然后为下面的行解析出姓名和电话号码,然后维护一个字典,其中的姓名与每个电话号码相关联。你知道吗

numbers_to_names = {}
for line in open("csv2", "r").splitlines():
    name, phone_numbers = line.split(";")
    for phone_number in phone_numbers.split(","):
        numbers_to_names[phone_number] = name

然后当再次通过CSV1时,跳过第一行,然后解析出发送者和接收者,并与以前的字典结果相结合。你知道吗

for line in open("csv1", "r").splitlines():
    sender, recipient = line.split(";")
    print "%s;%s;%s" % (sender, recipient, numbers_to_names[recipient])

相关问题 更多 >

    热门问题