Python Pandas比较两个数据帧以将国家/地区分配给phone numb

phonenumber, add_info, country, order_info 34123425209, info1, Spain, 1 92654321762, info2, Pakistan, 4 12018883637, info3, USA, 2 6323450001, info4, Philippines, 3 496789521134, info5, Germany, 4

#! /usr/bin/python import csv import pandas with open ('longlist.csv','r') as lookuplist: with open ('country_list.csv','r') as inputlist: with open('Outputfile.csv', 'w') as outputlist: reader = csv.reader(lookuplist, delimiter=',') reader2 = csv.reader(inputlist, delimiter=';') writer = csv.writer(outputlist, dialect='excel') for i in reader2: for xl in reader: if xl[0].startswith(i[1]): zeile = [xl[0], xl[1], i[0], i[1], i[2]] writer.writerow(zeile) lookuplist.seek(0)

import pandas as pd, numpy as np longlist = pd.read_csv('path/to/longlist.csv', usecols=[2,3], names=['PHONENUMBER','ADD_INFO']) country_list = pd.read_csv('path/to/country_list.csv', sep=';', names=['COUNTRY','COUNTRY_CODE','ORDER_INFO'], skiprows=[0]) # remove duplicates and make phone number an index longlist = longlist.drop_duplicates('PHONENUMBER') longlist = longlist.set_index('PHONENUMBER') # Sort country list, from high to low value and make country code an index country_list=country_list.sort_values(by='COUNTRY_CODE', ascending=0) country_list=country_list.set_index('COUNTRY_CODE') (...) longlist.to_csv('path/to/output.csv')

1条回答

网友

1楼 · 发布于 2024-07-05 15:11:21

我会这样做：

cl = pd.read_csv('country_list.csv', sep=';', dtype={'country_code':str})
ll = pd.read_csv('phones.csv', skipinitialspace=True, dtype={'phonenumber':str})

lookup = cl['country_code']
lookup.index = cl['country_code']

ll['country_code'] = (
    ll['phonenumber']
    .apply(lambda x: pd.Series([lookup.get(x[:4]), lookup.get(x[:3]),
                                lookup.get(x[:2]), lookup.get(x[:1])]))
    .apply(lambda x: x.get(x.first_valid_index()), axis=1)
)

# remove `how='left'` parameter if you don't need "unmatched" phone-numbers    
result = ll.merge(cl, on='country_code', how='left')

输出：

^{pr2}$

说明：

In [216]: (ll['phonenumber']
   .....:   .apply(lambda x: pd.Series([lookup.get(x[:4]), lookup.get(x[:3]),
   .....:                               lookup.get(x[:2]), lookup.get(x[:1])]))
   .....: )
Out[216]:
      0     1     2     3
0  None  None    34  None
1  None  None    92  None
2  None  None  None     1
3  1242  None  None     1
4  None  None    63  None
5  None  None    49  None
6  None  None  None  None

在电话.csv：-我故意添加了一个巴哈马号码（1242...）和一个无效号码（00000000000）

phonenumber, add_info
34123425209, info1
92654321762, info2
12018883637, info3
12428883637, info31
6323450001, info4
496789521134, info5
00000000000, BAD

相关问题更多 >

编程相关推荐

热门问题

热门文章