我如何仅使用python而不使用scala或创建可诱惑的东西来连接2个csv文件

2024-10-03 09:07:08 发布

您现在位置:Python中文网/ 问答频道 /正文

我对使用python的superfunnel和web属性还很陌生,我正在尝试找到解决问题的方法。我有2个csv文件(url和访问)

url.csv
short_code
full_url
time_created
user_id
premium_user
country

visits.csv
short_code
visit_time
browser_type
version
platform
ipaddress
country

我正在编写一个python代码来获取以下内容

1. Return urls which only have visitors from the same country as the url was created from

2. Get the URL with the shortest time between when the URL was created and when the first visit was recorded

3. Gets a count of visits to each short code by each unique visitor

下面是我的代码,它只是从我的云导入数据

链接到我的文件https://www.dropbox.com/s/u193iv6ybeges92/url.csv?dl=0https://www.dropbox.com/s/u3vmdra41p3qjgv/visits.csv?dl=0

import csv
import requests

from pprint import pprint


def same_country_only(visits, urls):
    """Return urls which only have visitors from the same country as the url was created from"""
    pass


def shortest_first_visit(visits, urls):
    """Get the URL with the shortest time between when the URL was created and when the first visit was recorded"""
    pass


def unique_visitors(visits, urls):
    """Gets a count of visits to each short code by each unique visitor"""
    pass


if __name__ == '__main__':
    urls_response = requests.get('<<my_url>>').text
    urls_dr = csv.DictReader(urls_response.splitlines(), delimiter=',')    
    urls = [dict(url) for url in urls_dr]
    pprint(urls[0]) # example format

    print('\n' + '*' * 60 + '\n')

    visits_response = requests.get('<<my_url>>').text
    visits_dr = csv.DictReader(visits_response.splitlines(), delimiter=',')    
    visits = [dict(visit) for visit in visits_dr]
    pprint(visits[0]) # example format

    print('\n' + '*' * 60 + '\n')

    pprint(same_country_only(visits, urls))
    pprint(shortest_first_visit(visits, urls))
    pprint(unique_visitors(visits, urls))

感谢您的帮助

示例Csv(第一列是标题)

网址.csv

id  short_cod   long_url    created_ti  creator_id  premium country
1   GTq6Bl  https://w   2018-07-2   78  FALSE   CA
2   EmazTI  https://as  2018-07-2   124 FALSE   GB
3   tT54Bl  https://bi  2018-07-2   97  FALSE   GBG4
4   6ZTSle  https://gi  2018-07-2   98  FALSE   US
5   3akWjJ  https://e   2018-07-2   11  FALSE   JP
6   m7NoUy  https://bl  2018-07-2   34  TRUE    JP
7   lszSBy  https://m   2018-07-2   90  FALSE   US
8   PnTavE  https://ha  2018-07-2   1   FALSE   GB
9   QkXxbV  https://d   2018-07-2   109 FALSE   CN

访问.csv

browser_t   visit_time  short_cod   country platform    ip_address
Chrome  2018-07-2   GTq6Bl  IT  Windows 78.110.51.215
Firefox 2018-07-2   GTq6Bl  IT  Linux   27.243.245.232
Chrome  2018-07-2   GTq6Bl  JP  Mac OS  97.155.155.73
Chrome  2018-07-2   GTq6Bl  RU  Linux   85.201.130.148
Chrome  2018-07-2   GTq6Bl  GB  Linux   26.90.189.168
Chrome  2018-07-2   GTq6Bl  CN  Android 58.203.242.175
Edge    2018-07-2   GTq6Bl  KR  Windows 84.11.120.228
Safari  2018-07-2   GTq6Bl  KR  iOS 46.72.81.132
Firefox 2018-07-2   GTq6Bl  IT  Linux   30.47.125.89F10
Safari  2018-07-2   GTq6Bl  CA  iOS 85.245.10.160
Firefox 2018-07-2   GTq6Bl  RU  Windows 43.13.144.48
Chrome  2018-07-2   GTq6Bl  IT  Android 65.74.182.22

Tags: csvthehttpsfalseurlvisitchromeurls