如何汇总列表中的所有外观并打印列表中的最大值

2024-09-28 19:28:14 发布

您现在位置:Python中文网/ 问答频道 /正文

您好,我有关于网页抓取的建议。如何从刮取的数据打印最大值、最小值和平均值?我也不知道如何将它与tiitle的外观联系起来。最终打印结果如下所示:

BMW - number of offerts: ..., max price:..., min price: ..., average price: ...

我用这些数据创建了一个列表,但我不知道如何对tiitle的外观求和,并从中计算max等值。 这是我的密码:


    for car in carList:
      

        title = car.find('a', class_='offer-title__link').text.strip()

        price = car.find('span', class_='offer-price__number').text.strip()


        lista = [title, price,]


        carFile.write(title + ',')
        carFile.write(price + ',')

        carFile.write('\n')

        print( lista)
        print(lista.count(title))

carFile.close()

现在我只数了一点点


Tags: 数据textnumbertitlefindcarpricemax
1条回答
网友
1楼 · 发布于 2024-09-28 19:28:14

如果您想分析数据,那么最好将所有数据都放在pandas.DataFrame

首先将[title, int(price)]添加到外部列表data

data = []

for page in range(1, last_page+1):

    # ... code ...

    for car in car_list:

         # ... code ...

         data.append( [title, int(price)] )

然后转换成DataFrame

df = pd.DataFrame(data, columns=['title', 'price'])

然后你可以分析它

    cars = df[ df['title'].str.contains("BMW") ]

    print('count:', len(cars))
    print('price min    :', cars['price'].min())
    print('price average:', cars['price'].mean())
    print('price max    :', cars['price'].max())   

你甚至可以在for循环中运行更多的汽车

for name in ['BMW', 'Audi', 'Opel', 'Mercedes']:

    print(' -', name, ' -')

    cars = df[ df['title'].str.contains(name) ]

    print('count:', len(cars))
    print('price min    :', cars['price'].min())
    print('price average:', cars['price'].mean())
    print('price max    :', cars['price'].max()) 

您甚至可以绘制价格直方图,以查看哪些价格更受欢迎

enter image description here

enter image description here

您甚至可以简单地将数据保存为csvexcel

df.to_csv('carData.csv')
df.to_excel('carData.xlsx')

基于先前代码的最小工作代码

它显示直方图,您必须关闭直方图才能看到下一个数据

import requests
import bs4
import pandas as pd
import matplotlib.pyplot as plt

url = 'https://www.otomoto.pl/osobowe/seg-sedan/?search%5Bfilter_float_price%3Afrom%5D=3000&search%5Bfilter_float_price%3Ato%5D=5000&search%5Bfilter_float_engine_capacity%3Afrom%5D=2000&search%5Border%5D=created_at%3Adesc&search%5Bbrand_program_id%5D%5B0%5D=&search%5Bcountry%5D='

response = requests.get(url)
response.raise_for_status()

# check how many pages are there
soup = bs4.BeautifulSoup(response.text, "lxml")
last_page = int(soup.select('.page')[-1].text)

print('last_page:', last_page)

data = []

for page in range(1, last_page+1):

    print(' - page:', page, ' -')

    response = requests.get(url + '&page=' + str(page))
    response.raise_for_status()
    
    soup = bs4.BeautifulSoup(response.text, 'lxml')
    all_offers = soup.select('article.offer-item')

    for offer in all_offers:
        # get the interesting data and write to file

        title = offer.find('a', class_='offer-title__link').text.strip()
        price = offer.find('span', class_='offer-price__number').text.strip().replace(' ', '').replace('\nPLN', '')

        item = [title, int(price)]
        data.append(item)
        print(item)

#  - work with data  -

df = pd.DataFrame(data, columns=['title', 'price'])
df.to_csv('carData.csv')
#df.to_excel('carData.xlsx')

for name in ['BMW', 'Audi', 'Opel', 'Mercedes']:
    print(' -', name, ' -')
    cars = df[ df['title'].str.contains(name) ]
    print('count:', len(cars))
    print('price min    :', cars['price'].min())
    print('price average:', cars['price'].mean())
    print('price max    :', cars['price'].max())        
    
    cars.plot.hist(title=name)
    plt.show()

结果:

 - BMW  -
count: 3
price min    : 4500
price average: 4500.0
price max    : 4500
 - Audi  -
count: 12
price min    : 3900
price average: 4500.0
price max    : 4900
 - Opel  -
count: 12
price min    : 3300
price average: 4049.5
price max    : 4999
 - Mercedes  -
count: 27
price min    : 3000
price average: 4366.555555555556
price max    : 5000

相关问题 更多 >