如何汇总列表中的所有外观并打印列表中的最大值

1条回答

网友

1楼 · 发布于 2024-09-28 19:28:14

如果您想分析数据，那么最好将所有数据都放在pandas.DataFrame

首先将[title, int(price)]添加到外部列表data

data = []

for page in range(1, last_page+1):

    # ... code ...

    for car in car_list:

         # ... code ...

         data.append( [title, int(price)] )

然后转换成DataFrame

df = pd.DataFrame(data, columns=['title', 'price'])

然后你可以分析它

    cars = df[ df['title'].str.contains("BMW") ]

    print('count:', len(cars))
    print('price min    :', cars['price'].min())
    print('price average:', cars['price'].mean())
    print('price max    :', cars['price'].max())

你甚至可以在for循环中运行更多的汽车

for name in ['BMW', 'Audi', 'Opel', 'Mercedes']:

    print(' -', name, ' -')

    cars = df[ df['title'].str.contains(name) ]

    print('count:', len(cars))
    print('price min    :', cars['price'].min())
    print('price average:', cars['price'].mean())
    print('price max    :', cars['price'].max())

您甚至可以绘制价格直方图，以查看哪些价格更受欢迎

您甚至可以简单地将数据保存为csv或excel

df.to_csv('carData.csv')
df.to_excel('carData.xlsx')

基于先前代码的最小工作代码

它显示直方图，您必须关闭直方图才能看到下一个数据

import requests
import bs4
import pandas as pd
import matplotlib.pyplot as plt

url = 'https://www.otomoto.pl/osobowe/seg-sedan/?search%5Bfilter_float_price%3Afrom%5D=3000&search%5Bfilter_float_price%3Ato%5D=5000&search%5Bfilter_float_engine_capacity%3Afrom%5D=2000&search%5Border%5D=created_at%3Adesc&search%5Bbrand_program_id%5D%5B0%5D=&search%5Bcountry%5D='

response = requests.get(url)
response.raise_for_status()

# check how many pages are there
soup = bs4.BeautifulSoup(response.text, "lxml")
last_page = int(soup.select('.page')[-1].text)

print('last_page:', last_page)

data = []

for page in range(1, last_page+1):

    print(' - page:', page, ' -')

    response = requests.get(url + '&page=' + str(page))
    response.raise_for_status()
    
    soup = bs4.BeautifulSoup(response.text, 'lxml')
    all_offers = soup.select('article.offer-item')

    for offer in all_offers:
        # get the interesting data and write to file

        title = offer.find('a', class_='offer-title__link').text.strip()
        price = offer.find('span', class_='offer-price__number').text.strip().replace(' ', '').replace('\nPLN', '')

        item = [title, int(price)]
        data.append(item)
        print(item)

#  - work with data  -

df = pd.DataFrame(data, columns=['title', 'price'])
df.to_csv('carData.csv')
#df.to_excel('carData.xlsx')

for name in ['BMW', 'Audi', 'Opel', 'Mercedes']:
    print(' -', name, ' -')
    cars = df[ df['title'].str.contains(name) ]
    print('count:', len(cars))
    print('price min    :', cars['price'].min())
    print('price average:', cars['price'].mean())
    print('price max    :', cars['price'].max())        
    
    cars.plot.hist(title=name)
    plt.show()

结果:

 - BMW  -
count: 3
price min    : 4500
price average: 4500.0
price max    : 4500
 - Audi  -
count: 12
price min    : 3900
price average: 4500.0
price max    : 4900
 - Opel  -
count: 12
price min    : 3300
price average: 4049.5
price max    : 4999
 - Mercedes  -
count: 27
price min    : 3000
price average: 4366.555555555556
price max    : 5000

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何汇总列表中的所有外观并打印列表中的最大值

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >