从csv文件中提取用户输入的特定列的数据(无)

2024-09-30 22:22:12 发布

您现在位置:Python中文网/ 问答频道 /正文

我需要一个代码的帮助,该代码从我拥有的一个大型csv文件中获取用户所需的特定列的输入。在自己键入所需的列后,还必须键入整数输入。该整数输入将为他们提供该列出现次数最少的结果数。例如,如果他们键入:hospital_name,“5”,它将向他们显示5个不同的医院(该列下至少有50个不同的医院名称),这些医院的最低计数与他们挂钩。我将编写一个输入和输出示例:

键入所需的列:医院名称 输入您想要的最低结果数:3

输出可能如下所示:

                      400 births are tied to Gains Hospital                                                                            
                      347 births are tied to Petri Hospital 
                      200 births are tied to Brit Hospital 

整个csv是一个关于出生的报告,因此您必须计算一个项目在每列中出现的次数,并报告(大多数低计数)

我已经使用“with”读取了我的csv文件

我有困难,使循环连接所有这一切。 我知道用户输入本身将是input()和int(input()),但这并不能将我连接回csv文件


Tags: 文件csvto代码用户名称键入整数
1条回答
网友
1楼 · 发布于 2024-09-30 22:22:12

代码

import csv

column_name = input('Which column: ').upper()
number_lowest = int(input('How many lowest: '))

# Calculate births by specified column name
with open("data.csv", "r") as f:
  reader = csv.DictReader(f, skipinitialspace=True, delimiter=",")
  births_count = {}
  for d in reader:
    # Use column_name as key
    # accumulate births for this key
    if not d[column_name] in births_count:
      births_count[d[column_name]] = 0
    births_count[d[column_name]] += 1 # since each row is a different birth

# Find number_lowest lowest births
lowest_births = {}
for i in range(number_lowest):
  # By looping number_lowest times, 
  # we find this many lowest values
  if len(births_count) > 0:
    # find lowest births
    lowest_val = 1e37 # just use a large number
                      # that we know actual
                      # count will be less than

    lowest_name = ""
    for name, value in births_count.items():
      if value < lowest_val:
        lowest_val = value
        lowest_name = name

    # Add to lowest births
    lowest_births[lowest_name] = lowest_val

    # remove from births_count
    # this reduces count of items in dictionary
    del births_count[lowest_name]
  else:
    break  # births_count is empty

# Output results
for name, births in lowest_births.items():
  print(f"{births} births are tied to {name} {column_name.title()}")

测试

由逗号分隔的CSV数据组成,包含三列:出生、医院、位置

File: data.csv

HOSPITAL_NAME,BIRTH_DAY, BIRTH_YEAR, BIRTH_WEIGHT
Gains,1/14,2015,8.5 lbs
Mayo Clinic,2/11,2018,6.5 lbs
Gains,1/15,2016,8.9 lbs
Stanford Health Care,2/15,2016,7.4 lbs
Mayo Clinic,11/10,2018,7.3 lbs
Gains,1/09,2011,7.5 lbs
John Hopkins,12/23,2012,6.9 lbs
Massachusetts General,9/14,2001,8.3 lbs
Stanford Health Care,8/17,2005,7.6 lbs
Massachusetts General,7/18,2016,8.7 lbs
John Hopkins,3/11,2017,7.2 lbs
Massachusetts General,4/16,2014,7.4 lbs
Northwestern Memorial,10/12,2012,8.3 lbs
UCLA Medical Center,9/19,2011,8.1 lbs
Petri,11/21,2003,7.5 lbs
UCSF Medical Center,2/15,2004,7.9 lbs

运行示例:

Which column: hospital_name
How many lowest: 5
HOSPITAL_NAME
1 births are tied to Northwestern Memorial Hospital_Name
1 births are tied to UCLA Medical Center Hospital_Name
1 births are tied to Petri Hospital_Name
1 births are tied to UCSF Medical Center Hospital_Name
2 births are tied to Mayo Clinic Hospital_Name

使用插入排序更新Find Max

import csv

# Source: https://www.geeksforgeeks.org/python-program-for-insertion-sort/
def insertionSort(arr): 
  " Inplace location sort "
  # Traverse through 1 to len(arr) 
  for i in range(1, len(arr)): 
    key = arr[i] 
    # Move elements of arr[0..i-1], that are 
    # greater than key, to one position ahead 
    # of their current position 
    j = i-1
    while j >=0 and key < arr[j] : 
            arr[j+1] = arr[j] 
            j -= 1
    arr[j+1] = key

def find_maxs_by_sort(data, number):
  """ Finds extreems of mins or max's 
      depending upn bLowest flag
  """

  # Get list of key, value pairs as tuples of (value, key)
  tuple_list = []
  for k, v in data.items():
    tuple_list.append((v, k))

  # Sort will be in ascending order
  # Does an inplace sort
  # insertSort also works on array of tuples
  # Will sort by v since it's first in the each tuple
  insertionSort(tuple_list)

  # Place sorted tuples back as a dictionary
  # tuples are sorted by [(v1, k1), (v2, k2), ...]
  # We start at the end and work backwards since sort is
  # in ascending order
  n = len(tuple_list)
  results = {}
  for i in range(n-1, n - number - 1, -1):
    v, k = tuple_list[i]
    results[k] = v

  return results

for i in range(3):
  # To do this 3 times
  column_name = input('Which column: ').upper()
  number = int(input('How many maxs: '))

  with open("data.csv", "r") as f:
    reader = csv.DictReader(f, skipinitialspace=True, delimiter=",")
    births_count = {}
    for d in reader:
      # Use column_name as key
      # accumulate births for this key
      if not d[column_name] in births_count:
        births_count[d[column_name]] = 0
      births_count[d[column_name]] += 1 # since each row is a different birth

  # find max
  max_births = find_maxs_by_sort(births_count, number)

  # Output results
  for name, births in max_births.items():
    print(f"\t{births} births are tied to {name} {column_name.title()}")

相关问题 更多 >