Python平均表格数据帮助

3条回答

网友

1楼 · 编辑于 2024-09-29 19:35:18

以下是一个功能解决方案：

text = """Joe Sam Bob
1   2   3
2   1   3
NA 2 3
3 5 NA"""

def avg( lst ):
    """ returns the average of a list """
    return 1. * sum(lst)/len(lst)

# split that text
parts = [line.split() for line in text.splitlines()]
#remove the headers
names = parts.pop(0)
# zip(*m) does something like transpose a matrix :-)
columns = zip(*parts)
# convert to numbers and leave out the NA
numbers = [[int(x) for x in column if x != 'NA' ] for column in columns]
# all left is averaging
averages = [avg(col) for col in numbers]
# and printing
for name, x in zip( names, averages):
    print name, x

我在这里写了很多列表理解，这样你就可以打印出中间步骤，但这些都可能是原因的生成器。在

网友

2楼 · 编辑于 2024-09-29 19:35:18

[为清晰起见编辑]

从文本文件中读取项时，它们将作为字符串而不是数字导入。这意味着，如果文本文件中有数字3，并将其读入Python，则需要在进行算术运算之前将字符串转换为数字。在

现在，您有一个包含colums的文本文件。每列都有一个标题和一组项。每一项要么是一个数字，要么不是。如果它是一个数字，它将被函数float正确地转换，如果它不是一个有效的数字（也就是说，如果转换不存在），转换将引发一个称为ValueError的异常。在

因此，你会循环查看你的列表和项目，因为它已经在多个答案中得到了正确的解释。如果可以转换为浮点，请累积统计信息。如果没有，继续忽略这个条目。在

如果您需要更多关于什么是“duck typing”的信息（一种可以恢复为“最好请求原谅而不是请求许可”）的信息，请检查Wikipedia link。如果你开始接触Python，你会经常听到这个词。在

我的意思是你对下面的统计数据感兴趣。可以为表中的每一列使用该类的实例。在

class Accumulator(object):
    """
    Used to accumulate the arithmetic mean of a stream of
    numbers. This implementation does not allow to remove items
    already accumulated, but it could easily be modified to do
    so. also, other statistics could be accumulated.
    """
    def __init__(self):
     # upon initialization, the numnber of items currently
     # accumulated (_n) and the total sum of the items acumulated
     # (_sum) are set to zero because nothing has been accumulated
     # yet.
     self._n = 0
     self._sum = 0.0

    def add(self, item):
     # the 'add' is used to add an item to this accumulator
     try:
        # try to convert the item to a float. If you are
        # successful, add the float to the current sum and
        # increase the number of accumulated items
        self._sum += float(item)
        self._n += 1
     except ValueError:
        # if you fail to convert the item to a float, simply
        # ignore the exception (pass on it and do nothing)
        pass

    @property
    def mean(self):
     # the property 'mean' returns the current mean accumulated in
     # the object
     if self._n > 0:
        # if you have more than zero items accumulated, then return
        # their artithmetic average
        return self._sum / self._n
     else:
        # if you have no items accumulated, return None (you could
        # also raise an exception)
        return None

# using the object:

# Create an instance of the object "Accumulator"
my_accumulator = Accumulator()
print my_accumulator.mean
# prints None because there are no items accumulated

# add one (a number)
my_accumulator.add(1)
print my_accumulator.mean
# prints 1.0

# add two (a string - it will be converted to a float)
my_accumulator.add('2')
print my_accumulator.mean
# prints 1.5

# add a 'NA' (will be ignored because it cannot be converted to float)
my_accumulator.add('NA')
print my_accumulator.mean
# prints 1.5 (notice that it ignored the 'NA')

干杯。在

网友

3楼 · 编辑于 2024-09-29 19:35:18

下面的代码可以正确地处理不同的计数，还可以检测额外的数据。。。换句话说，它相当健壮。如果文件是空的（2）如果头行是空的，可以通过显式消息（1）来改进它。另一种可能是显式地测试"NA"，如果某个字段既不是"NA"，也不是可浮动的，则发出错误消息。在

>>> import sys, StringIO
>>>
>>> data = """\
... Jim Joe Billy Bob
... 1   2   3     x
... 2   x   x     x  666
...
... 3   4   5     x
... """
>>>
>>> def get_averages(f):
...     headers = f.readline().split()
...     ncols = len(headers)
...     sumx0 = [0] * ncols
...     sumx1 = [0.0] * ncols
...     lino = 1
...     for line in f:
...         lino += 1
...         values = line.split()
...         for colindex, x in enumerate(values):
...             if colindex >= ncols:
...                 print >> sys.stderr, "Extra data %r in row %d, column %d" %
(x, lino, colindex+1)
...                 continue
...             try:
...                 value = float(x)
...             except ValueError:
...                 continue
...             sumx0[colindex] += 1
...             sumx1[colindex] += value
...     print headers
...     print sumx1
...     print sumx0
...     averages = [
...         total / count if count else None
...         for total, count in zip(sumx1, sumx0)
...         ]
...     print averages

编辑在此处添加：

^{pr2}$

编辑

正常使用：

with open('myfile.text') as mf:
   hdrs, avgs = get_averages(mf)

相关问题更多 >

编程相关推荐

热门问题

热门文章