两个列表中对应的浮点

2024-10-03 09:18:26 发布

您现在位置:Python中文网/ 问答频道 /正文

我有两个CSV文件。
第一个列表如下所示:

('Rubus idaeus', '10.0', '56.0')
('Neckera crispa', '9.8785', '56.803')
('Dicranum polysetum', '9.1919', '56.0456')
('Sphagnum subnitens', '9.1826', '56.6367')
('Taxus baccata', '9.61778', '55.68833')
('Sphagnum papillosum', '9.1879', '56.0442')

列为“物种”、“经度”和“纬度”。 它们是在实地进行的观察。
另一个文件也是CSV文件。一种与真品相似的测试。看起来是这样的:

{'y': '58.1', 'x': '22.1', 'temp': '14'}
{'y': '58.2', 'x': '22.2', 'temp': '10'}
{'y': '58.3', 'x': '22.3', 'temp': '1'}
{'y': '58.4', 'x': '22.4', 'temp': '12'}
{'y': '58.5', 'x': '22.5', 'temp': '1'}
{'y': '58.6', 'x': '22.6', 'temp': '6'}
{'y': '58.7', 'x': '22.7', 'temp': '0'}
{'y': '58.8', 'x': '22.8', 'temp': '13'}
{'y': '58.9', 'x': '22.9', 'temp': '7'}

两个文件都很长。你知道吗

我有了观测结果,现在我想在包含气候数据的文件中找到最接近的较低的数字,然后将该行附加到另一行,这样输出就变成:

('Dicranum polysetum', '9.1919', '56.0456', 'y': '9.1', 'x': '56.0', 'temp': '7')

我尝试通过使用DictReader遍历CSV文件来创建嵌套循环,但是嵌套速度非常快。整个过程需要大量的循环。
有人知道什么方法吗?你知道吗

我目前的代码很差,但我尝试了几种循环方式,我认为我的整个方法有一些根本性的错误。你知道吗

import csv
fil = csv.DictReader(open("TestData.csv"), delimiter=';')
navn = "nyDK_OVER_50M.csv"
occu = csv.DictReader(open(navn), delimiter='\t')

for row in fil:
    print 'x=',row['x']
    for line in occu:
        print round(float(line['decimalLongitude']),1)
        if round(float(line['decimalLongitude']),1) == row['x']:
            print 'You did it, found one dam match'

这是我的两个文件的链接,所以你不必编造任何数据,以防你知道什么可以推动我前进。你知道吗

https://www.dropbox.com/s/lmstnkq8jl71vcc/nyDK_OVER_50M.csv?dl=0https://www.dropbox.com/s/v22j61vi9b43j78/TestData.csv?dl=0

致以最诚挚的问候, 马蒂亚斯


Tags: 文件csv数据方法lineopentemprow
2条回答

这是一个使用numpy计算每个数据项到x,y点的欧氏距离的解决方案,并将该项与x,y数据元组中距离最小的数据连接起来。你知道吗

import numpy
import operator

# read the data into numpy arrays
testdata = numpy.genfromtxt('TestData.csv', delimiter=';', names=True)
nyDK     = numpy.genfromtxt('nyDK_OVER_50M.csv', names=True, delimiter='\t',\
                            dtype=[('species','|S64'),\
                                   ('decimalLongitude','float32'),\
                                   ('decimalLatitude','float32')])

# extract the x,y tuples into a numpy array or [(lat,lon), ...]
xy        = numpy.array(map(operator.itemgetter('x', 'y'), testdata))
# this is a function which returns a function which computes the distance
# from an arbitrary point to an origin
distance  = lambda origin: lambda point: numpy.linalg.norm(point-origin)

# methods to extract the (lat, lon) from a nyDK entry
latlon    = operator.itemgetter('decimalLatitude', 'decimalLongitude')
getlatlon = lambda item: numpy.array(latlon(item))

# this will transfrom a single element of the nyDK array into
# a union of it with its closest climate data
def transform(item):
    # compute distance from each x,y point to this item's location
    # and find the position of the minimum
    idx = numpy.argmin( map(distance(getlatlon(item)), xy) )
    # return the union of the item and the closest climate data
    return tuple(list(item)+list(testdata[idx]))

# transform all the entries in the input data set
result = map(transform, nyDK)

print result[0:3]

输出:

[('Rubus idaeus', 10.0, 56.0, 15.0, 51.0, 14.0),
 ('Neckera crispa', 9.8785, 56.803001, 15.300000000000001, 51.299999999999997, 2.0),
 ('Dicranum polysetum', 9.1919003, 56.045601, 14.6, 50.600000000000001, 10.0)]

注意:距离不是很近,但这可能是因为.csv文件中没有完整的x,y点网格。你知道吗

因为你说没有缺失的温度数据点,那么解决问题就容易多了:

import csv

# temperatures
fil = csv.DictReader(open("TestData.csv"), delimiter=';')
# species
navn = "nyDK_OVER_50M.csv"
occu = csv.DictReader(open(navn), delimiter='\t')

d = {}
for row in fil:
    x = '{:.1f}'.format(float(row['x']))
    y = '{:.1f}'.format(float(row['y']))
    try:
        d[x][y] = row['temp']
    except KeyError:
        d[x] = {y:row['temp']}

for line in occu:
    x = '{:.1f}'.format(round(float(line['decimalLongitude']),1))
    y = '{:.1f}'.format(round(float(line['decimalLatitude']),1))
    temp = d[x][y]
    line['temp'] = temp
    line['x'] = x
    line['y'] = y
    print(line)

相关问题 更多 >