将函数应用于Pandas Datafram中的每一行问题的回答

将函数应用于Pandas Datafram中的每一行

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

我是Python新手，希望重新构建这个<a href="https://anaconda.org/jbednar/nyc_taxi/notebook" rel="nofollow noreferrer">example</a>。我有纽约市出租车接送的经纬度数据，但是，我需要将数据更改为webmercartor格式（在上面的例子中找不到）。我发现了一个函数，它可以取一对经度和纬度值，并将其更改为Web-Mercartor格式，它来自<a href="http://www.neercartography.com/latitudelongitude-tofrom-web-mercator/" rel="nofollow noreferrer">here</a>，如下所示： <pre><code>import math def toWGS84(xLon, yLat): # Check if coordinate out of range for Latitude/Longitude if (abs(xLon) < 180) and (abs(yLat) > 90): return # Check if coordinate out of range for Web Mercator # 20037508.3427892 is full extent of Web Mercator if (abs(xLon) > 20037508.3427892) or (abs(yLat) > 20037508.3427892): return semimajorAxis = 6378137.0 # WGS84 spheriod semimajor axis latitude = (1.5707963267948966 - (2.0 * math.atan(math.exp((-1.0 * yLat) / semimajorAxis)))) * (180/math.pi) longitude = ((xLon / semimajorAxis) * 57.295779513082323) - ((math.floor((((xLon / semimajorAxis) * 57.295779513082323) + 180.0) / 360.0)) * 360.0) return [longitude, latitude] def toWebMercator(xLon, yLat): # Check if coordinate out of range for Latitude/Longitude if (abs(xLon) > 180) and (abs(yLat) > 90): return semimajorAxis = 6378137.0 # WGS84 spheriod semimajor axis east = xLon * 0.017453292519943295 north = yLat * 0.017453292519943295 northing = 3189068.5 * math.log((1.0 + math.sin(north)) / (1.0 - math.sin(north))) easting = semimajorAxis * east return [easting, northing] def main(): print(toWebMercator(-105.816001, 40.067633)) print(toWGS84(-11779383.349100526, 4875775.395628653)) if __name__ == '__main__': main() </code></pre> 如何将这些数据应用于pandas数据帧中的每对long/lat坐标，并将输出保存在同一pandasDF中？在 ^{pr2}$

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

对于这样大的数据集，最有帮助的是理解如何以<code>pandas</code>的方式来做事情。与内置的矢量化方法相比，遍历行将产生糟糕的性能。在 <pre><code>import pandas as pd import numpy as np df = pd.read_csv('/yellow_tripdata_2016-06.csv') df.head(5) VendorID tpep_pickup_datetime tpep_dropoff_datetime passenger_count trip_distance pickup_longitude pickup_latitude RatecodeID store_and_fwd_flag dropoff_longitude dropoff_latitude payment_type fare_amount extra mta_tax tip_amount tolls_amount improvement_surcharge total_amount 0 2 2016-06-09 21:06:36 2016-06-09 21:13:08 2 0.79 -73.983360 40.760937 1 N -73.977463 40.753979 2 6.0 0.5 0.5 0.00 0.0 0.3 7.30 1 2 2016-06-09 21:06:36 2016-06-09 21:35:11 1 5.22 -73.981720 40.736668 1 N -73.981636 40.670242 1 22.0 0.5 0.5 4.00 0.0 0.3 27.30 2 2 2016-06-09 21:06:36 2016-06-09 21:13:10 1 1.26 -73.994316 40.751072 1 N -74.004234 40.742168 1 6.5 0.5 0.5 1.56 0.0 0.3 9.36 3 2 2016-06-09 21:06:36 2016-06-09 21:36:10 1 7.39 -73.982361 40.773891 1 N -73.929466 40.851540 1 26.0 0.5 0.5 1.00 0.0 0.3 28.30 4 2 2016-06-09 21:06:36 2016-06-09 21:23:23 1 3.10 -73.987106 40.733173 1 N -73.985909 40.766445 1 13.5 0.5 0.5 2.96 0.0 0.3 17.76 </code></pre> 这个数据集有11135470行，这不是“大数据”，但也不小。与其编写一个函数并将其应用于每一行，不如将函数的某些部分执行到各个列，从而获得更高的性能。我会把这个函数转过来： ^{pr2}$ 在这方面： <pre><code>SEMIMAJORAXIS = 6378137.0 # typed in all caps since this is a static value df['pickup_east'] = df['pickup_longitude'] * 0.017453292519943295 # takes all pickup longitude values, multiples them, then saves as a new column named pickup_east. df['pickup_north'] = df['pickup_latitude'] * 0.017453292519943295 # numpy functions allow you to calculate an entire column's worth of values by simply passing in the column. df['pickup_northing'] = 3189068.5 * np.log((1.0 + np.sin(df['pickup_north'])) / (1.0 - np.sin(df['pickup_north']))) df['pickup_easting'] = SEMIMAJORAXIS * df['pickup_east'] </code></pre> 然后，<code>pickup_easting</code>和{<cd3>}列包含计算值。在 对于我的笔记本电脑，这需要： <pre><code>CPU times: user 1.01 s, sys: 286 ms, total: 1.3 s Wall time: 763 ms </code></pre> 所有1100万行。15分钟>秒。在 我取消了价值观的检查-你可以做些类似的事情： <pre><code>df = df[(df['pickup_longitude'].abs() <= 180) & (df['pickup_latitude'].abs() <= 90)] </code></pre> 这使用了布尔索引，这同样比循环快几个数量级。在

将函数应用于Pandas Datafram中的每一行

1 个回答

相关Python问题