将pandas dataframe转换为xarray数据集后的大小和顺序更改

2024-10-03 17:21:13 发布

男 | 程序猿一只，喜欢编程写python代码。

我正在尝试将数据帧导出到netcdf文件。据我所知，我可以使用xarray.Dataset.to_netcdf函数实现这一点。因此，我必须将数据帧转换为xarray数据集。以下是我正在做的：

ypredicted_df = pd.DataFrame(ypredicted, index=ytest.index, columns=ytest.columns.values)
ypredicted_ds = ypredicted_df.to_xarray()  # to ds
ypredicted_ds.to_netcdf(os.path.join(output_path, 'ypredicted_wholescene_highres_' + str(max_features) + '.nc'))

预测是一个错误。当我将ypredicted_df和ypredicted_ds.打印到_dataframe（）以检查是否有更改时，我看到，订单的该部分和大小发生了更改：

print(ypredicted_df)

                       ST_B10
lat       lon
50.684918 13.282882 -0.213598
          13.283247  0.521064
          13.283613  0.162646
          13.283978  0.090892
          13.284343 -0.060037
...                       ...
51.397346 13.671611  4.871557
          13.671977  4.168761
          13.672342  1.421363
          13.672708  1.761741
          13.673073  2.938208

[5979909 rows x 1 columns]



print(ypredicted_ds.to_dataframe())

                       ST_B10
lat       lon
50.684918 13.282882 -0.213598
          13.283247  0.521064
          13.283613  0.162646
          13.283978  0.090892
          13.284343 -0.060037
...                       ...
51.397346 12.281465  3.387909
          12.281099  3.021199
          12.280734  2.889664
          12.280369  3.197318
          12.280003  2.702418

[7441114 rows x 1 columns]

数据帧的大小不相等
最后一行的顺序不同（降序，而第一行升序）

我已经检查了是否包含了一些NAN，但是当我删除NAN时，大小没有改变

谁能解释一下，这里发生了什么事？为什么从pandas数据帧转换到xarray数据集后，数据帧会有所不同？是否有其他方法可以这样做，使数据帧保持不变？或者我可以直接将数据帧导出到netcdf吗

谢谢你的帮助：）

更新：

我又试着放下南斯，现在大小都一样了，但顺序还是错了。我现在不知道，当我策划它时，这是否有一些影响

print(ypredicted.to_dataframe().dropna(how='any'))

                       ST_B10
lat       lon
50.684918 13.282882 -0.213598
          13.283247  0.521064
          13.283613  0.162646
          13.283978  0.090892
          13.284343 -0.060037
...                       ...
51.397346 12.281465  3.387909
          12.281099  3.021199
          12.280734  2.889664
          12.280369  3.197318
          12.280003  2.702418

[5979909 rows x 1 columns]

但是，对于绘制，我仍然需要一个数据集，因为我还没有找到绘制数据帧的方法。因此，我仍然需要从数据集中删除nan。我找到了xarray.Dataset.dropna，但它还不起作用：

我尝试的第一件事是：

ypredicted_ds.dropna(how='any')

错误消息：

Traceback (most recent call last):
  File "script_randomforest_dem.py", line 174, in <module>
    output_path_identifier, 3)
  File "/lustre/scratch2/ws/1/stwa779b-master/04_workspace/randomforest/randomforest.py", line 104, in randomforest
    print(dif_ds.dropna(how='any'))
TypeError: dropna() missing 1 required positional argument: 'dim'

然后我试着：

ypredicted_ds.dropna('lon', how='any').to_dataframe()

错误消息：

Empty DataFrame
Columns: [ST_B10]
Index: []
(5979909, 1)

ypredicted_ds.dropna('lat', how='any').to_dataframe()

错误消息：

Empty DataFrame
Columns: [ST_B10]
Index: []
(5979909, 1)

他们都没有工作。当通过lon和lat删除nan时，我可以想象在每个lon或lat中至少出现一个nan，因此数据集是空的。现在有人知道如何使用xarraysds.dropna（）

如何绘图？ 作为补充，我将了解如何绘制数据集：

import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(10, 7))
ax.imshow(ypredicted_ds['ST_B10'], cmap=cmap)

Tags： columns to 数据 dataframe ds any netcdf how

0条回答

目前没有回答

将pandas dataframe转换为xarray数据集后的大小和顺序更改

相关问题更多 >

编程相关推荐

热门问题

热门文章

将pandas dataframe转换为xarray数据集后的大小和顺序更改

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >