下面是数据帧:
CNSSSBDVSN CNSSSBDVS1 CNMCRGNNM \
0 5941833 Kluskus 1 Cariboo
1 5949832 Iskut 6 North Coast / Cote-nord
2 5941016 Cariboo H Cariboo
3 5955040 Peace River B Northeast / Nord-est
4 5941801 Alkali Lake 1 Cariboo
CNSSSBDVS3 instagram_posts airports \
0 Indian Reserve 0 0
1 Indian Reserve 0 0
2 Regional District Electoral Area 0 0
3 Regional District Electoral Area 1 17
4 Indian Reserve 0 0
railway_stations accommodations visitor_centers festivals \
0 0 0 0 0
1 0 0 0 0
2 0 5 0 0
3 11 0 0 0
4 0 0 0 0
ports_and_ferry_terminals attractions
0 0 0
1 0 0
2 0 0
3 0 0
4 0 0
下面是代码。在你读之前,我想提两点:1。我相信残差或索引有问题 2CNSSSBDVSN可根据需要用作索引
# -*- coding: utf-8 -*-
import pandas as pd
import statsmodels.formula.api as sm
import matplotlib.pyplot as plt
import scipy.stats as stats
from tabulate import tabulate
if __name__ == "__main__":
# Read data
census_subdivision_without_lower_mainland_and_van_island = pd.read_csv('../data/augmented/census_subdivision_without_lower_mainland_and_van_island.csv')
# Select data
cities = census_subdivision_without_lower_mainland_and_van_island[census_subdivision_without_lower_mainland_and_van_island['CNSSSBDVS3'] == 'City']
non_cities = census_subdivision_without_lower_mainland_and_van_island[census_subdivision_without_lower_mainland_and_van_island['CNSSSBDVS3'] != 'City']
# Fit
fit_cities = sm.ols(formula="instagram_posts ~ airports + railway_stations + ports_and_ferry_terminals + accommodations + visitor_centers + festivals + attractions", data=cities).fit()
fit_non_cities = sm.ols(formula="instagram_posts ~ airports + railway_stations + ports_and_ferry_terminals + accommodations + visitor_centers + festivals + attractions", data=non_cities).fit()
print(fit_cities.summary())
print(fit_non_cities.summary())
# Residual
cities['residual'] = fit_cities.resid
non_cities['residual'] = fit_non_cities.resid
给出错误:
/Users/Chu/Documents/dssg/done/linear_model_cities.py:27: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
cities['residual'] = fit_cities.resid
/Users/Chu/Documents/dssg/done/linear_model_cities.py:28: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
non_cities['residual'] = fit_non_cities.resid
你的问题是城市是人口普查的一部分,没有下大陆和凡岛 如果您想从这里开始使用城市作为自己的数据帧,您只需使用以下内容创建一个副本:
或者,如果您希望修改原始数据帧,您可以使用loc作为所述错误插入结果:
对于非城市也是如此。仅供参考,我会使用较短的数据帧名称,以保持代码可读性并在推荐的python行限制内
相关问题 更多 >
编程相关推荐