按列（对象）分层拆分问题的回答

按列（对象）分层拆分

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

<pre><code>from sklearn.model_selection import train_test_split from sklearn.linear_model import LinearRegression df = pd.DataFrame({ 'Country': ['AB', 'CD', 'EF', 'FG']*20, 'ColumnA' : [1]*20*4,'ColumnB' : [10]*20*4, 'Label': [1,0,1,0]*20 }) df['Country_Code'] = df['Country'].astype('category').cat.codes X = df.loc[:, df.columns.drop(['Label','Country'])] y = df['Label'] X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0, stratify=df.Country_Code) lm = LinearRegression() lm.fit(X_train,y_train) lm_predictions = lm.predict(X_test) </code></pre> <ul> <li>将<code>country</code>中的字符串值转换为数字，并将其另存为新列</li> <li>创建<code>x</code>列数据drop <code>label</code>（<code>y</code>）和字符串<code>country</code>列时</li> </ul> <h2>方法2</h2> <p>如果您要对其进行预测的测试数据稍后会出现，那么在进行预测之前，您将需要一种机制将它们的<code>country</code>转换为<code>code</code>。在这种情况下，推荐的方法是使用<code>LabelEncoder</code>，您可以使用<code>fit</code>方法将字符串编码为标签，然后使用<code>transform</code>对测试数据的国家/地区进行编码。你知道吗</p> <pre><code>from sklearn.model_selection import train_test_split from sklearn.linear_model import LinearRegression from sklearn import preprocessing df = pd.DataFrame({ 'Country': ['AB', 'CD', 'EF', 'FG']*20, 'ColumnA' : [1]*20*4,'ColumnB' : [10]*20*4, 'Label': [1,0,1,0]*20 }) # Train-Validation le = preprocessing.LabelEncoder() df['Country_Code'] = le.fit_transform(df['Country']) X = df.loc[:, df.columns.drop(['Label','Country'])] y = df['Label'] X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0, stratify=df.Country_Code) lm = LinearRegression() lm.fit(X_train,y_train) # Test test_df = pd.DataFrame({'Country': ['AB'], 'ColumnA' : [1],'ColumnB' : [10] }) test_df['Country_Code'] = le.transform(test_df['Country']) print (lm.predict(test_df.loc[:, test_df.columns.drop(['Country'])])) </code></pre>

按列（对象）分层拆分

1 个回答

相关Python问题