<p>您可以声明一个空数据框,并在其中插入过滤后的数据</p>
<p>要筛选数据,您可以遍历<code>df2</code>的所有行,并使用相同的<code>state</code>名称在<code>specificDate</code>列和<code>specificDate+14</code>的日期之间设置掩码</p>
<p>我已经用数据帧中的几个值创建了两个数据帧<code>df1</code>和<code>df2</code>,并测试了上述过程</p>
<pre><code>import pandas as pd
import datetime
data1 = {
"state":["Alabama","Alabama","Alabama"],
"date":["3/12/20", "3/13/20", "3/14/20"],
"number":[0,5,7]
}
data2 = {
"state": ["Alabama", "Alaska"],
"specificDate": ["03.13.2020", "03.11.2020"]
}
df1 = pd.DataFrame(data1)
df1['date'] = pd.to_datetime(df1['date'])
df2 = pd.DataFrame(data2)
df2['specificDate'] = pd.to_datetime(df2['specificDate'])
final_df = pd.DataFrame()
for index, row in df2.iterrows():
begin_date = row["specificDate"]
end_date = begin_date+datetime.timedelta(days=14)
mask = (df1['date'] >= begin_date) & (df1['date'] <= end_date) & (df1['state'] == row['state'])
filtered_data = df1.loc[mask]
if not filtered_data.empty:
final_df = final_df.append(filtered_data, ignore_index=True)
print(final_df)
</code></pre>
<p>输出:</p>
<pre><code> state date number
0 Alabama 2020-03-13 5
1 Alabama 2020-03-14 7
</code></pre>
<p><strong>更新的答案</strong>:</p>
<p>要仅显示特定日期和特定日期+14th date from<code>df1</code>的数据,我们应该更新上述代码片段的<code>mask</code></p>
<pre><code>import pandas as pd
import datetime
data1 = {
"state":["Alabama","Alabama","Alabama","Alabama","Alabama"],
"date":["3/12/20", "3/13/20", "3/14/20", "3/27/20", "3/28/20"],
"number":[0,5,7,9,3]
}
data2 = {
"state": ["Alabama", "Alaska"],
"specificDate": ["03.13.2020", "03.11.2020"]
}
df1 = pd.DataFrame(data1)
df1['date'] = pd.to_datetime(df1['date'])
df2 = pd.DataFrame(data2)
df2['specificDate'] = pd.to_datetime(df2['specificDate'])
final_df = pd.DataFrame()
for index, row in df2.iterrows():
first_date = row["specificDate"]
last_date = first_date+datetime.timedelta(days=14)
mask = ((df1['date'] == first_date) | (df1['date'] == last_date)) & (df1['state'] == row['state'])
filtered_data = df1.loc[mask]
if not filtered_data.empty:
final_df = final_df.append(filtered_data, ignore_index=True)
print(final_df)
</code></pre>
<p>输出:</p>
<pre><code> state date number
0 Alabama 2020-03-13 5
1 Alabama 2020-03-27 9
</code></pre>