<p>如源代码所示,R的<a href="https://github.com/ggrothendieck/sqldf/blob/master/R/sqldf.R" rel="nofollow noreferrer">sqldf</a>和Python的<a href="https://github.com/yhat/pandasql/blob/master/pandasql/sqldf.py" rel="nofollow noreferrer">pandasql</a>实际上都运行在内存中的SQLite实例中(R的默认情况下是SQLite)。因此,实际上,您可以通过与<a href="http://docs.sqlalchemy.org/en/latest/dialects/sqlite.html" rel="nofollow noreferrer">SQLAlchemy</a>接口,用pandas的SQL方法(<a href="http://pandas.pydata.org/pandas-docs/version/0.22/generated/pandas.read_sql.html" rel="nofollow noreferrer">read_sql</a>和<a href="https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_sql.html" rel="nofollow noreferrer">to_sql</a>)复制功能,而pandasql实际上是在幕后操作的!具体来说,请考虑以下示例:</p>
<ol>
<li><p>将所有需要的环境数据帧导入内存中的SQLite数据库:</p>
<pre><code>import numpy as np
import pandas as pd
from sqlalchemy import create_engine
# IN-MEMORY DATABASE (NO PATH SPECIFIED)
engine = create_engine('sqlite://')
dates = pd.date_range('2018-01-01', '2018-06-22', freq='D')
df1 = pd.DataFrame({'current_date': np.random.choice(dates, 50),
'analysis_tool': 'pandas',
'num_value': np.random.randint(100, size=50)*1000
}, columns=['current_date', 'analysis_tool', 'num_value'])
df2 = pd.DataFrame({'current_date': np.random.choice(dates, 50),
'analysis_tool': 'r',
'num_value': np.random.randint(100, size=50)*1000
}, columns=['current_date', 'analysis_tool', 'num_value'])
df3 = pd.DataFrame({'current_date': np.random.choice(dates, 50),
'analysis_tool': 'sas',
'num_value': np.random.randint(100, size=50)*1000
}, columns=['current_date', 'analysis_tool', 'num_value'])
df1.to_sql("df1", con=engine, if_exists='replace')
df2.to_sql("df2", con=engine, if_exists='replace')
df3.to_sql("df3", con=engine, if_exists='replace')
</code></pre></li>
<li><p>运行所需的SQL查询以更新和操作数据:</p>
<pre><code># QUERIES RUN IN A TRANSACTION
with engine.begin() as cn:
cn.execute("UPDATE df1 SET analysis_tool = 'python pandas'")
cn.execute("INSERT INTO df3 (analytic_tool, current_date, num_value) VALUES (?, ?, ?)",
('sas', '2018-06-23', 51000))
</code></pre></li>
<li><p>导入为熊猫数据帧:</p>
<pre><code>strSQL = """SELECT * FROM df1
UNION ALL
SELECT * FROM df2
UNION ALL
SELECT * FROM df3;"""
df_all = pd.read_sql(strSQL, engine)
engine.dispose() # IN-MEMORY DATABASE DESTROYED
</code></pre></li>
</ol>