AttributeError:“generator”对象在使用gen创建datframe时没有“to”to“sql”属性

3条回答

网友

1楼 · 编辑于 2024-06-26 18:01:12

我建议你这样做：

def _generator( engine, filename, ...):
    for chunk in pd.read_fwf(filename, ...):
        yield chunk.to_sql('sample_table', engine, ...)  # not sure about this since row was not define

for row in _generator(engine=engine, filename=filename)
    print(row)

网友

2楼 · 编辑于 2024-06-26 18:01:12

“chunck_generator”将返回“generator”对象，而不是块的实际元素。您需要迭代对象以从中获取块。在

>>> def my_generator(x):
...     for y in range(x):
...         yield y
...
>>> g = my_generator(10)
>>> print g.__class__
<type 'generator'>
>>> ele = next(g, None)
>>> print ele
0
>>> ele = next(g, None)
>>> print ele
1

因此，要修复代码，只需在生成器上循环一次即可

^{pr2}$

但似乎很激动。我可以这么做：

import pandas.io.sql as psql
import pandas as pd
from sqlalchemy import create_engine

def sql_generator(engine, filename, header=False,chunk_size = 10 ** 5):
    frame = pd.read_fwf(
        filename, 
        colspecs=[[0,12],[12,13]],
        index_col=False,
        header=None, 
        iterator=True, 
        chunksize=chunk_size
    ):

    for chunk in frame:
        yield chunk.to_sql(
            'sample_table', 
            engine, 
            if_exists='replace', 
            schema='sample_schema', 
            index=False
        )


if __name__ == "__main__":
    filename = r'test_file.txt'
    engine = create_engine('postgresql://USEE:PWD@IP:PORT/DB')
    for sql in sql_generator(engine, filename):
        print sql

网友

3楼 · 编辑于 2024-06-26 18:01:12

结论： to\u sql方法在加载大文件时效率不高。所以我在psycopg2包中使用copy_from方法，并在创建数据帧时使用chunksize选项。在30分钟内加载980万条记录（约17GB），每个记录有98列。在

我已经删除了我实际文件的原始引用（iam使用原始文章中的示例文件）。在

import pandas as pd
import psycopg2
import io

def sql_generator(cur,con, filename, boundries, col_names, header=False,chunk_size = 2000000):
    frame = pd.read_fwf(filename,colspecs=boundries,index_col=False,header=None,iterator=True,chunksize=chunk_size,names=col_names)
    for chunk in frame:
        output = io.StringIO()
        chunk.to_csv(output, sep='|', quoting=3, escapechar='\\' , index=False, header=False,encoding='utf-8')
        output.seek(0)
        cur.copy_from(output, 'sample_schema.sample_table', null="",sep="|")
        yield con.commit()

if __name__ == "__main__":
    boundries = [[0,12],[12,13]]
    col_names = ['col1','col2']
    filename = r'test_file.txt'  #Refer to sample file in the original post
    con = psycopg2.connect(database='database',user='username', password='pwd', host='ip', port='port')
    cur = con.cursor()
    for sql in sql_generator(cur,con, filename, boundries, col_names):
        print(sql)
    con.close()

相关问题更多 >

编程相关推荐

热门问题

热门文章

AttributeError:“generator”对象在使用gen创建datframe时没有“to”to“sql”属性

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >