无法正确读取python中的SQL表：varchar列作为逗号分隔字符/元组导入

jar = ojdbc8.jar path jvm_path = jvm.dll path args = '-Djava.class.path=%s' % jar jpype.startJVM(jvm_path, args) con = jaydebeapi.connect("oracle.jdbc.driver.OracleDriver", url,[user, password], jar)

+---+-----------------+-----------------+-----------------+ | | (C,O,L,U,M,N,1) | (C,O,L,U,M,N,2) | (C,O,L,U,M,N,3) | +---+-----------------+-----------------+-----------------+ | 1 | (t,e,s,t) | (t,e,s,t,2) | 1 | +---+-----------------+-----------------+-----------------+ | 2 | (f,o,o) | (b,a,r) | 100 | +---+-----------------+-----------------+-----------------+

1条回答

网友

1楼 · 发布于 2024-09-28 19:33:53

当将jaydebeapi与jpype一起使用时，这似乎是一个问题。当连接到Oracle db时，我可以用与您相同的方式重现这种情况（在我的例子中是Oracle 11gR2，但由于您使用的是ojdbc8.jar，我想其他版本也会出现这种情况）

有不同的方法可以解决此问题：

更改您的连接

由于错误似乎只发生在特定的包组合中，因此最明智的做法是尝试避免这些错误，从而避免整个错误

备选方案1：使用jaydebeapi而不使用jpype：

如前所述，我仅在将jaydebeapi与jpype一起使用时观察到这一点。然而，在我的例子中，根本不需要jpype。我在本地有.jar文件，没有它我的连接可以正常工作：

import jaydebeapi as jdba
import pandas as pd
import os

db_host = 'db.host.com'
db_port = 1521
db_sid = 'YOURSID'

jar=os.getcwd()+'/ojdbc6.jar'

conn = jdba.connect('oracle.jdbc.driver.OracleDriver', 
                'jdbc:oracle:thin:@' + db_host + ':' + str(db_port) + ':' + db_sid, 
                {'user': 'USERNAME', 'password': 'PASSWORD'}, 
                jar
                )

df_jay = pd.read_sql('SELECT * FROM YOURSID.table1', conn)

conn.close()

在我的例子中，这可以正常工作并创建数据帧

备选方案2：使用cx_Oracle代替：

如果我使用cx_Oracle连接到Oracle数据库，也不会出现此问题：

import cx_Oracle
import pandas as pd
import os

db_host = 'db.host.com'
db_port = 1521
db_sid = 'YOURSID'

dsn_tns = cx_Oracle.makedsn(db_host, db_port, db_sid)
cx_conn = cx_Oracle.connect('USERNAME', 'PASSWORD', dsn_tns)

df_cxo = pd.read_sql('SELECT * FROM YOURSID.table1', con=cx_conn)

cx_conn.close()

注意：要使cx_Oracle工作，您必须安装Oracle Instant Client并正确设置（参见例如cx_Oracle documentation for Ubuntu）

修复事实后的数据帧：

如果出于某种原因，无法使用上述连接替代方案，还可以转换数据帧

备选方案3：联接元组条目：

您可以使用''.join()到convert tuples to strings。您需要对条目和列名执行此操作

# for all entries that are not None, join the tuples
for col in df.select_dtypes(include=['object']).columns:
    df[col] = df[col].apply(lambda x: ''.join(x) if x is not None else x)

# also rename the column headings in the same way
df.rename(columns=lambda x: ''.join(x) if x is not None else x, inplace=True)

备选方案4：更改列的数据类型：
通过将受影响列的dtype从object更改为string，所有条目也将被转换。请注意，这可能会产生不必要的副作用，例如将None值更改为字符串<N/A>。此外，您还必须单独重命名列标题，如上所述
```
for col in df.select_dtypes(include=['object']).columns:
    df[col] = df[col].astype('string')

# again, rename headings
df.rename(columns=lambda x: ''.join(x) if x is not None else x, inplace=True)
```

所有这些最终都应该产生大致相同的df（除了dtypes和None值的可能替换之外）：

+ -+    -+    -+    -+
|   | COLUMN1 | COLUMN2 | COLUMN3 |
+ -+    -+    -+    -+
| 1 | test    | test2   | 1       |
+ -+    -+    -+    -+
| 2 | foo     | bar     | 100     |
+ -+    -+    -+    -+

更改您的连接

修复事实后的数据帧：

相关问题更多 >

编程相关推荐

热门问题

热门文章