mysql-python挂断调试

def create_AMatrix(): """Create the adjacency table of the retweet network from rt_table to create an adjacency matrix""" con = mdb.connect(host="localhost", user="root", passwd="", db="twitter") cur = con.cursor(mdb.cursors.DictCursor) #get vertex set of users in retweet network cur.execute("select user_id from users") rows = cur.fetchall() vSet = list() for uID in rows: vSet.append(uID) #populate adjacency table cur.execute("select * from rt_table") rows = cur.fetchall() for row in rows: sourceUserID = row["source_user_id"] sourceUserName = row["source_user_name"] rtUserID = row["rt_user_id"] rtUserName = row["rt_user_name"] try: curRow = vSet.index(sourceUserID) curCol = vSet.index(rtUserID) except ValueError: continue cur.execute("select COUNT(*) from adjacency where r = %s and c = %s", (curRow, curCol)) if cur.fetchone()['COUNT(*)'] == 0: try: cur.execute("insert into adjacency (r, c, val, source_user_id, source_user_name, rt_user_id, rt_user_name) values (%d, %d, %d, %d, %s, %d, %s"), (curRow, curCol, 1, sourceUserID, sourceUserName, rtUserID, rtUserName) con.commit() except: con.rollback() else: try: cur.execute("update adjacency set val = val+1 where r = %d and c = %d"), (curRow, curCol) con.commit() except: con.rollback() cur.close() con.close()

1条回答

网友

1楼 · 发布于 2024-09-30 05:30:14

我看到的一个潜在问题是这个片段：

try:
    curRow = vSet.index(sourceUserID)
    curCol = vSet.index(rtUserID)
except ValueError:
    continue

list.index()函数在O（N）时间内搜索列表。你也叫它O（N）次，所以你的整体效率是O（N^2）。当N=250000时，这是一个相当大的低效率。我没有在你的代码中看到任何明显的错误，所以我猜它没有返回的原因是因为它需要几个小时才能完成，而你没有等那么久。你知道吗

您可以尝试用dict替换vSet。从代码来看，vSet的唯一用途似乎是查找各种用户id的索引，因此请尝试替换以下内容：

vSet = list()
for uID in rows:
    vSet.append(uID)

有了这个：

vSet = dict()
for index, row in enumerate(rows):
    vSet[row['user_id']] = index

在dict中查找内容是一个O（1）操作，因此这应该可以让您获得O（N）total运行时。你知道吗

另外，请注意，我没有将uID放入lookup dict（这会放入一行），而是将实际的user_id值放入，因为稍后，您将查找用户id，而不是行。我没有运行您的代码来测试它，但是我怀疑如果它运行到完成，您会发现您没有输出行，因为int不等于DB cursor行，因此设置curRow和curCol的代码永远不会成功。你知道吗

当然，您需要将curRow和curCol代码段更改为：

try:
    curRow = vSet[sourceUserID]
    curCol = vSet[rtUserID]
except IndexError:
    continue

尝试进行这些更改，看看这是否能让代码工作得更好。你知道吗

另外，在代码中散布print语句的建议也不错。我通常在找到调试器之前先尝试一下，大多数时候这足以让我了解代码在做什么，我不需要拿出调试器的大炮。不过，如果您确实需要Python调试器，请在Google上搜索pdb，并阅读如何使用它。您可以从命令行使用它，也可以将它集成到您正在使用的任何IDE中，具体取决于您喜欢的工作方式。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章