使用Python（PostGIS/PostgreSQL）进行令人尴尬的并行数据库更新

2条回答

网友

1楼 · 编辑于 2024-06-26 17:40:16

在纯SQL中，可以执行以下操作：

UPDATE city ci
SET gid_fkey = co.gid 
FROM country co 
WHERE ST_within(ci.the_geom , co.the_geom) 
AND ci.city_id = _some_parameter_
        ;

如果一个城市适合多个国家（导致对同一目标行进行多次更新），可能会出现问题，但在您的数据中可能不是这样。在

网友

2楼 · 编辑于 2024-06-26 17:40:16

好吧，这是对我自己帖子的回复。干得好我=D

使我的系统从单核线程到四核多处理的速度提高了大约150%。在

import multiprocessing, time, psycopg2

class Consumer(multiprocessing.Process):

def __init__(self, task_queue, result_queue):
    multiprocessing.Process.__init__(self)
    self.task_queue = task_queue
    self.result_queue = result_queue

def run(self):
    proc_name = self.name
    while True:
        next_task = self.task_queue.get()
        if next_task is None:
            print 'Tasks Complete'
            self.task_queue.task_done()
            break            
        answer = next_task()
        self.task_queue.task_done()
        self.result_queue.put(answer)
    return


class Task(object):
def __init__(self, a):
    self.a = a

def __call__(self):        
    pyConn = psycopg2.connect("dbname='geobase_1' host = 'localhost'")
    pyConn.set_isolation_level(0)
    pyCursor1 = pyConn.cursor()

        procQuery = 'UPDATE city SET gid_fkey = gid FROM country  WHERE ST_within((SELECT the_geom FROM city WHERE city_id = %s), country.the_geom) AND city_id = %s' % (self.a, self.a)

    pyCursor1.execute(procQuery)
    print 'What is self?'
    print self.a

    return self.a

def __str__(self):
    return 'ARC'
def run(self):
    print 'IN'

if __name__ == '__main__':
tasks = multiprocessing.JoinableQueue()
results = multiprocessing.Queue()

num_consumers = multiprocessing.cpu_count() * 2
consumers = [Consumer(tasks, results) for i in xrange(num_consumers)]
for w in consumers:
    w.start()

pyConnX = psycopg2.connect("dbname='geobase_1' host = 'localhost'")
pyConnX.set_isolation_level(0)
pyCursorX = pyConnX.cursor()

pyCursorX.execute('SELECT count(*) FROM cities WHERE gid_fkey IS NULL')    
temp = pyCursorX.fetchall()    
num_job = temp[0]
num_jobs = num_job[0]

pyCursorX.execute('SELECT city_id FROM city WHERE gid_fkey IS NULL')    
cityIdListTuple = pyCursorX.fetchall()    

cityIdList = []

for x in cityIdListTuple:
    cityIdList.append(x[0])


for i in xrange(num_jobs):
    tasks.put(Task(cityIdList[i - 1]))

for i in xrange(num_consumers):
    tasks.put(None)

while num_jobs:
    result = results.get()
    print result
    num_jobs -= 1

现在我有另一个问题，我已经贴在这里了：

Create DB connection and maintain on multiple processes (multiprocessing)

希望我们能摆脱一些开销，让这个孩子更快。在

相关问题更多 >

编程相关推荐

热门问题

热门文章

使用Python（PostGIS/PostgreSQL）进行令人尴尬的并行数据库更新

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >