Python多处理队列thows windows错误：系统找不到指定的文件

def work(cmd, q): df_local = function_which_returns_dataframe(cmd) if not df_local.empty: q.put(df_local) else: print("Empty:", cmd) def listener(file, q): while True: line = q.get() if isinstance(line, pd.DataFrame): line.to_csv(file, mode='a', header=False) elif line == 'kill': return def main(args): cpus = multiprocessing.cpu_count() patient_dirs = [os.path.join(args.input_dir, x) for x in os.listdir(args.input_dir)] threads = [] file = os.path.join(args.output_dir, 'concepts_all_%s.csv' % identifier) #setup manager with write access to file manager = multiprocessing.Manager() q = manager.Queue() header_df = pd.DataFrame(columns=['patient_id', 'lookup_id', 'begin_inx', 'end_inx', 'mention_type', 'codingScheme', 'code', 'preferredText', 'word_phrase']) header_df.loc[len(header_df)] = ['patient_id', 'lookup_id', 'begin_inx', 'end_inx', 'mention_type', 'codingScheme', 'code', 'preferredText', 'word_phrase'] q.put(header_df) #start write process writer_process = multiprocessing.Process(target=listener, args=(file, q)) writer_process.start() # now spawn processes from each patient dir* while threads or patient_dirs: if (len(threads) < cpus) and patient_dirs: p = multiprocessing.Process(target=work, args=[patient_dirs.pop(), q]) p.start() threads.append(p) else: for thread in threads: if not thread.is_alive(): threads.remove(thread) #finish write q.put('kill') writer_process.join() if __name__ == '__main__': parser = argparse.ArgumentParser() parser.add_argument('input_dir', type=str) parser.add_argument('output_dir', type=str) args = parser.parse_args() main(args)

1条回答

网友

1楼 · 发布于 2024-09-29 02:24:31

使用^{}代替它，它的语法与pandas相似，只是它的parallel（基本上它有很多并行pandas datafame）和lazy（这有助于避免ram限制）。在

如果要提取的文件是csv类型，请执行以下操作：

from dask.distributed import Client  
import dask.dataframe as dd

client = Client() #  ensures multiprocessing

ddf = dd.read_csv(r'sub\**\*.csv')  # reads all the csv files inside of the subdirectories of the subdirectories

如果文件的类型不同，pandas可以读取它们，dask很可能也可以read它们。在

对于xml它看起来像这样：

^{pr2}$

我建议您阅读this，并为meta提供一个from_delayed

如果要将ddf转换为pandas.DataFrame，只需执行以下操作：

df = ddf.compute()

相关问题更多 >

编程相关推荐

热门问题

热门文章