如何优化以下代码?(Python是否可以这样做,或者我应该使用其他工具?)在
这是迄今为止我问过的最疯狂的问题,但我将尝试一下,希望能得到一些关于我是否利用正确的工具和方法有效处理大量数据的建议。我不一定在寻找优化我的代码的帮助,除非我完全忽略了一些东西,但本质上我只是想知道我是否应该一起使用一个不同的框架,而不是Python。我对Python还很陌生,不完全确定是否可以更有效地处理大量数据并将其存储到DB中。在
以下实现读取目录中的文本文件:
代码:
triggerZipFiles = glob.glob('*.zip')
for triggerFiles in triggerZipFiles:
with zipfile.ZipFile(triggerFiles, 'r') as myzip:
for logfile in myzip.namelist():
datacc = []
zipcc = []
csvout = '{}_US.csv'.format(logfile[:-4])
f = myzip.open(logfile)
contents = f.readlines()
for line in contents:
try:
parsed = json.loads(line[:-2])
if "CC" in parsed['data']['weatherType'] and "US" in parsed['zipcodes']:
datacc.append(parsed['data'])
zipcc.append(parsed['zipcodes'])
except:
pass
if len(datacc) > 0:
df = pd.concat([pd.DataFrame(zipcc), pd.DataFrame(datacc)], axis=1)
df = pd.concat((pd.Series((v, row['key'], row['key'], row['key'],
row['key'], row['key'], row['key'], row['key'], row['key'],
row['key'], row['key'], row['key'], row['key'], row['key'],
row['key'], row['key'], row['key'], row['key'], row['key'],
row['key'], row['key'], row['key'], row['key'], row['key'],
row['key'], row['key'], row['key'], row['key'],
row['key'], row['key'], row['key'], row['key'],
row['key'], row['key'], row['key'], row['key'],
row['key'], row['key'], row['key'], row['key'], row['key'],
row['key'], row['key'], row['key'], row['key'],
row['key'], row['key'], row['key'], row['key'], row['key'],
row['key'], row['key'], row['key'], row['key'], row['key'], row['key'],
row['key'], row['key'], row['key'], row['key'], row['key'], row['key'],
row['key'], row['key'], row['key'], row['key'], row['key'], row['key'],
row['key'], row['key'], row['key'], row['key']), df.columns) for _,
row in df.iterrows() for v in row['US']), axis=1).T
df.to_csv(csvout, header=None, index=False)
else:
pass
print datetime.now().strftime('%Y/%m/%d %H:%M:%S') + ": Finished: {}".format(logfile)
首先,对于json,行并不是一个特别有用的度量!
第二,你的想法是正确的:你肯定想要基于块(分别读取/清理/转储每个部分)。在
我建议使用pandas的
read_json
函数,它在创建数据帧时效率更高(它不创建临时python dict),请参见reading in json section of the docs。*不清楚实际的格式是什么,但通常不需要太多的时间就可以将它们转换为有效的json。
Python的额外提示:如果您的缩进级别超过了几个级别,请考虑将其拆分成更多的函数。(这里最明显的选择是使用},但使用描述性名称…)
f1(logfile)
和{相关问题 更多 >
编程相关推荐