我想将下面的文本文件转换成一个以列为标题，以行为对应数据的文本文件问题的回答

我想将下面的文本文件转换成一个以列为标题，以行为对应数据的文本文件

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

下载后，我所有的数据都在文本文件（.txt）中。我有多个这样的文件（至少70000个），我想我会先用cmd中的copy.txt将它们合并成一个mastertext文件。然后，我想分析mastertext文件，以便理想的输出是.csv文件 为了便于分析，需要将数据转换为类似表的结构。每个文件（在我拥有的70000.txt文件中）都以“[”开头，以“]”结尾，每一行新数据都以“{”开头，以“}”结尾，我有多行格式相同的数据。我希望转换后的数据具有诸如“活动ID”之类的标题，以便我可以使用数据透视分析列（就像在excel中所做的那样）。我是编程新手，不确定Python中的哪些库可以帮助我实现我的目标 这是我所拥有的数据的一个例子。（注意——一个文件中只有两行，我有大约70000个这样的文件） <pre><code>{ "campaignId": "all", "startDate": "2020-06-11", "endDate": "2020-06-11", "device": "Computers", "network": "Display Network", "channel": "all", "accLevelQS": -1.0, "impressions": 389, "clicks": 3, "ctr": 0.0, "avgCPC": 0.0, "convValuePerClick": 0.0, "convValuePerCost": 0.0, "costConv1PerClick": 0.0, "convRate1PerClick": 0.0, "cost": 0.142884, "conv1PerClick": 0.0, "totalConvValue": 0.00, "allConversions": 0.0, "allConversionValue": 0.00, "avgPosition": 0.0, "intr": 3, "searchImprShare": 0.0, "contImprShare": 5.0, "impressionShare": 5.0 }, { "campaignId": "all", "startDate": "2020-06-11", "endDate": "2020-06-11", "device": "Mobile devices with full browsers", "network": "Display Network", "channel": "all", "accLevelQS": -1.0, "impressions": 6101, "clicks": 90, "ctr": 0.0, "avgCPC": 0.0, "convValuePerClick": 0.0, "convValuePerCost": 0.0, "costConv1PerClick": 0.0, "convRate1PerClick": 0.0, "cost": 4.342799, "conv1PerClick": 0.0, "totalConvValue": 0.00, "allConversions": 0.0, "allConversionValue": 0.00, "avgPosition": 0.0, "intr": 90, "searchImprShare": 0.0, "contImprShare": 5.0077566465021217, "impressionShare": 5.0077566465021217 } </code></pre>

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

<blockquote> I have made some assumptions as the data you have provided seems like a list of JSON data. This is because there is a "," in between your entries. </blockquote> 这是我的<code>input.json</code>文件。请注意，我在顶部和底部添加了[and]，因为这将为您的数据提供适当的JSON结构 <pre><code>[ { "campaignId": "all", "startDate": "2020-06-11", "endDate": "2020-06-11", "device": "Computers", "network": "Display Network", "channel": "all", "accLevelQS": -1.0, "impressions": 389, "clicks": 3, "ctr": 0.0, "avgCPC": 0.0, "convValuePerClick": 0.0, "convValuePerCost": 0.0, "costConv1PerClick": 0.0, "convRate1PerClick": 0.0, "cost": 0.142884, "conv1PerClick": 0.0, "totalConvValue": 0.00, "allConversions": 0.0, "allConversionValue": 0.00, "avgPosition": 0.0, "intr": 3, "searchImprShare": 0.0, "contImprShare": 5.0, "impressionShare": 5.0 }, { "campaignId": "all", "startDate": "2020-06-11", "endDate": "2020-06-11", "device": "Mobile devices with full browsers", "network": "Display Network", "channel": "all", "accLevelQS": -1.0, "impressions": 6101, "clicks": 90, "ctr": 0.0, "avgCPC": 0.0, "convValuePerClick": 0.0, "convValuePerCost": 0.0, "costConv1PerClick": 0.0, "convRate1PerClick": 0.0, "cost": 4.342799, "conv1PerClick": 0.0, "totalConvValue": 0.00, "allConversions": 0.0, "allConversionValue": 0.00, "avgPosition": 0.0, "intr": 90, "searchImprShare": 0.0, "contImprShare": 5.0077566465021217, "impressionShare": 5.0077566465021217 } ] </code></pre> 下面的代码使用<code>pandas</code>库将数据处理为数据帧，然后将其写入CSV文件 <pre><code>import json # Available by default, no install required. import glob # Available by default, no install required. import pandas as pd # Requires installation via pip. # Initialise a list to store our results. combined_json = [] # Set a glob pattern to *.txt since your files are txt files. # You can also write the full path e.g. /home/user/textfiles/*.txt text_files = glob.glob("*.txt") # Loop through all the text files and combine them into a single JSON list. # As for 70,000 files, I am unsure how the performance will turn out. for json_text in text_files: with open(json_text, 'r') as text_file: combined_json.extend(json.load(text_file)) # Write all the files to a JSON file. For your future usage. # You also can read directly from the combed_json variable. with open('input.json', 'w') as json_file: json.dump(combined_json, json_file, indent=2) # Convert the JSON data into a dataframe, using the combined_json variable. json_df = pd.json_normalize(combined_json) # Write the data from the dataframe to the CSV file. # Mode "w" will always overwrite the CSV file, use mode "a" to append text instead of overwriting. json_df.to_csv("dataframe.csv", mode="w") </code></pre> 有关<code>pd.json_normalize</code>如何工作的更多信息，您可以参考<a href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.json_normalize.html" rel="nofollow noreferrer">here</a> 要开始使用<code>pandas</code>库，可以参考<a href="https://pandas.pydata.org/pandas-docs/stable/getting_started/index.html" rel="nofollow noreferrer">here</a> 如果希望从字符串而不是文件加载JSON，可以引用<a href="https://www.w3schools.com/python/python_json.asp" rel="nofollow noreferrer">here</a> 要了解有关<code>glob</code>的更多信息，您可以参考<a href="https://www.w3schools.com/php/func_filesystem_glob.asp" rel="nofollow noreferrer">here</a>

我想将下面的文本文件转换成一个以列为标题，以行为对应数据的文本文件

1 个回答

相关Python问题