对包含多个表的文本文件执行计算

2024-09-28 19:02:00 发布

您现在位置:Python中文网/ 问答频道 /正文

在Python代码中,我将阅读以下文本文件:

text = """
Well name:   Well-001
MD in    MD out  SubZones
7556.48523572    7558.27620486   0.000000
7558.27620486    7560.06788752   0.000000
7560.06788752    7561.86037056   0.000000
7561.86037056    7562.69600233   1.000000
7562.69600233    7563.53180593   1.000000
7563.53180593    7564.91478892   2.000000
7564.91478892    7566.29755224   2.000000
7566.29755224    7567.67931936   2.000000
7567.67931936    7568.33927889   3.000000
7568.33927889    7568.99876550   3.000000
7568.99876550    7570.17596547   3.000000
7570.17596547    7571.35355479   2.000000
7571.35355479    7572.53053558   2.000000
7572.53053558    7572.87713383   2.000000
7572.87713383    7573.70951451   3.000000
7573.70951451    7574.35566647   3.000000
7574.35566647    7575.00189268   3.000000
7575.00189268    7576.84445358   3.000000
7576.84445358    7578.68636542   4.000000
7578.68636542    7580.52806605   5.000000
7580.52806605    7582.36868123   6.000000
.
.
.
Well name:   Well-100
MD in    MD out  SubZones
7559.06191603    7560.68722084   0.000000
7560.68722084    7562.31275257   0.000000
7562.31275257    7563.93823559   0.000000
7563.93823559    7564.50391095   1.000000
7564.50391095    7565.06952612   1.000000
7565.06952612    7566.34649406   2.000000
7566.34649406    7567.62333168   2.000000
7567.62333168    7568.90017951   2.000000
7568.90017951    7569.48662623   3.000000
7569.48662623    7570.07350278   3.000000
7570.07350278    7571.72238328   2.000000
7571.72238328    7573.37101460   2.000000
7573.37101460    7575.01990457   3.000000
7575.01990457    7576.66870007   3.000000
7576.66870007    7577.37429322   4.000000
7577.37429322    7578.08009961   5.000000
7578.08009961    7579.61354471   6.000000
7579.61354471    7581.14822056   6.000000
7581.14822056    7582.68365796   6.000000
7582.68365796    7584.21865885   6.000000
.
.
.
"""

我希望能够计算出每口井的进尺=MD out-MD in,仅当子带等于3时 我希望输出是一个表,第一列是well-001,第二列是SubZone=3的镜头 请注意,井可能多次进入分区=3,总进尺应为分区=3中所有井进尺的总和 Python编写此代码的最佳方式是什么


Tags: 代码textnameinoutmd镜头分区
3条回答

我设法得到了以下答案:

import os
import pandas as pd
import re
import io
import matplotlib.pyplot as plt
import numpy as np

plt.figure(figsize=(10,10))

df_dict = dict()
start = None
values_list = []
footage_list = []
total_footage = 0

comp_files = os.listdir('/content/sample_data/Completion/')

comp_files_list = []

for comp_file in comp_files:
  comp_files_list.append(comp_file)

if '.ipynb_checkpoints' in comp_files_list:
  comp_files_list.remove('.ipynb_checkpoints')

with open('/content/sample_data/Completion/{}'.format(comp_file),'r') as comp_well_file:
  for i, val in enumerate(comp_well_file.read().split('\n')):
    if len(val) == 0:
        continue
    if val.startswith('Well'):
        df_name = val.split(" ")[-1]
        start = i
        continue
    if val.startswith('MD'):
        values_list = []
        continue
    values = [x for x in val.split(" ") if len(x) != 0]
    values_list.append(values)
    if len(values_list) > 1:
        df_dict[df_name] = pd.DataFrame(values_list, columns=['MD in', 'MD out', 'SubZones'])
  columns=['MD in', 'MD out', 'SubZones']
  for key in df_dict.keys():
    for column in columns:
      for row in df_dict[key][column].index:
        if df_dict[key][column][row] == '3.000000':
          footage = float(df_dict[key]['MD out'][row]) - float(df_dict[key]['MD in'][row])
          total_footage =  total_footage + footage
    footage_list.append('{}, {} \n'.format(key,total_footage))
    total_footage = 0
  for i, item in enumerate(footage_list):
    print(item)
  #print(footage_list)
  #print(footage_list[0])
  #df_total_footage = pd.DataFrame(footage_list, columns = ['well_name', 'Total_Footage_in_subzone_of_interest'])
  #print(df_total_footage)

输出结果如下所示:

001井,933.7765628900015

井-002,2058.36714124

。 . .

我不知道如何为每个数据帧创建变量,但也可以创建一个字典,其中键是df名称,值是数据帧。尽管这段代码对您给出的示例非常特殊,但它仍然可以工作

df_dict = dict()
start = None
values_list = []
for i, val in enumerate(text.split('\n')):
#     print(val)
    if len(val) == 0:
        continue
    if val.startswith('Well'):
#         print(values_list)
        df_name = val.split(" ")[-1]
#         print(df_name)
        start = i
        continue
    if val.startswith('MD'):
        values_list = []
        continue
    values = [x for x in val.split(" ") if len(x) != 0]
    values_list.append(values)
    
    if len(values_list) > 1:
        df_dict[df_name] = pd.DataFrame(values_list, columns=['MD in', 'MD out', 'SubZones'])

df_dict是数据帧的字典

首先拆分text的行并将它们分组到按句点拆分的列表中,然后将列表中的句子按双空格拆分,同时从尾部空格中剥离结果。列一个清单:

data = [[[i.strip() for i in x.split("  ") if i] for x in list(g)] for k, g in groupby(text.splitlines()[1:], lambda x: x != ".") if k]

这将输出:

[[['Well name:', 'Well-001'],
  ['MD in', 'MD out', 'SubZones'],
  ['7556.48523572', '7558.27620486', '1.000000'],
  ['7558.27620486', '7560.06788752', '2.000000']],
 [['Well name:', 'Well-100'],
  ['MD in', 'MD out', 'SubZones'],
  ['7556.48523572', '7558.27620486', '1.000000'],

然后,您可以创建一个dict,井名称作为键,数据帧作为值:

dataframes = {}
for item in data:
  dataframes[item[0][1]] = pd.DataFrame(item[2:], columns=item[1])

您现在可以通过dict中的井名称访问数据帧。dataframes['Well-001'].head()将输出:

|    |   MD in |   MD out |   SubZones |
| -:|    :|    -:|     -:|
|  0 | 7556.49 |  7558.28 |          1 |
|  1 | 7558.28 |  7560.07 |          2 |

相关问题 更多 >