将三个变量从textfile匹配到csv,并将变量写入匹配行上的csv

2024-10-03 21:34:25 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在寻找一些帮助,在我的文本文件中循环遍历每个组,并将三个变量与我的csv匹配,如果匹配成功,它将向csv文件中写入几个新变量:

在textfile中,行1与csv元素1匹配 在textfile中,第2行与csv元素0匹配

每个学生将被分成三部分: 3Tommy144512/23332第1部分和第3部分将分别写入元素12和13。第2部分将用于第三个匹配,与csv元素8匹配这是为了找出要写入的行。你知道吗

“数据”将写入元素14(第15列) “misc3”将写入要素15(第16列) “bla3”将写入元素16(第17列)

注释文本文件:

     Textfile Item 1 (Will loop/cycle/run 4 times, because there are 4 students)
           |
           v

MData (N/A)                <-- Match Line 1 (matches to csv element 1)
DMATCH1                    <-- Match Line 2 (matches to csv element 0)
3 Tommy 144512/23332       <-- Match Line 3 (matches to csv element 8) (Loop 1)                 
1 Jim 90000/222311     <-- Match Line 3 (matches to csv element 8) (Loop 2)
1 Elz M 90000/222311       <-- Match Line 3 (matches to csv element 8) (Loop 3)
1 Ben 90000/222311         <-- Match Line 3 (matches to csv element 8) (Loop 4)
Data $50.90                <-- If "Data" Exists then filewrite to csv element 14 (Loop 1)   
misc2 $10.40               <-- If "misc2" Exists then filewrite to csv element 15 (Loop 1)
bla3 $20.20               <-- If "bla3" Exists then filewrite to csv element 16 (Loop 1)


     Textfile Item 2 (Will loop/cycle/run 2 times, because there are 3 students)
           |
           v

MData (B/B)                <-- Match Line 1 (matches to csv element 1)
DMATCH2                    <-- Match Line 2 (matches to csv element 0)
4 James Smith 2333/114441  <-- Match Line 3 (matches to csv element 8) (Loop 1)
4 Mike 90000/222311        <-- Match Line 3 (matches to csv element 8) (Loop 2)
4 Jessica Long 2333/114441 <-- Match Line 3 (matches to csv element 8) (Loop 3)
Data $50.90                <-- If "Data" Exists then filewrite to csv element 14 (Loop 1)   
bla3 $5.44                <-- If "bla3" Exists then filewrite to csv element 16 (Loop 1)


     Textfile Item 3 (Will loop/cycle/run 2 times, because there are 2 students)
           |
           v

Mdata                      <-- Match Line 1 (matches to csv element 1)
DMATCH3                    <-- Match Line 2 (matches to csv element 0)
5 Joe Reane 0/0            <-- Match Line 3 (matches to csv element 8) (Loop 1)
5 Peter Jones 90000/222311 <-- Match Line 3 (matches to csv element 8) (Loop 2)
misc2 $420.00              <-- If "misc2" Exists then filewrite to csv element 15 (Loop 1)
bla3 $210.00               <-- If "bla3" Exists then filewrite to csv element 16 (Loop 1)

未注释的实文本文件:

MData (N/A)
DMATCH1
3 Tommy 144512/23332
1 Jim 90000/222311
1 Elz M 90000/222311
1 Ben 90000/222311
Data $50.90
misc2 $10.40
bla3 $20.20


MData (B/B) 
DMATCH2
4 James Smith 2333/114441
4 Mike 90000/222311
4 Jessica Long 2333/114441
Data $50.90
bla3 $5.44


Mdata
DMATCH3
5 Joe Reane 0/0
5 Peter Jones 90000/222311
Data $10.91
misc2 $420.00
bla3 $210.00

CSV之前:

MATCH1,MATCH2,TITLE,TITLE,TITLE,TITLE,TITLE,TITLE,MATCH3,DATA,TITLE,TITLE
DMATCH1,MData (N/A),data,data,data,data,data,data,Tommy,55,data,data
DMATCH1,MData (N/A),data,data,data,data,data,data,Ben,54,data,data
DMATCH1,MData (N/A),data,data,data,data,data,data,Jim,52,data,data
DMATCH1,MData (N/A),data,data,data,data,data,data,Elz M,22,data,data
DMATCH2,MData (B/B),data,data,data,data,data,data,James Smith,15,data,data
DMATCH2,MData (B/B),data,data,data,data,data,data,Jessica Long,224,data,data
DMATCH2,MData (B/B),data,data,data,data,data,data,Mike,62,data,data
DMATCH3,Mdata,data,data,data,data,data,data,Joe Reane,66,data,data
DMATCH3,Mdata,data,data,data,data,data,data,Peter Jones,256,data,data
DMATCH3,Mdata,data,data,data,data,data,data,Lesley Lope,5226,data,data

CSV之后:

MATCH1,MATCH2,TITLE,TITLE,TITLE,TITLE,TITLE,TITLE,MATCH3,DATA,TITLE,TITLE,,,,,
DMATCH1,MData (N/A),data,data,data,data,data,data,Tommy,55,data,data,3,144512/23332,Data $50.90,misc2 $10.40,bla3 $20.20
DMATCH1,MData (N/A),data,data,data,data,data,data,Ben,54,data,data,1,90000/222311,,,
DMATCH1,MData (N/A),data,data,data,data,data,data,Jim,52,data,data,1,90000/222311,,,
DMATCH1,MData (N/A),data,data,data,data,data,data,Elz M,22,data,data,1,90000/222311,,,
DMATCH2,MData (B/B),data,data,data,data,data,data,James Smith,15,data,data,4,2333/114441,Data $50.90,,bla3 $5.44
DMATCH2,MData (B/B),data,data,data,data,data,data,Jessica Long,224,data,data,4,2333/114441,,,
DMATCH2,MData (B/B),data,data,data,data,data,data,Mike,62,data,data,4,90000/222311,,,
DMATCH3,Mdata,data,data,data,data,data,data,Joe Reane,66,data,data,5,0/0,,misc2 $420.00,bla3 $210.00
DMATCH3,Mdata,data,data,data,data,data,data,Peter Jones,256,data,data,5,90000/222311,,,
DMATCH3,Mdata,data,data,data,data,data,data,Lesley Lope,5226,data,data,,,,,

有人知道怎么做吗?你知道吗

任何帮助都将不胜感激!你知道吗


Tags: csvtoloopdatatitlematchlineelement
1条回答
网友
1楼 · 发布于 2024-10-03 21:34:25

这个问题实际上有几个子问题。首先,我们必须阅读有趣的格式化文本文件:

读取匹配器文本文件

# each block in the text file will be one element of this list
matchers = [[]]
i = 0 
with open('test.txt') as infile:
    for line in infile:
        line = line.strip()
        # Blocks are seperated by blank lines
        if len(line) == 0:
            i += 1
            matchers.append([])
            # assume there are always two blank lines between items 
            # and just skip to the lext line
            infile.next()
            continue
        matchers[i].append(line)

此时我们有一个列表列表,每个块一个元素,每行一个元素。然后我们必须转换成更像桌子的东西

转换为类似表格的格式

import re

# This regular expression matches the variable number of students in each block
studentlike = re.compile('(\d+) (.+) (\d+/\d+)')
# We will build a table containing a list of elements for each student
table = []
for matcher in matchers:
    # We use an iterator over the block lines to make indexing simpler
    it = iter(matcher)
    # The first two elements are match values
    m1, m2 = it.next(), it.next()
    # then there are a number of students
    students = []
    for possiblestudent in it:
        m = studentlike.match(possiblestudent)
        if m:
            students.append(list(m.groups()))
        else:
            break
    # After the students come the data elements, which we read into a dictionary
    # We also add in the last possible student line as that didn't match the student re
    dataitems = dict(item.split() for item in [possiblestudent] + list(it))
    datanames = dataitems.keys()
    # Finally we construct the table
    for student in students:
        # We use the dictionary .get() method to return blanks for the missing fields
        table.append([m1, m2] + student + [dataitems.get(d, '') for d in datanames])
print table

加入熊猫

现在,我们可以合并数据了。我在这里使用了熊猫,因为它非常适合这种加入:

import pandas
csvdata = pandas.read_csv('test.csv')
textdata = pandas.DataFrame(table, columns=['MATCH2', 'MATCH1', 'TITLE01', 'MATCH3', 'TITLE02', 'Data', 'misc2', 'bla3'])
mergeddata = pandas.merge(csvdata, textdata, how='left', on=['MATCH1', 'MATCH2', 'MATCH3'], sort=False)
mergeddata.to_csv('output.csv', index=False)

相关问题 更多 >