如何将两种类型的数据与csv文件中的模式自动关联?

2024-10-03 13:28:56 发布

您现在位置:Python中文网/ 问答频道 /正文

我生成了这个csv文本文件:

{http://www.omg.org/XMI}id,begin,end,Expressor
joy
13108,15,33,Physical sensations
sadness
13123,252,258,Voice
trust
11647,1564,1570,Looking behaviour
joy
11647,1564,1570,Looking behaviour
11625,1524,1557,Facial expression
trust
joy
11625,1524,1557,Facial expression
joy
11743,1657,1670,Facial expression
joy
13175,1921,1935,Facial expression
anger
11879,2023,2041,Looking behaviour
disgust
11948,2490,2496,Body movements
disgust
11940,2469,2482,Facial expression
trust
12024,2641,2676,Facial expression
joy
12024,2641,2676,Facial expression
12134,2728,2757,Looking behaviour

其中包括从一个带注释的语料库的.xmi文件中获取的情感角色关系的结果。你知道吗

首先,我要计算每种类型的“Expressor”的出现次数(总共8个固定类型),并将结果打印为一个列表,而不使用“None”值,这将导致所有Expressor的结束计数错误(sum()加72):

    0 - None, 72
    1 - Physical sensations, 1
    2 - Voice, 1
    3 - Looking behaviour, 4
    4 - Facial expression, 7
    ...

此外,我希望能够将情感与其表达者联系起来。文件显示一种情感,它下面的表达者与之相连。你知道吗

一种情感可以有一个或多个表达者。例如:

joy
11647,1564,1570,Looking behaviour
11625,1524,1557,Facial expression

一个表达者可能与不止一种情感相关。例如:

trust
joy
11625,1524,1557,Facial expression

所以,模式是:

情感

其表达者

  • 因此,当下一行中出现一种新的情绪时,一种特定的关联就结束了,并且为新的关联重复上面的模式。你知道吗

我的目标是能够将它们关联起来,并计算出情感与表达者之间的每一种关联。例如:

Joy: 'Looking behaviour': 4, 'Physical sensations': 2, 'Facial expression': 7..... #etc.
Fear: 'Looking behaviour': 9, 'Physical sensations': 3, 'Facial expression': 5.....#etc

提供csv文本文件的代码如下:

def emex_count():
    with open('emex_count.txt', 'w') as f:
        cf = csv.DictWriter(f, ['{http://www.omg.org/XMI}id', 'begin', 'end'\
                                , 'Expressor'], extrasaction='ignore')
        cf.writeheader()
        for rel_node in root.findall("emospan:CharacterRelation",ns):
            if rel_node.attrib['Relation']=="Expressor":
                source = rel_node.attrib['Governor']
                target = rel_node.attrib['Dependent']
                for span_node in root.findall("emospan:CharacterEmotion",ns):
                    if span_node.attrib[my_id]==source:
                        print(span_node.attrib['Emotion'])
                        print(span_node.attrib['Emotion'], file=f)
                    if span_node.attrib[my_id]==target:
                        print(span_node.attrib)
                        cf.writerow(span_node.attrib)
    with open('expericount.txt') as f:
        cf = csv.DictReader(f)
        val = collections.Counter(d['Expressor'] for d in cf)
        print(sum(val.values()))
        for n,(ex, number) in enumerate(val.items()):
            print('{} - {}, {}'.format(n, ex, number))

有没有一种方法能够自动处理情绪和它们的表达者之间的关联?你知道吗

我希望我的问题可以理解


Tags: csvidnodecf情感spanprintexpression
1条回答
网友
1楼 · 发布于 2024-10-03 13:28:56

pandas是这类工作的好库。以下是第一部分:

import pandas as pd

# with open('myfile.txt', 'r') as a:
   # a = a.read()


a="""joy
13108,15,33,Physical sensations
sadness
13123,252,258,Voice
trust
11647,1564,1570,Looking behaviour
joy
11647,1564,1570,Looking behaviour
11625,1524,1557,Facial expression
trust
joy
11625,1524,1557,Facial expression
joy
11743,1657,1670,Facial expression
joy
13175,1921,1935,Facial expression
anger
11879,2023,2041,Looking behaviour
disgust
11948,2490,2496,Body movements
disgust
11940,2469,2482,Facial expression
trust
12024,2641,2676,Facial expression
joy
12024,2641,2676,Facial expression
12134,2728,2757,Looking behaviour"""

exprsrs = [i for i in a.splitlines() if not i.isalpha()]
df = pd.DataFrame(exprsrs, columns=['expressor'])
df['expressor'] = df.expressor.str.split(',').str[-1]
df.expressor.value_counts()

输出:

Facial expression      7
Looking behaviour      4
Voice                  1
Physical sensations    1
Body movements         1
Name: expressor, dtype: int64

第二部分:

final = []
running = []
for line in a.splitlines():    
    if line.isalpha():
        running.append(line.strip())
    else:
        for r in running:
            final.append(f'{r},{line}')
        running = []

df = pd.DataFrame(final, columns=['whole'])

df['expression'] = df.whole.str.split(',').str[0]
df['expressor'] = df.whole.str.split(',').str[-1]

df.groupby('expression')['expressor'].value_counts()

输出:

expression  expressor          
anger       Looking behaviour      1
disgust     Body movements         1
            Facial expression      1
joy         Facial expression      4
            Looking behaviour      1
            Physical sensations    1
sadness     Voice                  1
trust       Facial expression      2
            Looking behaviour      1
Name: expressor, dtype: int64

如果你想把它当作字典,你可以这样做:

from collections import defaultdict

d = df.groupby('expression')['expressor'].value_counts().to_dict()
new_dict = defaultdict(dict)
for k, v in d.items():
    new_dict[k[0]][k[1]] = v

final_dict = dict(new_dict)

输出:

{'anger': {'Looking behaviour': 1},
 'disgust': {'Body movements': 1, 'Facial expression': 1},
 'joy': {'Facial expression': 4,
         'Looking behaviour': 1,
         'Physical sensations': 1},
 'sadness': {'Voice': 1},
 'trust': {'Facial expression': 2, 'Looking behaviour': 1}}

相关问题 更多 >