寻找一个功能,帮助我避免重复到文本文件

2024-05-18 06:53:52 发布

您现在位置:Python中文网/ 问答频道 /正文

我面临一个小问题,但还没有成功解决这个问题

我有一个文本文件,其中有几个单词是重复的,但我不想输入重复的单词

这是文本文件数据

<inputting a country>
ENU D:\ART-Project\build-python\testing\audio_category\1_2021_01_28_14_46_57_inputting a country.wav    navigation_destination_country

<talk to me about navigation>
ENU D:\ART-Project\build-python\testing\audio_category\2_2021_01_28_14_46_57_talk to me about navigation.wav    system_navigation_sdsmenu

<enter POI please>
ENU D:\ART-Project\build-python\testing\audio_category\3_2021_01_28_14_46_57_enter POI please.wav   navigation_destination_poi

<bring me to a charging station please>
ENU D:\ART-Project\build-python\testing\audio_category\4_2021_01_28_14_46_57_bring me to a charging station please.wav  navigation_destination_poi

<Search nearest charging station at destination>
ENU D:\ART-Project\build-python\testing\audio_category\5_2021_01_28_14_46_57_Search nearest charging station at destination.wav navigation_destination_poi

<Search charging station along the route>
ENU D:\ART-Project\build-python\testing\audio_category\6_2021_01_28_14_46_57_Search charging station along the route.wav    navigation_destination_poi

<Search charging station>
ENU D:\ART-Project\build-python\testing\audio_category\7_2021_01_28_14_46_57_Search charging station.wav    navigation_destination_poi

<please show me my last destinations>
ENU D:\ART-Project\build-python\testing\audio_category\8_2021_01_28_14_46_57_please show me my last destinations.wav    navigation_last_destinations

<please turn on the navigation voice guidance>
ENU D:\ART-Project\build-python\testing\audio_category\9_2021_01_28_14_46_57_please turn on the navigation voice guidance.wav   system_navigation_sdsmenu

<United Kingdom>
ENU D:\ART-Project\build-python\testing\audio_category\10_2021_01_28_14_46_57_United Kingdom.wav    navigation_destination_country

<charging station>
ENU D:\ART-Project\build-python\testing\audio_category\11_2021_01_28_14_46_57_charging station.wav  navigation_destination_poi_slot_only

<line 5>
ENU D:\ART-Project\build-python\testing\audio_category\12_2021_01_28_14_46_57_line 5.wav    system_line_number

<line 4>
ENU D:\ART-Project\build-python\testing\audio_category\13_2021_01_28_14_46_57_line 4.wav    system_line_number

<line 2>
ENU D:\ART-Project\build-python\testing\audio_category\14_2021_01_28_14_46_57_line 2.wav    system_line_number

<line 2>
ENU D:\ART-Project\build-python\testing\audio_category\15_2021_01_28_14_46_57_line 2.wav    system_line_number

<london Court road Tottenham 9>
ENU D:\ART-Project\build-python\testing\audio_category\16_2021_01_28_14_46_57_london Court road Tottenham 9.wav navigation_destination_address

<line 2>
ENU D:\ART-Project\build-python\testing\audio_category\17_2021_01_28_14_46_57_line 2.wav    system_line_number

<line 1>
ENU D:\ART-Project\build-python\testing\audio_category\18_2021_01_28_14_46_57_line 1.wav    system_line_number

在上述示例中<;第2行>;是重复的单词,但我想避免<;第2行>;在“一些句子…”的下面

输出看起来像

<inputting a country>
ENU D:\ART-Project\build-python\testing\audio_category\1_2021_01_28_14_46_57_inputting a country.wav    navigation_destination_country

<talk to me about navigation>
ENU D:\ART-Project\build-python\testing\audio_category\2_2021_01_28_14_46_57_talk to me about navigation.wav    system_navigation_sdsmenu

<enter POI please>
ENU D:\ART-Project\build-python\testing\audio_category\3_2021_01_28_14_46_57_enter POI please.wav   navigation_destination_poi

<bring me to a charging station please>
ENU D:\ART-Project\build-python\testing\audio_category\4_2021_01_28_14_46_57_bring me to a charging station please.wav  navigation_destination_poi

<Search nearest charging station at destination>
ENU D:\ART-Project\build-python\testing\audio_category\5_2021_01_28_14_46_57_Search nearest charging station at destination.wav navigation_destination_poi

<Search charging station along the route>
ENU D:\ART-Project\build-python\testing\audio_category\6_2021_01_28_14_46_57_Search charging station along the route.wav    navigation_destination_poi

<Search charging station>
ENU D:\ART-Project\build-python\testing\audio_category\7_2021_01_28_14_46_57_Search charging station.wav    navigation_destination_poi

<please show me my last destinations>
ENU D:\ART-Project\build-python\testing\audio_category\8_2021_01_28_14_46_57_please show me my last destinations.wav    navigation_last_destinations

<please turn on the navigation voice guidance>
ENU D:\ART-Project\build-python\testing\audio_category\9_2021_01_28_14_46_57_please turn on the navigation voice guidance.wav   system_navigation_sdsmenu

<United Kingdom>
ENU D:\ART-Project\build-python\testing\audio_category\10_2021_01_28_14_46_57_United Kingdom.wav    navigation_destination_country

<charging station>
ENU D:\ART-Project\build-python\testing\audio_category\11_2021_01_28_14_46_57_charging station.wav  navigation_destination_poi_slot_only

<line 5>
ENU D:\ART-Project\build-python\testing\audio_category\12_2021_01_28_14_46_57_line 5.wav    system_line_number

<line 4>
ENU D:\ART-Project\build-python\testing\audio_category\13_2021_01_28_14_46_57_line 4.wav    system_line_number

<line 2>
ENU D:\ART-Project\build-python\testing\audio_category\15_2021_01_28_14_46_57_line 2.wav    system_line_number

<london Court road Tottenham 9>
ENU D:\ART-Project\build-python\testing\audio_category\16_2021_01_28_14_46_57_london Court road Tottenham 9.wav navigation_destination_address

<line 1>
ENU D:\ART-Project\build-python\testing\audio_category\18_2021_01_28_14_46_57_line 1.wav    system_line_number

这是平均值<;第2行>;只出现一次


Tags: buildprojectlinetestingdestinationsystemaudioplease
3条回答

让我们添加一个通用提取器,它将帮助您从文件中提取重复项,通常使用python正则表达式

代码语法

import re

def extractor(path):
    list = []
    with open(path) as file:
        lis = file.readlines()
        for index, line in enumerate(lis):
            lin = re.search(r"<\w+\s\d>", line.strip('\n'))
            try:
                if lin is None:
                    list.append(line.strip('\n'))
                else:
                    if lin.group(0) not in list:
                        list.append(lin.group(0))
                        list.append(lis[index+1].strip('\n'))
                        lis.pop(1) #prevent to append the same line again of the tag.
                    else:
                        lis.pop(0) #prevent to append the directory line of the duplicate tag.
                        
            except IndexError:
                break
        return list


##   - main Execution   - ##
for line in extractor('read_text_extraction3.txt'):
    print(line)

输出

<inputting a country>
ENU D:\ART-Project\build-python\testing\audio_category\1_2021_01_28_14_46_57_inputting a country.wav    navigation_destination_country

<talk to me about navigation>
ENU D:\ART-Project\build-python\testing\audio_category\2_2021_01_28_14_46_57_talk to me about navigation.wav    system_navigation_sdsmenu

<enter POI please>
ENU D:\ART-Project\build-python\testing\audio_category\3_2021_01_28_14_46_57_enter POI please.wav   navigation_destination_poi

<bring me to a charging station please>
ENU D:\ART-Project\build-python\testing\audio_category\4_2021_01_28_14_46_57_bring me to a charging station please.wav  navigation_destination_poi

<Search nearest charging station at destination>
ENU D:\ART-Project\build-python\testing\audio_category\5_2021_01_28_14_46_57_Search nearest charging station at destination.wav navigation_destination_poi

<Search charging station along the route>
ENU D:\ART-Project\build-python\testing\audio_category\6_2021_01_28_14_46_57_Search charging station along the route.wav    navigation_destination_poi

<Search charging station>
ENU D:\ART-Project\build-python\testing\audio_category\7_2021_01_28_14_46_57_Search charging station.wav    navigation_destination_poi

<please show me my last destinations>
ENU D:\ART-Project\build-python\testing\audio_category\8_2021_01_28_14_46_57_please show me my last destinations.wav    navigation_last_destinations

<please turn on the navigation voice guidance>
ENU D:\ART-Project\build-python\testing\audio_category\9_2021_01_28_14_46_57_please turn on the navigation voice guidance.wav   system_navigation_sdsmenu

<United Kingdom>
ENU D:\ART-Project\build-python\testing\audio_category\10_2021_01_28_14_46_57_United Kingdom.wav    navigation_destination_country

<charging station>
ENU D:\ART-Project\build-python\testing\audio_category\11_2021_01_28_14_46_57_charging station.wav  navigation_destination_poi_slot_only

<line 5>
ENU D:\ART-Project\build-python\testing\audio_category\12_2021_01_28_14_46_57_line 5.wav    system_line_number

<line 4>
ENU D:\ART-Project\build-python\testing\audio_category\13_2021_01_28_14_46_57_line 4.wav    system_line_number

<line 2>
ENU D:\ART-Project\build-python\testing\audio_category\14_2021_01_28_14_46_57_line 2.wav    system_line_number


<london Court road Tottenham 9>
ENU D:\ART-Project\build-python\testing\audio_category\16_2021_01_28_14_46_57_london Court road Tottenham 9.wav navigation_destination_address


<line 1>
ENU D:\ART-Project\build-python\testing\audio_category\18_2021_01_28_14_46_57_line 1.wav    system_line_number


[Program finished]

您可以一次读取两行文件,并将值存储在列表中。然后从列表中删除重复项,最后将新列表写入文件

f = open('file.txt','r') # replace file.txt with your text file name
line_list = []
while True:
  line1 = f.readline()
  line2 = f.readline()
  line_list.append(line1+line2)
  if not line2: break
new_list = list(dict.fromkeys(line_list)) # removes duplicates from line_list
print("".join(new_list)) 
# Here you need to write new_list into another file

输出

<line 4>
some sentences... 
<line 2>
some sentences...
<line 1>
some sentences...
<line 3>
some sentences...
<line 5>
some sentences...
<line 7>
some sentences...

您将这些行读入一个列表,然后按第2步迭代该列表,如果该列表中还没有值,则向唯一列表添加值

with open('scores.txt') as input:
    lines = [line.strip() for line in input]
unique = []
for i in range(0,len(lines),3):
    if lines[i] not in unique:
        unique.append(lines[i])
        unique.append(lines[i+1])

print(unique)

相关问题 更多 >