遍历文本文件中的每一行以提取唯一的lis

2024-10-05 19:35:29 发布

您现在位置:Python中文网/ 问答频道 /正文

我一直试图从一个文本文件中提取一个唯一的数据名列表,但我似乎做不到,因为我对regex一无所知。在

如果我们有一个例子:

[Friday 17/10/2014 @ 07:30:55] The user user01 | account01 | namename1 has been granted access.
[Friday 17/10/2014 @ 07:30:57] The user user two | account_two | name2 has been granted access.
[Friday 17/10/2014 @ 07:30:59] The user user_three | account_ | name3 here3 has been granted access.
[Friday 17/10/2014 @ 07:31:41] The user user01 | account01 | namename1 has been granted access.

我希望它基本上找到两个管道之间的帐户信息|,并去掉管道和空白,这样它就可以在文本文件中输出一个包含以下内容的列表,在它遍历并删除任何重复项之后,它将严格地说是一个纯列表

^{pr2}$

它必须做的一项检查是确保它只在行中包含短语has been granted access.时获取帐户信息,因为数据可能看起来像:

[Friday 17/10/2014 @ 07:30:55] The user user01 | account01 | namename1 has been granted access.
[Friday 17/10/2014 @ 07:30:57] The user user two | account_two | name2 has been granted access.
[Friday 17/10/2014 @ 07:30:59] Details Granted | user two | access number 01239
[Friday 17/10/2014 @ 07:30:59] The user user_three | account_ | name3 here3 has been granted access.
[Friday 17/10/2014 @ 07:31:41] The user user01 | account01 | namename1 has been granted access.

我不希望它从该示例的第3行获取帐户信息user two。在

有谁能帮我做一些代码的例子吗?我们将不胜感激。在


Tags: the信息列表access帐户accounthasbeen
3条回答
def get_granted_accounts(filename):
    with open(filename) as f:
      return set(
               s.split('|')[1].strip() 
               for s in f.readlines() 
               if "has been granted access" in s) 

这段代码需要注意:

  • 管道不能出现在第一个或第二个字段中(引用、转义)
  • “已被授予访问权限”应仅出现在预期字段中(例如,不作为帐户名)
>>> granted_accounts = [line.split('|')[1].strip() for line in open('file.txt') if 'has been granted access' in line]
>>> print(granted_accounts)
['account01', 'account_two', 'account_', 'account01']

如果您想在命令行上执行它,只需将这两行与shebang放在一个.py文件中,如下所示(搜索.py)公司名称:

^{pr2}$

然后像这样跑:

$ python search.py

或者:

$ chmod +x search.py
$ ./search.py

如果您有很多帐户,您可能希望每个帐户只打印一次,并在一行单独打印:

>>> granted_accounts = [line.split('|')[1].strip() for line in open('file.txt') if 'has been granted access' in line]
>>> print('\n'.join(sorted(set(granted_accounts))))
account01
account_
account_two

我完全忽视了斯普利特。。。但以下是一个基于使用split的完全有效的版本:

|拆分并选择拆分的第二部分,然后去掉所有空白,然后通过检查帐户是否不在列表中生成一个accountlist,这样可以删除重复项

最后但并非最不重要的是,它会将所有帐户输出到输出.txt在

accountlist = []
with open('mydatafile.txt', 'r') as infile: 
    for line in infile:
        if "has been granted access." in line:
            if line.strip().split('|')[1].strip(" ") not in accountlist:
                accountlist.append(line.strip().split('|')[1].strip(" "))
    print accountlist

    with open('output.txt', 'w') as outfile:
        for account in accountlist:
            outfile.write("%s\n" % account)

相关问题 更多 >