基于键从表中获取所需数据

2024-10-02 08:26:17 发布

您现在位置:Python中文网/ 问答频道 /正文

我在一个文件中有一个数据集,由三列(IP地址、端口、域名)组成,如下所示:

172.56.146.16 61981 r5---sn-uhvcpax0n5-x5ue.googlevideo.com
172.56.146.13 64576 r2---sn-uhvcpax0n5-x5ue.googlevideo.com
172.56.146.46 56483 ssl.gstatic.com
172.56.146.14 57054 r3---sn-uhvcpax0n5-x5ue.googlevideo.com
172.56.146.14 58157 r3---sn-uhvcpax0n5-x5ue.googlevideo.com
172.56.146.18 62666 ssl.gstatic.com
172.56.146.15 55682 r4---sn-uhvcpax0n5-x5ue.googlevideo.com
172.56.146.16 52234 r5---sn-uhvcpax0n5-x5ue.googlevideo.com
172.56.146.59 57106 ssl.gstatic.com
172.56.146.18 58897 ssl.gstatic.com
172.56.146.16 52258 r5---sn-uhvcpax0n5-x5ue.googlevideo.com
172.56.146.15 55694 r4---sn-uhvcpax0n5-x5ue.googlevideo.com
172.56.146.32 64281 ssl.gstatic.com
172.56.146.39 60581 ssl.gstatic.com
172.56.146.13 57137 r2---sn-uhvcpax0n5-x5ue.googlevideo.com
172.56.146.14 64763 r3---sn-uhvcpax0n5-x5ue.googlevideo.com
172.56.146.13 57135 r2---sn-uhvcpax0n5-x5ue.googlevideo.com
172.56.146.15 51318 r4---sn-uhvcpax0n5-x5ue.googlevideo.com

我还在文件中设置了一个密钥,只包含IP地址和端口:

172.56.146.15 49333
172.56.146.16 52233
172.56.146.46 56483
172.56.146.14 58928
172.56.146.16 61981
172.56.146.13 64576
172.56.146.14 58157
172.56.146.18 62666
172.56.146.15 55682
172.56.146.14 57054

现在我想逐个考虑密钥集中的所有行,将其作为数据集的输入,作为回报,我应该能够从数据集中获取每个密钥的域名(从密钥集中获取的IP地址和端口)。你知道吗

例如,对于172.56.146.15 49333,我可以得到结果“domain not found”,对于172.56.146.46 56483,我应该得到结果ssl.gstatic.com,依此类推。有人能告诉我如何使用shell命令或脚本来实现这一点吗?结果输出如下(与键集中的键一一对应):

domain not found
ssl.gstatic.com
r5---sn-uhvcpax0n5-x5ue.googlevideo.com

Tags: 文件数据端口comssl密钥r2r3
3条回答

使用GNU bash:

#!/bin/bash

while read -r ip foo bar; do
  grep "$ip $foo" dataset
  [[ $? != 0 ]] && echo "$ip $foo domain not found"
done < keys

输出:

172.56.146.15 49333 domain not found
172.56.146.16 52233 domain not found
172.56.146.46 56483 ssl.gstatic.com
172.56.146.14 58928 domain not found
172.56.146.16 61981 r5 -sn-uhvcpax0n5-x5ue.googlevideo.com
172.56.146.13 64576 r2 -sn-uhvcpax0n5-x5ue.googlevideo.com
172.56.146.14 58157 r3 -sn-uhvcpax0n5-x5ue.googlevideo.com
172.56.146.18 62666 ssl.gstatic.com
172.56.146.15 55682 r4 -sn-uhvcpax0n5-x5ue.googlevideo.com
172.56.146.14 57054 r3 -sn-uhvcpax0n5-x5ue.googlevideo.com

用这个

#!/bin/sh

while IFS='' read -r line || [[ -n "$line" ]]; do
    if grep -q -s "$line" table.txt; then
        result=($(grep -s $line table.txt))
        echo ${result[2]}
    else
        echo "domain not found"
    fi
done < "$1"

跑步:

./myscript.sh key.txt

结果:

domain not found
domain not found
ssl.gstatic.com
domain not found
r5 -sn-uhvcpax0n5-x5ue.googlevideo.com
r2 -sn-uhvcpax0n5-x5ue.googlevideo.com
r3 -sn-uhvcpax0n5-x5ue.googlevideo.com
ssl.gstatic.com
r4 -sn-uhvcpax0n5-x5ue.googlevideo.com
r3 -sn-uhvcpax0n5-x5ue.googlevideo.com

两种解决方案,都是将数据文件读入数组,然后查找密钥文件中每行的数组值。你知道吗

  1. “纯”Bash(仅内置):

    #!/bin/bash
    
    # Declare associative array
    declare -A datafile
    
    # Read data file into associative array
    while read -r ip_addr port domain; do
        datafile["$ip_addr $port"]="$domain"
    done < "$1"
    
    # Look up value for each key from key file in array
    while IFS= read -r key; do
        # Use parameter expansion to print "not found" if key is not in array
        printf "%s\n" "${datafile[$key]:-domain not found}"
    done < "$2"
    

    这称为:

    ./SO.sh data keys
    

    其中,SO.sh是脚本文件的名称,data是数据文件,keys是带有键的文件。

  2. Awk公司:

    #!/usr/bin/awk -f
    
    # Process first file, read into array
    NR == FNR {
        datafile[$1, $2] = $3
        next
    }
    
    # Look up value for key
    {
        if (datafile[$1, $2] == "")
            print "domain not found"
        else
            print datafile[$1, $2]
    }
    

    假设它存储在SO.awk中,则调用它,如下所示:

    ./SO.awk data keys
    

对于大文件,awk解决方案的速度将提高几个数量级。你知道吗

相关问题 更多 >

    热门问题