将列从一个tsv文件附加到另一个tsv文件（unix）

2条回答

网友

1楼 · 编辑于 2024-06-26 14:17:31

使用join并执行完全外部联接：

>cat test.txt test2.txt
SampleID  RawReads
1         18
2         15
5         21
7         7
SampleID     ReadsPost
1            yes
3            no
4            yes
5            yes

> join -a1 -a2 test.txt test2.txt
SampleID RawReads ReadsPost
1 18 yes
2 15
3 no
4 yes
5 21 yes
7 7

注意，-a参数打印文件中未联接的行。要进行完整的外部连接，请打印两个文件中的行，如示例所示。你知道吗

网友

2楼 · 编辑于 2024-06-26 14:17:31

使用^{}加载数据时，可能需要为制表符分隔的工作表设置sep='\t'。一旦加载了两个数据帧，就可以使用^{}或^{}。有关良好的参考，请参见熊猫文档中的Merge, join, and concatenate。你知道吗

假设你的两辆tsv是这样的：

文件1：

SampleID     RawReads
1            18
2            15      
5            21    
7            7

文件2：

SampleID     ReadsPost
1            yes
3            no
4            yes
5            yes

使用合并

Merge可用于在两个dataframe上实现数据库样式的连接。在本例中，我们可以看到SampleID列中的两个数据帧不一致。如果我们想确保从两个帧获得所有数据，我们将使用outer连接。如果我们只需要其中一个的数据，我们可以使用right或left连接，这取决于我们想要保留的内容。这是一个保存一切的例子。你知道吗

df1 = pd.read_csv(file1, sep='\t')
df2 = pd.read_csv(file2, sep='\t')
merge_df = pd.merge(df1, df2, how='outer', on='SampleID')
print(merge_df)
   SampleID  RawReads ReadsPost
0         1      18.0       yes
1         2      15.0       NaN
2         5      21.0       yes
3         7       7.0       NaN
4         3       NaN        no
5         4       NaN       yes

使用Concat

Concat可用于沿行轴或列轴扩展数据帧。假设SampleID是您的索引，您只想沿着列轴将file2到file1中的值连接起来。例如：

df1 = pd.read_csv(file1, sep='\t', index_col='SampleID')
df2 = pd.read_csv(file2, sep='\t', index_col='SampleID')
concat_df = pd.concat([df1, df2], axis=1)
print(concat_df)
          RawReads ReadsPost
SampleID
1             18.0       yes
2             15.0       NaN
3              NaN        no
4              NaN       yes
5             21.0       yes
7              7.0       NaN

就像我说的看熊猫的资料。这是一个功能强大的库，也是用python处理数据的一个很好的入门。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章