从字符串为elemen的列表中查找交集

2024-06-26 14:23:18 发布

您现在位置:Python中文网/ 问答频道 /正文

我有下面的列表,其中内部列表有两个字符串格式的项。你知道吗

neighbor_list = [['Mo0',
  '[PeriodicSite: S (1.5952, -0.9210, 37.6032) [0.3333, -0.3333, 0.9458], PeriodicSite: S (0.0000, 1.8419, 37.6032) [0.3333, 0.6667, 0.9458], PeriodicSite: S (3.1903, 1.8419, 37.6032) [1.3333, 0.6667, 0.9458], PeriodicSite: S (1.5952, -0.9210, 34.4734) [0.3333, -0.3333, 0.8671], PeriodicSite: S (0.0000, 1.8419, 34.4734) [0.3333, 0.6667, 0.8671], PeriodicSite: S (3.1903, 1.8419, 34.4734) [1.3333, 0.6667, 0.8671]]'],
 ['Mo1',
  '[PeriodicSite: S (1.5952, -0.9210, 12.7242) [0.3333, -0.3333, 0.3200], PeriodicSite: S (0.0000, 1.8419, 12.7242) [0.3333, 0.6667, 0.3200], PeriodicSite: S (3.1903, 1.8419, 12.7242) [1.3333, 0.6667, 0.3200], PeriodicSite: S (1.5952, -0.9210, 9.5944) [0.3333, -0.3333, 0.2413], PeriodicSite: S (0.0000, 1.8419, 9.5944) [0.3333, 0.6667, 0.2413], PeriodicSite: S (3.1903, 1.8419, 9.5944) [1.3333, 0.6667, 0.2413]]'],
 ['Mo2',
  '[PeriodicSite: S (-1.5952, 0.9210, 30.1636) [-0.3333, 0.3333, 0.7587], PeriodicSite: S (1.5952, 0.9210, 30.1636) [0.6667, 0.3333, 0.7587], PeriodicSite: S (0.0000, 3.6839, 30.1636) [0.6667, 1.3333, 0.7587], PeriodicSite: S (-1.5952, 0.9210, 27.0339) [-0.3333, 0.3333, 0.6800], PeriodicSite: S (1.5952, 0.9210, 27.0339) [0.6667, 0.3333, 0.6800], PeriodicSite: S (0.0000, 3.6839, 27.0339) [0.6667, 1.3333, 0.6800]]'],
 ['Mo3',
  '[PeriodicSite: S (-1.5952, 0.9210, 5.2846) [-0.3333, 0.3333, 0.1329], PeriodicSite: S (1.5952, 0.9210, 5.2846) [0.6667, 0.3333, 0.1329], PeriodicSite: S (0.0000, 3.6839, 5.2846) [0.6667, 1.3333, 0.1329], PeriodicSite: S (-1.5952, 0.9210, 2.1548) [-0.3333, 0.3333, 0.0542], PeriodicSite: S (1.5952, 0.9210, 2.1548) [0.6667, 0.3333, 0.0542], PeriodicSite: S (0.0000, 3.6839, 2.1548) [0.6667, 1.3333, 0.0542]]']]

内部列表中的第一项(比如Mo0)是中心,第二项中的所有S是周围环境。首先我要打印添加到周围环境的中心原子列表,例如Mo0S6、Mo1S6、M02S6等等。然后我想用它们的坐标来找出Mo0,Mo1,Mo2,Mo3之间是否有共同的S,例如,Mo0附近S的坐标是:

S (1.5952, -0.9210, 37.6032) 
S (1.5952, -0.9210, 12.7242) 

等等。你知道吗

我可以通过这样做得到中心和周围的环境

for i in range(len(neighbor_list)):
    center = neighbor_list[i][0]
    surroundings = neighbor_list[i][1] 

如何求每个中心原子周围环境的数目之和,并找出周围环境之间的交集?你知道吗

最终的目标是得到以下格式的矩阵

      Mo0S6  Mo1S6  Mo2S6  Mo3S6
Mo0S6    0.0    0.0    0.0    0.0
Mo1S6    0.0    0.0    0.0    0.0
Mo2S6    0.0    0.0    0.0    0.0
Mo3S6    0.0    0.0    0.0    0.0

dataframe中的所有元素都是0,因为此列表中没有公共元素。你知道吗

谁能帮我一下吗。提前谢谢。你知道吗


Tags: 列表格式中心listneighbormo0s6mo1mo2s6
2条回答

您可以使用ast.literal_evalregex清理数据:

import pandas as pd
import re, ast

surrounding = [[ast.literal_eval(i) for i in re.findall(r'\([ ,.\d-]+\)', i[1])] for i in neighbor_list]
centers = ['{0}S{1}'.format(i[0], len(s)) for i, s in zip(neighbor_list, surrounding)]

data = dict(zip(centers, surrounding))

提供:

{'Mo0S6': [(1.5952, -0.921, 37.6032), (0.0, 1.8419, 37.6032), (3.1903, 1.8419, 37.6032), (1.5952, -0.921, 34.4734), (0.0, 1.8419, 34.4734), (3.1903, 1.8419, 34.4734)],
'Mo1S6': [(1.5952, -0.921, 12.7242), (0.0, 1.8419, 12.7242), (3.1903, 1.8419, 12.7242), (1.5952, -0.921, 9.5944), (0.0, 1.8419, 9.5944), (3.1903, 1.8419, 9.5944)],
'Mo2S6': [(-1.5952, 0.921, 30.1636), (1.5952, 0.921, 30.1636), (0.0, 3.6839, 30.1636), (-1.5952, 0.921, 27.0339), (1.5952, 0.921, 27.0339), (0.0, 3.6839, 27.0339)],
'Mo3S6': [(-1.5952, 0.921, 5.2846), (1.5952, 0.921, 5.2846), (0.0, 3.6839, 5.2846), (-1.5952, 0.921, 2.1548), (1.5952, 0.921, 2.1548), (0.0, 3.6839, 2.1548)]}

然后可以直接使用df = pd.Dataframe(data)生成数据帧:

                       Mo0S6                      Mo1S6  \
0  (1.5952, -0.921, 37.6032)  (1.5952, -0.921, 12.7242)   
1     (0.0, 1.8419, 37.6032)     (0.0, 1.8419, 12.7242)   
2  (3.1903, 1.8419, 37.6032)  (3.1903, 1.8419, 12.7242)   
3  (1.5952, -0.921, 34.4734)   (1.5952, -0.921, 9.5944)   
4     (0.0, 1.8419, 34.4734)      (0.0, 1.8419, 9.5944)   
5  (3.1903, 1.8419, 34.4734)   (3.1903, 1.8419, 9.5944)   

                       Mo2S6                     Mo3S6  
0  (-1.5952, 0.921, 30.1636)  (-1.5952, 0.921, 5.2846)  
1   (1.5952, 0.921, 30.1636)   (1.5952, 0.921, 5.2846)  
2     (0.0, 3.6839, 30.1636)     (0.0, 3.6839, 5.2846)  
3  (-1.5952, 0.921, 27.0339)  (-1.5952, 0.921, 2.1548)  
4   (1.5952, 0.921, 27.0339)   (1.5952, 0.921, 2.1548)  
5     (0.0, 3.6839, 27.0339)     (0.0, 3.6839, 2.1548) 

要查找重复项,我们可以简单地使用stack()duplicated(keep=False),其中keep=False确保返回重复项及其相关中心:

df.stack()[df.stack().duplicated(keep=False)]

收益率:

Series([], dtype: object)

您可以通过在示例数据中故意创建一个副本来确认此方法是否有效。你知道吗

只需解析字符串而无需导入任何内容:

for item in neighbor_list:
    center=item[0]
    surroundings=item[1].split("PeriodicSite: S ")

    # remove extra brackets
    surroundings=surroundings[1:]
    surroundings[-1]=surroundings[-1][0:-1]

    print "%sS%d" % (center, len(surroundings))

    surroundings = [x.replace("("," ").replace(")"," ").replace("["," ").replace("]"," ").replace(","," ") for x in surroundings]
    surroundings = [x.split() for x in surroundings]

    for S in surroundings:
        print "S (%s,%s,%s)" % (S[0], S[1], S[2])

提供:

Mo0S6
S (1.5952,-0.9210,37.6032)
S (0.0000,1.8419,37.6032)
S (3.1903,1.8419,37.6032)
S (1.5952,-0.9210,34.4734)
S (0.0000,1.8419,34.4734)
S (3.1903,1.8419,34.4734)
Mo1S6
S (1.5952,-0.9210,12.7242)
S (0.0000,1.8419,12.7242)
S (3.1903,1.8419,12.7242)
S (1.5952,-0.9210,9.5944)
S (0.0000,1.8419,9.5944)
S (3.1903,1.8419,9.5944)
Mo2S6
S (-1.5952,0.9210,30.1636)
S (1.5952,0.9210,30.1636)
S (0.0000,3.6839,30.1636)
S (-1.5952,0.9210,27.0339)
S (1.5952,0.9210,27.0339)
S (0.0000,3.6839,27.0339)
Mo3S6
S (-1.5952,0.9210,5.2846)
S (1.5952,0.9210,5.2846)
S (0.0000,3.6839,5.2846)
S (-1.5952,0.9210,2.1548)
S (1.5952,0.9210,2.1548)
S (0.0000,3.6839,2.1548

相关问题 更多 >