重复数据帧中的行,但ID不同

2024-06-02 16:35:25 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个pandas数据帧,看起来像这样:

id    c1    c2    c3
100    2    7     4
100    3    4     1 
100    4    0     10
105    2    3     4
105    3    6     8
105    4    9     2
115    2    1     0
115    3    7     14
115    4    0     20

现在我想用new_id = id + 10重复这个数据帧的行,如果这个new_id已经存在于原始数据帧中,那么new_id = new_id(the repeated one) + 10

样品:

id    c1    c2    c3
100    2    7     4
100    3    4     1 
100    4    0     10
105    2    3     4
105    3    6     8
105    4    9     2    
115    2    1     0
115    3    7     14
115    4    0     20
## Repeated data
110    2    7     4
110    3    4     1 
110    4    0     10
##Since 115 already exists it shall now be 125, if 125 exists it shall be 135
125    2    3     4
125    3    6     8
125    4    9     2 
.
.
.   

Tags: the数据idpandasnew原始数据existsit
2条回答

您可以先将10添加到id列,如果新id已经存在,则再添加10。你知道吗

(
    df.assign(id=df.id.add(10).add(df.id.add(10).isin(df.id).mul(10)))
    .pipe(lambda x: pd.concat([df, x]))
)

    id  c1  c2  c3
0   100 2   7   4
1   100 3   4   1
2   100 4   0   10
3   105 2   3   4
4   105 3   6   8
5   105 4   9   2
6   115 2   1   0
7   115 3   7   14
8   115 4   0   20
0   110 2   7   4
1   110 3   4   1
2   110 4   0   10
3   125 2   3   4
4   125 3   6   8
5   125 4   9   2
6   125 2   1   0
7   125 3   7   14
8   125 4   0   20

如果我没听错你的问题,看看这个。你知道吗

d = {'id': [100,100,100,105,105,105,115,115,115], 
 'c1': [2,3,4,2,3,4,2,3,4], 
 'c2':[7,4,0,3,6,9,1,7,0], 
 'c3':[4,1,10,4,8,2,0,14,20]}

df = pd.DataFrame(data=d)

def IDcheck(uniqueID, ID):
  while(True):
    #Increasing the value of the ID by 10
    ID += 10
    #Checking if the new_id is contained within the uniqueID list
    if(((ID) in uniqueID) == True):
        #The new ID exists within the old IDS
        #Updating the value of ID
        ID += 10
    else:
        return ID


def updateRow(df):
   #Selecting unique values from the 'id' column
   uniqueID = df['id'].unique().tolist()

   for ID in uniqueID:    
      #Select all rows with the same 'id' 
      temp = df.loc[df['id'] == ID]

      #Getting the new ID value
      new_id = IDcheck(uniqueID, ID)

      #Updating the ID's in temp to the new_id value
      temp['id'] = new_id

      #Adding the temporary dataframe to the original
      df = df.append(temp, ignore_index=True)

  #Unsorted
  return df

  #Sorted
  #return df.sort_values(by=['id'])


 updateRow(df)

相关问题 更多 >