这是我的数据帧:
repository,sha1,url,refactorings
repo1,1,url1,"[{'type': 'Add Parameter', 'description': 'Add Parameter id : String in method public IssueFilter(repository Repository, id String) from class com.github.pockethub.android.core.issue.IssueFilter', 'leftSideLocations': [{'filePath': 'path2'}]]
repo2,2,url2,"[{'type': 'Add Parameter', 'description': 'Add Parameter id : String in method public IssueFilter(repository Repository, id String) from class com.github.pockethub.android.core.issue.IssueFilter', 'leftSideLocations': [{'filePath': 'path2'}]]
我想从重构列中提取:添加类型为的参数,以及类后面的com.github.pockethub.android.core.issue.IssueFilter,并将它们放在新列中,然后删除重构列
该框架是:
repository,sha1,url,refac, class
repo1,1,url1,Add Parameter, com.github.pockethub.android.core.issue.IssueFilter
repo2,2,url2,Add Parameter, com.github.pockethub.android.core.issue.IssueFilter
这是我的代码:
df= pd.read_csv('data.csv', sep=',')
df1 = df[['sha1','url','refactorings']]
df1['refac']=df.refactorings.str.extract(r'[C|c]lass\s*([^ ]*)')
df1['class']=df.refactorings.str.extract(r"type':'\s*([^ ]*)")
del df1['refactorings']
a=df1.loc[~df1.sha1.duplicated(keep='last')]
list=[]
for elm in a['sha1']:
list.append(elm)
dicts = {key: d for key, d in df.groupby('sha1')}
lenght=len(list)
for i in range(lenght):
output1="output"+str(i)+".csv"
a=dicts[list[i]]
m=pd.DataFrame.from_dict(a)
m.to_csv(output1, index=False, na_rep='NaN')
它没有正确地提取重构和类:对于重构它返回'Add
,对于类它返回com.github.pockethub.android.core.issue.IssueFilter',
,它也没有创建任何新列,也没有删除重构列
将regexp与
str.extract()
一起使用相关问题 更多 >
编程相关推荐