如何从文件名中提取元素并将其移动到不同的列?

2024-09-26 22:50:33 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个文件名,我把它转换成一个列表。该列表包含以下元素:

list = ['15253_Variation.JPG',
 '15253_Variation_Tainted.JPG',
 '15253_Variation_O2_Saxophone.PNG',
 '15253_Variation_O2_Saxophone.jpg',
 '15253_Variation_O2_Saxophone_reference.png',
 '15253_Variation_Side1.JPG',
 '15253_Variation_Side2.JPG']

我的目标是从这个列表中提取元素并填写一个数据框,它应该如下所示:

enter image description here

链接到包含上面图像的谷歌表单:https://docs.google.com/spreadsheets/d/1kuX3M4RFCNWtNoE7Hm1ejxWMwF-Cs4p8SsjA3JzdidA/edit?usp=sharing

到目前为止,我所做的是以下代码:

Obj = pd.DataFrame(data = list, index = None, columns = ['file'])
new_list = []
for i in Obj['file']:
       new_list.append(i.split('_'))

但是,这一个没有留下空白,因此没有做我需要的

非常感谢您的光临


Tags: obj元素列表newpng文件名listfile
1条回答
网友
1楼 · 发布于 2024-09-26 22:50:33

根据评论的数量。这很痛苦,因为文件名中的标记不是完全固定的格式。相当多的条件逻辑

  1. 已经定义了两个额外的列表mi工具和oxygen不管它是什么
  2. 第一个通行证在建筑物dict中,这是熊猫的标准格式
  3. 然后在有了基本数据帧后通过条件逻辑工作
# don't name it list - it override python list()!
l = ['15253_Variation.JPG',
 '15253_Variation_Tainted.JPG',
 '15253_Variation_O2_Saxophone.PNG',
 '15253_Variation_O2_Saxophone.jpg',
 '15253_Variation_O2_Saxophone_reference.png',
 '15253_Variation_Side1.JPG',
 '15253_Variation_Side2.JPG']

issues = ["Tainted","Perfect"]
mi = ["Saxophone"]
oxygen = ["O2"]

# first pass using dict/list comprehensions
df = pd.DataFrame({"filename":{i:f.split(".")[0] for i,f in enumerate(l)},
             "Number":{i:f.split("_")[0] for i,f in enumerate(l)}, 
              "Name":{i:f.split("_")[1].split(".")[0] for i,f in enumerate(l)}, 
              "Location2":{},
              "Issues":{}, "Oxygen":{},"Location":{}, "Musical Instrument":{},
             "Ref":{}, 
              "Extension":{i:f.split(".")[1] for i,f in enumerate(l)}})

df = df.assign(**{
    # list of tokens for checking fixed lists against
    "Tokens":lambda dfa: dfa.apply(lambda s: s["filename"].split("_")[2:], axis=1),
    "Issues":lambda dfa: dfa["Tokens"].apply(lambda s: s[np.where(np.isin(s, issues))[0][0]] 
                                           if np.isin(s, issues).any() else np.nan),
    "Musical Instrument":lambda dfa: dfa["Tokens"].apply(lambda s: s[np.where(np.isin(s, mi))[0][0]] 
                                           if np.isin(s, mi).any() else np.nan),
    "Oxygen":lambda dfa: dfa["Tokens"].apply(lambda s: s[np.where(np.isin(s, oxygen))[0][0]] 
                                           if np.isin(s, oxygen).any() else np.nan),
}).assign(**{
    # let's do tokens again minus ones already placed
    "Tokens":lambda dfa: dfa.apply(lambda s: [t for t in s["filename"].split("_")[2:] 
                                             if not(t==s["Issues"] 
                                                    or t==s["Musical Instrument"]
                                                   or t==s["Oxygen"])], axis=1),
    "Location2":lambda dfa: dfa.apply(lambda s: s["Tokens"][0] if len(s["Tokens"])>0 
                                      and "Side" in s["Tokens"][0] else np.nan, axis=1),
    "Ref":lambda dfa: dfa.apply(lambda s: s["Tokens"][0] if len(s["Tokens"])>0 
                                      and "Side" not in s["Tokens"][0] else np.nan, axis=1)

}).drop(columns=["Tokens","filename"])

print(df.to_string(index=False))

输出

Number       Name Location2   Issues Oxygen  Location Musical Instrument        Ref Extension
 15253  Variation       NaN      NaN    NaN       NaN                NaN        NaN       JPG
 15253  Variation       NaN  Tainted    NaN       NaN                NaN        NaN       JPG
 15253  Variation       NaN      NaN     O2       NaN          Saxophone        NaN       PNG
 15253  Variation       NaN      NaN     O2       NaN          Saxophone        NaN       jpg
 15253  Variation       NaN      NaN     O2       NaN          Saxophone  reference       png
 15253  Variation     Side1      NaN    NaN       NaN                NaN        NaN       JPG
 15253  Variation     Side2      NaN    NaN       NaN                NaN        NaN       JPG

相关问题 更多 >

    热门问题