扫描列中的关键字并提取该列中的所有值（如果在该列中找到关键字）

wb1 = load_workbook(join(dict_folder, file), data_only = True) ws = wb1.active for rowofcellobj in ws["C":"D"]: for cellobj in rowofcellobj: if cellobj.value == "Abbreviation": # extract all words in that column but Idk how to execute this step or if my above steps are correct if cellobj.value is not None: data = re.findall(r"\b\w+_.*?\w+|[A-Z]*$\b", str(cellobj.value)) #filtering out blank rows here: if data != [ ]: if data != [' ']: #extracting words from square brackets in list: fields = data[0] print(fields)

2条回答

网友

1楼 · 编辑于 2024-10-02 12:28:29

熊猫解决方案，灵感来自（link）

示例文件：

import pandas as pd 
import numpy as np

df = pd.read_excel('tst.xlsx', usecols="C:D")
df = df.fillna('') 

for row in range(df.shape[0]): 
       for col in range(df.shape[1]):
           if df.iat[row,col] == 'Abbreviation':
             row_start = row
             col_required = col
             break

df = df.loc[row_start+1:, df.columns[col_required]]
df.replace(['','\s+'], np.nan, inplace=True, regex=True)
df.dropna(inplace=True)

print(df)

结果：

9      sfsdfd
10    fgfg_ff
12        dfs
13        ddd
15      dd_hh

网友

2楼 · 编辑于 2024-10-02 12:28:29

Question: Scan columns for a keyword and extract all values in that column

定义起始行，在这里1：
```
min_row = 1
min_col = None
```

循环所有行，从min_row开始递增：

for row in ws.iter_rows(min_row=min_row, values_only=True):
    min_row += 1

try要在row中查找关键字，如果找到break。
因为index是0-based，所以+1得到列索引1-based。你知道吗
```
    try:
        min_col = row.index('Abbreviation') + 1
        break
    except:
        continue
```

如果找到，则循环所有folloup行，直至结束。你知道吗

Note: You have not defined a end condition!

if min_col is not None:
    for value in map(lambda x: x[0], 
                     sheet1.iter_rows(min_row=min_row, 
                                      min_col=min_col,                                  
                                      max_col=min_col,
                                      values_only=True)):
        print(value)

相关问题更多 >

编程相关推荐

热门问题

热门文章