我试图用两个字符串名称过滤数据帧,但问题是字符串可以在数据帧的任何一个序列中,并且序列的数目是可变的。如何过滤数据帧的每个序列,然后将它们合并到单个数据帧中?你知道吗
import pandas as pd
import os
# Directories of Statements:
cdir = "Current Directory"
odir = "Output Directory"
# Find all CSVs in cdir:
excels = [filename for filename in os.listdir(cdir) if filename.endswith(".csv")]
# Define concat_csv Function:
def concat_csv(csv_file):
df_csv = pd.read_csv(os.path.join(cdir, csv_file), header=None, index_col=None) # Load CSV into dataframe
df_final = pd.DataFrame() # Create empty dataframe
for col in df_csv: # For all columns in the dataframe filter rows by string 1 or 2 then create new dataframe
df_i = df_csv[(df_csv[col].str.contains("string1")==True) or (df_csv[col].str.contains("string2")==True)] # Use row if string equals string 1 or 2
df_final = df_final.concat(df_i, axis=1) # Concat all rows that contain string 1 or 2 to a new dataframe
# Send final dataframe to CSV in output directory:
df_final.to_csv(os.path.join(odir, os.path.splitext(os.path.basename(csv_file))[0] + ".csv"), encoding='utf-8')
# Apply concat_csv to all CSVs in cdir:
for f in excels:
concat_csv(os.path.join(cdir, f))
下面是我在Scott Boston推荐后使用的最后一个代码:
...
# Define concat_csv Function:
def concat_csv(csv_file):
df_csv = pd.read_csv(os.path.join(cdir, csv_file), header=None, index_col=None) # Load CSV into data frame
df = df_csv[df_csv.isin(["string 1", "string2"]).any(axis=1)] # Filter data frame by UGL data
df2 = df.dropna(axis=1, how="all") # Drop columns with all empty cells
try:
df_final = df2.set_index([0]) # Set index to column 1
except:
df_final = df2
# Send final dataframe to CSV in output directory:
df_final.to_csv(os.path.join(odir, os.path.splitext(os.path.basename(csv_file))[0] + ".csv"), encoding='utf-8')
# Apply concat_csv to all CSVs in cdir:
for f in excels:
concat_csv(os.path.join(cdir, f))
IIUC公司:
您有一个系列数为N的数据帧,您希望检查是否有两个字符串出现在任何一个系列中,并用这些行构建一个新的数据帧。你知道吗
生成通用数据
查找任何列中出现“G”或“F”的所有记录
输出:
相关问题 更多 >
编程相关推荐