数据分析与机器学习实验框架
exp-runner的Python项目详细描述
实验转轮(实验转轮)
exp runner是一个简单且可扩展的框架,用于python中的数据分析和机器学习实验。
结构
框架包括以下步骤:
- 数据加载
- 数据转换
- 模型培训和测试
- 绩效评估
- 结果保存
主要功能
- generability:支持模型和方法的变量,它可以用于许多任务
(如预处理,降维,分类,
回归、聚类、统计检验等)
- flexability:可以轻松跳过和/或包括步骤
- 动态加载:在运行时自动导入模块-不需要额外的行
安装
pip install exp-runner
使用量
假设您的项目具有以下结构:
MyAwesomeProject/
main.py
my_custom_module.py
data/
data_00.npy
data_01.npy
...
data_NN.npy
protocols/
experiment_config.json
results/
给我一个密码!您只需要在JSON配置文件中描述您的framework:
实验配置json
{"Setup":{"description":"You can add detailed description of the experiment","random_seed":42},"Dataset":{"class":"my_custom_module.MyAwesomeDataLoader","args":{"path_to_data":"data/*.npy"}},"Transforms":[{"class":"sklearn.decomposition.PCA","args":{"n_components":3,"whiten":true}}],"Model":{"class":"sklearn.cluster.KMeans","args":{"n_clusters":3,"n_jobs":-1,"verbose":0}},"Metric":{"class":"my_custom_module.SklearnMetricWrapper","args":{"metric":"normalized_mutual_info_score"}},"Saver":{"class":"my_custom_module.CSVReport","args":{"path_to_output":"results/evaluation_results.csv","sep":";"}}}
下面是前面提到的类(单击):
我的自定义模块.py
importosimportglobimportnumpyasnpimportsklearn.metricsfromexp_runnerimportDataset,Metric,SaverfromcollectionsimportdefaultdictfromtypingimportAny,Dict,List,Union,NoReturn,Iterable,Callablefromsklearn.model_selectionimportStratifiedShuffleSplitclassMyAwesomeDataLoader(Dataset):def__init__(self,path_to_data:str,test_size:float=0.1,training:bool=True):super(MyAwesomeDataLoader,self).__init__()self._samples=dict()self._labels=dict()self._splits=defaultdict(dict)paths_to_data=glob.glob(path_to_data)forpathinpaths_to_data:fname=os.path.basename(path)data=np.load(path)X=data[:,:-1]y=data[:,-1]indices_train,indices_test=next(StratifiedShuffleSplit(test_size=test_size).split(X,y))self._samples[fname]=Xself._labels[fname]=yself._splits[fname]['train']=indices_trainself._splits[fname]['test']=indices_testself._indices=list(self._samples.keys())self._training=trainingdef__getitem__(self,index:int)->Dict[str,Dict[str,Union[str,np.ndarray]]]:ifnot(0<=index<len(self._indices)):raiseIndexErrorfname=self._indices[index]item={'X':self._samples[fname][self._splits[fname]['train']ifself.trainingelseself._splits[fname]['test']],'y':self._labels[fname][self._splits[fname]['train']ifself.trainingelseself._splits[fname]['test']]}item['desc']='it is possible to add description for each data sample'return{'filename':fname,'item':item}def__len__(self)->int:returnlen(self._indices)@propertydeftraining(self):returnself._trainingclassSklearnMetricWrapper(Metric):def__init__(self,metric:str):super(SklearnMetricWrapper,self).__init__()metric=getattr(sklearn.metrics,metric)self._metric:Callable[[Iterable[Union[float,int]],Iterable[Union[float,int]]],float]=metricdef__call__(self,y_true:Iterable[Union[float,int]],y_pred:Iterable[Union[float,int]])->float:returnself._metric(y_true,y_pred)classCSVReport(Saver):def__init__(self,path_to_output:str,sep:str=';',append:bool=True):super(CSVReport,self).__init__()self.path_to_output=path_to_outputself.sep=sepself.mode='a+'ifappendelse'w+'defsave(self,report:List[Dict[str,Any]])->NoReturn:withopen(self.path_to_output,self.mode)ascsv:forentryinreport:line=self.sep.join([entry['filename'],entry['desc'],entry['perf']])+'\n'csv.write(line)
详细信息>最后,要在终端中运行实验类型:
cd /path/to/MyAwesomeProject
python main.py --config protocols/experiment_config.json
推荐PyPI第三方库
下面是前面提到的类(单击):
我的自定义模块.py
importosimportglobimportnumpyasnpimportsklearn.metricsfromexp_runnerimportDataset,Metric,SaverfromcollectionsimportdefaultdictfromtypingimportAny,Dict,List,Union,NoReturn,Iterable,Callablefromsklearn.model_selectionimportStratifiedShuffleSplitclassMyAwesomeDataLoader(Dataset):def__init__(self,path_to_data:str,test_size:float=0.1,training:bool=True):super(MyAwesomeDataLoader,self).__init__()self._samples=dict()self._labels=dict()self._splits=defaultdict(dict)paths_to_data=glob.glob(path_to_data)forpathinpaths_to_data:fname=os.path.basename(path)data=np.load(path)X=data[:,:-1]y=data[:,-1]indices_train,indices_test=next(StratifiedShuffleSplit(test_size=test_size).split(X,y))self._samples[fname]=Xself._labels[fname]=yself._splits[fname]['train']=indices_trainself._splits[fname]['test']=indices_testself._indices=list(self._samples.keys())self._training=trainingdef__getitem__(self,index:int)->Dict[str,Dict[str,Union[str,np.ndarray]]]:ifnot(0<=index<len(self._indices)):raiseIndexErrorfname=self._indices[index]item={'X':self._samples[fname][self._splits[fname]['train']ifself.trainingelseself._splits[fname]['test']],'y':self._labels[fname][self._splits[fname]['train']ifself.trainingelseself._splits[fname]['test']]}item['desc']='it is possible to add description for each data sample'return{'filename':fname,'item':item}def__len__(self)->int:returnlen(self._indices)@propertydeftraining(self):returnself._trainingclassSklearnMetricWrapper(Metric):def__init__(self,metric:str):super(SklearnMetricWrapper,self).__init__()metric=getattr(sklearn.metrics,metric)self._metric:Callable[[Iterable[Union[float,int]],Iterable[Union[float,int]]],float]=metricdef__call__(self,y_true:Iterable[Union[float,int]],y_pred:Iterable[Union[float,int]])->float:returnself._metric(y_true,y_pred)classCSVReport(Saver):def__init__(self,path_to_output:str,sep:str=';',append:bool=True):super(CSVReport,self).__init__()self.path_to_output=path_to_outputself.sep=sepself.mode='a+'ifappendelse'w+'defsave(self,report:List[Dict[str,Any]])->NoReturn:withopen(self.path_to_output,self.mode)ascsv:forentryinreport:line=self.sep.join([entry['filename'],entry['desc'],entry['perf']])+'\n'csv.write(line)详细信息>
最后,要在终端中运行实验类型:
cd /path/to/MyAwesomeProject
python main.py --config protocols/experiment_config.json