losd的数据故事模式分析

datastories的Python项目详细描述


数据存储模式库

数据故事模式库是一个具有模式分析的存储库,指定用于链接的开放统计数据。故事模式是从“数据新闻学”的文学研究中检索出来的。

安装

pipinstalldatastories

要求将随软件包一起自动安装

###导入/使用

importdatastories.analyticalaspatternspatterns.DataStoryPattern(sparqlendpointurl,jsonmetadata)
<创建的对象允许基于JSON MeATADAT提供的SPARQL端点查询。

json模板

{"cube_key":{"title":"title of cube","dataset_structure":"URI for cube structure","dimensions":{"dimension_key":{"dimension_title":"Title of diemnsion","dimension_url":"URI for dimension","dimension_prefix":"URI for dimension's values"},"dimension_key":{"dimension_title":"Title of diemnsion","dimension_url":"URI for dimension","dimension_prefix":"URI for dimension's values"}},"hierarchical_dimensions":{"dimension_key":{"dimension_title":"Title of diemnsion","dimension_url":"URI for dimension","dimension_prefix":"URI for dimension's values","dimension_levels":{"level_key":"integer(granularity level)","level_key":"integer(granularity level)"}}},"measures":{"measure_key":{"measure_title":"Title of measure","measure_url":"URI for measure"}}}}

模式描述

M计数

测量和计数 应用于整个数据集的算术运算符-有关数据的基本信息

属性

defMCounting(self,cube="",dims=[],meas=[],hierdims=[],count_type="raw",df=pd.DataFrame())
ParameterTypeDescription
cube^{}Cube, which dimensions and measures will be investigated
dims^{}List of dimensions (from cube) to take into investigation
meas^{}List of measures (from cube) to take into investigation
hierdims^{}Hierarchical Dimesion with selected hierarchy level to take into investigation
count_type^{}Type of Count to perform
df^{}DataFrame object, if data is already retrieved from endpoint

输出

基于count_type值

Count_typeDescription
rawdata without any analysis performed
sumsum across all numeric columns
meanmean across all numeric columns
minminimum values from all numeric columns
maxmaximum values from all numeric columns
countamount of records

长期有效

leaguetable-排序和提取特定数量的记录

属性

defLTable(self,cube=[],dims=[],meas=[],hierdims=[],columns_to_order="",order_type="asc",number_of_records=20,df=pd.DataFrame())
ParameterTypeDescription
cube^{}Cube, which dimensions and measures will be investigated
dims^{}List of dimensions (from cube) to take into investigation
meas^{}List of measures (from cube) to take into investigation
hierdims^{}Hierarchical Dimesion with selected hierarchy level to take into investigation
columns_to_order^{}Set of columns to order by
order_type^{}Type of order (asc/desc)
number_of_records^{}Amount of records to retrieve
df^{}DataFrame object, if data is already retrieved from endpoint

输出

基于排序类型值

Sort_typeDescription
ascascending order based on columns provided in ^{}
descdescending order based on columns provided in ^{}

内部比较

InternalComparison-一列中与文本值相关的数值比较

属性

defInternalComparison(self,cube="",dims=[],meas=[],hierdims=[],df=pd.DataFrame(),dim_to_compare="",meas_to_compare="",comp_type="")
ParameterTypeDescription
cube^{}Cube, which dimensions and measures will be investigated
dims^{}List of dimensions (from cube) to take into investigation
meas^{}List of measures (from cube) to take into investigation
hierdims^{}Hierarchical Dimesion with selected hierarchy level to take into investigation
df^{}DataFrame object, if data is already retrieved from endpoint
dim_to_compare^{}Dimension, which values will be investigated
meas_to_compare^{}Measure, which numeric values related to ^{} will be processed
comp_type^{}Type of comparison to perform

输出

与所选的comp_type无关,输出数据将有额外的列以特定方式处理数值列meas_to_compare

可用的比较类型comp_type

Comp_typeDescription
diffmaxdifference with max value related to specific textual value
diffmeandifference with arithmetic mean related to specific textual values
diffmindifference with minimum value related to specific textual value

剖面异常值

剖面异常值-检测数据中的异常值(异常)

属性

defProfileOutliers(self,cube=[],dims=[],meas=[],hierdims=[],df=pd.DataFrame(),displayType="outliers_only")
ParameterTypeDescription
cube^{}Cube, which dimensions and measures will be investigated
dims^{}List of dimensions (from cube) to take into investigation
meas^{}List of measures (from cube) to take into investigation
hierdims^{}Hierarchical Dimesion with selected hierarchy level to take into investigation
df^{}DataFrame object, if data is already retrieved from endpoint
display_type^{}What information display are bound to display (with/without anomalies)

输出

使用python scipy库的模式分析将在数据中的异常值序列中执行快速探索。

基于display_type参数数据将显示有/无检测异常值。

可用的显示类型display_type

display_typeDescription
outliers_onlyreturns rows from dataset where unusual values were detected
without_outliersreturns dataset with excluded rows where unusual values were detected

解剖因子

dissectfactors-根据dim_to_dissect中的值分解数据

属性

defDissectFactors(self,cube="",dims=[],meas=[],hierdims=[],df=pd.DataFrame(),dim_to_dissect="")
ParameterTypeDescription
cube^{}Cube, which dimensions and measures will be investigated
dims^{}List of dimensions (from cube) to take into investigation
meas^{}List of measures (from cube) to take into investigation
hierdims^{}Hierarchical Dimesion with selected hierarchy level to take into investigation
df^{}DataFrame object, if data is already retrieved from endpoint
dim_to_dissect^{}Based on which dimension data should be decomposed

输出

作为输出,数据将以字典的形式分解,其中每个子集的值仅与特定值相关。 子数据集的字典将被构造为一系列paiers,其中每个susbet的key值将来自dim_to_dissect 这个键值将是数据,其中yhis键值发生了。

高光对比度

HighlightContrast-与一个文本列相关的值之间的部分差异

属性

defHighlightContrast(self,cube="",dims=[],meas=[],hierdims=[],df=pd.DataFrame(),dim_to_contrast="",contrast_type="",meas_to_contrast="")
ParameterTypeDescription
cube^{}Cube, which dimensions and measures will be investigated
dims^{}List of dimensions (from cube) to take into investigation
meas^{}List of measures (from cube) to take into investigation
hierdims^{}Hierarchical Dimesion with selected hierarchy level to take into investigation
df^{}DataFrame object, if data is already retrieved from endpoint
dim_to_contrast^{}Textual column, from which values will be contrasted
meas_to_contrast^{}Numerical column, which values are contrasted
contrast_type^{}Type of contrast to present

输出

独立于所选的contrast_type,输出数据将有额外的列以特定方式处理数值列meas_to_contrast

可用的比较类型contrast_type

Contrast_typeDescription
partofwholedifference with max value related to specific textual value
partofmaxdifference with arithmetic mean related to specific textual values
partofmindifference with minimum value related to specific textual value

开始向下搜索

StartBigDrilldown-从多层次检索数据。

此模式只能应用于尚未存储在数据框中的数据

属性

defStartBigDrillDown(self,cube="",dims=[],meas=[],hierdim_drill_down=[])
ParameterTypeDescription
cube^{}Cube, which dimensions and measures will be investigated
dims^{}List of dimensions (from cube) to take into investigation
meas^{}List of measures (from cube) to take into investigation
hierdim_drill_down^{}Hierarchical dimension with list of hierarchy levels to inspect

输出

作为输出,数据将以字典的形式检索,其中每个数据集将从不同的层次结构级别检索。列表将在hierdim_drill_down中提供。参数中提供的层次结构级别将根据提供的元数据自动按从最一般到最详细的顺序排序。

启动mallzoomout

startsmallzoomout-从多个层次的数据检索。

此模式只能应用于尚未存储在数据框中的数据

属性

defStartSmallZoomOut(self,cube="",dims=[],meas=[],hierdim_zoom_out=[])
ParameterTypeDescription
cube^{}Cube, which dimensions and measures will be investigated
dims^{}List of dimensions (from cube) to take into investigation
meas^{}List of measures (from cube) to take into investigation
hierdim_zoom_out^{}Hierarchical dimension with list of hierarchy levels to inspect

输出

作为输出,数据将以字典的形式检索,其中每个数据集将从不同的层次结构级别检索。列表将在hierdim_zoom_out中提供。参数中提供的层次结构级别将根据提供的元数据自动按从最详细到最一般的级别进行排序。

按类别分析

AnalysisByCategory—根据Dim_for_类别中的值对数据进行组合,并对每个susbet执行分析

属性

defAnalysisByCategory(self,cube="",dims=[],meas=[],hierdims=[],df=pd.DataFrame(),dim_for_category="",meas_to_analyse="",analysis_type="min"):
ParameterTypeDescription
cube^{}Cube, which dimensions and measures will be investigated
dims^{}List of dimensions (from cube) to take into investigation
meas^{}List of measures (from cube) to take into investigation
hierdims^{}Hierarchical Dimesion with selected hierarchy level to take into investigation
df^{}DataFrame object, if data is already retrieved from endpoint
dim_for_category^{}Dimension, based on which input data will be categorised
meas_to_analyse^{}Measure, which will be analysed
analysis_type^{}Type of analysis to perform

输出

作为输出,数据将以字典的形式分解,其中每个子集的值仅与特定值相关。该子集将基于analysis_type参数进行分析

可用的分析类型analysis_type

Analysis_typeDescription
minMinimum per each category
maxMaximum per each category
meanArithmetical mean per each category
sumTotal value from each category

explore接口

属性

defExploreIntersection(self,dim_to_explore=""):
ParameterTypeDescription
dim_to_explore^{}Dimension, which existence within enpoint is going to be investigated

输出

模式将返回一系列数据集,其中每个数据集表示一个多维数据集中dim_to_explore的出现

叙述更改超时

属性

defNarrChangeOT(self,cube="",dims=[],meas=[],hierdims=[],df=pd.DataFrame(),meas_to_narrate="",narr_type="")
ParameterTypeDescription
cube^{}Cube, which dimensions and measures will be investigated
dims^{}List of dimensions (from cube) to take into investigation
meas^{}List of measures (from cube) to take into investigation
hierdims^{}Hierarchical Dimesion with selected hierarchy level to take into investigation
df^{}DataFrame object, if data is already retrieved from endpoint
meas_to_narrate^{}Set of 2 measures, which change will be narrated
narr_type^{}Type of narration to perform

输出

与所选的narr_type无关,输出数据将有额外的列,其中的数值将以特定方式处理。

可用的分析类型narr_type

Narr_typeDescription
percchangePercentage change between first nad second property
diffchangeQuantitive change between first and second property

欢迎加入QQ群-->: 979659372 Python中文网_新手群

推荐PyPI第三方库


热门话题
java如何从客户端使用UDDI的异步订阅服务?   加密Java加密并将向量保存到文件   递归中的java返回数集   java组织。jboss。部署。DeploymentException:嵌套可丢弃   java如何从命令行发送包含多行的字符串   java我可以用GetObjectById来表示2个ID吗?   java如何修复“执行DDL时出错”alter table事件删除外键FKg0mkvgsqn8584qoql6a2rxheq“通过JDBC语句”   使用Seam和JBoss AS访问OpenLDAP的java最佳框架?   java Apache KafkaMetric value方法已弃用,如何使用metricValue?   在Java中检查字符串是否为null时,如果(str==null)不正确,那么为什么(str!=null&&!str.isEmpty())被认为是正确的   DefaultMutableTreeNode中的java isRoot()   爪哇三叶草。木卫一。IOException,在CloudConnect中运行图形时禁止的异常   java安全非重复随机字母数字URL段塞   java HttpClient无法访问GET方法中的Cookie   java中如何将字符串转换为字符串数组   使用downloadmanager下载java Android studio,然后打开下载的文件