losd的数据故事模式分析
datastories的Python项目详细描述
数据存储模式库
数据故事模式库是一个具有模式分析的存储库,指定用于链接的开放统计数据。故事模式是从“数据新闻学”的文学研究中检索出来的。
安装
pipinstalldatastories
要求将随软件包一起自动安装
###导入/使用
importdatastories.analyticalaspatternspatterns.DataStoryPattern(sparqlendpointurl,jsonmetadata)<创建的对象允许基于JSON MeATADAT提供的SPARQL端点查询。
json模板
{"cube_key":{"title":"title of cube","dataset_structure":"URI for cube structure","dimensions":{"dimension_key":{"dimension_title":"Title of diemnsion","dimension_url":"URI for dimension","dimension_prefix":"URI for dimension's values"},"dimension_key":{"dimension_title":"Title of diemnsion","dimension_url":"URI for dimension","dimension_prefix":"URI for dimension's values"}},"hierarchical_dimensions":{"dimension_key":{"dimension_title":"Title of diemnsion","dimension_url":"URI for dimension","dimension_prefix":"URI for dimension's values","dimension_levels":{"level_key":"integer(granularity level)","level_key":"integer(granularity level)"}}},"measures":{"measure_key":{"measure_title":"Title of measure","measure_url":"URI for measure"}}}}
模式描述
- Measurement and Counting
- League Table
- Internal Comprison
- Profile Outliers
- Dissect Factors
- Highlight Contrast
- Start Big Drill Down
- Start Small Zoom Out
- Analysis By Category
- Explore Intersection
- Narrating Change Over Time
M计数
测量和计数 应用于整个数据集的算术运算符-有关数据的基本信息
属性
defMCounting(self,cube="",dims=[],meas=[],hierdims=[],count_type="raw",df=pd.DataFrame())
Parameter | Type | Description |
---|---|---|
cube | ^{ | Cube, which dimensions and measures will be investigated |
dims | ^{ | List of dimensions (from cube) to take into investigation |
meas | ^{ | List of measures (from cube) to take into investigation |
hierdims | ^{ | Hierarchical Dimesion with selected hierarchy level to take into investigation |
count_type | ^{ | Type of Count to perform |
df | ^{ | DataFrame object, if data is already retrieved from endpoint |
输出
基于count_type值
Count_type | Description |
---|---|
raw | data without any analysis performed |
sum | sum across all numeric columns |
mean | mean across all numeric columns |
min | minimum values from all numeric columns |
max | maximum values from all numeric columns |
count | amount of records |
长期有效
leaguetable-排序和提取特定数量的记录
属性
defLTable(self,cube=[],dims=[],meas=[],hierdims=[],columns_to_order="",order_type="asc",number_of_records=20,df=pd.DataFrame())
Parameter | Type | Description |
---|---|---|
cube | ^{ | Cube, which dimensions and measures will be investigated |
dims | ^{ | List of dimensions (from cube) to take into investigation |
meas | ^{ | List of measures (from cube) to take into investigation |
hierdims | ^{ | Hierarchical Dimesion with selected hierarchy level to take into investigation |
columns_to_order | ^{ | Set of columns to order by |
order_type | ^{ | Type of order (asc/desc) |
number_of_records | ^{ | Amount of records to retrieve |
df | ^{ | DataFrame object, if data is already retrieved from endpoint |
输出
基于排序类型值
Sort_type | Description |
---|---|
asc | ascending order based on columns provided in ^{ |
desc | descending order based on columns provided in ^{ |
内部比较
InternalComparison-一列中与文本值相关的数值比较
属性
defInternalComparison(self,cube="",dims=[],meas=[],hierdims=[],df=pd.DataFrame(),dim_to_compare="",meas_to_compare="",comp_type="")
Parameter | Type | Description |
---|---|---|
cube | ^{ | Cube, which dimensions and measures will be investigated |
dims | ^{ | List of dimensions (from cube) to take into investigation |
meas | ^{ | List of measures (from cube) to take into investigation |
hierdims | ^{ | Hierarchical Dimesion with selected hierarchy level to take into investigation |
df | ^{ | DataFrame object, if data is already retrieved from endpoint |
dim_to_compare | ^{ | Dimension, which values will be investigated |
meas_to_compare | ^{ | Measure, which numeric values related to ^{ |
comp_type | ^{ | Type of comparison to perform |
输出
与所选的comp_type
无关,输出数据将有额外的列以特定方式处理数值列meas_to_compare
。
可用的比较类型comp_type
Comp_type | Description |
---|---|
diffmax | difference with max value related to specific textual value |
diffmean | difference with arithmetic mean related to specific textual values |
diffmin | difference with minimum value related to specific textual value |
剖面异常值
剖面异常值-检测数据中的异常值(异常)
属性
defProfileOutliers(self,cube=[],dims=[],meas=[],hierdims=[],df=pd.DataFrame(),displayType="outliers_only")
Parameter | Type | Description |
---|---|---|
cube | ^{ | Cube, which dimensions and measures will be investigated |
dims | ^{ | List of dimensions (from cube) to take into investigation |
meas | ^{ | List of measures (from cube) to take into investigation |
hierdims | ^{ | Hierarchical Dimesion with selected hierarchy level to take into investigation |
df | ^{ | DataFrame object, if data is already retrieved from endpoint |
display_type | ^{ | What information display are bound to display (with/without anomalies) |
输出
使用python scipy
库的模式分析将在数据中的异常值序列中执行快速探索。
基于display_type
参数数据将显示有/无检测异常值。
可用的显示类型display_type
display_type | Description |
---|---|
outliers_only | returns rows from dataset where unusual values were detected |
without_outliers | returns dataset with excluded rows where unusual values were detected |
解剖因子
dissectfactors-根据dim_to_dissect中的值分解数据
属性
defDissectFactors(self,cube="",dims=[],meas=[],hierdims=[],df=pd.DataFrame(),dim_to_dissect="")
Parameter | Type | Description |
---|---|---|
cube | ^{ | Cube, which dimensions and measures will be investigated |
dims | ^{ | List of dimensions (from cube) to take into investigation |
meas | ^{ | List of measures (from cube) to take into investigation |
hierdims | ^{ | Hierarchical Dimesion with selected hierarchy level to take into investigation |
df | ^{ | DataFrame object, if data is already retrieved from endpoint |
dim_to_dissect | ^{ | Based on which dimension data should be decomposed |
输出
作为输出,数据将以字典的形式分解,其中每个子集的值仅与特定值相关。
子数据集的字典将被构造为一系列paiers,其中每个susbet的key值将来自dim_to_dissect
这个键值将是数据,其中yhis键值发生了。
高光对比度
HighlightContrast-与一个文本列相关的值之间的部分差异
属性
defHighlightContrast(self,cube="",dims=[],meas=[],hierdims=[],df=pd.DataFrame(),dim_to_contrast="",contrast_type="",meas_to_contrast="")
Parameter | Type | Description |
---|---|---|
cube | ^{ | Cube, which dimensions and measures will be investigated |
dims | ^{ | List of dimensions (from cube) to take into investigation |
meas | ^{ | List of measures (from cube) to take into investigation |
hierdims | ^{ | Hierarchical Dimesion with selected hierarchy level to take into investigation |
df | ^{ | DataFrame object, if data is already retrieved from endpoint |
dim_to_contrast | ^{ | Textual column, from which values will be contrasted |
meas_to_contrast | ^{ | Numerical column, which values are contrasted |
contrast_type | ^{ | Type of contrast to present |
输出
独立于所选的contrast_type
,输出数据将有额外的列以特定方式处理数值列meas_to_contrast
。
可用的比较类型contrast_type
Contrast_type | Description |
---|---|
partofwhole | difference with max value related to specific textual value |
partofmax | difference with arithmetic mean related to specific textual values |
partofmin | difference with minimum value related to specific textual value |
开始向下搜索
StartBigDrilldown-从多层次检索数据。
此模式只能应用于尚未存储在数据框中的数据
属性
defStartBigDrillDown(self,cube="",dims=[],meas=[],hierdim_drill_down=[])
Parameter | Type | Description |
---|---|---|
cube | ^{ | Cube, which dimensions and measures will be investigated |
dims | ^{ | List of dimensions (from cube) to take into investigation |
meas | ^{ | List of measures (from cube) to take into investigation |
hierdim_drill_down | ^{ | Hierarchical dimension with list of hierarchy levels to inspect |
输出
作为输出,数据将以字典的形式检索,其中每个数据集将从不同的层次结构级别检索。列表将在hierdim_drill_down
中提供。参数中提供的层次结构级别将根据提供的元数据自动按从最一般到最详细的顺序排序。
启动mallzoomout
startsmallzoomout-从多个层次的数据检索。
此模式只能应用于尚未存储在数据框中的数据
属性
defStartSmallZoomOut(self,cube="",dims=[],meas=[],hierdim_zoom_out=[])
Parameter | Type | Description |
---|---|---|
cube | ^{ | Cube, which dimensions and measures will be investigated |
dims | ^{ | List of dimensions (from cube) to take into investigation |
meas | ^{ | List of measures (from cube) to take into investigation |
hierdim_zoom_out | ^{ | Hierarchical dimension with list of hierarchy levels to inspect |
输出
作为输出,数据将以字典的形式检索,其中每个数据集将从不同的层次结构级别检索。列表将在hierdim_zoom_out
中提供。参数中提供的层次结构级别将根据提供的元数据自动按从最详细到最一般的级别进行排序。
按类别分析
AnalysisByCategory—根据Dim_for_类别中的值对数据进行组合,并对每个susbet执行分析
属性
defAnalysisByCategory(self,cube="",dims=[],meas=[],hierdims=[],df=pd.DataFrame(),dim_for_category="",meas_to_analyse="",analysis_type="min"):
Parameter | Type | Description |
---|---|---|
cube | ^{ | Cube, which dimensions and measures will be investigated |
dims | ^{ | List of dimensions (from cube) to take into investigation |
meas | ^{ | List of measures (from cube) to take into investigation |
hierdims | ^{ | Hierarchical Dimesion with selected hierarchy level to take into investigation |
df | ^{ | DataFrame object, if data is already retrieved from endpoint |
dim_for_category | ^{ | Dimension, based on which input data will be categorised |
meas_to_analyse | ^{ | Measure, which will be analysed |
analysis_type | ^{ | Type of analysis to perform |
输出
作为输出,数据将以字典的形式分解,其中每个子集的值仅与特定值相关。该子集将基于analysis_type
参数进行分析
可用的分析类型analysis_type
Analysis_type | Description |
---|---|
min | Minimum per each category |
max | Maximum per each category |
mean | Arithmetical mean per each category |
sum | Total value from each category |
explore接口
属性
defExploreIntersection(self,dim_to_explore=""):
Parameter | Type | Description |
---|---|---|
dim_to_explore | ^{ | Dimension, which existence within enpoint is going to be investigated |
输出
模式将返回一系列数据集,其中每个数据集表示一个多维数据集中dim_to_explore
的出现
叙述更改超时
属性
defNarrChangeOT(self,cube="",dims=[],meas=[],hierdims=[],df=pd.DataFrame(),meas_to_narrate="",narr_type="")
Parameter | Type | Description |
---|---|---|
cube | ^{ | Cube, which dimensions and measures will be investigated |
dims | ^{ | List of dimensions (from cube) to take into investigation |
meas | ^{ | List of measures (from cube) to take into investigation |
hierdims | ^{ | Hierarchical Dimesion with selected hierarchy level to take into investigation |
df | ^{ | DataFrame object, if data is already retrieved from endpoint |
meas_to_narrate | ^{ | Set of 2 measures, which change will be narrated |
narr_type | ^{ | Type of narration to perform |
输出
与所选的narr_type
无关,输出数据将有额外的列,其中的数值将以特定方式处理。
可用的分析类型narr_type
Narr_type | Description |
---|---|
percchange | Percentage change between first nad second property |
diffchange | Quantitive change between first and second property |