391 阿里云技术社区[云栖]

网络分析__使用手册(new)_机器学习-阿里云

网络分析栏的算法组件都需要设置运行参数，参数说明如下：进程数：参数代号workerNum，用于设置作业并行执行的节点数；数字越大并行度越高，但框架通讯开销会增大。进程内存：参数代号workerMem，用于设置单个 worker可使用的最大内存量，默认每个worker分配4096内存；实际使用内存超过该值，会抛出OutOfMemory异常。

k-Core

功能介绍

一个图的KCore是指反复去除度小于或等于k的节点后，所剩余的子图。若一个节点存在于KCore，而在(K+1)CORE中被移去，那么此节点的核数（coreness）为k。因此所有度为1的节点的核数必然为0，节点核数的最大值被称为图的核数。

参数设置

k：核数的值，必填，默认3

实例

测试数据

新建数据SQL

drop table if exists KCore_func_test_edge;
create table KCore_func_test_edge as
select * from
(
  select '1' as flow_out_id,'2' as flow_in_id from dual
  union all
  select '1' as flow_out_id,'3' as flow_in_id from dual
  union all
  select '1' as flow_out_id,'4' as flow_in_id from dual
  union all
  select '2' as flow_out_id,'3' as flow_in_id from dual
  union all
  select '2' as flow_out_id,'4' as flow_in_id from dual
  union all
  select '3' as flow_out_id,'4' as flow_in_id from dual
  union all
  select '3' as flow_out_id,'5' as flow_in_id from dual
  union all
  select '3' as flow_out_id,'6' as flow_in_id from dual
  union all
  select '5' as flow_out_id,'6' as flow_in_id from dual
)tmp;

数据对应的graph结构如下图：

运行结果

设定k = 2：运行结果：结果如下：

+-------+-------+
| node1 | node2 |
+-------+-------+
| 1     | 2     |
| 1     | 3     |
| 1     | 4     |
| 2     | 1     |
| 2     | 3     |
| 2     | 4     |
| 3     | 1     |
| 3     | 2     |
| 3     | 4     |
| 4     | 1     |
| 4     | 2     |
| 4     | 3     |
+-------+-------+

pai命令示例

pai -name KCore
    -project algo_public
    -DinputEdgeTableName=KCore_func_test_edge
    -DfromVertexCol=flow_out_id
    -DtoVertexCol=flow_in_id
    -DoutputTableName=KCore_func_test_result
    -Dk=2;

算法参数

参数key名称	参数描述	必/选填	默认值
inputEdgeTableName	输入边表名	必填	-
inputEdgeTablePartitions	输入边表的分区	选填	全表读入
fromVertexCol	边表中起点所在列	必填	-
toVertexCol	边表中终点所在列	必填	-
outputTableName	输出表名	必填	-
outputTablePartitions	输出表的分区	选填	-
lifecycle	输出表申明周期	选填	-
workerNum	进程数量	选填	未设置
workerMem	进程内存	选填	4096
splitSize	数据切分大小	选填	64
k	核数	必填	3

单源最短路径

功能介绍

单源最短路径参考Dijkstra算法，本算法中当给定起点，则输出该点和其他所有节点的最短路径。

参数设置

起始节点id：用于计算最短路径的起始节点，必填

实例

测试数据

新建数据的SQL语句：

drop table if exists SSSP_func_test_edge;
create table SSSP_func_test_edge as
select
    flow_out_id,flow_in_id,edge_weight
from
(
    select "a" as flow_out_id,"b" as flow_in_id,1.0 as edge_weight from dual
    union all
    select "b" as flow_out_id,"c" as flow_in_id,2.0 as edge_weight from dual
    union all
    select "c" as flow_out_id,"d" as flow_in_id,1.0 as edge_weight from dual
    union all
    select "b" as flow_out_id,"e" as flow_in_id,2.0 as edge_weight from dual
    union all
    select "e" as flow_out_id,"d" as flow_in_id,1.0 as edge_weight from dual
    union all
    select "c" as flow_out_id,"e" as flow_in_id,1.0 as edge_weight from dual
    union all
    select "f" as flow_out_id,"g" as flow_in_id,3.0 as edge_weight from dual
    union all
    select "a" as flow_out_id,"d" as flow_in_id,4.0 as edge_weight from dual
) tmp
;

数据对应的graph结构： images

运行结果

结果如下：
+------------+------------+------------+--------------+
| start_node | dest_node  | distance   | distance_cnt | 
+------------+------------+------------+--------------+
| a          | b          | 1.0        | 1            |
| a          | c          | 3.0        | 1            |
| a          | d          | 4.0        | 3            |
| a          | a          | 0.0        | 0            |
| a          | e          | 3.0        | 1            |
+------------+------------+------------+--------------+

pai命令示例

pai -name SSSP
    -project algo_public
    -DinputEdgeTableName=SSSP_func_test_edge
    -DfromVertexCol=flow_out_id
    -DtoVertexCol=flow_in_id
    -DoutputTableName=SSSP_func_test_result
    -DhasEdgeWeight=true
    -DedgeWeightCol=edge_weight
    -DstartVertex=a;

算法参数

参数key名称	参数描述	必/选填	默认值
inputEdgeTableName	输入边表名	必填	-
inputEdgeTablePartitions	输入边表的分区	选填	全表读入
fromVertexCol	输入边表的起点所在列	必填	-
toVertexCol	输入边表的终点所在列	必填	-
outputTableName	输出表名	必填	-
outputTablePartitions	输出表的分区	选填	-
lifecycle	输出表申明周期	选填	-
workerNum	进程数量	选填	未设置
workerMem	进程内存	选填	4096
splitSize	数据切分大小	选填	64
startVertex	起始节点ID	必填	-
hasEdgeWeight	输入边表的边是否有权重	选填	false
edgeWeightCol	输入边表边的权重所在列	选填	-

PageRank

功能介绍

PageRank起于网页的搜索排序，google利用网页的链接结构计算每个网页的等级排名，其基本思路是：如果一个网页被其他多个网页指向，这说明该网页比较重要或者质量较高。除考虑网页的链接数量，还考虑网页本身的权重级别，以及该网页有多少条出链到其它网页。对于用户构成的人际网络，除了用户本身的影响力之外，边的权重也是重要因素之一。例如：新浪微博的某个用户，会更容易影响粉丝中关系比较亲密的家人、同学、同事等，而对陌生的弱关系粉丝影响较小。在人际网络中，边的权重等价为用户-用户的关系强弱指数。带连接权重的PageRank公式为：其中，w(i)为节点i的权重，c(A,i)为链接权重，d为阻尼系数，算法迭代稳定后的节点权重W即为每个用户的影响力指数。

参数设置

最大迭代次数：算法自身会收敛并停止迭代，选填，默认30

实例

测试数据

新建数据的SQL语句：

drop table if exists PageRankWithWeight_func_test_edge;
create table PageRankWithWeight_func_test_edge as
select * from
(
    select 'a' as flow_out_id,'b' as flow_in_id,1.0 as weight from dual
    union all
    select 'a' as flow_out_id,'c' as flow_in_id,1.0 as weight from dual
    union all
    select 'b' as flow_out_id,'c' as flow_in_id,1.0 as weight from dual
    union all
    select 'b' as flow_out_id,'d' as flow_in_id,1.0 as weight from dual
    union all
    select 'c' as flow_out_id,'d' as flow_in_id,1.0 as weight from dual
)tmp
;

对应的graph结构： pagerank

运行结果

结果如下：
+------+------------+
| node | weight     |
+------+------------+
| a    | 0.0375     |
| b    | 0.06938    |
| c    | 0.12834    |
| d    | 0.20556    |
+------+------------+

pai命令示例

pai -name PageRankWithWeight
    -project algo_public
    -DinputEdgeTableName=PageRankWithWeight_func_test_edge
    -DfromVertexCol=flow_out_id
    -DtoVertexCol=flow_in_id
    -DoutputTableName=PageRankWithWeight_func_test_result
    -DhasEdgeWeight=true
    -DedgeWeightCol=weight
    -DmaxIter 100;

算法参数

参数key名称	参数描述	必/选填	默认值
inputEdgeTableName	输入边表名	必填	-
inputEdgeTablePartitions	输入边表的分区	选填	全表读入
fromVertexCol	输入边表的起点所在列	必填	-
toVertexCol	输入边表的终点所在列	必填	-
outputTableName	输出表名	必填	-
outputTablePartitions	输出表的分区	选填	-
lifecycle	输出表申明周期	选填	-
workerNum	进程数量	选填	未设置
workerMem	进程内存	选填	4096
splitSize	数据切分大小	选填	64
hasEdgeWeight	输入边表的边是否有权重	选填	false
edgeWeightCol	输入边表边的权重所在列	选填	-
maxIter	最大迭代次数	选填	30

标签传播聚类

功能介绍

图聚类是根据图的拓扑结构，进行子图的划分，使得子图内部节点的链接较多，子图之间的连接较少。标签传播算法（Label Propagation Algorithm, LPA）是基于图的半监督学习方法，其基本思路是节点的标签（community）依赖其邻居节点的标签信息，影响程度由节点相似度决定，并通过传播迭代更新达到稳定。

参数介绍

最大迭代次数：选填，默认30

实例

测试数据

数据生成SQL:

drop table if exists LabelPropagationClustering_func_test_edge;
create table LabelPropagationClustering_func_test_edge as
select * from
(
    select '1' as flow_out_id,'2' as flow_in_id,0.7 as edge_weight from dual
    union all
    select '1' as flow_out_id,'3' as flow_in_id,0.7 as edge_weight from dual
    union all
    select '1' as flow_out_id,'4' as flow_in_id,0.6 as edge_weight from dual
    union all
    select '2' as flow_out_id,'3' as flow_in_id,0.7 as edge_weight from dual
    union all
    select '2' as flow_out_id,'4' as flow_in_id,0.6 as edge_weight from dual
    union all
    select '3' as flow_out_id,'4' as flow_in_id,0.6 as edge_weight from dual
    union all
    select '4' as flow_out_id,'6' as flow_in_id,0.3 as edge_weight from dual
    union all
    select '5' as flow_out_id,'6' as flow_in_id,0.6 as edge_weight from dual
    union all
    select '5' as flow_out_id,'7' as flow_in_id,0.7 as edge_weight from dual
    union all
    select '5' as flow_out_id,'8' as flow_in_id,0.7 as edge_weight from dual
    union all
    select '6' as flow_out_id,'7' as flow_in_id,0.6 as edge_weight from dual
    union all
    select '6' as flow_out_id,'8' as flow_in_id,0.6 as edge_weight from dual
    union all
    select '7' as flow_out_id,'8' as flow_in_id,0.7 as edge_weight from dual
)tmp
;
drop table if exists LabelPropagationClustering_func_test_node;
create table LabelPropagationClustering_func_test_node as
select * from
(
    select '1' as node,0.7 as node_weight from dual
    union all
    select '2' as node,0.7 as node_weight from dual
    union all
    select '3' as node,0.7 as node_weight from dual
    union all
    select '4' as node,0.5 as node_weight from dual
    union all
    select '5' as node,0.7 as node_weight from dual
    union all
    select '6' as node,0.5 as node_weight from dual
    union all
    select '7' as node,0.7 as node_weight from dual
    union all
    select '8' as node,0.7 as node_weight from dual
)tmp
;

数据对应的group结构：


运行结果
结果如下：  
+------+------------+
| node | group_id   |
+------+------------+
| 1    | 1          |
| 2    | 1          |
| 3    | 1          |
| 4    | 1          |
| 5    | 5          |
| 6    | 5          |
| 7    | 5          |
| 8    | 5          |
+------+------------+
pai命令示例
pai -name LabelPropagationClustering
    -project algo_public
    -DinputEdgeTableName=LabelPropagationClustering_func_test_edge
    -DfromVertexCol=flow_out_id
    -DtoVertexCol=flow_in_id
    -DinputVertexTableName=LabelPropagationClustering_func_test_node
    -DvertexCol=node
    -DoutputTableName=LabelPropagationClustering_func_test_result
    -DhasEdgeWeight=true
    -DedgeWeightCol=edge_weight
    -DhasVertexWeight=true
    -DvertexWeightCol=node_weight
    -DrandSelect=true
    -DmaxIter=100;
算法参数
参数key名称
参数描述
必/选填
默认值
inputEdgeTableName
输入边表名
必填
-
inputEdgeTablePartitions
输入边表的分区
选填
全表读入
fromVertexCol
输入边表的起点所在列
必填
-
toVertexCol
输入边表的终点所在列
必填
-
inputVertexTableName
输入点表名称
必填
-
inputVertexTablePartitions
输入点表的分区
选填
全表读入
vertexCol
输入点表的点所在列
必填
-
outputTableName
输出表名
必填
-
outputTablePartitions
输出表的分区
选填
-
lifecycle
输出表申明周期
选填
-
workerNum
进程数量
选填
未设置
workerMem
进程内存
选填
4096
splitSize
数据切分大小
选填
64
hasEdgeWeight
输入边表的边是否有权重
选填
false
edgeWeightCol
输入边表边的权重所在列
选填
-
hasVertexWeight
输入点表的点是否有权重
选填
false
vertexWeightCol
输入点表的点的权重所在列
选填
-
randSelect
是否随机选择最大标签
选填
false
maxIter
最大迭代次数
选填
30
标签传播分类
功能介绍
该算法为半监督的分类算法，原理为用已标记节点的标签信息去预测未标记节点的标签信息。  
在算法执行过程中，每个节点的标签按相似度传播给相邻节点，在节点传播的每一步，每个节点根据相邻节点的标签来更新自己的标签，与该节点相似度越大，其相邻节点对其标注的影响权值越大，相似节点的标签越趋于一致，其标签就越容易传播。在标签传播过程中，保持已标注数据的标签不变，使其像一个源头把标签传向未标注数据。  
最终，当迭代过程结束时，相似节点的概率分布也趋于相似，可以划分到同一个类别中，从而完成标签传播过程
参数设置

阻尼系数:默认0.8收敛系数:默认0.000001

实例
测试数据
生成数据的SQL:  
drop table if exists LabelPropagationClassification_func_test_edge;
create table LabelPropagationClassification_func_test_edge as
select * from
(
    select 'a' as flow_out_id, 'b' as flow_in_id, 0.2 as edge_weight from dual
    union all
    select 'a' as flow_out_id, 'c' as flow_in_id, 0.8 as edge_weight from dual
    union all
    select 'b' as flow_out_id, 'c' as flow_in_id, 1.0 as edge_weight from dual
    union all
    select 'd' as flow_out_id, 'b' as flow_in_id, 1.0 as edge_weight from dual
)tmp
;
drop table if exists LabelPropagationClassification_func_test_node;
create table LabelPropagationClassification_func_test_node as
select * from
(
    select 'a' as node,'X' as label, 1.0 as label_weight from dual
    union all
    select 'd' as node,'Y' as label, 1.0 as label_weight from dual
)tmp
;
对应的图结构：

运行结果
结果如下：
+------+-----+------------+
| node | tag | weight     |
+------+-----+------------+
| a    | X   | 1.0        |
| b    | X   | 0.16667    |
| b    | Y   | 0.83333    |
| c    | X   | 0.53704    |
| c    | Y   | 0.46296    |
| d    | Y   | 1.0        |
+------+-----+------------+
pai命令示例
pai -name LabelPropagationClassification
    -project algo_public
    -DinputEdgeTableName=LabelPropagationClassification_func_test_edge
    -DfromVertexCol=flow_out_id
    -DtoVertexCol=flow_in_id
    -DinputVertexTableName=LabelPropagationClassification_func_test_node
    -DvertexCol=node
    -DvertexLabelCol=label
    -DoutputTableName=LabelPropagationClassification_func_test_result
    -DhasEdgeWeight=true
    -DedgeWeightCol=edge_weight
    -DhasVertexWeight=true
    -DvertexWeightCol=label_weight
    -Dalpha=0.8
    -Depsilon=0.000001;
算法参数
参数key名称
参数描述
必/选填
默认值
inputEdgeTableName
输入边表名
必填
-
inputEdgeTablePartitions
输入边表的分区
选填
全表读入
fromVertexCol
输入边表的起点所在列
必填
-
toVertexCol
输入边表的终点所在列
必填
-
inputVertexTableName
输入点表名称
必填
-
inputVertexTablePartitions
输入点表的分区
选填
全表读入
vertexCol
输入点表的点所在列
必填
-
vertexLabelCol
输入点表的点的标签
必填
-
outputTableName
输出表名
必填
-
outputTablePartitions
输出表的分区
选填
-
lifecycle
输出表申明周期
选填
-
workerNum
进程数量
选填
未设置
workerMem
进程内存
选填
4096
splitSize
数据切分大小
选填
64
hasEdgeWeight
输入边表的边是否有权重
选填
false
edgeWeightCol
输入边表边的权重所在列
选填
-
hasVertexWeight
输入点表的点是否有权重
选填
false
vertexWeightCol
输入点表的点的权重所在列
选填
-
alpha
阻尼系数
选填
0.8
epsilon
收敛系数
选填
0.000001
maxIter
最大迭代次数
选填
30
Modularity
功能介绍
Modularity是一种评估社区网络结构的指标，来评估网络结构中划分出来社区的紧密程度，往往0.3以上是比较明显的社区结构。
实例
测试数据
略（与标签传播聚类算法的数据相同）
运行结果
结果如下：
+--------------+
| val          |
+--------------+
| 0.4230769    |
+--------------+
pai命令示例
pai -name Modularity
    -project algo_public
    -DinputEdgeTableName=Modularity_func_test_edge
    -DfromVertexCol=flow_out_id
    -DfromGroupCol=group_out_id
    -DtoVertexCol=flow_in_id
    -DtoGroupCol=group_in_id
    -DoutputTableName=Modularity_func_test_result;
算法参数
参数key名称
参数描述
必/选填
默认值
inputEdgeTableName
输入边表名
必填
-
inputEdgeTablePartitions
输入边表的分区
选填
全表读入
fromVertexCol
输入边表的起点所在列
必填
-
fromGroupCol
输入边表起点的群组
必填
-
toVertexCol
输入边表的终点所在列
必填
-
toGroupCol
输入边表终点的群组
必填
-
outputTableName
输出表名
必填
-
outputTablePartitions
输出表的分区
选填
-
lifecycle
输出表申明周期
选填
-
workerNum
进程数量
选填
未设置
workerMem
进程内存
选填
4096
splitSize
数据切分大小
选填
64
最大联通子图
功能介绍
在无向图G中，若从顶点A到顶点B有路径相连，则称A和B是连通的；在图G种存在若干子图，其中每个子图中所有顶点之间都是连通的，但在不同子图间不存在顶点连通，那么称图G的这些子图为最大连通子图。
参数设置

无

实例
测试数据
生成数据的SQL:  
drop table if exists MaximalConnectedComponent_func_test_edge;
create table MaximalConnectedComponent_func_test_edge as
select * from
(
  select '1' as flow_out_id,'2' as flow_in_id from dual
  union all
  select '2' as flow_out_id,'3' as flow_in_id from dual
  union all
  select '3' as flow_out_id,'4' as flow_in_id from dual
  union all
  select '1' as flow_out_id,'4' as flow_in_id from dual
  union all
  select 'a' as flow_out_id,'b' as flow_in_id from dual
  union all
  select 'b' as flow_out_id,'c' as flow_in_id from dual
)tmp;
drop table if exists MaximalConnectedComponent_func_test_result;
create table MaximalConnectedComponent_func_test_result
(
  node string,
  grp_id string
);
对应的图结构：

运行结果
结果如下：
+-------+-------+
| node  | grp_id|
+-------+-------+
| 1     | 4     |
| 2     | 4     |
| 3     | 4     |
| 4     | 4     |
| a     | c     |
| b     | c     |
| c     | c     |
+-------+-------+
pai命令示例
pai -name MaximalConnectedComponent 
    -project algo_public 
    -DinputEdgeTableName=MaximalConnectedComponent_func_test_edge 
    -DfromVertexCol=flow_out_id
    -DtoVertexCol=flow_in_id 
    -DoutputTableName=MaximalConnectedComponent_func_test_result;
算法参数
参数key名称
参数描述
必/选填
默认值
inputEdgeTableName
输入边表名
必填
-
inputEdgeTablePartitions
输入边表的分区
选填
全表读入
fromVertexCol
输入边表的起点所在列
必填
-
toVertexCol
输入边表的终点所在列
必填
-
outputTableName
输出表名
必填
-
outputTablePartitions
输出表的分区
选填
-
lifecycle
输出表申明周期
选填
-
workerNum
进程数量
选填
未设置
workerMem
进程内存
选填
4096
splitSize
数据切分大小
选填
64
点聚类系数
功能介绍
在无向图G中，计算每一个节点周围的稠密度，星状网络稠密度为0，全联通网络稠密度为1。
参数设置

maxEdgeCnt：若节点度大于该值，则进行抽样，默认500，选填。

实例
测试数据
生成数据的SQL:  
drop table if exists NodeDensity_func_test_edge;
create table NodeDensity_func_test_edge as
select * from
(
  select '1' as flow_out_id, '2' as flow_in_id from dual
  union all
  select '1' as flow_out_id, '3' as flow_in_id from dual
  union all
  select '1' as flow_out_id, '4' as flow_in_id from dual
  union all
  select '1' as flow_out_id, '5' as flow_in_id from dual
  union all
  select '1' as flow_out_id, '6' as flow_in_id from dual
  union all
  select '2' as flow_out_id, '3' as flow_in_id from dual
  union all
  select '3' as flow_out_id, '4' as flow_in_id from dual
  union all
  select '4' as flow_out_id, '5' as flow_in_id from dual
  union all
  select '5' as flow_out_id, '6' as flow_in_id from dual
  union all
  select '5' as flow_out_id, '7' as flow_in_id from dual
  union all
  select '6' as flow_out_id, '7' as flow_in_id from dual
)tmp;
drop table if exists NodeDensity_func_test_result;
create table NodeDensity_func_test_result
(
  node string,
  node_cnt bigint,
  edge_cnt bigint,
  density double,
  log_density double
);
对应的图结构：

运行结果
结果如下：
1,5,4,0.4,1.45657
2,2,1,1.0,1.24696
3,3,2,0.66667,1.35204
4,3,2,0.66667,1.35204
5,4,3,0.5,1.41189
6,3,2,0.66667,1.35204
7,2,1,1.0,1.24696
pai命令示例
pai -name NodeDensity
    -project algo_public
    -DinputEdgeTableName=NodeDensity_func_test_edge
    -DfromVertexCol=flow_out_id
    -DtoVertexCol=flow_in_id
    -DoutputTableName=NodeDensity_func_test_result
    -DmaxEdgeCnt=500;
算法参数
参数key名称
参数描述
必/选填
默认值
inputEdgeTableName
输入边表名
必填
-
inputEdgeTablePartitions
输入边表的分区
选填
全表读入
fromVertexCol
输入边表的起点所在列
必填
-
toVertexCol
输入边表的终点所在列
必填
-
outputTableName
输出表名
必填
-
outputTablePartitions
输出表的分区
选填
-
lifecycle
输出表申明周期
选填
-
maxEdgeCnt
若节点度大于该值，则进行抽样。
选填
500
workerNum
进程数量
选填
未设置
workerMem
进程内存
选填
4096
splitSize
数据切分大小
选填
64
边聚类系数
功能介绍
在无向图G中，计算每一条边周围的稠密度。
参数设置

无

实例
测试数据
生成数据的SQL:  
drop table if exists EdgeDensity_func_test_edge;
create table EdgeDensity_func_test_edge as
select * from
(
  select '1' as flow_out_id,'2' as flow_in_id from dual
  union all
  select '1' as flow_out_id,'3' as flow_in_id from dual
  union all
  select '1' as flow_out_id,'5' as flow_in_id from dual
  union all
  select '1' as flow_out_id,'7' as flow_in_id from dual
  union all
  select '2' as flow_out_id,'5' as flow_in_id from dual
  union all
  select '2' as flow_out_id,'4' as flow_in_id from dual
  union all
  select '2' as flow_out_id,'3' as flow_in_id from dual
  union all
  select '3' as flow_out_id,'5' as flow_in_id from dual
  union all
  select '3' as flow_out_id,'4' as flow_in_id from dual
  union all
  select '4' as flow_out_id,'5' as flow_in_id from dual
  union all
  select '4' as flow_out_id,'8' as flow_in_id from dual
  union all
  select '5' as flow_out_id,'6' as flow_in_id from dual
  union all
  select '5' as flow_out_id,'7' as flow_in_id from dual
  union all
  select '5' as flow_out_id,'8' as flow_in_id from dual
  union all
  select '7' as flow_out_id,'6' as flow_in_id from dual
  union all
  select '6' as flow_out_id,'8' as flow_in_id from dual
)tmp;
drop table if exists EdgeDensity_func_test_result;
create table EdgeDensity_func_test_result
(
  node1 string,
  node2 string,
  node1_edge_cnt bigint,
  node2_edge_cnt bigint,
  triangle_cnt bigint,
  density double
);
对应的图结构：

运行结果
结果如下：
1,2,4,4,2,0.5
2,3,4,4,3,0.75
2,5,4,7,3,0.75
3,1,4,4,2,0.5
3,4,4,4,2,0.5
4,2,4,4,2,0.5
4,5,4,7,3,0.75
5,1,7,4,3,0.75
5,3,7,4,3,0.75
5,6,7,3,2,0.66667
5,8,7,3,2,0.66667
6,7,3,3,1,0.33333
7,1,3,4,1,0.33333
7,5,3,7,2,0.66667
8,4,3,4,1,0.33333
8,6,3,3,1,0.33333
pai命令示例
pai -name EdgeDensity
    -project algo_public
    -DinputEdgeTableName=EdgeDensity_func_test_edge
    -DfromVertexCol=flow_out_id
    -DtoVertexCol=flow_in_id
    -DoutputTableName=EdgeDensity_func_test_result;
算法参数
参数key名称
参数描述
必/选填
默认值
inputEdgeTableName
输入边表名
必填
-
inputEdgeTablePartitions
输入边表的分区
选填
全表读入
fromVertexCol
输入边表的起点所在列
必填
-
toVertexCol
输入边表的终点所在列
必填
-
outputTableName
输出表名
必填
-
outputTablePartitions
输出表的分区
选填
-
lifecycle
输出表申明周期
选填
-
workerNum
进程数量
选填
未设置
workerMem
进程内存
选填
4096
splitSize
数据切分大小
选填
64
计数三角形
功能介绍
在无向图G中，输出所有三角形。
参数设置

maxEdgeCnt：若节点度大于该值，则进行抽样，默认500，选填。

实例
测试数据
生成数据的SQL:  
drop table if exists TriangleCount_func_test_edge;
create table TriangleCount_func_test_edge as
select * from
(
  select '1' as flow_out_id,'2' as flow_in_id from dual
  union all
  select '1' as flow_out_id,'3' as flow_in_id from dual
  union all
  select '1' as flow_out_id,'4' as flow_in_id from dual
  union all
  select '1' as flow_out_id,'5' as flow_in_id from dual
  union all
  select '1' as flow_out_id,'6' as flow_in_id from dual
  union all
  select '2' as flow_out_id,'3' as flow_in_id from dual
  union all
  select '3' as flow_out_id,'4' as flow_in_id from dual
  union all
  select '4' as flow_out_id,'5' as flow_in_id from dual
  union all
  select '5' as flow_out_id,'6' as flow_in_id from dual
  union all
  select '5' as flow_out_id,'7' as flow_in_id from dual
  union all
  select '6' as flow_out_id,'7' as flow_in_id from dual
)tmp;
drop table if exists TriangleCount_func_test_result;
create table TriangleCount_func_test_result
(
  node1 string,
  node2 string,
  node3 string
);
对应的图结构：

运行结果
结果如下：
1,2,3
1,3,4
1,4,5
1,5,6
5,6,7
pai命令示例
pai -name TriangleCount
    -project algo_public
    -DinputEdgeTableName=TriangleCount_func_test_edge
    -DfromVertexCol=flow_out_id
    -DtoVertexCol=flow_in_id
    -DoutputTableName=TriangleCount_func_test_result;
算法参数
参数key名称
参数描述
必/选填
默认值
inputEdgeTableName
输入边表名
必填
-
inputEdgeTablePartitions
输入边表的分区
选填
全表读入
fromVertexCol
输入边表的起点所在列
必填
-
toVertexCol
输入边表的终点所在列
必填
-
outputTableName
输出表名
必填
-
outputTablePartitions
输出表的分区
选填
-
lifecycle
输出表申明周期
选填
-
maxEdgeCnt
若节点度大于该值，则进行抽样。
选填
500
workerNum
进程数量
选填
未设置
workerMem
进程内存
选填
4096
splitSize
数据切分大小
选填
64
树深度
功能介绍
对于众多树状网络，输出每个节点的所处深度和树ID。
参数设置

无

实例
测试数据
生成数据的SQL:  
drop table if exists TreeDepth_func_test_edge;
create table TreeDepth_func_test_edge as
select * from
(
    select '0' as flow_out_id, '1' as flow_in_id from dual
    union all
    select '0' as flow_out_id, '2' as flow_in_id from dual
    union all
    select '1' as flow_out_id, '3' as flow_in_id from dual
    union all
    select '1' as flow_out_id, '4' as flow_in_id from dual
    union all
    select '2' as flow_out_id, '4' as flow_in_id from dual
    union all
    select '2' as flow_out_id, '5' as flow_in_id from dual
    union all
    select '4' as flow_out_id, '6' as flow_in_id from dual
    union all
    select 'a' as flow_out_id, 'b' as flow_in_id from dual
    union all
    select 'a' as flow_out_id, 'c' as flow_in_id from dual
    union all
    select 'c' as flow_out_id, 'd' as flow_in_id from dual
    union all
    select 'c' as flow_out_id, 'e' as flow_in_id from dual
)tmp;
drop table if exists TreeDepth_func_test_result;
create table TreeDepth_func_test_result
(
  node string,
  root string,
  depth bigint
);
对应的图结构：

运行结果
结果如下：
0,0,0
1,0,1
2,0,1
3,0,2
4,0,2
5,0,2
6,0,3
a,a,0
b,a,1
c,a,1
d,a,2
e,a,2
pai命令示例
pai -name TreeDepth
    -project algo_public
    -DinputEdgeTableName=TreeDepth_func_test_edge
    -DfromVertexCol=flow_out_id
    -DtoVertexCol=flow_in_id
    -DoutputTableName=TreeDepth_func_test_result;
算法参数
参数key名称
参数描述
必/选填
默认值
inputEdgeTableName
输入边表名
必填
-
inputEdgeTablePartitions
输入边表的分区
选填
全表读入
fromVertexCol
输入边表的起点所在列
必填
-
toVertexCol
输入边表的终点所在列
必填
-
outputTableName
输出表名
必填
-
outputTablePartitions
输出表的分区
选填
-
lifecycle
输出表申明周期
选填
-
workerNum
进程数量
选填
未设置
workerMem
进程内存
选填
4096
splitSize
数据切分大小
选填
64
最后更新：2016-11-23 16:04:15
  上一篇： 文本分析__使用手册(new)_机器学习-阿里云
  下一篇： 【图算法】金融风控实验__案例_机器学习-阿里云
相关内容
 查询签名密钥列表__后端签名密钥相关接口_API_API 网关-阿里云
 Python SDK下载__SDK下载_SDK使用手册_归档存储-阿里云
 ListVirtualMFADevices__用户管理接口_RAM API文档_访问控制-阿里云
 Job配置约定__作业配置说明_使用手册_数据集成-阿里云
 修改集群名称__集群_API参考_E-MapReduce-阿里云
 短信字数最多能发多少个字？ 建议400个字以内的短信。__常见问题_短信服务-阿里云
 自定义算法开发__产品简介_推荐引擎-阿里云
 企业信息安全整体解决方案 阿里云栖大会，我们来了！
 关键组件和流程__产品简介_业务实时监控服务 ARMS-阿里云
 添加监控服务器__测试环境_使用手册_性能测试-阿里云
热门内容
 常见错误说明__附录_大数据计算服务-阿里云
 发送短信接口__API使用手册_短信服务-阿里云
 接口文档__Android_安全组件教程_移动安全-阿里云
 运营商错误码（联通）__常见问题_短信服务-阿里云
 设置短信模板__使用手册_短信服务-阿里云
 OSS 权限问题及排查__常见错误及排除_最佳实践_对象存储 OSS-阿里云
 消息通知__操作指南_批量计算-阿里云
 设备端快速接入(MQTT)__快速开始_阿里云物联网套件-阿里云
 查询API调用流量数据__API管理相关接口_API_API 网关-阿里云
 使用STS访问__JavaScript-SDK_SDK 参考_对象存储 OSS-阿里云
最新内容
 阿里云云大使申请指南：从入门到精通，成为阿里云生态的贡献者
 阿里云：并非一种材料，而是一个庞大的云计算平台
 阿里云ECS实例：快速创建你的专属云服务器
 阿里云是什么？详解阿里云核心概念及服务
 阿里云适合哪些企业？深度解析阿里云适用场景与优势
 阿里云钱包深度使用指南：充值、支付、管理及安全策略
 阿里云玩转指南：从小白到云端高手
 阿里云程序管理关闭方法详解及最佳实践
 阿里云认证考试指南：全面解读及高效备考策略
 阿里云发票格式详解及常见问题解答

网络分析__使用手册(new)_机器学习-阿里云

目录

k-Core

功能介绍

参数设置

实例

测试数据

运行结果

pai命令示例

算法参数

单源最短路径

功能介绍

参数设置

实例

测试数据

运行结果

pai命令示例

算法参数

PageRank

功能介绍

参数设置

实例

测试数据

运行结果

pai命令示例

算法参数

标签传播聚类

功能介绍

参数介绍

实例

测试数据

运行结果

pai命令示例

算法参数

标签传播分类

功能介绍

参数设置

实例

测试数据

运行结果

pai命令示例

算法参数

Modularity

功能介绍

实例

测试数据

运行结果

pai命令示例

算法参数

最大联通子图

功能介绍

参数设置

实例

测试数据

运行结果

pai命令示例

算法参数

点聚类系数

功能介绍

参数设置

实例

测试数据

运行结果

pai命令示例

算法参数

边聚类系数

功能介绍

参数设置

实例

测试数据

运行结果

pai命令示例

算法参数

计数三角形

功能介绍

参数设置

实例

测试数据

运行结果

pai命令示例

上一篇：文本分析__使用手册(new)_机器学习-阿里云

下一篇：【图算法】金融风控实验__案例_机器学习-阿里云