数据挖掘数据集下载(【Web大数据挖掘】PageRank算法具体实现

项目说明Compute the PageRank scores on the Wikipedia dataset

dataset : wiki data.txttheformatofthelinesinthefileisasfollow 3360 fromnodeidtonodeidinthisproject， youneedtoreportthetop 100 nodeidwiththeirpagerankscores.youcanchoosedifferentparameters，such as the teleport parameter， tocomparedifferentresults.oneresultyoumustreportisthatwhensettingtheteleportparameterto 0.85.inadditiontothebasicpagerankalal 考虑youneedtoimplementtheblock-stripeupdatealgorithm .实现请求语言：c.deadends和spider trap节点。优化稀疏矩阵。实现分块计算。程序需要重复直到收敛。不能直接调整接口。例如，如果要实现pagerank，请使用Python的networkx包.结果格式(.txt文件) ([NodeID] [Score] .实现详细信息http://www.Sina.com/http://www

2.1数据总行数： 103689

2.2节点总数： 7115

2.3最大节点ID:8297

2.4数据调用位置1、算法简介：略

考虑3.1deadends和spider trap节点：考虑到这两种情况下的节点，PageRank算法引入了随机浏览模型。定义衰减因子表示用户通过跳转链接连接互联网的概率，通常为固定值0.85，1 -=0.15表示用户没有通过跳转链接访问网页。

3.2稀疏矩阵的优化

3.3实现分块计算

2、WikiData数据集说明

4.1阿尔法=0.75，分1个块计算

http://www.Sina.com/http://www.Sina.com /

6.1阿尔法取相同值，区块不同

更多…

如果作者需要联系项目代码和文档，请与作者联系获取。

数据挖掘数据集下载(【Web大数据挖掘】PageRank算法具体实现 - 计算 Wikipedia 数据集上的 PageRank 分数)