首页 > 编程知识 正文

文本分类数据集汇总怎么做,文本信息分类

时间:2023-05-03 09:47:32 阅读:278513 作者:2047

统计了下载到的文本分类数据集信息,汇总成表格如下(时间:2020.7.1):

DatasetClassesTypeSamplesBest MethodPerformanceAG News4TopicTrain:120000 Test: 7600XLNetError: 4.45Dbpedia14TopicTrain: 560000 Test: 70000XLNetError: 0.6TREC-66QuestionTrain: 5452 Test: 500USE_T+CNNError: 1.93TREC-5050QuestionTrain: 5452 Test: 500RulesError: 2.820NEWS20Topic20,000SGCAcc: 88.5IMDb2SentimentTrain: 25,000 Test: 25,000XLNetAcc: 96.8Yahoo! Answers10QuestionTrain: 1,400,000 Test: 60,000BERT-ITPT-FiTAcc: 77.62R88TopicTrain: 5,485 Test: 2,189NABoE-fullAcc: 97.9Ohsumed23疾病分类50,216SGCNAcc: 68.5Sogou News5TopicTrain: 450,000 Test: 60,000BERT-ITPT-FiTAcc: 98.07Amazon-22评分1-2: negative 4-5: positiveper class Train: 1,800,000 Test: 200,000XLNetError: 2.11Amazon-55用户评分1-5per class Train: 600,000 Test: 130,000XLNetError: 31.67Yelp-221-2: negative 4-5: positiveper class Train: 130,000 Test: 10,000XLNetAcc:98.63Yelp-55用户评分1-5per class Train:130,000 Test: 10,000HANNNAcc: 73.28Reuters-2157890TopicTrain:7769 Test: 3019MPAD-pathAcc: 97.44Cora7论文分类:如:遗传算法2708ACNetAcc: 83.5BBCSports5Topic737MPAD-pathAcc: 99.59WOS-1196735, 7父类论文类别: 如: CS->computer graphics11967RMDLAcc:91.59WOS-46985134, 7父类论文类别: 如: CS->computer graphics46985RMDLAcc:82.42WOS-1196711, 3父类论文类别: 如: CS->computer graphics5736RMDLAcc:93.57

未能下载的数据集:DODF Data,MVICTOR(type),RCV1,TRAC2-Benghali. Task 2., TRAC2-English. Task2.,AffCon 2020 Emotion
Detection,IMDb-M,AAPD,Yelp-14,Reuters En-De,Reuters De-En,MPQA,HoC

参考链接:
Text Classification
Document Classification

鉴于有些朋友需要资源,免费开放下载链接,拿资源请点个赞,万分感谢!!!
链接:https://pan.baidu.com/s/10jFP1CfE-HyCVVCYY9XZWw
提取码:0617

版权声明:该文观点仅代表作者本人。处理文章:请发送邮件至 三1五14八八95#扣扣.com 举报,一经查实,本站将立刻删除。