WISE2012
WISE 2012 Challenge The follow-up information about the challenge can be found here Winners: Championship on T1: Throughput and Latency Ze Tang, Heng Lin, Kaiwei Li, and Wentao Han Department of Computer Science and Technology, Tsinghua University, China Championship on T1: Scalability Edans F.O. Sandes, Li Weigang, and Alba C. M. A. de Melo University of Brasilia, Brasilia, Brazil Championship on T2 Sayan Unankard, Ling Chen, Peng Li, Sen Wang, Zi Huang, Mohamed Sharaf, and Xue Li School of Information Technology and Electrical Engineering, The University of Queensland, Australia Runner-Up on T2 Zhilin Luo, Yue Wang, and Xintao Wu University of North Carolina at Charlotte, USA ==================================================================== Rankings: T1: Performance Track Throughput 1: Ze Tang, Heng Lin, Kaiwei Li, and Wentao Han Department of Computer Science and Technology, Tsinghua University, China 2: Edans F.O. Sandes, Li Weigang, and Alba C. M. A. de Melo University of Brasilia, Brasilia, Brazil 3: Lizhou Zheng, Xiaofeng Zhou, Zhenwen Lin, Peiquan Jin School of Computer Science and Technology, University of Science and Technology of China, China 4: Feng Zhu, Jie Liu, and Lijie Xu Technology Center of Software Engineering, Institute of Software, Chinese Academy of Sciences, China Latency 1: Ze Tang, Heng Lin, Kaiwei Li, and Wentao Han Department of Computer Science and Technology, Tsinghua University 2: Lizhou Zheng, Xiaofeng Zhou, Zhenwen Lin, Peiquan Jin School of Computer Science and Technology, University of Science and Technology of China, China 3: Edans F.O. Sandes, Li Weigang, and Alba C. M. A. de Melo University of Brasilia, Brasilia, Brazil 4: Feng Zhu, Jie Liu, and Lijie Xu Technology Center of Software Engineering, Institute of Software, Chinese Academy of Sciences, China Scalability 1: Edans F.O. Sandes, Li Weigang, and Alba C. M. A. de Melo University of Brasilia, Brasilia, Brazil 2: Ze Tang, Heng Lin, Kaiwei Li, and Wentao Han Department of Computer Science and Technology, Tsinghua University 3: Lizhou Zheng, Xiaofeng Zhou, Zhenwen Lin, Peiquan Jin School of Computer Science and Technology, University of Science and Technology of China, China 4: Feng Zhu, Jie Liu, and Lijie Xu Technology Center of Software Engineering, Institute of Software, Chinese Academy of Sciences, China T2: Mining Track 1: Sayan Unankard, Ling Chen, Peng Li, Sen Wang, Zi Huang, Mohamed Sharaf, and Xue Li School of Information Technology and Electrical Engineering, The University of Queensland, Australia 2: Zhilin Luo, Yue Wang, and Xintao Wu University of North Carolina at Charlotte, USA 3: Han Li, Kuang Chong, Zhiyuan Liu Tsinghua University, China 4: Lianshuai Zhang, Zequn Zhang, and Peiquan Jin School of Computer Science and Technology, University of Science and Technology of China, China 5: Juarez Paulino, Lucas A. Almeida, Felipe M. Modesto, Thiago F. Neves, and Li Weigang, Department of Computer Science, University of Brasilia, Brasilia, Brazil 6: Hongbo Zhang, Qun Zhao, Hongyan Liu, Ke Xiao, Jun He, Xiaoyong Du School of Information, Renmin University of China, China Management Science and Engineering, Tsinghua University, China 7: FENG Song, ZHANG Chuang, LIU Yuxuan, LI Tai School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, China =========================================================== Notice: 1. There is a error in the BSMA Performance Testing Tool (BSMA.zip) related to the Q8. Attendants need to apply the patch by using the following command:
Download links of the bsma20120321.patch file: Mirror1: http://d.yun.io/fgq0VlMirror2: http://115.com/file/e7ijpv4y# Mirror3: https://content.wuala.com/contents/imc_ecnu/wise_challenge/bsma20120321.patch?dl=1 2. Papers or reports on the challenge should be submitted to the Challenge Track via paper submission system. 3. There is an error in Query 18. The correct query and descriptions in A2.pdf should be: =====================18. Find out top-x (x may be 10, 50, or 100) users: Those users that are AAZs followers. Get all AAZs followers, order them by the number of their micro-blogs mentioning A, and the time of the micro-blogs should be in a timeRange from a timestamp. SELECT microblog.uidFROM microblog,mention WHERE microblog.mid = mention.mid AND mention.uid="A" AND microblog.uid IN (SELECT uid FROM friendList WHERE friendID = "A") AND microblog.time BETWEEN TO_DAYS('YYYY-MM-DD HH:MM:SS') AND DATE_ADD('YYYY-MM-DD HH:MM:SS', INTERVAL 1HOUR) GROUP BY microblog.uid ORDER BY COUNT(*) DESC LIMIT 10; ==================== 4. There are tweets with duplicated MIDs but having different values in other fields. All these records are returned by Sina Weibo API. There is no clue on which record should be correct. Attendants should handle these duplicated MIDs by themselves. 5. There are missing events. There are two types of missing events: 1). Our auto-annotation system cannot identify any corresponding tweets:-Chinese pro-democracy protests -Jiang Zemin disappearance and death rumor 2). They are labeled with different names in events.txt: -Yao Ming retirement +Yao Ming retire are actually the same event. -Motorola was purchased by Google +Motorola was acquisitions by Google are actually the same event. -iphone4S release +iphone4s release are actually the same event. 6. There are event labels in tweets that do not appear in events.txt. They are events that have no Wikipedia links provided. Attendants may omit them. There are also keyword labels that are not listed in events.txt. They are keywords related to above events without Wikipedia links. 7. Event names and keywords are case insensitive. ===========================================================1. Introduction WISE 2012 Challenge is based on a dataset collected from one of the most popular micro-blog service (http://weibo.com). The challenge has two tracks: 1) the performance track, and 2) the mining track. Attendees may attend one or both tracks. Selected reports will be published in conference proceedings after review. Important dates: 2. Submission guideline Attendees may attend one or both tracks. Two separate submissions should be sent if both tracks are attended. Each submission should contain two parts: 1) Results: Results should be submitted to wise2012challenge@gmail.com by 18th May, 2012 following the specification provided in task description. 2) Report: Report should be submitted via the WISE 2012 submission system. Attendees should register their submission by 11th May, 2012, and submit the final report by 18th May, 2012. The report should follow the WISE 2012 research paper format requirements. Details of how the attendees finish the challenge tasks should be introduced in the report, while the results should be summarized. 3. The dataset The original data was crawled from Sina Weibo (http://weibo.com), a popular micro-blogging service in China, via the API provided. The dataset distributed in WISE 2012 Challenge is preprocessed as follows: 1) User IDs and message IDs are anonymized.2) Content of tweets are removed, based on Sina Weibo's Terms of Services. 3) Some tweets are annotated with events. For each event, the terms that are used to identify the event and a link to Wikipedia (http://wikipedia.org) page containing descriptions to the event are given. The information of events are given in the file events.txt. The dataset that to be used in both tracks contains two sets of files: 1) Tweets: It includes basic information about tweets (time, user ID, message ID etc.), mentions (user IDs appearing in tweets), re-tweet paths, and whether containing links.2) Followship network: It includes the following network of users (based on user IDs). In addition, a small testing dataset that should be used in the mining track is provided. It contains one file, which shares the same format of the tweets file introduced above. A small part of re-tweeting activities of thirty-three tweets of six events are given in the testing file. It should be noted that the dataset is not complete, yet is only a sample of the whole data in the micro-blogging service. The details of dataset format are given in Appendix 1: Data format. 4. The performance track (T1) Attendees are required to build a system for evaluating queries over the dataset. Nineteen typical queries should be covered and corresponding interfaces in BSMA performance testing tool should be implemented. The target is to achieve low response time and high throughput reported by BSMA performance testing tool. Result submission specification: 1) Results should be submitted via email to wise2012challenge@gmail.com2) Email title should be: [T1] xxx Part:y/z. In which 'xxx' denotes the paper id assigned by the paper submission system in registration, 'z' is the total number of emails for submission, while 'y' denotes the sequential number of the email in the submission. 3) All results should be submitted as attachments of the email. Each mail should only contain one attachment whose size should be no more than 20MB. 4) The attachment should be in tar.gz or zip format. 5) The attachment should contain all 1344 result files generated by the performance testing tool, without any modifications (including the file names), in the root directory of the compressed package. The typical queries are introduced in Appendix 2: T1: Queries. The BSMA performance testing tool manual is given in Appendix 3: T1: BSMA performance testing tool manual. 5. The mining track (T2) In T2, it is required to predict the re-tweeting activities of thirty-three tweets of six events. For each of these six events, only tweets (and re-tweets) before a given timestamp are given in the file of Tweets. Thirty-three tweets are given in the file of Tests. For each of them, the event that it belongs to is given. As in Tweets, only information of re-tweeting before the timestamp is given. Attendees are required to predict two measurements at the time that the original tweet is published 30 days. These two measurements are: 1) M1: The number of times that the original tweet is re-tweeted. If a user re-tweet (or called re-post, or forward) a tweet twice at different timestamps, it should be counted two times.2) M2: The number of times of possible-view of the original tweet. The number of possible-view of one re-tweet activity is defined as the number of followers of the user who conduct the re-tweet action. The number of times of possible-view of a tweet is defined as the sum of all possible-view numbers of re-tweet actions. It should be noted that all re-tweeting actions in a re-tweeting chain should be counted in the root of the chain. Result submission specification: 1) Results should be submitted via email to wise2012challenge@gmail.com2) Email title should be: [T2] xxx Part:y/z. In which 'xxx' denotes the paper id assigned by the paper submission system in registration, 'z' is the total number of emails for submission, while 'y' denotes the sequential number of the email in the submission. 3) All results should be submitted as attachments of the email. Each mail should only contain one attachment whose size should be no more than 20MB. 4) The attachment should be in plain text format with thirty-three rows, in which each row contains three fields: the message ID of the original tweet, the predicted M1 value, and the predicted M2 value. 6. Downloads The dataset and documents can be downloaded from various sites. Datasets from different sites are all the same, though they may be compressed in different forms, to fit the requirements of different storage services. Site A: Wuala.com Appendix 1: Data format Appendix 2: T1: Queries Appendix 3: T1: BSMA performance testing tool manual Tweets: in twelve compressed files (Please note that these files are quite large, and may take quite a long time to download.):
Followships: in three compressed zip files (Please note that the files are quite large and may take quite a long time to download):
Events: events.txtTesting: eventForTest.zipBSMA performance testing tool BSMA.zip Site B: The University of Queensland http://itee.uq.edu.au/~dke/WISE2012.htm The files are exactly the same as those in http://www.wuala.com/imc_ecnu/wise_challenge/ Site C: 115 (Suggested for users with IP in China) Appendix 1: Data format Appendix 2: T1: Queries Appendix 3: T1: BSMA performance testing tool manual Tweets: in twelve compressed files (Please note that these files are quite large, and may take quite a long time to download.):
Followships: in three compressed zip files (Please note that the files are quite large and may take quite a long time to download):
Events: events.txt http://115.com/file/beem15q0Testing: eventForTest.zip http://115.com/file/ans4nu5cBSMA performance testing tool BSMA.zip http://115.com/file/dppk9ydp
Site D: yun.io (Please note that tweets and following network files on this server are compressed in different forms to those on other sites.) Appendix 1: Data format Appendix 2: T1: Queries Appendix 3: T1: BSMA performance testing tool manual Tweets: in seven compressed files (Please note that these files are quite large, and may take quite a long time to download.):
Followships: in one compressed zip file (Please note that the file is quite large and may take quite a long time to download):
Events: events.txt http://d.yun.io/qEomuuTesting: eventForTest.zip http://d.yun.io/HtmeAuBSMA performance testing tool BSMA.zip http://d.yun.io/nuo6za
7. Contact wise2012challenge@gmail.com |
Sponsors |