Personal tools

WISE2012


WISE 2012 Challenge

The follow-up information about the challenge can be found here


Winners:
Championship on T1: Throughput and Latency
Ze Tang, Heng Lin, Kaiwei Li, and Wentao Han
Department of Computer Science and Technology, Tsinghua University, China


Championship on T1: Scalability
Edans F.O. Sandes, Li Weigang, and Alba C. M. A. de Melo
University of Brasilia, Brasilia, Brazil


Championship on T2
Sayan Unankard, Ling Chen, Peng Li, Sen Wang, Zi Huang, Mohamed Sharaf, and Xue Li
School of Information Technology and Electrical Engineering, The University of Queensland, Australia


Runner-Up on T2
Zhilin Luo, Yue Wang, and Xintao Wu
University of North Carolina at Charlotte, USA

====================================================================

Rankings:

T1: Performance Track

Throughput

1: Ze Tang, Heng Lin, Kaiwei Li, and Wentao Han
Department of Computer Science and Technology, Tsinghua University, China

2: Edans F.O. Sandes, Li Weigang, and Alba C. M. A. de Melo
University of Brasilia, Brasilia, Brazil

3: Lizhou Zheng, Xiaofeng Zhou, Zhenwen Lin, Peiquan Jin
School of Computer Science and Technology, University of Science and Technology of China, China

4: Feng Zhu, Jie Liu, and Lijie Xu Technology Center of Software Engineering, Institute of Software, Chinese Academy of Sciences, China


Latency

1: Ze Tang, Heng Lin, Kaiwei Li, and Wentao Han
Department of Computer Science and Technology, Tsinghua University

2: Lizhou Zheng, Xiaofeng Zhou, Zhenwen Lin, Peiquan Jin
School of Computer Science and Technology, University of Science and Technology of China, China

3: Edans F.O. Sandes, Li Weigang, and Alba C. M. A. de Melo
University of Brasilia, Brasilia, Brazil

4: Feng Zhu, Jie Liu, and Lijie Xu
Technology Center of Software Engineering, Institute of Software, Chinese Academy of Sciences, China


Scalability

1: Edans F.O. Sandes, Li Weigang, and Alba C. M. A. de Melo
University of Brasilia, Brasilia, Brazil

2: Ze Tang, Heng Lin, Kaiwei Li, and Wentao Han
Department of Computer Science and Technology, Tsinghua University

3: Lizhou Zheng, Xiaofeng Zhou, Zhenwen Lin, Peiquan Jin
School of Computer Science and Technology, University of Science and Technology of China, China

4: Feng Zhu, Jie Liu, and Lijie Xu
Technology Center of Software Engineering, Institute of Software, Chinese Academy of Sciences, China


T2: Mining Track

1: Sayan Unankard, Ling Chen, Peng Li, Sen Wang, Zi Huang, Mohamed Sharaf, and Xue Li
School of Information Technology and Electrical Engineering, The University of Queensland, Australia

2: Zhilin Luo, Yue Wang, and Xintao Wu
University of North Carolina at Charlotte, USA

3: Han Li, Kuang Chong, Zhiyuan Liu
Tsinghua University, China

4: Lianshuai Zhang, Zequn Zhang, and Peiquan Jin
School of Computer Science and Technology, University of Science and Technology of China, China

5: Juarez Paulino, Lucas A. Almeida, Felipe M. Modesto, Thiago F. Neves, and Li Weigang,
Department of Computer Science, University of Brasilia, Brasilia, Brazil

6: Hongbo Zhang, Qun Zhao, Hongyan Liu, Ke Xiao, Jun He, Xiaoyong Du
School of Information, Renmin University of China, China
Management Science and Engineering, Tsinghua University, China

7: FENG Song, ZHANG Chuang, LIU Yuxuan, LI Tai
School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, China




===========================================================

Notice:

1. There is a error in the BSMA Performance Testing Tool (BSMA.zip) related to the Q8. Attendants need to apply the patch by using the following command:

patch -p1 < bsma20120321.patch

under the directory where BSMA lies in.

Download links of the bsma20120321.patch file:

Mirror1: http://d.yun.io/fgq0Vl
Mirror2: http://115.com/file/e7ijpv4y#
Mirror3: https://content.wuala.com/contents/imc_ecnu/wise_challenge/bsma20120321.patch?dl=1

2. Papers or reports on the challenge should be submitted to the Challenge Track via paper submission system.

3. There is an error in Query 18. The correct query and descriptions in A2.pdf should be:

=====================

18. Find out top-x (x may be 10, 50, or 100) users: Those users that are AAZs followers. Get all AAZs followers, order them by the number of their micro-blogs mentioning A, and the time of the micro-blogs should be in a timeRange from a timestamp.

SELECT microblog.uid
FROM microblog,mention
WHERE microblog.mid = mention.mid AND
mention.uid="A" AND
microblog.uid IN
(SELECT uid
FROM friendList
WHERE friendID = "A") AND
microblog.time BETWEEN TO_DAYS('YYYY-MM-DD HH:MM:SS') AND DATE_ADD('YYYY-MM-DD HH:MM:SS', INTERVAL 1HOUR)
GROUP BY microblog.uid
ORDER BY COUNT(*) DESC
LIMIT 10;

====================

4. There are tweets with duplicated MIDs but having different values in other fields. All these records are returned by Sina Weibo API. There is no clue on which record should be correct. Attendants should handle these duplicated MIDs by themselves.

5. There are missing events. There are two types of missing events:

1). Our auto-annotation system cannot identify any corresponding tweets:
-Chinese pro-democracy protests
-Jiang Zemin disappearance and death rumor
2). They are labeled with different names in events.txt:
-Yao Ming retirement
+Yao Ming retire
are actually the same event.
-Motorola was purchased by Google
+Motorola was acquisitions by Google
are actually the same event.
-iphone4S release
+iphone4s release
are actually the same event.

6. There are event labels in tweets that do not appear in events.txt. They are events that have no Wikipedia links provided. Attendants may omit them. There are also keyword labels that are not listed in events.txt. They are keywords related to above events without Wikipedia links.

7. Event names and keywords are case insensitive.

===========================================================



1. Introduction

WISE 2012 Challenge is based on a dataset collected from one of the most popular micro-blog service (http://weibo.com). The challenge has two tracks: 1) the performance track, and 2) the mining track. Attendees may attend one or both tracks. Selected reports will be published in conference proceedings after review.


Important dates:

  • Attendance registration deadline: 11th May    Extended to 1st June 2012
  • Result/report submission deadline: 18th May    Extended to 22nd June 2012
  • Winners notified: 13th July    Extended to 3rd August 2012
  • Report camera-ready due: 27 July    Extended to 31st August 2012


2. Submission guideline

Attendees may attend one or both tracks. Two separate submissions should be sent if both tracks are attended. Each submission should contain two parts:

1) Results:

Results should be submitted to wise2012challenge@gmail.com by 18th May, 2012 following the specification provided in task description.

2) Report:

Report should be submitted via the WISE 2012 submission system. Attendees should register their submission by 11th May, 2012, and submit the final report by 18th May, 2012. The report should follow the WISE 2012 research paper format requirements. Details of how the attendees finish the challenge tasks should be introduced in the report, while the results should be summarized.



3. The dataset

The original data was crawled from Sina Weibo (http://weibo.com), a popular micro-blogging service in China, via the API provided. The dataset distributed in WISE 2012 Challenge is preprocessed as follows:

1) User IDs and message IDs are anonymized.

2) Content of tweets are removed, based on Sina Weibo's Terms of Services.

3) Some tweets are annotated with events. For each event, the terms that are used to identify the event and a link to Wikipedia (http://wikipedia.org) page containing descriptions to the event are given. The information of events are given in the file events.txt.


The dataset that to be used in both tracks contains two sets of files:

1) Tweets: It includes basic information about tweets (time, user ID, message ID etc.), mentions (user IDs appearing in tweets), re-tweet paths, and whether containing links.

2) Followship network: It includes the following network of users (based on user IDs).

In addition, a small testing dataset that should be used in the mining track is provided. It contains one file, which shares the same format of the tweets file introduced above. A small part of re-tweeting activities of thirty-three tweets of six events are given in the testing file.

It should be noted that the dataset is not complete, yet is only a sample of the whole data in the micro-blogging service.

The details of dataset format are given in Appendix 1: Data format.



4. The performance track (T1)

Attendees are required to build a system for evaluating queries over the dataset. Nineteen typical queries should be covered and corresponding interfaces in BSMA performance testing tool should be implemented. The target is to achieve low response time and high throughput reported by BSMA performance testing tool.

Result submission specification:

1) Results should be submitted via email to wise2012challenge@gmail.com

2) Email title should be: [T1] xxx Part:y/z. In which 'xxx' denotes the paper id assigned by the paper submission system in registration, 'z' is the total number of emails for submission, while 'y' denotes the sequential number of the email in the submission.

3) All results should be submitted as attachments of the email. Each mail should only contain one attachment whose size should be no more than 20MB.

4) The attachment should be in tar.gz or zip format.

5) The attachment should contain all 1344 result files generated by the performance testing tool, without any modifications (including the file names), in the root directory of the compressed package.

The typical queries are introduced in Appendix 2: T1: Queries.

The BSMA performance testing tool manual is given in Appendix 3: T1: BSMA performance testing tool manual.



5. The mining track (T2)

In T2, it is required to predict the re-tweeting activities of thirty-three tweets of six events. For each of these six events, only tweets (and re-tweets) before a given timestamp are given in the file of Tweets. Thirty-three tweets are given in the file of Tests. For each of them, the event that it belongs to is given. As in Tweets, only information of re-tweeting before the timestamp is given. Attendees are required to predict two measurements at the time that the original tweet is published 30 days. These two measurements are:

1) M1: The number of times that the original tweet is re-tweeted. If a user re-tweet (or called re-post, or forward) a tweet twice at different timestamps, it should be counted two times.

2) M2: The number of times of possible-view of the original tweet. The number of possible-view of one re-tweet activity is defined as the number of followers of the user who conduct the re-tweet action. The number of times of possible-view of a tweet is defined as the sum of all possible-view numbers of re-tweet actions.

It should be noted that all re-tweeting actions in a re-tweeting chain should be counted in the root of the chain.

Result submission specification:

1) Results should be submitted via email to wise2012challenge@gmail.com

2) Email title should be: [T2] xxx Part:y/z. In which 'xxx' denotes the paper id assigned by the paper submission system in registration, 'z' is the total number of emails for submission, while 'y' denotes the sequential number of the email in the submission.

3) All results should be submitted as attachments of the email. Each mail should only contain one attachment whose size should be no more than 20MB.

4) The attachment should be in plain text format with thirty-three rows, in which each row contains three fields: the message ID of the original tweet, the predicted M1 value, and the predicted M2 value.



6. Downloads

The dataset and documents can be downloaded from various sites. Datasets from different sites are all the same, though they may be compressed in different forms, to fit the requirements of different storage services.


Site A: Wuala.com

Appendix 1: Data format
A1.txt

Appendix 2: T1: Queries
A2.pdf

Appendix 3: T1: BSMA performance testing tool manual
A3.pdf


Tweets: in twelve compressed files (Please note that these files are quite large, and may take quite a long time to download.):

file name size (bytes) md5 checksum
finalmicroblogs.zip.001103809024092E7D35F90EA8B2D2C142B0F7C214C09
finalmicroblogs.zip.002 1038090240 35C688228B0929A961D4DB510936ABAB
finalmicroblogs.zip.003 1038090240 033A8E30E8B05CB086679F64B3B43B00
finalmicroblogs.zip.004 1038090240 FE153B0786341A8059D3DCE2601CA2E1
finalmicroblogs.zip.005 1038090240 F823EE2C2B9C0FF2375E613B177A583D
finalmicroblogs.zip.006 1038090240 8826C942344E468F2997E467624D407D
finalmicroblogs.zip.007 1038090240 41DB57B998230435931BFA315F54E711
finalmicroblogs.zip.008 1038090240 396995C04412EFC8DD3B0469045F8C58
finalmicroblogs.zip.009 1038090240 6BDAB3F60C99349E355C4A6D62AD6D83
finalmicroblogs.zip.010 1038090240 DF6B13AB8F3A6E0BC372AEA104F587AE
finalmicroblogs.zip.011 1038090240 80836DD1636B5D12C53EC803CE8E2C25
finalmicroblogs.zip.012 1026428116 34A90C8B4FD796CDFD35862E278BD090



Followships: in three compressed zip files (Please note that the files are quite large and may take quite a long time to download):

file name size (bytes) md5 checksum
socialnetwork.zip.001 1038090240 789A5C4D182766ED42241B569AFD60FD
socialnetwork.zip.002 1038090240 149399C4CC17A4A9E2866183D93B24CC
socialnetwork.zip.003 1024559892 D427F3BB268AA6552BDF34918FEEBA19


Events:

events.txt


Testing:

eventForTest.zip


BSMA performance testing tool

BSMA.zip




Site B: The University of Queensland http://itee.uq.edu.au/~dke/WISE2012.htm

The files are exactly the same as those in http://www.wuala.com/imc_ecnu/wise_challenge/




Site C: 115 (Suggested for users with IP in China)

Appendix 1: Data format
A1.txt http://115.com/file/anncpz2p

Appendix 2: T1: Queries
A2.pdf http://115.com/file/dppkahwp

Appendix 3: T1: BSMA performance testing tool manual
A3.pdf http://115.com/file/e77vk2oy


Tweets: in twelve compressed files (Please note that these files are quite large, and may take quite a long time to download.):

file name size (bytes) md5 checksum link
finalmicroblogs.zip.001103809024092E7D35F90EA8B2D2C142B0F7C214C09http://115.com/file/dpngbci8#
finalmicroblogs.zip.002 1038090240 35C688228B0929A961D4DB510936ABABhttp://115.com/file/c2l7k11e#
finalmicroblogs.zip.003 1038090240 033A8E30E8B05CB086679F64B3B43B00http://115.com/file/anq6h72n#
finalmicroblogs.zip.004 1038090240 FE153B0786341A8059D3DCE2601CA2E1http://115.com/file/behq8xax#
finalmicroblogs.zip.005 1038090240 F823EE2C2B9C0FF2375E613B177A583Dhttp://115.com/file/behux6o0#
finalmicroblogs.zip.006 1038090240 8826C942344E468F2997E467624D407Dhttp://115.com/file/e76mzkso#
finalmicroblogs.zip.007 1038090240 41DB57B998230435931BFA315F54E711 http://115.com/file/behqrnpz#
finalmicroblogs.zip.008 1038090240 396995C04412EFC8DD3B0469045F8C58 http://115.com/file/behuxt9k#
finalmicroblogs.zip.009 1038090240 6BDAB3F60C99349E355C4A6D62AD6D83 http://115.com/file/behusjh0#
finalmicroblogs.zip.010 1038090240 DF6B13AB8F3A6E0BC372AEA104F587AE http://115.com/file/e76mlnoy#
finalmicroblogs.zip.011 1038090240 80836DD1636B5D12C53EC803CE8E2C25 http://115.com/file/e76dkt67#
finalmicroblogs.zip.012 1026428116 34A90C8B4FD796CDFD35862E278BD090 http://115.com/file/e76dkp1c#



Followships: in three compressed zip files (Please note that the files are quite large and may take quite a long time to download):

file name size (bytes) md5 checksum link
socialnetwork.zip.001 1038090240 789A5C4D182766ED42241B569AFD60FD http://115.com/file/dp599vra#
socialnetwork.zip.002 1038090240 149399C4CC17A4A9E2866183D93B24CC http://115.com/file/e7lrxrkc#
socialnetwork.zip.003 1024559892 D427F3BB268AA6552BDF34918FEEBA19 http://115.com/file/bes0vi9x#


Events:

events.txt http://115.com/file/beem15q0


Testing:

eventForTest.zip http://115.com/file/ans4nu5c


BSMA performance testing tool

BSMA.zip http://115.com/file/dppk9ydp


Site D: yun.io

(Please note that tweets and following network files on this server are compressed in different forms to those on other sites.)

Appendix 1: Data format
A1.txt http://d.yun.io/pECCiy

Appendix 2: T1: Queries
A2.pdf http://d.yun.io/0DKzSp

Appendix 3: T1: BSMA performance testing tool manual
A3.pdf http://d.yun.io/gO71Nh


Tweets: in seven compressed files (Please note that these files are quite large, and may take quite a long time to download.):

file name size (bytes) md5 checksum link
finalmicroblogs.z01 21474836485CB2A0FFB857CD5A5F6AFBDA63EFE496http://d.yun.io/DEmikA
finalmicroblogs.z02 2147483648 B529EC8B46A2BC18ABAB5C2791A65631http://d.yun.io/SpYhYz
finalmicroblogs.z03 2147483648 B6CEBD96A0F61C691DDB1CFFCC37F37Ehttp://d.yun.io/u8FWbs
finalmicroblogs.z04 2147483648 5BA1D9CEB36402F8A95BD6E04BE18185http://d.yun.io/QPtaMt
finalmicroblogs.z05 2147483648 9DEDB0CFD01D81B972966FDC765A962Ahttp://d.yun.io/dILlWf
finalmicroblogs.z06 2147483648 9E210BDD83AC18911016D95E178391FEhttp://d.yun.io/3DSTZB
finalmicroblogs.zip 803261976 203C4765A7F390DE57888EE1C76E69B7 http://d.yun.io/K3bIpo



Followships: in one compressed zip file (Please note that the file is quite large and may take quite a long time to download):

file name size (bytes) md5 checksum link
socialnetwork.zip 3433604665 0EFA7F06628DF275F347570FD17BD131 http://d.yun.io/LVfK9x


Events:

events.txt http://d.yun.io/qEomuu


Testing:

eventForTest.zip http://d.yun.io/HtmeAu


BSMA performance testing tool

BSMA.zip http://d.yun.io/nuo6za

7. Contact

wise2012challenge@gmail.com
Document Actions
 

Sponsors