Sections

Personal tools

Past Conferences: WISE 2011
WISE 2010
WISE 2009
WISE 2008
WISE 2007
WISE 2006
WISE 2005
WISE 2004
WISE 2003
WISE 2002
WISE 2001
WISE 2000

WISE2012

WISE 2012 Challenge

The follow-up information about the challenge can be found here

Winners:
Championship on T1: Throughput and Latency
Ze Tang, Heng Lin, Kaiwei Li, and Wentao Han
Department of Computer Science and Technology, Tsinghua University, China

Championship on T1: Scalability
Edans F.O. Sandes, Li Weigang, and Alba C. M. A. de Melo
University of Brasilia, Brasilia, Brazil

Championship on T2
Sayan Unankard, Ling Chen, Peng Li, Sen Wang, Zi Huang, Mohamed Sharaf, and Xue Li
School of Information Technology and Electrical Engineering, The University of Queensland, Australia

Runner-Up on T2
Zhilin Luo, Yue Wang, and Xintao Wu
University of North Carolina at Charlotte, USA

====================================================================

Rankings:

T1: Performance Track

Throughput

1: Ze Tang, Heng Lin, Kaiwei Li, and Wentao Han
Department of Computer Science and Technology, Tsinghua University, China

2: Edans F.O. Sandes, Li Weigang, and Alba C. M. A. de Melo
University of Brasilia, Brasilia, Brazil

3: Lizhou Zheng, Xiaofeng Zhou, Zhenwen Lin, Peiquan Jin
School of Computer Science and Technology, University of Science and Technology of China, China

4: Feng Zhu, Jie Liu, and Lijie Xu Technology Center of Software Engineering, Institute of Software, Chinese Academy of Sciences, China

Latency

1: Ze Tang, Heng Lin, Kaiwei Li, and Wentao Han
Department of Computer Science and Technology, Tsinghua University

2: Lizhou Zheng, Xiaofeng Zhou, Zhenwen Lin, Peiquan Jin
School of Computer Science and Technology, University of Science and Technology of China, China

3: Edans F.O. Sandes, Li Weigang, and Alba C. M. A. de Melo
University of Brasilia, Brasilia, Brazil

4: Feng Zhu, Jie Liu, and Lijie Xu
Technology Center of Software Engineering, Institute of Software, Chinese Academy of Sciences, China

Scalability

1: Edans F.O. Sandes, Li Weigang, and Alba C. M. A. de Melo
University of Brasilia, Brasilia, Brazil

2: Ze Tang, Heng Lin, Kaiwei Li, and Wentao Han
Department of Computer Science and Technology, Tsinghua University

3: Lizhou Zheng, Xiaofeng Zhou, Zhenwen Lin, Peiquan Jin
School of Computer Science and Technology, University of Science and Technology of China, China

4: Feng Zhu, Jie Liu, and Lijie Xu
Technology Center of Software Engineering, Institute of Software, Chinese Academy of Sciences, China

T2: Mining Track

1: Sayan Unankard, Ling Chen, Peng Li, Sen Wang, Zi Huang, Mohamed Sharaf, and Xue Li
School of Information Technology and Electrical Engineering, The University of Queensland, Australia

2: Zhilin Luo, Yue Wang, and Xintao Wu
University of North Carolina at Charlotte, USA

3: Han Li, Kuang Chong, Zhiyuan Liu
Tsinghua University, China

4: Lianshuai Zhang, Zequn Zhang, and Peiquan Jin
School of Computer Science and Technology, University of Science and Technology of China, China

5: Juarez Paulino, Lucas A. Almeida, Felipe M. Modesto, Thiago F. Neves, and Li Weigang,
Department of Computer Science, University of Brasilia, Brasilia, Brazil

6: Hongbo Zhang, Qun Zhao, Hongyan Liu, Ke Xiao, Jun He, Xiaoyong Du
School of Information, Renmin University of China, China
Management Science and Engineering, Tsinghua University, China

7: FENG Song, ZHANG Chuang, LIU Yuxuan, LI Tai
School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, China

===========================================================

Notice:

1. There is a error in the BSMA Performance Testing Tool (BSMA.zip) related to the Q8. Attendants need to apply the patch by using the following command:

patch -p1 < bsma20120321.patch

under the directory where BSMA lies in.

Download links of the bsma20120321.patch file:

Mirror1: http://d.yun.io/fgq0Vl
Mirror2: http://115.com/file/e7ijpv4y#
Mirror3: https://content.wuala.com/contents/imc_ecnu/wise_challenge/bsma20120321.patch?dl=1

2. Papers or reports on the challenge should be submitted to the Challenge Track via paper submission system.

3. There is an error in Query 18. The correct query and descriptions in A2.pdf should be:

=====================

18. Find out top-x (x may be 10, 50, or 100) users: Those users that are AAZs followers. Get all AAZs followers, order them by the number of their micro-blogs mentioning A, and the time of the micro-blogs should be in a timeRange from a timestamp.

SELECT microblog.uid
FROM microblog,mention
WHERE microblog.mid = mention.mid AND
mention.uid="A" AND
microblog.uid IN
(SELECT uid
FROM friendList
WHERE friendID = "A") AND
microblog.time BETWEEN TO_DAYS('YYYY-MM-DD HH:MM:SS') AND DATE_ADD('YYYY-MM-DD HH:MM:SS', INTERVAL 1HOUR)
GROUP BY microblog.uid
ORDER BY COUNT(*) DESC
LIMIT 10;

====================

4. There are tweets with duplicated MIDs but having different values in other fields. All these records are returned by Sina Weibo API. There is no clue on which record should be correct. Attendants should handle these duplicated MIDs by themselves.

5. There are missing events. There are two types of missing events:

1). Our auto-annotation system cannot identify any corresponding tweets:
-Chinese pro-democracy protests
-Jiang Zemin disappearance and death rumor
2). They are labeled with different names in events.txt:
-Yao Ming retirement
+Yao Ming retire
are actually the same event.
-Motorola was purchased by Google
+Motorola was acquisitions by Google
are actually the same event.
-iphone4S release
+iphone4s release
are actually the same event.

6. There are event labels in tweets that do not appear in events.txt. They are events that have no Wikipedia links provided. Attendants may omit them. There are also keyword labels that are not listed in events.txt. They are keywords related to above events without Wikipedia links.

7. Event names and keywords are case insensitive.

===========================================================

1. Introduction

WISE 2012 Challenge is based on a dataset collected from one of the most popular micro-blog service (http://weibo.com). The challenge has two tracks: 1) the performance track, and 2) the mining track. Attendees may attend one or both tracks. Selected reports will be published in conference proceedings after review.

Important dates:

Attendance registration deadline: 11th May Extended to 1st June 2012
Result/report submission deadline: 18th May Extended to 22nd June 2012
Winners notified: 13th July Extended to 3rd August 2012
Report camera-ready due: 27 July Extended to 31st August 2012

2. Submission guideline

Attendees may attend one or both tracks. Two separate submissions should be sent if both tracks are attended. Each submission should contain two parts:

1) Results:

Results should be submitted to wise2012challenge@gmail.com by 18th May, 2012 following the specification provided in task description.

2) Report:

Report should be submitted via the WISE 2012 submission system. Attendees should register their submission by 11th May, 2012, and submit the final report by 18th May, 2012. The report should follow the WISE 2012 research paper format requirements. Details of how the attendees finish the challenge tasks should be introduced in the report, while the results should be summarized.

3. The dataset

The original data was crawled from Sina Weibo (http://weibo.com), a popular micro-blogging service in China, via the API provided. The dataset distributed in WISE 2012 Challenge is preprocessed as follows:

1) User IDs and message IDs are anonymized.

2) Content of tweets are removed, based on Sina Weibo's Terms of Services.

3) Some tweets are annotated with events. For each event, the terms that are used to identify the event and a link to Wikipedia (http://wikipedia.org) page containing descriptions to the event are given. The information of events are given in the file events.txt.

The dataset that to be used in both tracks contains two sets of files:

1) Tweets: It includes basic information about tweets (time, user ID, message ID etc.), mentions (user IDs appearing in tweets), re-tweet paths, and whether containing links.

2) Followship network: It includes the following network of users (based on user IDs).

In addition, a small testing dataset that should be used in the mining track is provided. It contains one file, which shares the same format of the tweets file introduced above. A small part of re-tweeting activities of thirty-three tweets of six events are given in the testing file.

It should be noted that the dataset is not complete, yet is only a sample of the whole data in the micro-blogging service.

The details of dataset format are given in Appendix 1: Data format.

4. The performance track (T1)

Attendees are required to build a system for evaluating queries over the dataset. Nineteen typical queries should be covered and corresponding interfaces in BSMA performance testing tool should be implemented. The target is to achieve low response time and high throughput reported by BSMA performance testing tool.

Result submission specification:

1) Results should be submitted via email to wise2012challenge@gmail.com

2) Email title should be: [T1] xxx Part:y/z. In which 'xxx' denotes the paper id assigned by the paper submission system in registration, 'z' is the total number of emails for submission, while 'y' denotes the sequential number of the email in the submission.

3) All results should be submitted as attachments of the email. Each mail should only contain one attachment whose size should be no more than 20MB.

4) The attachment should be in tar.gz or zip format.

5) The attachment should contain all 1344 result files generated by the performance testing tool, without any modifications (including the file names), in the root directory of the compressed package.

The typical queries are introduced in Appendix 2: T1: Queries.

The BSMA performance testing tool manual is given in Appendix 3: T1: BSMA performance testing tool manual.

5. The mining track (T2)

In T2, it is required to predict the re-tweeting activities of thirty-three tweets of six events. For each of these six events, only tweets (and re-tweets) before a given timestamp are given in the file of Tweets. Thirty-three tweets are given in the file of Tests. For each of them, the event that it belongs to is given. As in Tweets, only information of re-tweeting before the timestamp is given. Attendees are required to predict two measurements at the time that the original tweet is published 30 days. These two measurements are:

1) M1: The number of times that the original tweet is re-tweeted. If a user re-tweet (or called re-post, or forward) a tweet twice at different timestamps, it should be counted two times.

2) M2: The number of times of possible-view of the original tweet. The number of possible-view of one re-tweet activity is defined as the number of followers of the user who conduct the re-tweet action. The number of times of possible-view of a tweet is defined as the sum of all possible-view numbers of re-tweet actions.

It should be noted that all re-tweeting actions in a re-tweeting chain should be counted in the root of the chain.

Result submission specification:

1) Results should be submitted via email to wise2012challenge@gmail.com

2) Email title should be: [T2] xxx Part:y/z. In which 'xxx' denotes the paper id assigned by the paper submission system in registration, 'z' is the total number of emails for submission, while 'y' denotes the sequential number of the email in the submission.

3) All results should be submitted as attachments of the email. Each mail should only contain one attachment whose size should be no more than 20MB.

4) The attachment should be in plain text format with thirty-three rows, in which each row contains three fields: the message ID of the original tweet, the predicted M1 value, and the predicted M2 value.

6. Downloads

The dataset and documents can be downloaded from various sites. Datasets from different sites are all the same, though they may be compressed in different forms, to fit the requirements of different storage services.

Site A: Wuala.com

Appendix 1: Data format
A1.txt

Appendix 2: T1: Queries
A2.pdf

Appendix 3: T1: BSMA performance testing tool manual
A3.pdf

Tweets: in twelve compressed files (Please note that these files are quite large, and may take quite a long time to download.):

file name	size (bytes)	md5 checksum
finalmicroblogs.zip.001	1038090240	92E7D35F90EA8B2D2C142B0F7C214C09
finalmicroblogs.zip.002	1038090240	35C688228B0929A961D4DB510936ABAB
finalmicroblogs.zip.003	1038090240	033A8E30E8B05CB086679F64B3B43B00
finalmicroblogs.zip.004	1038090240	FE153B0786341A8059D3DCE2601CA2E1
finalmicroblogs.zip.005	1038090240	F823EE2C2B9C0FF2375E613B177A583D
finalmicroblogs.zip.006	1038090240	8826C942344E468F2997E467624D407D
finalmicroblogs.zip.007	1038090240	41DB57B998230435931BFA315F54E711
finalmicroblogs.zip.008	1038090240	396995C04412EFC8DD3B0469045F8C58
finalmicroblogs.zip.009	1038090240	6BDAB3F60C99349E355C4A6D62AD6D83
finalmicroblogs.zip.010	1038090240	DF6B13AB8F3A6E0BC372AEA104F587AE
finalmicroblogs.zip.011	1038090240	80836DD1636B5D12C53EC803CE8E2C25
finalmicroblogs.zip.012	1026428116	34A90C8B4FD796CDFD35862E278BD090

Followships: in three compressed zip files (Please note that the files are quite large and may take quite a long time to download):

file name	size (bytes)	md5 checksum
socialnetwork.zip.001	1038090240	789A5C4D182766ED42241B569AFD60FD
socialnetwork.zip.002	1038090240	149399C4CC17A4A9E2866183D93B24CC
socialnetwork.zip.003	1024559892	D427F3BB268AA6552BDF34918FEEBA19

Events:

events.txt

Testing:

eventForTest.zip

BSMA performance testing tool

BSMA.zip

Site B: The University of Queensland http://itee.uq.edu.au/~dke/WISE2012.htm

The files are exactly the same as those in http://www.wuala.com/imc_ecnu/wise_challenge/

Site C: 115 (Suggested for users with IP in China)

Appendix 1: Data format
A1.txt http://115.com/file/anncpz2p

Appendix 2: T1: Queries
A2.pdf http://115.com/file/dppkahwp

Appendix 3: T1: BSMA performance testing tool manual
A3.pdf http://115.com/file/e77vk2oy

Tweets: in twelve compressed files (Please note that these files are quite large, and may take quite a long time to download.):

file name	size (bytes)	md5 checksum	link
finalmicroblogs.zip.001	1038090240	92E7D35F90EA8B2D2C142B0F7C214C09	http://115.com/file/dpngbci8#
finalmicroblogs.zip.002	1038090240	35C688228B0929A961D4DB510936ABAB	http://115.com/file/c2l7k11e#
finalmicroblogs.zip.003	1038090240	033A8E30E8B05CB086679F64B3B43B00	http://115.com/file/anq6h72n#
finalmicroblogs.zip.004	1038090240	FE153B0786341A8059D3DCE2601CA2E1	http://115.com/file/behq8xax#
finalmicroblogs.zip.005	1038090240	F823EE2C2B9C0FF2375E613B177A583D	http://115.com/file/behux6o0#
finalmicroblogs.zip.006	1038090240	8826C942344E468F2997E467624D407D	http://115.com/file/e76mzkso#
finalmicroblogs.zip.007	1038090240	41DB57B998230435931BFA315F54E711	http://115.com/file/behqrnpz#
finalmicroblogs.zip.008	1038090240	396995C04412EFC8DD3B0469045F8C58	http://115.com/file/behuxt9k#
finalmicroblogs.zip.009	1038090240	6BDAB3F60C99349E355C4A6D62AD6D83	http://115.com/file/behusjh0#
finalmicroblogs.zip.010	1038090240	DF6B13AB8F3A6E0BC372AEA104F587AE	http://115.com/file/e76mlnoy#
finalmicroblogs.zip.011	1038090240	80836DD1636B5D12C53EC803CE8E2C25	http://115.com/file/e76dkt67#
finalmicroblogs.zip.012	1026428116	34A90C8B4FD796CDFD35862E278BD090	http://115.com/file/e76dkp1c#

Followships: in three compressed zip files (Please note that the files are quite large and may take quite a long time to download):

file name	size (bytes)	md5 checksum	link
socialnetwork.zip.001	1038090240	789A5C4D182766ED42241B569AFD60FD	http://115.com/file/dp599vra#
socialnetwork.zip.002	1038090240	149399C4CC17A4A9E2866183D93B24CC	http://115.com/file/e7lrxrkc#
socialnetwork.zip.003	1024559892	D427F3BB268AA6552BDF34918FEEBA19	http://115.com/file/bes0vi9x#

Events:

events.txt http://115.com/file/beem15q0

Testing:

eventForTest.zip http://115.com/file/ans4nu5c

BSMA performance testing tool

BSMA.zip http://115.com/file/dppk9ydp

Site D: yun.io

(Please note that tweets and following network files on this server are compressed in different forms to those on other sites.)

Appendix 1: Data format
A1.txt http://d.yun.io/pECCiy

Appendix 2: T1: Queries
A2.pdf http://d.yun.io/0DKzSp

Appendix 3: T1: BSMA performance testing tool manual
A3.pdf http://d.yun.io/gO71Nh

Tweets: in seven compressed files (Please note that these files are quite large, and may take quite a long time to download.):

file name	size (bytes)	md5 checksum	link
finalmicroblogs.z01	2147483648	5CB2A0FFB857CD5A5F6AFBDA63EFE496	http://d.yun.io/DEmikA
finalmicroblogs.z02	2147483648	B529EC8B46A2BC18ABAB5C2791A65631	http://d.yun.io/SpYhYz
finalmicroblogs.z03	2147483648	B6CEBD96A0F61C691DDB1CFFCC37F37E	http://d.yun.io/u8FWbs
finalmicroblogs.z04	2147483648	5BA1D9CEB36402F8A95BD6E04BE18185	http://d.yun.io/QPtaMt
finalmicroblogs.z05	2147483648	9DEDB0CFD01D81B972966FDC765A962A	http://d.yun.io/dILlWf
finalmicroblogs.z06	2147483648	9E210BDD83AC18911016D95E178391FE	http://d.yun.io/3DSTZB
finalmicroblogs.zip	803261976	203C4765A7F390DE57888EE1C76E69B7	http://d.yun.io/K3bIpo

Followships: in one compressed zip file (Please note that the file is quite large and may take quite a long time to download):

file name	size (bytes)	md5 checksum	link
socialnetwork.zip	3433604665	0EFA7F06628DF275F347570FD17BD131	http://d.yun.io/LVfK9x

Events:

events.txt http://d.yun.io/qEomuu

Testing:

eventForTest.zip http://d.yun.io/HtmeAu

BSMA performance testing tool

BSMA.zip http://d.yun.io/nuo6za

7. Contact

wise2012challenge@gmail.com

Document Actions

WWW Journal: About WWW Journal

Issues Online

Sponsors