Università degli Studi dell'Insubria

Dipartimento di Scienze Teoriche e Applicate - DiSTA

Twitter DIFF dataset. Friends of users in 2009 and 2013


2009 2013
Friendships 26 397 373 78 219 353
Unique friends 4 246 174 15 587 662


We are sharing a dataset of two Twitter snapshots for a set of 160 489 users. The dataset contains two data files; graph2009.txt and graph2013.txt. Graph2009 data is extracted from a dataset by Kwak et al. Graph2013 data was crawled by us in February 2013. Both graph files contain friend information for the same set of users.

Both graphs only contain users who had less than 5000 friends in 2013. We used this threshold to focus on personal accounts, and possibly filter out bot, spam and robot accounts that follow a lot of users. Pre-filtering, 7.5% of Twitter accounts had more than 5000 friends in our dataset.

Each row contains a user x and a list of his/her friends (i.e., Twitter users followed by x). The structure is given as userIdtfrIId1,frId2,frId3,…..


An example:

6018932    78569316,214680621,17805281,35758259,8161232,20015311,20978574


In this row,  user 6018932 follows 78569316 and 214680621 and, …., 20978574.


IMPORTANT: If a friend from the 2009 list does not exist in the 2013 list, the user might have stopped following the friend, or the friend might have deleted his/her Twitter account.




The dataset was used in the “Detecting Anomalies in Social Network Data Consumption” article (under submission) by Cuneyt Gurcan Akcora, Barbara Carminati, Elena Ferrari.

Given Twitter’s new policy, we are no more allowed to share this dataset.