Who actually has 600 buddies?

Cheers to Julia for pointing this paper out to me: Planetary-Scale Views on Instant-Messenger Network,  Its nothing exotic, but what a data set:  They had access to the compete MS IM dataset, modulo individual identifiers.  They think they got half of all the conversations during the study.  30 billion conversations, 240 million distinct users, unless you assume a significant number of multiple accounts and robots, which I don’t think was addressed.

They get lots of broken power laws (yawn), and estimate the average degree of separation is 6.6, (double yawn), or that people in the 20s-30s are over represented with respect to the world population (ZZzzz…).    There is cool stuff though, like inter-gender conversations last longer on average than single-gender, and there’s some weird off diagonal nodes on reported age correlations or participants. There’s a cool map of users per capita of the world, and you see a significant asymmetry in the US, with more users per capita in the western half as in the east, with about the same density as in Australia.  This is clear anticorrelation with population density, but it looks cool.  People in Arabic nations seem to have significantly long conversations on IM, why this is is not immediately obvious to me.

There’s some weird extremities of the dataset: The tail of the AddBuddy events distribution shows that some people actually have 600 contacts (the maximum) on IM.  Wild.  This smells like robots to me.


7 responses to “Who actually has 600 buddies?

  1. Re: Arabic countries.

    According to Marketplace last week, young Arabs living in countries such as Saudi Arabia where young people are largely segregated from people of the opposite gender for religious reasons, use internet messaging as a way to “date”. That’d be my guess for the longer conversations there, they don’t really have any other outlet for talking to members of the opposite sex.

  2. People in Arabic nations seem to have significantly long conversations on IM, why this is is not immediately obvious to me

    I recently heard an NPR story on Saudi Arabia that claimed phone texting, IM , etc are a substitute for dating there since unmarried women are not permitted in the company of men…

  3. Hi Lee and Simplicio. I over simplified the statement a bit. Really fig 14b says that conversations between people in different Arabic nations are significantly longer. Its not clear to me that this says as much about culture as it does about immigration. Looking at the bigger connections for frequency and duration in fig 14a,b: ie Germany/Turkey, Russia/Azerbaijan, Morocco/France. This to me just says there’s people living in one country with family in the other. They talk often and long, no surprise. I’m not sure how common immigration is among Arabic nations, but looking at all conversations, Saudia Arabia and its surrounding nations don’t make it into the top 10 by number of messages a day per user. So there’s nothing particularly long about their internal IM conversations. I think this weakens the religion argument a bit.

  4. It would be a hoot to use this information to get the posterior probability that the person on the other end is e.g. actually the gender they claim to be. I sense a privacy-infringing niche market…

    But really maybe my imagination is failing me, because I can’t see what good this level of data is apart from pricing/network bandwidth optimization. On the sociological level there were a lot of “yawn”s as you pointed out.

    Multiple accounts are mostly easy to filter out by IP duplication, right?

    The west/east asymmetry in the US is very strange to me. At the least it’s worth noting that southern California (?) and the Pacific NW are 1) more tech-savvy regions; 2) significantly LESS prone to use Microsoft’s IM (i.e. “east-like”). This is enough to give serious pause to making much inference from this data… Agh, I wish the image at least were higher quality.

    Regarding 600 buddies, note that a paper in medicine with 976 authors has been published (winning the 1993 IgNobel Prize in Literature): http://www.vortex.com/air/m-jir.93-01 . The concept of an edge between individuals is fairly weak today…

  5. Hey Tim,

    You’re probably right, I just meant I couldn’t find a mention of treating multiple accounts in the paper. In hindsight, people with multiple accounts probably don’t use them equally, i.e. lost passwds, evading net-stalkers, etc. This is probably not a principle source of contamination.

    You’re definitely right in that the sociological sense this data isn’t great. You’re basically cutting on a minimum income, enhancing effects for populations with many first generation immigrants, and there’s clearly a lot of regional/cultural effects that need to be handled separately. I disagree with both your assumptions about the east coast. There just more densely populated than the non-coastal western states. If there were more types of client protocol data, I’m sure you’d see a drop in Internet communication in general in the coastal US.

    973 Authors: Yo, I’m from HEP, and here that’s normal. My author list is longer than the text in my last paper, but they sure aint all on my f-list. :^)

  6. So you’re saying that coastal and populated areas have more immigrants (population), who can’t afford computers & connectivity (note the coloring is per capita)? I guess maybe, although you can’t throw a rock in nyc without it passing through an unWEP’d wifi signal; also older laptops are pretty darned cheap, and immigrants are not nearly universally impoverished.

    Although I admit that my hypothesis, that the more urbane someone is the less they are likely to use Microsoft services, is rather self-centered and silly.

  7. No No No, absolutely no. I meant there are several distinct effects. When I said “enhancing effects”, I was implying the opposite. I’m saying recent immigrants probably use IM more than non-immigrants, in order to maintain communication with family and friends in other countries. I am not assuming any economic-nationality connection.

    Dense populations: less IM (easier to see people personally)
    Immigrant Communities: More IM (Family abroad)
    Low Income: Less IM (No internet at home)

    I think by far the dominant effect here seems to be population density, but a sociological analysis would have to address the other two effects.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s