Dan Barker | BuzzFeed is Watching You
When you visit BuzzFeed, they record lots of
information about you.
Most websites record some information. BuzzFeed
record a whole ton. I’ll start with the fairly mundane stuff,
and then move on to one example of some slightly more scary
stuff.
First: The Mundane Bits
Here’s a snapshot of what BuzzFeed records when you land on a
page. They actually record much more than this, but this
is just the info they pass to Google (stored
within Google Analytics):
Here’s a description of what’s going on there:
The first line there is how many times in total I’ve visited
the site (above this, which I’ve skipped for brevity, it also
records the time I first visited, and a timestamp of my current
visit).
Below that, the ‘Custom Var’ block is made up of elements BuzzFeed have actively decided “we need to record
this in addition to what Google Analytics gives us out of the
box”. Against these, you can see ‘scope’. A scope of ’1′ means
it’s something recorded about the user, ’2′ means it’s recorded
about the current visit, ‘page’ means it’s just a piece of
information about the page itself.
There you can see other info they’re tracking, including:
- Have you connected Facebook with BuzzFeed?
- Do you have email updates enabled?
- Do they know your gender & age?
- How many times have you shared their content directly to
Facebook & Twitter & via Email? - Are you logged in?
- Which country are you in?
- Are you a buzzfeed editor?
- …and about 25 other pieces of information.
Within this you can also see it records ‘username’. I
think that’s recording my user status, and an encoded
version of my username. If I log in using 2 different browsers
right now, it assigns me that same username string, but I’m
going to caveat that I’m not 100% sure they’re recording that
it is ‘me’ browsing the site (ie. that they’re able to link the
data they’re recording in Google Analytics about my activity on
the site back to my email address and other personally
identifiable information). Either way, everything we’ve covered
so far is quite mundane.
The Scary Bit
The scary bit occurs when you think about certain
types of BuzzFeed content; most specifically: quizzes. Most
quizzes are extremely benign – the stereotypical “Which
[currently popular fictional TV show] Character Are You?” for
example. But some of their quizzes are very specific, and very
personal.
Here, for example, is a set of questions from a “How Privileged
are You?” quiz, which has had 2,057,419 views
at the time I write this. I’ve picked some of the questions
that may cause you to think “actually, I wouldn’t necessarily
want anyone recording my answers here”.
When you click any of those quiz answers, BuzzFeed record all
of the mundane information we looked at
earlier, plus they also records this:
Here’s what’s they’re recording there:
- ‘event’ simply means something happened that BuzzFeed chose
to record in Google Analytics. - ‘Buzz:content’ is how they’ve categorised the type of
event. - ‘clickab:quiz-answer’ means that the event was a quiz
answer. - ‘ad_unit_design3:desktopcontrol’ seems to be their
definition of the design of the quiz answer that was clicked. - ‘ol:1218987′ is the quiz ID. In other words, if they wish,
they could say “show me all the data for quiz 1218987″ knowing
that’s the ‘Check Your Privelege’ quiz. - ’1219024′ is the actual answer I checked. Each quiz answer
on BuzzFeed has a unique ID like this. Ie. if you click “I have
never had an eating disorder” they record that click.
In other words, if I had access to the BuzzFeed Google
Analytics data, I could query data for people who got to the
end of the quiz & indicated – by not checking that
particular answer – that they have had an eating disorder. Or
that they have tried to change their gender. Or I could run a
query along the following lines if I wished:
- Show me all the data for anyone who answered the “Check
Your Privelege” quiz but did not check “I have
never taken medication for my mental health”.
In BuzzFeed’s defense, I’m sure when they set up the tracking
in the first place they didn’t foresee that they’d be recording
data from quizzes of this personal depth. This is
just a single example, but I suspect this particular quiz would
have had less than 2 million views if everyone completing it
realised every click was being recorded & could potentially
be reported on later – whether that data is fully identifiable
back to individual users, or pseudonymous, or even totally
anonymous.
What do you think?