the study of human identity
using large datasets
and computational methods

Estimation of Black Lives Matter prevalence in US users' Twitter bios at daily resolution.


I study personally expressed identity. Personally expressed identity is who or what an individual themselves says they are. It is personal – the individual is describing themselves. It is expressed – these are words the individual emits, where others might see them. And it describes identity – the explicit purpose of the text is description of the author.

It is especially informative to examine temporal trends in personally expressed identity. There are two ways to do this. First, one can look cross-sectionally. This tells you how populations (large groups of people) are changing over time. The second is longitudinally. This tells you how individuals are amending their personally expressed identity.

Here I focus on a cross-sectional analysis of United States users of Twitter. This data can tell us how users tweeting each day described themselves. I wrote computer scripts to examine the 1% random sample of all tweets. It was not the text of the tweets that the script read, however. Instead, it was the profile bio. Within the profile bio, the script looked for the presence of words, and it counted how many users included a word. I converted that count into a prevalence, which expresses how many users out of the total included that word.

In the graph below, you can see the prevalence of US users with certain words in their bio out of all US users active on a particular day.

The graph above tells a clear story (in my opinion). There were two moments when use of Black Lives Matter words in users' bios dramatically increased in prevalence. These moments appear to closely follow the killings of Philandro Castille on July 6, 2016 and George Floyd on May 25, 2020. After these events, prevalence steadied or declined.

I cannot make a strong claim of causality, because this data is observational, and I agree with many of my fellow scientists who believe strong causal claims required randomized, controlled experiments. Nor do I make any normative claims (i.e. whether any of the patterns I show here are "good" or "bad").

I think temporal trends in personally expressed identity are interesting, and I show these data here in case others do, too.

What about Blue Lives Matter? Or All Lives Matter? Below I show patterns for the prevalence of Blue Lives Matter and All Lives Matter terms. To my eyes, the series respond strongly to the same events.

Ipseology - Read and explore more

When I show people data like that above, they sometimes ask, "What other words are changing?" The best answer to that question is a long list of links, so here it is: