About me and this site
Hi. I'm Jason Jeffrey Jones - a computational social scientist - and I study how individuals describe themselves online.
Wait, that's what I'm doing right now. The describing myself part, not the studying part. Actually, while you're reading this I probably am doing the studying part. Let's start over.
You have stumbled upon Jason Jeffrey Jones Identity Trends V2. It is an online tool anyone can use to explore the language Americans choose to describe themselves. For example, vegan
has become more popular, while vegetarian
less. Political words like conservative
and liberal
have increased in prevalence.
Yes, there was a Jason Jeffrey Jones Identity Trends V1. V2 is like that, but even better.
The data comes from Twitter. Specifically, a 1% random sample of all tweets from 2012 through 2023. Each year, I observe about 10 million unique accounts. These are active (tweeting) accounts with a US location. Then, for every word, phrase and emoji, I calculate how many users-per-10,000 include it in their profile bio.
Because I measure word prevalence consistently, persistently and precisely, it is easy to observe changing trends in the content of bios. Check out how the red heart gave way to other heart emojis over time. Check out flag emojis. Remember myspace?
I call the study of human identity using large datasets and computational methods ipseology. Sometimes I publish ipseology peer-reviewed research articles. A couple times I wrote op-eds. Other times I blog about ipseology.
In general, I love to talk and think about why people do the silly things they do. Ask me a question on the socials, and I might answer. For now, explore and have fun with Jason Jeffrey Jones Identity Trends V2.
Data and Methods
You are welcome to use the data I compiled for this site.
Find methodological details in the research article: A dataset for the study of identity at scale: Annual Prevalence of American Twitter Users with specified Token in their Profile Bio 2015–2020.
Download the data as jjjitv2.csv from https://osf.io/download/z7b8j.
The file jjjitv2.csv contains signifier prevalence data from United States users for the years 2012-2023. The prevalence estimates are based on cross-sectional, annual samples of tweeting users.
- Here is a brief explanation of each column:
- ngram - A signifer consisting of one to five linguistic tokens (e.g. words, emojis, abbreviations) that was observed in many Twitter users' profile bios. Only ngrams that rise above a threshold of 1 per 10,000 users are included.
- obsYear - The year over which we have observed profiles. Profiles were observed from tweeting users. One profile bio (selected at random) was retained per user per year.
- prevalence - Per 10,000 unique users in this nation, the whole number one would expect to include ngram within their bio