Introducing Jason Jeffrey Jones Identity Trends V2

Tables and examples describing the data behind the interface.

2023-04-06

Which words do Americans choose to describe themselves? The latest version of Jason Jeffrey Jones Identity Trends is ready to tell you. There are several feature improvements over V1:

Search phrases up to 5-grams!
- Here for a good time
Search emojis!
- ❤️, 💙, 💚, 💜, 🧡, 💛, 🤎
Longer temporal span. Now 2012-2023!
- Decreasing follows
- Increasing preferred pronouns

Of course, you can still search for good old fashioned single tokens: Marvel at the trajectories of vegan, vegetarian and carnivore or mom, dad, mother and father

If you're craving details, check out the tables below or read the original peer-reviewed open-access research article. If you're ready to query your own words/phrases/emojis, start here.

How many years, users, words and phrases are in the data?

Year	Unique user count	Unique unigram count	Unique bigram count	Unique 3-gram count	Unique 4-gram count	Unique 5-gram count
2012	9,947,225	14,310	30,517	15,175	5,070	1,916
2013	11,395,106	13,279	30,134	15,178	4,812	1,728
2014	8,891,764	12,987	28,268	11,999	3,368	1,098
2015	8,564,955	13,200	27,696	11,096	3,014	990
2016	10,227,688	12,891	25,712	9,927	2,636	896
2017	10,638,679	13,012	24,682	9,335	2,484	893
2018	10,310,854	13,016	24,087	9,087	2,379	860
2019	9,817,008	13,038	23,785	8,723	2,284	810
2020	10,181,678	13,095	23,779	8,954	2,661	1,211
2021	8,170,309	13,702	24,917	8,931	2,436	912
2022	7,605,856	13,843	25,287	8,958	2,393	856
2023	3,000,501	14,312	26,968	9,458	2,314	674

Wondering what's a prevalence? Use this reference.

A prevalence is a whole number that tells you how many users per 10,000 include a word, phrase or emoji within their bio. In ipseology it is the preferred measure, because it allows for easy comparison across time and place.

The prevalence distribution of JJJITV2 has a large head, while most of what you are probably interested in is in the long tail. I say the head of the distribution is large, because more than 50% of the words, phrases and emojis that make it into the data just barely surpass the 1 per 10,000 minimum criterion. Within the 2022 data, the 1st quartile and median prevalence values are 1. The mean prevalence is 3.9, while the third quartile value still only reaches 2!

Terms in the tail have more variance. The table below shows a few examples from 2022 starting at the 81st percentile.

Ngram examples	Prevalence	Percentile
mirror, my children, nft artist, overwatch, usaf vet, you need to know, ♏, 🇺🇸🇺🇸🇺🇸, 🌊🌊🌊	2	81st
cavs, climate change, freelance writer, milwaukee, scifi, school teacher, student athlete, truth seeker, views are mine, 📌, 🙏🏼	3	86th
baptist, cowboy, millennial, nft enthusiast, pro - life, tattoos, taylor swift, traditional, your dreams, 🦄, 🌵, 💀	4	90th
aquarius, believer in, latina, librarian, proud father, punk, usmc, yoga, 😘, ⚽️, 🌴	8	95th
beer, dogs, jesus, lover of, nerd, photographer, she / they, trump, vet, woman, !!!, 🏳️‍🌈, 💕	30 or more	99th

But I just want the data.

No problem, download all of the ngram prevalence data for US Twitter users 2012-2023 to serve your own analyses and visualizations.

Ipseology - Read and explore more

If this post intrigued you, check out more ipseology:

Read the ipseology white paper.
Explore the ipseology glossary.