Tables and examples describing the data behind the interface.
2023-04-06
Which words do Americans choose to describe themselves? The latest version of Jason Jeffrey Jones Identity Trends is ready to tell you. There are several feature improvements over V1:
Of course, you can still search for good old fashioned single tokens: Marvel at the trajectories of vegan
, vegetarian
and carnivore
or mom, dad, mother and father
If you're craving details, check out the tables below or read the original peer-reviewed open-access research article. If you're ready to query your own words/phrases/emojis, start here.
Year | Unique user count | Unique unigram count | Unique bigram count | Unique 3-gram count | Unique 4-gram count | Unique 5-gram count |
---|---|---|---|---|---|---|
2012 | 9,947,225 | 14,310 | 30,517 | 15,175 | 5,070 | 1,916 |
2013 | 11,395,106 | 13,279 | 30,134 | 15,178 | 4,812 | 1,728 |
2014 | 8,891,764 | 12,987 | 28,268 | 11,999 | 3,368 | 1,098 |
2015 | 8,564,955 | 13,200 | 27,696 | 11,096 | 3,014 | 990 |
2016 | 10,227,688 | 12,891 | 25,712 | 9,927 | 2,636 | 896 |
2017 | 10,638,679 | 13,012 | 24,682 | 9,335 | 2,484 | 893 |
2018 | 10,310,854 | 13,016 | 24,087 | 9,087 | 2,379 | 860 |
2019 | 9,817,008 | 13,038 | 23,785 | 8,723 | 2,284 | 810 |
2020 | 10,181,678 | 13,095 | 23,779 | 8,954 | 2,661 | 1,211 |
2021 | 8,170,309 | 13,702 | 24,917 | 8,931 | 2,436 | 912 |
2022 | 7,605,856 | 13,843 | 25,287 | 8,958 | 2,393 | 856 |
2023 | 3,000,501 | 14,312 | 26,968 | 9,458 | 2,314 | 674 |
A prevalence is a whole number that tells you how many users per 10,000 include a word, phrase or emoji within their bio. In ipseology it is the preferred measure, because it allows for easy comparison across time and place.
The prevalence distribution of JJJITV2 has a large head, while most of what you are probably interested in is in the long tail. I say the head of the distribution is large, because more than 50% of the words, phrases and emojis that make it into the data just barely surpass the 1 per 10,000 minimum criterion. Within the 2022 data, the 1st quartile and median prevalence values are 1. The mean prevalence is 3.9, while the third quartile value still only reaches 2!
Terms in the tail have more variance. The table below shows a few examples from 2022 starting at the 81st percentile.
Ngram examples | Prevalence | Percentile |
---|---|---|
mirror, my children, nft artist, overwatch, usaf vet, you need to know, β, πΊπΈπΊπΈπΊπΈ, πππ | 2 | 81st |
cavs, climate change, freelance writer, milwaukee, scifi, school teacher, student athlete, truth seeker, views are mine, π, ππΌ | 3 | 86th |
baptist, cowboy, millennial, nft enthusiast, pro - life, tattoos, taylor swift, traditional, your dreams, π¦, π΅, π | 4 | 90th |
aquarius, believer in, latina, librarian, proud father, punk, usmc, yoga, π, β½οΈ, π΄ | 8 | 95th |
beer, dogs, jesus, lover of, nerd, photographer, she / they, trump, vet, woman, !!!, π³οΈβπ, π | 30 or more | 99th |
No problem, download all of the ngram prevalence data for US Twitter users 2012-2023 to serve your own analyses and visualizations.
If this post intrigued you, check out more ipseology: