I study identity. Some might prefer the term self-concept. Regardless, to study how individuals express their identity or think about their selves, one needs data. The best data is language, and it comes in the form of responses to the prompt Who am I?
The most well-known of the Who am I? instruments is Kuhn & McPartland's Twenty Statements Test (TST). In the 1954 seminal work, Kuhn and McPartland say they are measuring "self-attitudes." The TST prompt is as plain and straightforward as one might hope:
I argue that the Twitter profile bio is the modern-day equivalent of the Who-am-I instrument. The utility of this data has been lessened by the Musk takeover, however, it does not change what happened before: For the period 2012-2022, millions of individuals in countries around the world publicly expressed and revised their identities. In my Ipseology white paper, I implore researchers to take advantage of this unprecedented decade.
Start exploring the relative popularity of words, phrases and emojis within Americans' profile bios by using Jason Jeffrey Jones Identity Trends V2. Deliberately patterned after Google Search Trends and Google Ngrams, I built this tool so anyone could compare a decade of data for up to 10 keywords. Read more and try some of my favorite searches.
You want .csv files? You can have .csv files. In this table, I link to the most up-to-date data files I have compiled. I have made these freely and publicly available under terms of the CC BY 4.0 License.
|Annual Prevalence of American Twitter Users with specified Token in their Profile Bio||Incidence (raw count) and prevalence (normalized proportion) of unique US Twitter user accounts that contain each token. Tokens are mostly words, but also contain abbreviations, emojis and more. TokensAnnualCross.csv is the final, updated version covering 2012 through 2023.
README for TokensAnnualCross.csv
|Download TokensAnnualCross.csv||Jones, Jason Jeffrey (2021). A dataset for the study of identity at scale: Annual Prevalence of American Twitter Users with specified Token in their Profile Bio 2015–2020. PLOS ONE, 16(11), e0260185.
|Longitudinal 2015-2022 US Annual Prevalence Subsample||Subsample of 680,509 unique US accounts that were observed each and every year 2015 through 2022. Incidence (raw count) and prevalence (normalized proportion) of accounts that contain each token.
README for TokensAnnualLongi.csv
|Download TokensAnnualLongi.csv||Jones, Jason Jeffrey (2021). A dataset for the study of identity at scale: Annual Prevalence of American Twitter Users with specified Token in their Profile Bio 2015–2020. PLOS ONE, 16(11), e0260185.
If this post intrigued you, check out more ipseology: