Pointers to freely, publicly available data for the study of identity

Who-am-I Data Freely and Publicly Available

Created: 2023-09-18 Last modified: 2024-12-04

I study identity. Some might prefer the term self-concept. Regardless, to study how individuals express their identity or think about their selves, one needs data. The best data is language, and it comes in the form of responses to the prompt Who am I?

The most well-known of the Who am I? instruments is Kuhn & McPartland's Twenty Statements Test (TST). In the 1954 seminal work, Kuhn and McPartland say they are measuring "self-attitudes." The TST prompt is as plain and straightforward as one might hope:

"There are twenty numbered blanks on the page below. Please write twenty answers to the simple question 'Who am I?' in the blanks. Just give twenty different answers to this question. Answer as if you were giving the answers to yourself, not to somebody else. Write the answers in the order that they occur to you. Don't worry about logic or 'importance.' Go along fairly fast, for time is limited.

Modern Who Am I Texts

I argue that the Twitter profile bio is the modern-day equivalent of the Who-am-I instrument. The utility of this data has been lessened by the Musk takeover, however, it does not change what happened before: For the period 2012-2022, millions of individuals in countries around the world publicly expressed and revised their identities. In my Ipseology white paper, I implore researchers to take advantage of this unprecedented decade.

Web Tools to Explore Identity Trends

Start exploring the relative popularity of words, phrases and emojis within Americans' profile bios by using Jason Jeffrey Jones Identity Trends V2. Deliberately patterned after Google Search Trends and Google Ngrams, I built this tool so anyone could compare a decade of data for up to 10 identity signifiers. Read more and try some of my favorite searches.

Explore multinational data using HINENI: Human Identities across Nations of the Earth, Ngram Investigator. Compare signifier prevalence over more than a decade across the nations of your choice.

Just give me some data!

You want .csv files? You can have .csv files. In this table, I link to the most up-to-date data files I have compiled. I have made these freely and publicly available under terms of the CC BY 4.0 License.

	Description	Download	Reference
Annual Prevalence of American Twitter Users with specified Token in their Profile Bio	Incidence (raw count) and prevalence (normalized proportion) of unique US Twitter user accounts that contain each token. Tokens are mostly words, but also contain abbreviations, emojis and more. TokensAnnualCross.csv is the final, updated version covering 2012 through 2023. README for TokensAnnualCross.csv	Download TokensAnnualCross.csv	Jones, Jason Jeffrey (2021). A dataset for the study of identity at scale: Annual Prevalence of American Twitter Users with specified Token in their Profile Bio 2015–2020. PLOS ONE, 16(11), e0260185. PDF
Longitudinal 2015-2022 US Annual Prevalence Subsample	Subsample of 680,509 unique US accounts that were observed each and every year 2015 through 2022. Incidence (raw count) and prevalence (normalized proportion) of accounts that contain each token. README for TokensAnnualLongi.csv	Download TokensAnnualLongi.csv	Jones, Jason Jeffrey (2021). A dataset for the study of identity at scale: Annual Prevalence of American Twitter Users with specified Token in their Profile Bio 2015–2020. PLOS ONE, 16(11), e0260185. PDF
Free, Open, Representative Sample Self-Description Data: the Jason Jeffrey Jones Productions Who am I 2024 Dataset	From a demographically-representative sample of 611 American adults in 2024, I make available their full responses to a Who-am-I prompt plus demographics. Project webpage	jjj-pro-wai-2024.csv	Jones, Jason Jeffrey (2024). Free, Open, Representative Sample Self-Description Data: the Jason Jeffrey Jones Productions Who am I 2024 Dataset. (Preprint). PDF

Ipseology - Read and explore more

If this post intrigued you, check out more ipseology:

Read a peer-reviewed, open-access research article about political words: Using Twitter Bios to Measure Changes in Self-Identity: Are Americans Defining Themselves More Politically Over Time?
Read a peer-reviewed, open-access research article about pronoun use: Pronoun Lists in Profile Bios Display Increased Prevalence, Systematic Co-Presence with Other Keywords and Network Tie Clustering among US Twitter Users 2015-2022
Quickly get familiar with ipseology terminology: Ipseology Glossary