the study of human identity
using large datasets
and computational methods


A deliberately brief post summarizing the founding ideas.

Created: 2022-12-22

Ipseology is the study of human identity using large datasets and computational social science methods. It is a new way to investigate ipseity at scale. Ipseity refers to personal identity, selfhood and the essential elements of identity. At scale means with millions of observations covering substantial temporal periods and geographical areas.

One measures ipseity with language. Specifically, the data sought is personally expressed identity text. This text must satisfy all three conditions. It should be personal - the authors are describing themselves. It should be expressed - the authors' text is "published," i.e. the words are available where others might see them. It should describe identity - the explicit purpose of the text is description of the author.

Currently, personally expressed identity text data is best sourced from social media profiles. Specifically, Twitter profile biographies (bios) are the best source because of the scale at which the platform is used and the open availability of public profile data. Bios may be observed at the scale of millions per day within the 1% sample stream or the users lookup endpoint.

Yes, today Twitter carries substantial platform risk. However, it was widely used in the period 2012-2022; that provides over a decade of data over multiple nations at staggered rates of growth and usage. This is useful variation for observational studies. If future events require, ispeology methods could be applied to Mastodon, Reddit or other profiles. Ideally, funding would provide the opportunity to collect data on an ongoing basis from representative samples.

Ipseology is based on the assumption that language use reveals much about the minds of authors and the collective consciousness of the societies in which they are embedded. Many have adopted this idea and applied it to the language of published books. The surprisingly broad successes of word embeddings and large language models hint that analysis of text alone (at scale) can provide deep understanding.

Ipseology reduces complexity by focusing on signifiers. Signifiers are linguistic tokens that represent something the author wishes to convey about their identity. Signifiers are frequently words, but could also be emoji, hashtags, n-grams, proper names, lemmas or other components of text. Over a dictionary of signifiers (e.g. all possible words) individuals represent themselves as subsets of that dictionary. Equivalently, one can imagine each individual as the binary vector over signifiers where most are FALSE/ABSENT and a select few are TRUE/PRESENT. This level or simplicity makes quantification straightforward. How similar are two bios? Calculate the Jaccard index or simply count signifiers in common. How popular is a signifier? Count its incidence or prevalence and see where it falls in the distribution.

Signifiers are the "elements of identity" from the definition of ipseity above. Natural language processing progressed quite a bit by drastically reducing the complexity of language to "bag of words" representations. Ipseology should begin in the same way: individuals are "bags of signifiers" until and unless a more complex representation is necessary.

Identity changes over time. An ipseological approach foregrounds this fact. Prevalence of a signifier should not be estimated once. Instead, it is calculated as daily or annual time series. Replication is fantastic, but there is no need to reinvent the wheel or use bespoke data or methods. Token-level annual prevalence is already calculated for US users for all tokens. Explore identity trends online or download the csv file.

There are two kinds of interesting changes in ipseity over time: longitudinal change within individuals and cross-sectional change over populations. Ipseology encourages analysis of both.

Identity varies over geography. In development is a tool to compare daily signifier prevalence across 32 countries. Emojis will be especially useful in this endeavor due to their cross-language, cross-nation invariance in nominal content, but likely nation- and language-dependent interpretations and norms of usage.

Ipseology is the new science of the self. The study of human identity currently rests uneasily on a fulcrum. On one side: small-batch, traditional, one-shot methods and data-poor theorizing of the past. On the other: at-scale, high-resolution consistent and persistent estimates that provide easy, precise answers to simple questions and point the way to reality-constrained, formalized and testable theories of ipseity. Astronomers gazed at the night sky and gleaned what knowledge they could until the telescope made unknown, unanticipated, heretical ideas plainly visible. The identity telescope is here; ipseology will tip the study of human selves away from the old and toward the new.

Ipseology - Read and explore more

If this post intrigued you, check out more ipseology: