4  Let’s take ipseology global!

4.1 A hesitant peek outside the United States.

So far, we have only analyzed data from users in the United States. The Twitter data was sampled from everywhere, however. Why not extend our analyses to more nations?

We can begin with similar places. Australia, Canada and the United Kingdom share the primary language of English with the US. I would expect to see most trends observed in the US also present in these countries because of the strong web of influence the cultures are enmeshed in. Let’s recreate Figure 3.2 for each of these nations.

Code
library(tidyverse)

# Download a csv file containing multinational data for tokens at annual resolution.
# Read about the data in the text file at https://osf.io/mdp7k
# HINENI stands for Human Identity across Nations of the Earth Ngram Investigator.
hineni = read_csv("https://osf.io/download/k7bwj/")

# Become familiar with the data file.
str(hineni) # Structure of the data.
hineni %>% slice_sample(n = 5)  # Example rows.

# How many users include the word vegan in their Twitter profile?
hineni %>% filter(ngram == "vegan" & obsYear == 2012) # in 2012
hineni %>% filter(ngram == "vegan" & obsYear == 2020) # in 2020
# Note that the numerator column contains the incidence.
Code
library(tidyverse)

hineni %>% 
  filter(ngram %in% c("vegan", "vegetarian", "carnivore") ) %>% 
  filter(nation %in% c("AU", "CA", "GB", "US") ) %>% 
  mutate(Nation = factor(nation, levels = c("AU", "CA", "GB", "US"), labels = c("Australia", "Canada", "United Kingdom", "United States")) ) %>%
  mutate(finePrevalence = 10000 * numerator / denominator ) %>%
  mutate(Signifier = factor(ngram, levels = c("vegan", "vegetarian", "carnivore")) ) %>%
ggplot(aes(x = obsYear, y = finePrevalence, color = Signifier, shape = Signifier)) +
  geom_path(linetype = "longdash", linewidth = 1) +
  geom_point(size=4) +
  scale_x_continuous(breaks = seq(2012, 2023, 2)) +
  scale_color_manual(values = c("#5c8326", "#B61500", "#a16868") ) +
  ggtitle("Estimated prevalence of food-related signifiers", "within AU, CA, UK and US Twitter users' profile bios 2012-2023") +
  xlab("Year") + ylab("Prevalence\n(per 10,000 accounts)") +
  facet_wrap(vars(Nation), nrow = 2) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1) ) +
  theme(text = element_text(size=16)) +
  theme(legend.background = element_rect(fill = "white", color = "black")) +
  labs(caption = "Source: Ipseology - a new science of the self\n \uA9 Jason Jeffrey Jones.") +
  theme(plot.caption = element_text(size=10, color = "#666666"))

Figure 4.1: Prevalence of vegan, vegetarian and carnivore over time in Australia, Canada, the United Kingdom and the United States.

We see the same pattern in each country: vegan growth and vegetarian decline. Compared to the United States, the rise of vegan was more vigorous in Australia, Canada and especially the United Kingdom.

4.2 Use previous analysis as a template.

We begin to see the utility of the ipseological approach. When we measure consistently, persistently and precisely at scale, comparisons across time and space become accessible.

Let’s use our previous analysis as a template for more. There are several primarily Spanish-speaking nations in the data. Why not try a simple extension? Google Translate tells me there are feminine and masculine forms of vegan (vegana, vegano) and similarly for vegetarian (vegetariana, vegetariano). How popular was each of these across years and nations?

Code
hineni %>% 
  filter(ngram %in% c("vegano", "vegana", "vegetariano", "vegetariana") ) %>% 
  filter(nation %in% c("AR", "CL", "CO", "MX", "PE", "ES", "VE") ) %>% 
  mutate(Gender = if_else(ngram %in% c("vegana", "vegetariana"), "Fem", "Masc") ) %>%
  mutate(Nation = factor(nation, levels = c("AR", "CL", "CO", "MX", "PE", "ES", "VE"), labels = c("Argentina", "Chile", "Colombia", "Mexico", "Peru", "Spain", "Venezuala")) ) %>%
  mutate(finePrevalence = 10000 * numerator / denominator ) %>%
  mutate(Signifier = factor(ngram, levels = c("vegano", "vegana", "vegetariano", "vegetariana")) ) %>%
  # Get rid of Chile.
  #filter(Nation != "Chile" ) %>% 
ggplot(aes(x = obsYear, y = finePrevalence, color = Signifier, shape = Signifier)) +
  geom_path(linetype = "dashed", linewidth = 0.75) +
  geom_point(size=2) +
  scale_x_continuous(breaks = seq(2012, 2023, 2)) +
  #scale_y_continuous(limits = c(0,6), breaks = seq(0, 6, 2), expand =  expansion(add=c(0.5, 2)) ) +
  scale_color_manual(values = c("#5c8326", "#5c8326", "#B61500", "#B61500") ) +
  ggtitle("Estimated prevalence of food-related signifiers", "within some Spanish-speaking nations' Twitter users' profile bios 2012-2023") +
  xlab("Year") + ylab("Prevalence\n(per 10,000 accounts)") +
  facet_grid(rows = vars(Nation), cols = vars(Gender) ) +
  # , scales = "free_y"
  theme(axis.text.x = element_text(angle = 45, hjust = 1) ) +
  #theme(text = element_text(size=16)) +
  theme(strip.text.y = element_text(angle = 0) ) +
  #theme(panel.spacing.y = unit(2, "lines") ) +
  theme(legend.background = element_rect(fill = "white", color = "black")) +
  labs(caption = "Source: Ipseology - a new science of the self\n \uA9 Jason Jeffrey Jones.") +
  theme(plot.caption = element_text(size=10, color = "#666666"))

Figure 4.2: Prevalence of feminine and masculine forms of vegan and vegetarian over time in some Spanish-speaking nations.

This is not the most beautiful figure, but it tells us quite a bit. Our chosen signifiers - vegano, vegana, vegetariano, vegetariana saw infrequent use. Prevalence was in the low single digits with the exceptions of early years vegetariana in Chile.

Were the feminine forms used more often than the masculine? Let’s zoom in on Mexico and Spain to investigate.

Code
hineni %>% 
  filter(ngram %in% c("vegano", "vegana", "vegetariano", "vegetariana") ) %>% 
  filter(nation %in% c("MX", "ES") ) %>% 
  mutate(Gender = if_else(ngram %in% c("vegana", "vegetariana"), "Fem", "Masc") ) %>%
  mutate(Nation = factor(nation, levels = c("MX", "ES"), labels = c("Mexico", "Spain")) ) %>%
  mutate(finePrevalence = 10000 * numerator / denominator ) %>%
  mutate(Signifier = factor(ngram, levels = c("vegano", "vegana", "vegetariano", "vegetariana")) ) %>%
ggplot(aes(x = obsYear, y = finePrevalence, color = Signifier, shape = Signifier)) +
  geom_path(linetype = "dashed", linewidth = 0.75) +
  geom_point(size=2) +
  scale_x_continuous(breaks = seq(2012, 2023, 2)) +
  scale_y_continuous(limits = c(0,3), breaks = seq(0, 3, 1) ) +
  scale_color_manual(values = c("#5c8326", "#5c8326", "#B61500", "#B61500") ) +
  ggtitle("Estimated prevalence of food-related signifiers", "within Mexico and Spain profile bios 2012-2023") +
  xlab("Year") + ylab("Prevalence\n(per 10,000 accounts)") +
  facet_grid(rows = vars(Nation), cols = vars(Gender) ) +
  # , scales = "free_y"
  theme(axis.text.x = element_text(angle = 45, hjust = 1) ) +
  #theme(text = element_text(size=16)) +
  theme(strip.text.y = element_text(angle = 0) ) +
  #theme(panel.spacing.y = unit(2, "lines") ) +
  theme(legend.background = element_rect(fill = "white", color = "black")) +
  labs(caption = "Source: Ipseology - a new science of the self\n \uA9 Jason Jeffrey Jones.") +
  theme(plot.caption = element_text(size=10, color = "#666666"))

Figure 4.3: Prevalence of feminine and masculine forms of vegan and vegetarian over time in Mexico and Spain.

The eye sees a subtle pattern that statistical analysis could test. Feminine forms outnumbered masculine in early years, but the series converged toward equal levels later.

A more obvious conclusion is that vegan overtaking vegetarian occurred later in these nations than in the English-language nations and was less pronounced.

4.3 Emojis are language-independent but not context-independent.

Sometimes language differences are exactly what we are interested in; sometimes they just get in the way. Thankfully, there are emojis.

Emojis are language-independent but not context-independent. They are meant to represent the same thing no matter the language or nation they are embedded in. But certainly, some emojis are more personally meaningful to individuals is some contexts compared to others.

TODO hearts and sparkles are nearly universally popular

Code
#TODO get emojis from file and plot

TODO in every nation, the most popular flag is the home nation. FR, DE, IT, UK

TODO other flags vary in rates of use - EU, rainbow, pirate. FR, DE, IT, UK

TODO sports emojis analysis

4.4 Explore with HINENI.

HINENI stands for Human Identities across Nations of the Earth Ngram Investigator. HINENI comprises a dataset and tools allowing anyone to explore the popularity of signifiers within profile bios. It covers 32 nations and the years 2012 through 2023.

To explore in a web browser with no coding or download necessary, use the interface Human Identities across Nations of the Earth Ngram Investigator.

Read the research article for a detailed introduction. TODO link

4.5 Data and code.

Download data for this chapter from https://osf.io/download/k7bwj/

R code to produce the numbers and figures within this chapter is embedded in the code chunks above.