---
title: "Overview"
output: rmarkdown::html_vignette
author:
  - Chung-hong Chan ^[GESIS]
vignette: >
  %\VignetteIndexEntry{Overview}
  %\VignetteEngine{knitr::rmarkdown}
  %\usepackage[utf8]{inputenc}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
  )
set.seed(46709394)
```
The validation test is called "oolong test" (for reading tea leaves). This package provides several functions for generating different types of oolong test.

| function | purpose                                                                                                                           |
|---------:|:----------------------------------------------------------------------------------------------------------------------------------|
|   `wi()` | validating a topic model with [word intrusion test](#word-intrusion-test) (Chang et al., 2008)                                    |
|   `ti()` | validating a topic model with [topic intrusion test](#topic-intrusion-test) (Chang et al., 2008; aka "T8WSI" in Ying et al. 2021) |
| `witi()` | validating a topic model with [word intrusion test](#word-intrusion-test) and [topic intrusion test](#topic-intrusion-test)       |
|  `wsi()` | validating a topic model with [word set intrusion test](#word-set-intrusion-test) (Ying et al. 2021)                              |
|   `gs()` | oolong test for [creating gold standard](#creating-gold-standard) (see Song et al., 2020)                                         |

All of these tests can also be generated with the function [`create_oolong`](#backward-compatibility). As of version 0.3.20, it is no longer recommended.

## Installation

Because the package is constantly changing, we suggest using the development version from [GitHub](https://github.com/):

``` r
# install.packages("devtools")
devtools::install_github("chainsawriot/oolong")
```

You can also install the "stable" (but slightly older) version from CRAN:

```r
install.packages("oolong")
```

## Validating Topic Models

#### Word intrusion test

`abstracts_seededlda` is an example topic model trained with the data `abstracts` using the `seededlda` package. Currently, this package supports structural topic models / correlated topic models from `stm`, Warp LDA models from `text2vec` , LDA/CTM models from `topicmodels`,  Biterm Topic Models from `BTM`, Keyword Assisted Topic Models from `keyATM`, and seeded LDA models from `seededlda`. Although not strictly a topic model, Naive Bayes models from `quanteda.textmodels` are also supported. See the section on [Naive Bayes](#about-naive-bayes) for more information.

```{r}
library(oolong)
library(seededlda)
library(quanteda)
library(dplyr)
```

```{r example}
abstracts_seededlda
```

To create an oolong test with word intrusion test, use the function `wi`. It is recommended to provide a user id of coder who are going to be doing the test.

```{r createtest}
oolong_test <- wi(abstracts_seededlda, userid = "Hadley")
oolong_test
```

As instructed, use the method `$do_word_intrusion_test()` to start coding.

```{r, eval = FALSE}
oolong_test$do_word_intrusion_test()
```

You can pause the test by clicking the "Exit" button. Your progress will be recorded in the object. If you want to save your progress, just save the object (e.g. `saveRDS(oolong_test, "oolong_test.RDS")`). To resume the test, launch the test again.

After the coding (all items are coded), you need to press the "Exit" button to quit the coding interface and then lock the test. Then, you can look at the model precision by printing the oolong test.

```{r, include = FALSE}
### Mock this process
oolong_test$.__enclos_env__$private$test_content$wi$answer <- oolong_test$.__enclos_env__$private$test_content$wi$intruder
oolong_test$.__enclos_env__$private$test_content$wi$answer[1] <- "wronganswer"
```

```{r lock}
oolong_test$lock()
oolong_test
```

#### Word set intrusion test

Word set intrusion test is a variant of word intrusion test (Ying et al., 2021), in which multiple word sets generated from top terms of one topic are juxtaposed with one intruder word set generated similarly from another topic. In Ying et al., this test is called "R4WSI" because 4 word sets are displayed. By default, oolong generates also R4WSI. However, it is also possible to generate R(N)WSI by setting the parameter `n_correct_ws` to N - 1.

```{r wsi1}
oolong_test <- wsi(abstracts_seededlda, userid = "Garrett")
oolong_test
```

Use the method `$do_word_set_intrusion_test()` to start coding.

```{r wsi2, eval = FALSE}
oolong_test$do_word_set_intrusion_test()
```

```{r, include = FALSE}
### Mock this process
oolong_test$.__enclos_env__$private$test_content$wsi$answer <- oolong_test$.__enclos_env__$private$test_content$wsi$intruder
oolong_test$.__enclos_env__$private$test_content$wsi$answer[1] <- "wronganswer"
```

```{r wsi3}
oolong_test$lock()
oolong_test
```

#### Topic intrusion test

For example, `abstracts_seededlda` was generated with the corpus `abstracts$text`

```{r newgroup5}
library(tibble)
abstracts
```

Creating the oolong test object with the corpus used for training the topic model will generate topic intrusion test cases.

```{r createtest2}
oolong_test <- ti(abstracts_seededlda, abstracts$text, userid = "Julia")
oolong_test
```

Similarly, use the `$do_topic_intrusion_test` to code the test cases, lock the test with `$lock()` and then you can look at the TLO (topic log odds) value by printing the oolong test.

```{r, eval = FALSE}
oolong_test$do_topic_intrusion_test()
oolong_test$lock()
```

```{r, include = FALSE}
genius_topic <- function(obj1) {
    obj1$.__enclos_env__$private$test_content$ti$answer <- obj1$.__enclos_env__$private$test_content$ti$intruder
    return(obj1)
}
genius_word <- function(obj1) {
    obj1$.__enclos_env__$private$test_content$wi$answer <- obj1$.__enclos_env__$private$test_content$wi$intruder
    return(obj1)
}
oolong_test <- genius_word(genius_topic(oolong_test))
oolong_test$.__enclos_env__$private$test_content$ti$answer[2] <- sample(oolong_test$.__enclos_env__$private$test_content$ti$candidates[[2]], 1)
oolong_test$lock()
```

```{r topic_res}
oolong_test
```

### Suggested workflow

The test makes more sense if more than one coder is involved. A suggested workflow is to create the test, then clone the oolong object. Ask multiple coders to do the test(s) and then summarize the results.

Preprocess and create a document-feature matrix

```{r, eval = FALSE}
tokens(abstracts$text, remove_punct = TRUE, remove_symbols = TRUE, remove_numbers = TRUE, remove_url = TRUE, spilit_hyphens = TRUE) %>% tokens_wordstem %>% tokens_remove(stopwords("en")) %>% dfm(tolower = TRUE) %>% dfm_trim(min_docfreq = 3, max_docfreq = 500) %>% dfm_select(min_nchar = 3, pattern = "^[a-zA-Z]+$", valuetype = "regex") -> abstracts_dfm
```

Train a topic model.

```{r step0, eval = FALSE}
require(seededlda)
abstracts_seededlda <- textmodel_seededlda(x = abstracts_dfm, dictionary = dictionary(abstracts_dictionary), seeds = 46709394, verbose = TRUE)
```

Create a new oolong object.

```{r step1}
oolong_test_rater1 <- witi(abstracts_seededlda, abstracts$text, userid = "Yihui")
```

Clone the oolong object to be used by other raters.

```{r step2}
oolong_test_rater2 <- clone_oolong(oolong_test_rater1, userid = "Jenny")
```

Ask different coders to code each object and then lock the object.

```{r, eval = FALSE}
## Let Yihui do the test.
oolong_test_rater1$do_word_intrusion_test()
oolong_test_rater1$do_topic_intrusion_test()
oolong_test_rater1$lock()

## Let Jenny do the test.
oolong_test_rater2$do_word_intrusion_test()
oolong_test_rater2$do_topic_intrusion_test()
oolong_test_rater2$lock()
```

```{r, include = FALSE}
### Mock this process
set.seed(46709394)
oolong_test_rater1 <- oolong:::.monkey_test(oolong_test_rater1, intelligent = 0.3)
oolong_test_rater2 <- oolong:::.monkey_test(oolong_test_rater2, intelligent = 0)
oolong_test_rater1$lock()
oolong_test_rater2$lock()
```

Get a summary of the two objects.

```{r, step3}
summarize_oolong(oolong_test_rater1, oolong_test_rater2)
```

### About the p-values

The test for model precision (MP) is based on an one-tailed, one-sample binomial test for each rater. In a multiple-rater situation, the p-values from all raters are combined using the Fisher's method (a.k.a. Fisher's omnibus test).

H0: MP is not better than 1/ (n\_top\_terms + 1)

H1: MP is better than 1/ (n\_top\_terms + 1)


The test for the median of TLO is based on a permutation test.

H0: Median TLO is not better than random guess.

H1: Median TLO is better than random guess.

One must notice that the two statistical tests are testing the bear minimum. A significant test only indicates the topic model can make the rater(s) perform better than random guess. It is not an indication of good topic interpretability. Also, one should use a very conservative significant level, e.g. $\alpha < 0.001$.

## About Biterm Topic Model

Please refer to the vignette about BTM.

## About Naive Bayes

Naive Bayes model is a supervised machine learning model. This package supports Naive Bayes models trained using `quanteda.textmodels`.

Suppose `newsgroup_nb` is a Naive Bayes model trained on a subset of the classic [20 newsgroups] dataset.

```r
tokens(newsgroup5$text, remove_punct = TRUE, remove_symbols = TRUE, remove_numbers = TRUE, remove_url = TRUE, spilit_hyphens = TRUE) %>% tokens_wordstem %>% tokens_remove(stopwords("en")) %>% dfm(tolower = TRUE) %>% dfm_trim(min_termfreq = 3, max_docfreq = 0.06, docfreq_type = "prop") -> newsgroup_dfm
docvars(newsgroup_dfm, "group") <- newsgroup5$title
newsgroup_nb <- textmodel_nb(newsgroup_dfm, docvars(newsgroup_dfm, "group"), distribution = "Bernoulli")
```

You can still generate word intrusion and word set intrusion tests.

```{r}
wi(newsgroup_nb)
```

```{r}
wsi(newsgroup_nb)
```

## Validating Dictionary-based Methods

### Creating gold standard

`trump2k` is a dataset of 2,000 tweets from \@realdonaldtrump.

```{r trump2k}
tibble(text = trump2k)
```

For example, you are interested in studying the sentiment of these tweets. One can use tools such as AFINN to automatically extract sentiment in these tweets. However, oolong recommends to generate gold standard by human coding first using a subset. By default, oolong selects 1% of the origin corpus as test cases. The parameter `construct` should be an adjective, e.g. positive, liberal, populistic, etc.

```{r goldstandard}
oolong_test <- gs(input_corpus = trump2k, construct = "positive", userid = "Joe")
oolong_test
```

As instructed, use the method `$do_gold_standard_test()` to start coding.

```{r, eval = FALSE}
oolong_test$do_gold_standard_test()
```

After the coding, you need to first lock the test and then the `$turn_gold()` method is available.

```{r, include = FALSE}
oolong_test$.__enclos_env__$private$test_content$gs <-
structure(list(case = 1:20, text = c("Thank you Eau Claire, Wisconsin. \n#VoteTrump on Tuesday, April 5th!\nMAKE AMERICA GREAT AGAIN! https://t.co/JI5JqwHnMC",
"\"@bobby990r_1: @realDonaldTrump would lead polls the second he announces candidacy! America is waiting for him to LEAD us out of this mess!",
"\"@KdanielsK: @misstcassidy @AllAboutTheTea_ @realDonaldTrump My money is on Kenya getting fired first.\"",
"Thank you for a great afternoon Birmingham, Alabama! #Trump2016 #MakeAmericaGreatAgain https://t.co/FrOkqCzBoD",
"\"@THETAINTEDT: @foxandfriends @realDonaldTrump Trump 2016 http://t.co/UlQWGKUrCJ\"",
"People believe CNN these days almost as little as they believe Hillary....that's really saying something!",
"It was great being in Michigan. Remember, I am the only presidential candidate who will bring jobs back to the U.S.and protect car industry!",
"\"@DomineekSmith: @realDonaldTrump is the best Republican presidential candidate of all time.\"  Thank you.",
"Word is that little Morty Zuckerman’s @NYDailyNews loses more than $50 million per year---can that be possible?",
"\"@Chevy_Mama: @realDonaldTrump I'm obsessed with @celebrityapprenticeNBC. Honestly,  Mr Trump, you are very inspiring\"",
"President Obama said \"ISIL continues to shrink\" in an interview just hours before the horrible attack in Paris. He is just so bad! CHANGE.",
".@HillaryClinton loves to lie. America has had enough of the CLINTON'S! It is time to #DrainTheSwamp! Debates https://t.co/3Mz4T7qTTR",
"\"@jerrimoore: @realDonaldTrump awesome. A treat to get to see the brilliant Joan Rivers once more #icon\"",
"\"@shoegoddesss: @realDonaldTrump Will definitely vote for you. Breath of fresh air. America needs you!\"",
"Ted is the ultimate hypocrite. Says one thing for money, does another for votes. \nhttps://t.co/hxdfy0mjVw",
"\"@Lisa_Milicaj: Truth be told, I  never heard of The National Review until they \"tried\" to declare war on you. No worries, you got my vote!\"",
"THANK YOU Daytona Beach, Florida!\n#MakeAmericaGreatAgain https://t.co/IAcLfXe463",
"People rarely say that many conservatives didn’t vote for Mitt Romney. If I can get them to vote for me, we win in a landslide.",
"Trump National Golf Club, Washington, D.C. is on 600 beautiful acres fronting the Potomac River. A fantastic setting! http://t.co/pYtkbyKwt5",
"\"@DRUDGE_REPORT: REUTERS 5-DAY ROLLING POLL: TRUMP 34%, CARSON 19.6%, RUBIO 9.7%, CRUZ 7.7%...\" Thank you - a great honor!"
), answer = c(4L, 4L, 2L, 5L, 3L, 2L, 4L, 5L, 2L, 4L, 1L, 1L,
4L, 4L, 2L, 4L, 4L, 4L, 4L, 4L), target_value = c(NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA)), row.names = c(NA, -20L), class = c("tbl_df", "tbl", "data.frame"
))
```

```{r gs_locking}
oolong_test$lock()
oolong_test
```

### Example: Validating AFINN using the gold standard

A locked oolong test can be converted into a quanteda-compatible corpus for further analysis. The corpus contains two `docvars`, 'answer'.

```{r}
oolong_test$turn_gold()
```

In this example, we calculate the AFINN score for each tweet using quanteda. The dictionary `afinn` is bundle with this package.

```{r}
gold_standard <- oolong_test$turn_gold()
gold_standard %>% tokens(remove_punct = TRUE) %>% dfm() %>% dfm_lookup(afinn) %>%
    quanteda::convert(to = "data.frame") %>%
    mutate(matching_word_valence = (neg5 * -5) + (neg4 * -4) + (neg3 * -3) + (neg2 * -2) + (neg1 * -1)
           + (zero * 0) + (pos1 * 1) + (pos2 * 2) + (pos3 * 3) + (pos4 * 4) + (pos5 * 5),
           base = ntoken(gold_standard, remove_punct = TRUE), afinn_score = matching_word_valence / base) %>%
    pull(afinn_score) -> all_afinn_score
all_afinn_score
```

Put back the vector of AFINN score into the respective `docvars` and study the correlation between the gold standard and AFINN.

```{r}
summarize_oolong(oolong_test, target_value = all_afinn_score)
```

### Suggested workflow

Create an oolong object, clone it for another coder. According to Song et al. (2020), you should at least draw 1% of your data.

```{r}
trump <- gs(input_corpus = trump2k, exact_n = 40, userid = "JJ")
trump2 <- clone_oolong(trump, userid = "Winston")
```

Instruct two coders to code the tweets and lock the objects.

```{r, eval = FALSE}
trump$do_gold_standard_test()
trump2$do_gold_standard_test()
trump$lock()
trump2$lock()
```

```{r, include = FALSE}
trump$.__enclos_env__$private$test_content$gs <-
structure(list(case = 1:20, text = c("Thank you Eau Claire, Wisconsin. \n#VoteTrump on Tuesday, April 5th!\nMAKE AMERICA GREAT AGAIN! https://t.co/JI5JqwHnMC",
"\"@bobby990r_1: @realDonaldTrump would lead polls the second he announces candidacy! America is waiting for him to LEAD us out of this mess!",
"\"@KdanielsK: @misstcassidy @AllAboutTheTea_ @realDonaldTrump My money is on Kenya getting fired first.\"",
"Thank you for a great afternoon Birmingham, Alabama! #Trump2016 #MakeAmericaGreatAgain https://t.co/FrOkqCzBoD",
"\"@THETAINTEDT: @foxandfriends @realDonaldTrump Trump 2016 http://t.co/UlQWGKUrCJ\"",
"People believe CNN these days almost as little as they believe Hillary....that's really saying something!",
"It was great being in Michigan. Remember, I am the only presidential candidate who will bring jobs back to the U.S.and protect car industry!",
"\"@DomineekSmith: @realDonaldTrump is the best Republican presidential candidate of all time.\"  Thank you.",
"Word is that little Morty Zuckerman’s @NYDailyNews loses more than $50 million per year---can that be possible?",
"\"@Chevy_Mama: @realDonaldTrump I'm obsessed with @celebrityapprenticeNBC. Honestly,  Mr Trump, you are very inspiring\"",
"President Obama said \"ISIL continues to shrink\" in an interview just hours before the horrible attack in Paris. He is just so bad! CHANGE.",
".@HillaryClinton loves to lie. America has had enough of the CLINTON'S! It is time to #DrainTheSwamp! Debates https://t.co/3Mz4T7qTTR",
"\"@jerrimoore: @realDonaldTrump awesome. A treat to get to see the brilliant Joan Rivers once more #icon\"",
"\"@shoegoddesss: @realDonaldTrump Will definitely vote for you. Breath of fresh air. America needs you!\"",
"Ted is the ultimate hypocrite. Says one thing for money, does another for votes. \nhttps://t.co/hxdfy0mjVw",
"\"@Lisa_Milicaj: Truth be told, I  never heard of The National Review until they \"tried\" to declare war on you. No worries, you got my vote!\"",
"THANK YOU Daytona Beach, Florida!\n#MakeAmericaGreatAgain https://t.co/IAcLfXe463",
"People rarely say that many conservatives didn’t vote for Mitt Romney. If I can get them to vote for me, we win in a landslide.",
"Trump National Golf Club, Washington, D.C. is on 600 beautiful acres fronting the Potomac River. A fantastic setting! http://t.co/pYtkbyKwt5",
"\"@DRUDGE_REPORT: REUTERS 5-DAY ROLLING POLL: TRUMP 34%, CARSON 19.6%, RUBIO 9.7%, CRUZ 7.7%...\" Thank you - a great honor!"
), answer = c(4L, 4L, 2L, 5L, 3L, 2L, 4L, 5L, 2L, 4L, 1L, 1L,
4L, 4L, 2L, 4L, 4L, 4L, 4L, 4L), target_value = c(NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA)), row.names = c(NA, -20L), class = c("tbl_df", "tbl", "data.frame"
                                         ))

trump2$.__enclos_env__$private$test_content$gs <-
structure(list(case = 1:20, text = c("Thank you Eau Claire, Wisconsin. \n#VoteTrump on Tuesday, April 5th!\nMAKE AMERICA GREAT AGAIN! https://t.co/JI5JqwHnMC",
"\"@bobby990r_1: @realDonaldTrump would lead polls the second he announces candidacy! America is waiting for him to LEAD us out of this mess!",
"\"@KdanielsK: @misstcassidy @AllAboutTheTea_ @realDonaldTrump My money is on Kenya getting fired first.\"",
"Thank you for a great afternoon Birmingham, Alabama! #Trump2016 #MakeAmericaGreatAgain https://t.co/FrOkqCzBoD",
"\"@THETAINTEDT: @foxandfriends @realDonaldTrump Trump 2016 http://t.co/UlQWGKUrCJ\"",
"People believe CNN these days almost as little as they believe Hillary....that's really saying something!",
"It was great being in Michigan. Remember, I am the only presidential candidate who will bring jobs back to the U.S.and protect car industry!",
"\"@DomineekSmith: @realDonaldTrump is the best Republican presidential candidate of all time.\"  Thank you.",
"Word is that little Morty Zuckerman’s @NYDailyNews loses more than $50 million per year---can that be possible?",
"\"@Chevy_Mama: @realDonaldTrump I'm obsessed with @celebrityapprenticeNBC. Honestly,  Mr Trump, you are very inspiring\"",
"President Obama said \"ISIL continues to shrink\" in an interview just hours before the horrible attack in Paris. He is just so bad! CHANGE.",
".@HillaryClinton loves to lie. America has had enough of the CLINTON'S! It is time to #DrainTheSwamp! Debates https://t.co/3Mz4T7qTTR",
"\"@jerrimoore: @realDonaldTrump awesome. A treat to get to see the brilliant Joan Rivers once more #icon\"",
"\"@shoegoddesss: @realDonaldTrump Will definitely vote for you. Breath of fresh air. America needs you!\"",
"Ted is the ultimate hypocrite. Says one thing for money, does another for votes. \nhttps://t.co/hxdfy0mjVw",
"\"@Lisa_Milicaj: Truth be told, I  never heard of The National Review until they \"tried\" to declare war on you. No worries, you got my vote!\"",
"THANK YOU Daytona Beach, Florida!\n#MakeAmericaGreatAgain https://t.co/IAcLfXe463",
"People rarely say that many conservatives didn’t vote for Mitt Romney. If I can get them to vote for me, we win in a landslide.",
"Trump National Golf Club, Washington, D.C. is on 600 beautiful acres fronting the Potomac River. A fantastic setting! http://t.co/pYtkbyKwt5",
"\"@DRUDGE_REPORT: REUTERS 5-DAY ROLLING POLL: TRUMP 34%, CARSON 19.6%, RUBIO 9.7%, CRUZ 7.7%...\" Thank you - a great honor!"
), answer = c(5L, 3L, 2L, 5L, 3L, 1L, 4L, 5L, 2L, 4L, 1L, 1L,
4L, 4L, 2L, 4L, 4L, 4L, 4L, 4L), target_value = c(NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA)), row.names = c(NA, -20L), class = c("tbl_df", "tbl", "data.frame"
                                         ))
trump$lock()
trump2$lock()
```

Calculate the target value (in this case, the AFINN score) by turning one object into a corpus.

```{r}
gold_standard <- trump$turn_gold()
gold_standard %>% tokens(remove_punct = TRUE) %>% dfm() %>%
    dfm_lookup(afinn) %>% quanteda::convert(to = "data.frame") %>%
    mutate(matching_word_valence = (neg5 * -5) + (neg4 * -4) + (neg3 * -3) + (neg2 * -2) + (neg1 * -1)
           + (zero * 0) + (pos1 * 1) + (pos2 * 2) + (pos3 * 3) + (pos4 * 4) + (pos5 * 5),
           base = ntoken(gold_standard, remove_punct = TRUE), afinn_score = matching_word_valence / base) %>%
    pull(afinn_score) -> target_value
```

Summarize all oolong objects with the target value.

```{r}
res <- summarize_oolong(trump, trump2, target_value = target_value)
```

Read the results. The diagnostic plot consists of 4 subplots. It is a good idea to read Bland & Altman (1986) on the difference between correlation and agreement.

* Subplot (top left): Raw correlation between human judgement and target value. One should want to have a good correlation between the two.
* Subplot (top right): Bland-Altman plot. One should want to have no correlation. Also, the dots should be randomly scattering around the mean value. If it is so, the two measurements (human judgement and target value) are in good agreement.
* Subplot (bottom left): Raw correlation between target value and content length. One should want to have no correlation, as an indication of good reliability against the influence of content length. (See Chan et al. 2020)
* Subplot (bottom right): Cook's distance of all data point. One should want to have no dot (or at least very few dots) above the threshold. It is an indication of how the raw correlation between human judgement and target value can or cannot be influenced by extreme values in your data.

The textual output contains the Krippendorff's alpha of the codings by your raters. In order to claim validity of your target value, you must first establish the reliability of your gold standard. Song et al. (2020) suggest Krippendorff's Alpha > 0.7 as an acceptable cut-off.

```{r}
res
```

```{r diagnosis}
plot(res)
```

## Backward compatibility

Historically, oolong test objects could only be generated with only one function: `create_oolong`. It is no longer the case and no longer recommended anymore. It is still retained for backward compatibility purposes. If you still need to use  `create_oolong()`, the most important parameters are `input_model` and `input_corpus`. Setting each of them to `NULL` generates different tests.

| input\_model | input\_corpus | output                                                                                                                                      |
|--------------|:-------------:|---------------------------------------------------------------------------------------------------------------------------------------------|
| Not NULL     | NULL          | oolong test for validating a topic model with [word intrusion test](#word-intrusion-test)                                                   |
| Not NULL     | Not NULL      | oolong test for validating a topic model with [word intrusion test](#word-intrusion-test) and [topic intrusion test](#topic-intrusion-test) |
| NULL         | Not NULL      | oolong test for [creating gold standard](#creating-gold-standard)                                                                           |
| NULL         | NULL          | error                                                                                                                                       |


## References

1. Chang, J., Gerrish, S., Wang, C., Boyd-Graber, J. L., & Blei, D. M. (2009). Reading tea leaves: How humans interpret topic models. In Advances in neural information processing systems (pp. 288-296). [link](https://papers.nips.cc/paper/3700-reading-tea-leaves-how-humans-interpret-topic-models)
2. Ying, L., Montgomery, J. M., & Stewart, B. M. (2021). Inferring concepts from topics: Towards procedures for validating topics as measures. Political Analysis. [link](https://doi.org/10.1017/pan.2021.33)
3. Song et al. (2020) In validations we trust? The impact of imperfect human annotations as a gold standard on the quality of validation of automated content analysis. Political Communication. [link](https://doi.org/10.1080/10584609.2020.1723752)
4. Bland, J. M., & Altman, D. (1986). Statistical methods for assessing agreement between two methods of clinical measurement. The lancet, 327(8476), 307-310.
5. Chan et al. (2020) Four best practices for measuring news sentiment using ‘off-the-shelf’ dictionaries: a large-scale p-hacking experiment. Computational Communication Research. [link](https://osf.io/preprints/socarxiv/np5wa/)
6. Nielsen, F. Å. (2011). A new ANEW: Evaluation of a word list for sentiment analysis in microblogs. arXiv preprint arXiv:1103.2903. [link](https://arxiv.org/abs/1103.2903)

---