2021

CogText

A joint embedding of cognitive tests and concepts

AI/ML/Data

Cognitive

Repo Dataset Paper

2021

Researchers develop concepts to describe cognitive control, and tests to measure it. But these concepts and tests are not always aligned or even related.

Before generative AI, theory (as in concepts) and practice (as in tests) were linked via extensive literature reviews by human domain experts. This approach was inadequate to track the ever-growing literature. It was also biased and ended up in redundancies and confusions.

To organize the science of cognitive control, I conducted an automated text analysis on a large collection of scientific texts (~ half a million articles, every abstract published about cognitive control during the past century).

I then created a knowledge map to relate cognitive concepts and tests within a single cohesive framework.

The map confirmed the complex nature of cognitive control and showed that human-like control cannot be assessed using only a few tests. It should be, instead, measured by a more complex battery of tests or, as I’m starting to believe now, resting-state measures (when the cognitive system is not performing any obvious behavioral task).

Technically, I used several pre-LLM language models to map text documents into an embedding space. Document embeddings were then used to construct a concept-test graph to ground concepts on tests; by taking advantage of constrained random walks in hyper-graphs (see weighted-metapath2vec).

This joint concept-test graph and corresponding embeddings represented concepts and tests in a semantic space. The graph could be queried for various applications: generating test batteries targeting specific constructs, revealing knowledge gaps in the literature, or inspiring new tests and novel hypotheses.

I also tried more recent GPT-2 and GPT-3 embeddings as soon as they were released. But soon after I wrote down the initial results, the new generation of LLMs has made this work obsolete. I still think that the approach is valid, but the implementation is not worth the effort anymore.