Cognitive natural language processing

From EverybodyWiki Bios & Wiki

Cognitive Natural language processing (cNLP) is a subfield of cognitive linguistics, computer science, information engineering, and artificial intelligence concerned with the interpretation of cognitive states encoded in human (natural) languages by computer algorithms, in particular measurement of psychological variations and emotional variations in a collection of text. Cognitive linguistics is a modern school of linguistics which Investigates the relationship between, Language, Thinking and Socio-physical experience.[1] The field as such is at the stage of being a broad theoretical descriptive, encompassing empirical data and theoretical frameworks.


The history of cognitive natural language processing (cNLP) is traced to the Connectionism vs. computationalism debate in the 1980's. Computationalism is a specific form of cognitivism that argues that mental activity is computational, that is, that the mind operates by performing purely formal operations on symbols, like a Turing machine. Traditional natural language processing (NLP) which generally started in the 1950s is based on a computationalist approach. Connectionism, which is at the root of cognitive natural language processing (cNLP) stems from the cognitive model of Associationism, which is the idea that mental processes operate by the association of one mental state with its successor states. It holds that all mental processes are made up of discrete psychological elements and their combinations, which are believed to be made up of sensations or simple feelings. The authoritative ground work to resolve this debate is laid out in the 1999 book by George Lakoff & Mark Johnson – Philosophy In The Flesh: the Embodied Mind and its Challenge to Western Thought.[2] In the Appendix titled The Neural Theory of Language Paradigm the authors present three computational approaches to develop algorithms that are congruent with the field of cognitive linguistics.

Models of Language[edit]

Some of the generally accepted earliest recorded attempts at a systematic study of language date to around the 6th century BCE, and are attributed to Indian scholar Pānini, who is generally regarded as the "Father of Linguistics".

Development of modern linguistics can be traced to the 18th century work centering around Indo-European studies and leading to a systematic reconstruction of the Proto-Indo-European language.

Language as a structure

The first half of the 20th century was dominated by the structuralist school, based on the work of Ferdinand de Saussure, Course in General Linguistics[3] published posthumously in 1916 in Europe and Edward Sapir and Leonard Bloomfield in the United States. Methods of Structural linguistics involve collecting a corpus of utterances and then attempting to classify all of the elements of the corpus at their different linguistic levels: the phonemes, morphemes, lexical categories, noun phrases, verb phrases, and sentence types. This was a beginning that provided a framework to organize the structure of language as it existed at a given time.

Language as a process

The static nature of the structural approaches opened up the field to develop other types of linguistics that would begin to point towards Language as a process such as Noam Chomsky's generative grammar, William Labov's sociolinguistics, Michael Halliday's functional linguistics and also modern psycholinguistics.

Language as a phenomenon

The publication of Science and Sanity[4] by Alfred Korzybski in 1933 laid the initial foundation of Language as a phenomenon to be studied using the approaches and disciplines of Phenomenology – the philosophical study of experience and consciousness. Korzybiski argued that, "human knowledge of the world is limited both by the human nervous system and the languages humans have developed." – i.e. through communication with other humans and assimilating what they have experienced through their nervous systems."

The profound implication of this argument is that no one has direct access to reality, given that the most we can know is that which is filtered through the brain's responses to reality and what other people say about their brain's responses to reality through language. The "brain's response to reality" is short form for stating the "human nervous systems' response to the environment which the human organism moves through and interacts with". The physiological mechanisms of the human nervous systems' response were outlined in the work of Sir Charles Scott Sherrington. This philosophical and physiological approach brings us to the common sense experience that we only have access to our perception. And are always working with our maps of our context. In 1933, Korzybiski described the situation as the Map–territory relation thus, "A map is not the territory[5] it represents, but, if correct, it has a similar structure to the territory, which accounts for its usefulness."

This new found emphasis on the inevitable limits of perception, the role of language in shaping that perception, and the necessary utility of the maps formed by Language laid the foundations to begin modeling Language along the lines of phenomenology, with study of consciousness at it's center. A central aspect of study of consciousness is to develop understanding of cognitive processes. This period marks the beginning of Cognitive Linguistics.


The fundamental theory behind Cognitive Natural Language Processing is that language patterns, rather than words alone, express and effect peoples thinking and behavior.[6] In the 1941 book "Language in Thought and Action : How men use words and words use men – an extension of language in action," Author Samuel I. Hayakawa is said to have been motivated to write the book after witnessing ruthless efficiency of the Nazi propaganda machine[7] that aided Adolph Hitler's rise to power. It established the role of language patterns effecting peoples thinking and behavior.

Language patterns further fit into a natural hierarchy that emerges from universal human physiology and sub-cultural linguistic habits.[8] The work of Brent Berlin and Paul Kay in 1969, establishing the universality of basic color terms proposed that the frequency dependent resolution power of the human eye, is sufficient to guarantee the emergence of the hierarchy of color names[9] in a culture, such as black, brown, or red, when combined with number of color terms and the culture has. This along with Eleanor Rosch's study,[10] "Natural Categories" established the relationship between universal human physiology, sub-cultural linguistic habits and the basic categories of organization or prototypical categories of language patterns.

The source of the prototypical categories of language patterns is rooted in universal human developmental experiences. In 1987 the cognitive linguist George Lakoff published "Women, Fire, and Dangerous Things: What Categories Reveal About the Mind"[11] in which Lakoff developed a model of cognition on the basis of cross-cultural universality of cognitive metaphors defined as the mapping of cognitive structures from one domain onto another in the cognitive process. It further established the role of cognitive metaphors on the grammar of several languages, and highlighted the limitations of the classical Structural linguistics and Noam Chomsky's generative grammar approaches.

Collectively, the work of Samuel I. Hayakawa, Brent Berlin, Paul Kay, Eleanor Rosch and George Lakoff established:

  • that language patterns express and effect peoples thinking and behavior, language patterns
  • that language patterns fall into a natural hierarchy that is dependent on universal human physiology and sub-cultural linguistic habits, and
  • that the sub-cultural linguistic habits themselves have a cross-cultural commonality in their source because they stem from cognitive metaphors.

These three theoretical findings enable the development of a computational model for linguistics in the form of Cognitive Natural Language Processing


Cognitive Natural Language Processing uses a nontraditional view of grammar, namely construction grammar.[12] The need for developing a new type of grammar was first highlighted by Noam Chomsky in his 1957 book Syntactic Structures by using "Colorless green ideas sleep furiously" as an example of a sentence that is grammatically correct, but semantically nonsensical. Construction grammar views language to be made up of constructions, or learned pairings of linguistic forms with functions or meanings. Cognitive Natural Language Processing uses principles of construction grammar in the following reverse markov model formula to assign a value to the Relative Measure of Meaning (RMM) to a block of text, a sentence, phrase or word:[13]


RMM, is the Relative Measure of Meaning
Token, is any block of text, sentence, phrase or word
n, is the number of tokens being analyzed
d, is the location of the token along the sequence of n tokens
PMM, is the Probable Measure of Meaning based on a corpora
PF, is the Probability Function specific to a language

Using this approach values are assigned to tokens for two modes of thinking,[14] mode 1 which is Fast and instinctive, and mode 2 which is Slow and deliberate.[15] The tension between the two modes of thinking outlined by Daniel Kahneman provide the quantitative values for Cognitive Natural Language Processing algorithms to measure shifts in emotional states by analyzing text.[16]

Technical Challenges[edit]

In order for Cognitive Natural language processing (cNLP) algorithms to extract meaning based on context and sub-context there needs to be a way to consistently delineate unit packets of thought from a block of text. A sentence may a part of unit of thought or contain many thoughts. Line breaks may be intentional or accidentally imposed by the printing or viewing medium. Punctuation may be dependable or may be not depending on the author. Any algorithm that if blind to these realities will not provide consistent results. Beeferman, D., Berger, A. & Lafferty, J. of the School of Computer Science, Carnegie Mellon University, Pittsburgh, PA presented a starting point in their paper 1999 titled Statistical Models for Text Segmentation[17]stating that the task of text segmentation is to divide text into segments, such that each segment is topically coherent, and cutoff points indicate a change of topic. Their approach required supervision and training sets. In 2001 Masao Utiyama and Hitoshi Isahara, of the Communications Research Laboratory, Kyoto, Japan[18]improved upon the work of Beeferman et al and proposed a statistical method that finds the maximum-probability segmentation of a given text and does not require training data because it estimates probabilities from the given text.

A rules based approach is proposed in the 2006 book Cognitive Linguistics, An Introduction.[19] This approach lays the rules for cut-off points between text segments based on a small set of formal grammatical patterns.

Continued development and combination of the above approaches is a focus for improving the confidence of Cognitive Natural language processing (cNLP) algorithms.


The central application of Cognitive Natural language processing (cNLP) is to use computing methods to reverse engineer states-of-mind and emotional variances from language. In other words to red between the lines and begin to understand the person or people behind any given collection of text.

It is established that language both sends out signals of a person's state-of-mind[20][21][22] and shapes the way a person thinks.[23][24] The FBI case that popularized the use of linguistics to profile a person, was the 1990's hunt for the Unabomber.[25] The Forensic Linguistics community is keen to apply developments between then and now to see if the principles and new technologies will help reveal the identity[26] of the anonymous senior official in the Trump administration who wrote a 2018 op-ed about the Trump administration.[27]

While such cases grab headlines, businesses are applying cNLP to make better hiring choices,[28] and understanding their customers.[29]

Warren Buffett, in his 2013 letter to investors, recommended Rittenhouse's book,[30] Investing Between The Lines[31] because in it, Rittenhouse, who had raised a red flag on Enron, much before it collapsed[32] offers clues to separate the facts from the fluff in annual reports and quarterly earnings calls by looking for language that indicates to investors, whether the management is telling the truth in the earning calls or misleading them.

In the near future we can expect to see tools that help co-workers gauge each other's emotional states through email and digital communications to communicate better.

See also[edit]

  • Cognitive linguistics
  • Psycholinguistics
  • Sociolinguistics
  • Neurolinguistics
  • Linguistic relativity
  • Primary metaphor
  • Stochastic grammar

Further reading[edit]

  • "The Linguistics Wars." Harris, Randy Allen. Oxford University Press. 1995. ISBN 9780199839063 Search this book on Logo.png..
  • "Where Mathematics Comes from: How the Embodied Mind Brings Mathematics Into Being," George Lakoff, Rafael E. Núñez, Baisc Books, 2000, ISBN 978-0-465-03771-1 Search this book on Logo.png.
  • "More Than Cool Reason: A Field Guide to Poetic Metaphor," George Lakoff, University of Chicago Press. 1989 ISBN 978-0-226-46812-9 Search this book on Logo.png..
  • "Metaphors We Live By," George Lakoff, Mark Johnson, University of Chicago Press, 1980, ISBN 978-0-226-46801-3 Search this book on Logo.png..
  • "Patterns in Language and Linguistics: New Perspectives on a Ubiquitous Concept" Volume 104 of Topics in English Linguistics [TiEL], Editors, Beatrix Busse, Ruth Moehlig-Falke, Walter de Gruyter GmbH & Co KG, 2019, ISBN 978-3-11-059551-2 Search this book on Logo.png.
  • "A course in Cognitive Linguistics" by Prof. Martin Hilpert, Professor of English Linguistics at the University of Neuchâtel.


  1. Evans, Vyvyan (2019). Cognitive Linguistics A Complete Guide. ISBN 978-1-4744-0523-2. Search this book on Logo.png
  2. Lakoff, George (8 October 1999). Philosophy in the Flesh: the Embodied Mind & its Challenge to Western Thought. ISBN 978-0-465-05674-3. Search this book on Logo.png
  3. Course in General Linguistics. Columbia University Press. June 2011. pp. 336 Pages. ISBN 978-0-231-15726-1. Search this book on Logo.png
  4. Science and Sanity: An Introduction to Non-Aristotelian Systems and General Semantics. Institute of General Semantics; 5th edition. April 1, 1995. pp. 927 pages. ISBN 0-937298-01-8. Search this book on Logo.png
  5. [[[Map–territory relation#cite note-4]] "Map–territory relation"] Check |url= value (help). Wikipedia.
  6. Language in Thought and Action. Harcourt. 1990. pp. 196 pp (5th edition paperback). ISBN 978-0-15-648240-0. Search this book on Logo.png
  7. "Samuel Ichiye (Sam) Hayakawa" (PDF). Gov Info.
  8. Basic Color Terms: Their Universality and Evolution. Center for the Study of Language and Inf. April 1, 1995. pp. 212 pages. ISBN 978-1-57586-162-3. Search this book on Logo.png
  9. [[[Basic Color Terms: Their Universality and Evolution#cite note-3]] "Basic Color Terms: Their Universality and Evolution"] Check |url= value (help).
  10. [[[Prototype theory#cite note-1]] "Prototype theory"] Check |url= value (help).
  11. Women, Fire and Dangerous Things: What Categories Reveal About the Mind. University of Chicago Press. 15 April 1990. pp. 632 pages. ISBN 978-0-226-46804-4. Search this book on Logo.png
  12. Constructions at Work: The Nature of Generalization in Language. Oxford University Press. 15 April 1990. pp. 288 pages. ISBN 978-0-226-46804-4. Search this book on Logo.png
  13. > US Active US9269353B1, Manu Rehani ; Warren L. Wolf, "Methods and systems for measuring semantics in communications", published 2016-02-23, issued 2016-02-23, assigned to Lingo Ip Holdings LLC 
  14. "Two Brains Running". The New York Times.
  15. Kahneman. Thinking, Fast and Slow. FSG Adult. pp. 499 pages. ISBN 978-0-374-53355-7. Search this book on Logo.png
  16. "What Daniel Kahneman Knows About Your Gut (Decisions)". Forbes. Retrieved March 7, 2019.
  17. Beeferman, Doug; Berger, Adam; Lafferty, John (February 1999). "Statistical Models for Text Segmentation". Machine Learning. 34: 177–210. doi:10.1023/A:1007506220214.
  18. Utiyama, Masao; Isahara, Hitoshi (July 2001). "A statistical model for domain-independent text segmentation". ACL '01: Proceedings of the 39th Annual Meeting on Association for Computational Linguistics: 499–506. doi:10.3115/1073012.1073076.
  19. Cognitive Linguistics: An Introduction (1 ed.). Routledge. March 5, 2006. pp. 856 pages. ISBN 978-0-8058-6014-6. Search this book on Logo.png
  20. "How language shapes the way we think". TED.
  21. "How to spot a liar". TED.
  22. Meyer, Pamela (July 20, 2010). Liespotting: Proven Techniques to Detect Deception. St. Martin's Press. ISBN 978-0-312-60187-4. Search this book on Logo.png
  23. Boroditsky, Lera (September 7, 2021). 7,000 Universes: How the Language We Speak Shapes the Way We Think. p. 336 pages. ISBN 978-0-385-53819-0. Search this book on Logo.png?tag=everybodywikien-20
  24. 7,000 Universes: How the Language We Speak Shapes the Way We Think. Doubleday. pp. 336 pages. ISBN 978-0-385-53819-0. Search this book on Logo.png?tag=everybodywikien-20
  25. "FBI Profiler Says Linguistic Work Was Pivotal In Capture Of Unabomber". NPR. Retrieved August 22, 2017.
  26. "Using Forensic Linguistics To Decode An Anonymous Writer's Identity". wbur. Retrieved November 18, 2019.
  27. "I Am Part of the Resistance Inside the Trump Administration". The New York Times. Retrieved September 5, 2018.
  28. "The Telltale Sign a New Hire Isn't Fitting In". The Wall Street Journal. Retrieved January 10, 2017.
  29. Krishnamoorthy, Srikumar (May 2015). "Linguistic features for review helpfulness prediction". Expert Systems with Applications. 42 (7): 3751–3759. doi:10.1016/j.eswa.2014.12.044.
  30. "Book Review: Investing Between The Lines". Seeking Alpha.
  31. Investing Between the Lines: How to Make Smarter Decisions By Decoding CEO Communications. McGraw-Hill Education. pp. 304 pages. ISBN 978-0-07-171407-5. Search this book on Logo.png

Others articles of the Topic Language : Google Translate, Glossary of Internet-related terms, Latin, Simplified Chinese characters, Languages of Central Asia, New Romanization of Korean, Chữ Hán

This article "Cognitive natural language processing" is from Wikipedia. The list of its authors can be seen in its historical and/or the page Edithistory:Cognitive natural language processing. Articles copied from Draft Namespace on Wikipedia could be seen on the Draft Namespace of Wikipedia and not main one.