Lemma (linguistics)

A lemma is a word that stands at the head of a definition in a dictionary. All the head words in a dictionary are lemmas. Technically, it is "a base word and its inflections".[1]


A lemma is the word you find in the dictionary. A lexeme is a unit of meaning, and can be more than one word. A lexeme is the set of all forms that have the same meaning, while lemma refers to the particular form that is chosen by convention to represent the lexeme.

In English, for example, run, runs and running are forms of the same lexeme, but run is the lemma.

Morphology

In English, the lemma of a noun is the singular: e.g., mouse rather than mice. In languages with gender, the head word of regular adjectives and nouns is usually the masculine singular. If the language also has cases, the lemma is often the masculine singular nominative.

In many languages, the citation form of a verb is the infinitive: French aller, German gehen, Spanish ir. In English it usually is the full infinitive (to go) although alphabetized without 'to' (go).

Difference between stem and lemma

In computational linguistics, a stem is the part of the word that never changes even when different forms of the word are used. A lemma is the base form of the verb. For example, from "produced", the lemma is "produce", but the stem is "produc-". This is because there are words such as production.[2] When sound (phonology) is taken into account, the definition of the unchangeable part of the word is not so useful. Notice the sound of the words in the example: "produced" /prəˈdjuːst/ versus "production" /prəˈdʌkʃən/.

Some lexemes have several stems but one lemma. For instance "to go" (the lemma) has the stems "go" and "went". Here, the past tense is based on a different verb, "to wend". The "-t" suffix is equivalent to "-ed".

References

  1. Nation, Paul & Waring, Robin 1997. Vocabulary size, text coverage and word lists. In Schmitt, Norbert & McCarthy (eds) Vocabulary: description, acquisition and pedagogy. Cambridge University Press, p9. ISBN 978-0-521-58551-4
  2. "Natural Language Toolkit — NLTK 3.4 documentation". www.nltk.org.