Terminology extraction

Terminology extraction is a semi-automated process that aims at creating a representative and relevant list of domain-specific or company-specific terminology. The resulting glossary may serve various purposes, such as:

  • Customisation of a Machine Translation system (terminology coding)
  • Supporting the translation process (multilingual terminology management)
  • Search applications (SEO)
  • Maintaining a company thesaurus (the more classical approach)

Tools can be used to automatically extract a list of term candidates, based on linguistic and/or statistical routines and algorithms.

Then there is a need to distinguish terms from non-terms; this is done during a validation process in which real terms are separated from invalid term candidates. After all, we are only interested in specific terminology and not general vocabulary, which can be found in dictionaries. In the case of multilingual terminology, for instance, translators may often have multiple synonyms to choose from. If the customer has a preference for a particular term, this must be made explicit. Other words that are useful to include in the glossary are those that translators may not be familiar with due to their highly technical nature or infrequent use.

