Overview
FAQs
Glossary

Support > Glossary

Corpus:
A collection of documents.

Data (or information) fusion:
The seamless integration of data from disparate repositories - in MetaCarta's case, this means providing a unified view of documents in a geographically oriented interface.

Document collection:
A corpus that shares a common set of metadata within the MetaCarta application.

Gazetteer:
A dictionary of geographic placenames and associated data about the placenames. Placenames can include any natural or manmade object that has a known location, such as continents, oceans, countries, states, provinces, regions, counties, cities, towns, landmarks, buildings and road names. The MetaCarta gazetteer is one of the largest collections in the world with almost 10 million entries.

GeoConfidence:
An estimate of the probability that the latitude and longitude assigned by the MetaCarta software to a placename correspond to the place the author intended.

Geographic Entity Resolution:
A given name n may refer to several points or regions, or refer to a non-geographic concept. The relevance of the document to each mentioned location must also be determined, in order to present the results that best satisfy the need for both correctness and relevance to a query.

Geographic Search:
A method for finding documents using a combination of keywords and maps. Also referred to as information analysis and retrieval of unstructured documents using a combination of keywords and geographic extent.

Geographic Data Module (GDM):
MetaCarta Geographic Data Modules make up the core of MetaCarta products and hosted solutions. A GDM is a knowledge base used to identify and disambiguate geographic references, assign latitude/longitude coordinates, and confidence scores and relevance ranking. Each MetaCarta GDM contains linguistic statistics, gazetteer data, and natural language processing (NLP) logic.

  • Base GDM
  • IHS Global Oil and Gas GDM
  • U.S. Street Address GDM
  • Spanish Language GDM
  • Arabic Language GDM

GeoParsing:
The process of using proprietary natural language processing (NLP) algorithms on an unstructured text document to identify geographic references. MetaCarta's approach to GeoParsing goes beyond simple string matching. By considering all text in a document, MetaCarta considers contextual clues to more accurately determine the exact geographic reference and its location mentioned in a document.

GeoTagger:
GeoTagger is a production-level geographic entity resolver that parses documents, extracts geographic references within the content, and resolves the geographic meaning intended by the author. This allows the system to assign latitude and longitude coordinates and country code tags in an XML output, which may be used as metadata and for processing by third-party systems. Available as an Appliance or OnDemand hosted service.

Ingestion API:
The set of application programming interfaces that allows remote systems to "push" documents into the document processing pipeline. This Web service allows users to develop their own applications/interfaces to process their documents onto the MetaCarta appliance.

MetaCarta document processing pipeline:
The complete sequence of operations applied to documents received from the content repository until the results are ready to be written to the index.

MetaCarta geographic index (CartaTrees):
An optimized text and spatial index that allows documents to be rapidly retrieved based on geographic and textual elements of interest.

MetaCarta probabilistic text model:
Natural language statistical data and tuned geographic rules that allow MetaCarta products to recognize geographic references and generate latitude, longitude and GeoConfidence metadata.

Minimum GeoConfidence:
The lowest GeoConfidence value deemed meaningful enough for a MetaCarta application to write a spatial index entry or tag.

Natural language:
Humans speak and write natural language. It is notoriously ambiguous and unstructured, often requiring the reader to utilize extensive external information to understand and interpret.

Natural language processing (NLP):
Computer understanding, analysis, manipulation, and/or generation of human language.

Query Relevance:
A value between 0 and 1 that indicates how closely the document matches both the geographic and keyword constraints of a search.

Spatial index:
A specific data structure that is used to accelerate queries based on geographic references.