Products > MetaCarta GSRP > Ingestion

For the MetaCarta Geographic Search and Referencing Platform to process and index content, it must first connect to the content sources (repositories, Web sites, etc.) to take content into the system.  This intake process has two critical pieces:

  1. Ingestion API
  2. Connector Framework

Ingestion API

The MetaCarta Platform can take in many types of text file, including:

  • plain text (ASCII)
  • rich text
  • HTML
  • Adobe Acrobat (pdf)
  • PostScript
  • XML
  • Microsoft Office files (Word, PowerPoint, Excel)

Administrators can define, edit, and delete intake jobs. These are specific descriptions of jobs to be performed, plus times and methods for performing the intake. Typically, jobs are meant to run multiple times to capture document updates and deletions.

The Connector Framework manages synchronization with individual or multiple repositories through jobs. Jobs can be scheduled to run regularly, on-demand, or continuously.

Connector Framework

The Connector Framework provides a common set of capabilities generally required to synchronize the MetaCarta Platform with content repositories. Fully supported connectors work with the Ingestion API to fetch documents from multiple sources including repositories, Internet, and other storage mediums.

The Connectors provide a simple and convenient way for an organization’s content administrator to configure connections to repositories by defining jobs to maintain synchronization between the repositories and the MetaCarta index. This creates a searchable index using locations and keywords found in the documents.

The Connector Framework provides a Java-based plug-in model for connectors and includes the ability to provide corresponding Java-based authority plug-ins, which look up user authorization information at search time. The Framework also provides a Web-based user interface that allows connectors to be configured, and ingestion jobs to be declared and executed.  It features:

  • Unified architecture for multiple connectors
  • Integrated Web UI for connection and job setup/management
  • Status report on job execution statistics

Authority Connections
For documents to be returned from a search, the Connector must be used in conjunction with some form of authentication. Both Active Directory authentication and Basic HTTP Authentication are supported and used to retrieve end-user document security information to ensure that result sets contain only those documents which the end-user is privileged to view.
.
Automatic Document Change Detection
Incremental updates are driven by the connectors, which detect changes that occurred to the files between job runs. The Connector Framework specifically detects and properly handles file additions, file deletions, and file modifications. In the case of additions, new files are ingested into the index. In the case of deletions, deleted files are removed from the index. In the case of modifications, the newer record replaces the old.

Please view the Connector Framework datasheet to learn more.