Data Integration and Enrichment

Endeca provides data integration and enrichment capabilities to help you efficiently combine information from any source into a single integrated view and add value on top of the raw data. Our approach to integrating and enriching source data includes the Endeca Content Acquisition System (CAS), an out-of-the-box data integration tool designed for extracting and enhancing both unstructured and structured data, as well as integration points with ETL packages such as Informatica PowerCenter.

Key Data Integration and Enrichment Features:

Administrative Tools

Data integration and enrichment steps are configurable through an intuitive graphical user interface, the CAS Console, allowing administrators to rapidly select data sources to include in their Endeca applications.

Connectivity

Out of the box, Endeca provides connectivity through ODBC/JDBC, XML, or web services standards, packaged connectors to common repositories such ERP systems, and crawls for information in approximately 500 different file formats from file systems, websites or CMS repositories.

ETL Integration

Endeca integrates with a number of 3rd-party Extract, Transform, Load (ETL) products such as Informatica PowerCenter. File-based outputs from these tools may be loaded through packaged connectors. Additionally, these tools may be configured to load data and configuration information directly to the MDEX Engine web services.

Join Support

Endeca supports the use of joins which allows information from different sources to be combined by any shared attributes across all records. Join support also enables multiple branches of work to converge as appropriate rather than subjecting every record to every possible processing step.

Data Cleansing and Enrichment

Data pipelines are fully configurable and support a number of techniques that improve data quality or augment the metadata on records. Capabilities include rules-based data processing, entity extraction, and statistical processing to extract important values which can be applied as metadata to the source record.

Taxonomy Management

Where available, taxonomies can be included as a data source and used for defining navigation options or defining content enrichment terms. Endeca is also able to build taxonomies directly from attributes in the source data.

Extensibility

A Java-based extension API is provided for cases where direct access to proprietary systems is required. Custom data cleansing or content enrichment packages can also be inserted into the data integration pipeline to provide best-of-breed 3rd-party functionality in addition to Endeca’s capabilities.

See how Endeca is leading the industry in enterprise search technology and empowering more than 600 leading organizations such as ABN AMRO, Boeing, Cox Newspapers, the U.S. Defense Intelligence Agency, Ford Motor Company, Hyatt, IBM, the Library of Congress, Texas Instruments, and Walmart.com.