SpringML’s Data Integration for Wave Analytics

The term “data integration” for most customers means complex projects that involve countless hours spent on extracting, transforming and joining data. In this post I’d like to draw a distinction between two types of integrations.

  1. Integration between transactional systems
  2. Integration from various systems to an analytics engine

The first type involves integration between two transactional systems, say between CRM and Financial systems. The purpose of such integrations is to allow business processes that span multiple systems to be functional e.g. create an invoice in the financial system based on data in CRM. Here integrations need to handle mis-matches such as field name (e.g. customer ID vs. account ID), data grain (line items in one system may map to header level in another), field length, and several others that tend to make such integrations time consuming and complex to implement.

The second type integrates data from source systems into a modern analytic engine such as Salesforce Wave which are built on top of NOSQL databases. Such target systems have the flexibility to ingest structured and un-structured data without having to define database tables with field names, length and relationships in advance. This allows the integration layer to push source system data into Wave without worrying about the mis-matches mentioned earlier. This is how SpringML’s data integration layer works.

To be sure, data mapping and transformation still needs to be done in order to enable dashboards – however this work can be implemented within Wave. Deferring this activity to Wave has an added advantage when dashboards need to changed. Since raw data is already in Wave, development is limited to Wave and the integration layer doesn’t have to be touched.

Key Characteristics SpringML Application ETL
Time to Value Lightweight integration layer hence quick to implement Typically takes longer to implement
Design & Target System Purpose built for a NOSQL system like Salesforce Wave General purpose tools that support various target systems
Support current connectors and future releases Provided as part of subscription Custom Services Engagement
Field transformation capabilities Defers to Wave Mapping functionality in ETL layer
Data volume Able to handle large volumes of data Depends on ETL tool
Future Dashboard Changes Typically handled within Wave Typically involves Wave and ETL layer changes

SpringML’s data management layer uses Apache Spark a lightning fast cluster compute platform including a framework that allows connectivity to various backend systems. We are proud to have contributed connector packages (https://spark-packages.org/?q=springml) to various systems including Salesforce, Workday, Netsuite, Zuora and Aria to this eco-system – these connectors will continue to evolve with the support of the developer community.