Transmogrif.ai – Initial Observations Of A Promising Framework

The first time I heard the term Transmogrification was when my 9 year old son “transmogrified” from Bears to Webelos in his boy scout troop.  It took me a few tries to pronounce the word right at that time – something that my colleague Prabhu had to endure when he talked to me about transmogrif.ai.  Once we had that out of the way, and we dug deeper into the framework we knew immediately that this was a serious attempt by some of the best minds in AI to simplify and automate machine learning for structured data.  

There are a few libraries that do a good job on unstructured data such as images and text.  It may sound weird but in many ways, image analysis is easier to automate because of the evolution of techniques such as object detection.  After all, all images are consistent in that they are matrices of RBG values. But when we deal with structured data, applying machine learning to it becomes a matter of understanding data via data exploration and then performing data and feature engineering before the data is ready for any machine learning tasks.  This consumes a lot of time for a data scientist.

This is exactly the area where TransmogrifAI claims to help.  According to their docs “It has Transformers and Estimators that make use of Feature abstractions to automate feature engineering, feature validation, and model selection.”

We decided to take this for a spin.  After toying with the Titanic dataset example provided we implemented a simple pipeline on an Opportunity dataset to help predict if a sales opportunity will win or not.  Traditionally this type of a machine learning problem would involve understanding the various custom fields that are in the Salesforce CRM system, and performing feature engineering to extract the best features that would give us the most accurate model.

Here are three takeaways and initial observations from our pilot.

  1. Extracting response and predictor features using “FeatureBuilder” and then automating feature selection using “transmogrify” simplifies the process of feature engineering and at the same time provides control to the user.  Powerful!
  2. Given this runs on top of spark immediately makes it attractive.  We are a heavy user of Spark and have contributed several connectors including a Salesforce Spark connector that’s being used in the field.
  3. No Python client means there’s a bit of a learning curve.  Hope a Python client will be made available soon.

Overall we are impressed with what we see and look forward to using this in our projects.