Implementation of Lead Scoring using AzureML

In this previous post we discussed how lead scoring can be done by consuming data from various systems.  We have implemented this as an AzureML model and published it as a web service.  However since this uses a custom module (described here), we’re unable to publish this experiment to the gallery

.azureml_leadscoring

 

Here are a few things we learned while building this model using AzureML.

  1. The custom R module functionality in AzureML is awesome and is exactly what we needed to perform sentiment analysis.  Looks like this is specific to R and not offered in Python.  We created a module to handle sentiment analysis of Twitter data.  This part of the code comprised a few different functions along with “opinion lexicon” which is a list of words denoting positive and negative emotions.  All of this code and supporting files were part of the custom module.  The process of updating a module however is a bit cumbersome and involves uploading a zip file manually even for small changes to individual files.
  2. We did run into an error when working with the custom modules though. We kept getting “cannot open the connection” error when trying to read one of the files in the module.  To debug this we looked at the output log (right click, view log, Output Log) and it provided the information that the files were actually placed in an “src” folder.  When we updated the code to use “src” as the base folder, the error went away.
  3. We use the dplyr package’s group_by and summarise functions. However the object returned from these functions is a data table.  AzureML needs the objects returned to be of type data.frame and to fix the error “Error 1000: RPackage library exception: Failed to convert tb_dlf to DataSet”, we converted to dataframe using the “as.data.frame” function.
  4. Unfortunately AzureML does not support network connectivity.  I know providing network access is in AzureML’s backlog – hope they get to it soon since this could be a limitation for production models which need realtime communication.  In the lead scoring experiment, we wanted to connect to Twitter and Yahoo Finance to get data points.  While this worked well in our RStudio environment, we ended up downloading data offline and uploading datasets in order to get the models working in AzureML.

Overall though, this was a fast way to get our lead scoring model working in AzureML.  Looking forward to sharing it with our customers and getting their feedback.