Google Video Intelligence and Vision APIs – A Powerful & Fun Combination – Recognizing Actors in Near Real Time

Just in time for the holiday lull, here is a fun ML project that processes a popular movie-clip down to the actor’s face and biography in a fully automated pipeline and in near real time! It’s all done in around 200 lines of code. This may seem like a lot of code, but just five years ago it would have required tens of thousands of lines and a headache.

At SpringML, we’ve built many solutions using the versatile Google Cloud suite of APIs but some of the funnest projects have been working with the Google Video Intelligence and Vision APIs in particular. Leveraging these two powerful APIs is like having an army of convolutional neural-network PhDs at your beck and call.

Video Processing Pipeline

Here is a high-level view of the action:


And high-level bullet points:

  1. Feed a clip of your favorite trailer into the Google Video Intelligence API and collect timestamps when it detects people
  2. Extract the frames with FFMPEG and feed into the Google Vision API facial detection function to get individual faces
  3. Feed those faces into the Google Vision API web-detection function and collect names
  4. Finally, feed the name into the Wikipedia API and pull the first paragraph for each name

That’s it! All that in around 200 lines of code which is mind-boggling. Plenty of ways of improving on this concept, customizing the pipeline with different APIs, and extending to your own needs. See the YouTube video link if you want to see it in action. Thanks!

Code and walk through on GitHub


