Process Standard Test Data Format (STDF) Files on Google Cloud

STDF Blog Image

What is STDF?

STDF stands for Standard Test Data Format, it’s a file format and de facto standard used in the semiconductor industry for standardized testing procedures between different ranges of devices. Developed by Teradyne, it majorly focuses on testing semiconductor chips, but the document structure is so robust in a way that it could be used in any other use case.

Format Details And Extraction

STDF can be visualized as an XML. It contains tags that have lots of data related to device-specific data, timestamps, etc. A detailed explanation can be found here. For extracting details, we will need 6 tags – MIR, PTR, PIR, BPS, EPS, MRR. These tags follow the following structure:

  • <MIR> – Starting timestamp recorded here
    • <PIR> – Used to Identify individual devices, and test record number (used to recognize unique reading) x number of devices
    • .
    • .
    • .
    • <BPS> – Starting tag for test phase x number of test phases
      • <PTR> – Test reading based on the number of devices described in PIR x number of test x number of devices
      • .
      • .
      • .
    • <EPS> – End tag for test phase x number of test phases
  • <MRR> – Ending timestamp

Each STDF file is highly compressed with millions of readings for each test. To recognize unique reading, a combination of PIR and PTR is used. PySTDF library was modified to extract CSV results.

Objective

In this blog we will talk about the processing of STDF documents using the Google Cloud Platform and how we can leverage machine learning using BQML to get insights into the data that we can have received from the STDF files. We will go over the architecture flow used in the processing of these documents. How data is being extracted, ingested into BigQuery and finally used to derive insights using BQML.

Architecture Flow

 

STDF on Google Cloud

As the diagram describes, the first Cloud Function parses the STDF file from a GCP bucket where a user has uploaded the files and generates a csv in the second GCP bucket. The csv that gets generated in the second GCS bucket triggers a second Cloud Function, which creates a BigQuery load job for the csv. BigQuery is used as the data warehouse for further visualizations and applying machine learning.

Note: As Cloud Function has a processing time limit, larger STDF files might timeout. To avoid that, replace the Cloud function step with a GCP Compute instance to process the files present in the bucket and upload it to the BQ staging bucket.

STDF Data in BigQuery

The STDF processed data now resides in 2 tables:
– File Details:
Contains timestamps and temperature details the experiment in the file.

STDF Data in BigQuery

 

  • Readings: Contains individual readings for experiments recorded in STDF file.

STDF Readings

The above 2 tables are linked using their experiment names.

Machine Learning and Visualization Applied to STDF Data Using BigQuery

The data contains multiple readings about the different tests, and each test reading can be pass or fail. These readings can be used to cluster together unsupervised and can be helpful to detect anomaly or defects using the pass/fail test data.

BQML clustering model was created using 100 clusters to analyze the data. It helps in grouping the data based on the properties of the reading, which can help to find anomalies and similarities in the data.

Query Editor

Looker was used for quick visualization and its easy access being hosted on the web. Now End Users can verify their reading and extract meaningful information and create stories based on the data. Below shown Visualization example is being used to detect faults or fails and summarise the whole experiment conducted.

 

STDF Data last image

For more information about how SpringML can support your team with data modernization & data visualization projects, feel free to contact us at info@springml.com or find out more about our services at: https://springml.com/data-analytics-and-visualization/