Creating and Maintaining Pipelines for Machine Learning Operations
Training accurate models to solve complex problems requires not only a sufficiently powerful model architecture, but also a robustly designed supporting system behind it to supply the data it needs to be trained, and subsequently begin making predictions in a production environment.
This supporting system is the data pipeline. Until relatively recently this topic failed to produce as much interest as the systems which they support, but as a result of the benefit which some time and work into them can provide, more attention has been garnered for these pipelines in recent years.
In this talk we will go over the basic principles of ETL, and how those principles are used to construct a pipeline.
Additionally, we will cover any nuances that may result from integrating NLP models into this design methodology.
We will then take a closer look at what happens at each layer of an ETL pipeline, as well as some notes on industry trends and best practices.
Zachary Wimpee
Data Engineering Analyst at Pieces
Zach is a data engineering analyst at Pieces working as part of their integration team, which focuses on ensuring the integrity of the data pipelines for use by downstream stakeholders on the NLP team.
His background is in physics and mathematics, graduating in December 2020 from Angelo State University