My Transition from Software to Data Engineering - by Stefan Schenk
As a software engineer, do you have the urge to explore more and try something new? If you've done any backend development, you've probably heard buzz-words like NoSQL, big data, Spark, distributed systems, streaming, sharding, and eventual consistency. These are the data storage and processing tools that data engineers use to build applications that push against the boundaries of possibilities. In their hearts, data engineers are software engineers who like to work on data-intensive applications.
Last year, Stefan took the journey from software engineering to data engineering with Xccelerated.
He shares his story below.
Hi, my name is Stefan, and, like a lot of other software engineers, I have worked across the stack of modern web application development. I have a keen interest in data technologies and AI. I joined Xccelerated to become a data engineer and help companies become data-driven.
There's a distinction between data and software engineering in my mind. Data engineers focus on building software, where the primary challenge is the data itself, such as its volume or rate of change, for example. Typically, data engineers are also responsible for the cloud infrastructure needed to store and process data, monitoring, and the security of the data. So, a broad software engineering skillset is imperative.
In practice, a data engineer often works with a data scientist. You might say that data scientists work in the data, and data engineers work on the data. What that means is one group focuses on the mathematical side while the other focuses on the IT side. As engineers, we prepare the data and make sure that the output of the data scientists' work is usable and supports the business. That means ensuring the data is clean, and the IT systems are robust, scalable, and well-tested.
Becoming a data engineer presented me with new challenges, and I was ready to face them with the help of Xccelerated.
My time as a data engineer @Xccelerated
My first month at Xccelerated was a full-time training bootcamp, in which I learned the essentials of data engineering. Afterward, I started at a large energy company where my first assignment was to deliver a training session on Databricks.
Incorporating best practices from software- and data engineering into this training provided developers with the customer insight to improve their code and make it more reliable and maintainable. All operations should be idempotent - meaning that some operations can be applied multiple times, without changing the result after the first application.
This "idempotent" concept is critical in all software layers. During the migration from Airflow to Data Factory + Databricks, I have tried to incorporate this concept to make the pipelines and data science notebooks more consistent and stable. It also makes productionizing notebooks a lot easier. For example, instead of generating a current timestamp on the fix, Notebooks now accepts an execution date-time from Data Factory. This automatic acceptance allows us to take control of the context in which a notebook runs, or reruns.
In addition to facilitating Databricks training sessions, I was also responsible for the implementation of web-scrapers, and ingested event-data from the company's virtual assistant app. Now that the company is striving for smarter webpages, our data scientists need to provide real-time predictions.
Once I completed the bootcamp, deploying ML—models became second nature to me. For example, when a data scientist handed me a pickled model, I had it containerized and deployed as a flask app in Azure (using Docker, Azure Container Registry, and Azure Container Instances) within a few hours. I pitched my solution to the requesting team, and they were delighted because they could start making predictions right away.
Before we moved this process to the production infrastructure, we reviewed it with a solutions architect. We realized it made more sense to move it to AzureML—a more standardized option offered by Azure. If you're curious about this project, check out my blog post, "Moving to AzureML."
I am thrilled that I have chosen to broaden my skill set by using my software engineering skills to become a data engineer!