An aspiring data engineer recently reached out to me for some guidance on pivoting into the field from a software development background. The questions they asked are similar to what others have asked me in the past, so I decided to capture my responses here. I link to prior posts and other resources when possible… Continue Reading
Getting Started with Spark Structured Streaming – Current 22
I am honored to speak at Current 22. The example notebook that I walk through towards the end is available at https://github.com/datakickstart/datakickstart-databricks-workspace/blob/main/stackoverflow/stackoverflow_streaming.py.
Run SQL Server locally on Docker
I recently came across the need for a locally running SQL Server instance so that I could attach a database and deploy to Azure SQL. The windows 10 laptop I am using does not having SQL Server Developer edition installed yet, so I decided to set it up using Docker. What I like about using… Continue Reading
Intro to Azure Stream Analytics
Real-time data processing is becoming more common in companies of all sizes. The use cases range from simple stream ingestion to complex machine learning pipelines. If you need to get started with streaming in Azure, Stream Analytics gives you a simple way to get up and running. Most of my streaming projects involve Apache Kafka and Spark which can take a lot of setup (or at least involving additional vendors to simplify the experience). Those technologies are great especially for challenging streaming pipelines, but if your data platform is within Azure you should consider if Stream Analytics will meet your needs.
Learn Python – Resource List
I get asked about getting started with Python a lot since it's the language I recommend for someone wanting to break into data engineering (unless they already know Scala or Java since those are heavily used also). In this post I share some Python resources that I think will help you learn, whether you are brand new to development or a seasoned developer who just wants to pick it up as an additional language.
Azure Data Lake FAQ
Questions I have been asked around Data Lakes, Azure Databricks, Azure Synapse Analytics, and Delta Lake.
Why Apache Kafka?
As a data engineer, you should not be trying to convince your colleagues that everything can be a scheduled batch job. It's time to learn how to building streaming data pipelines. For many data engineers, Apache Kafka is the go to platform for enabling real-time data pipelines. Let's quickly cover why and how to get started.
Top Traits of a Data Engineer
Data engineer roles vary but some core traits stand out for any data engineer. If you missed it, check out my first posts in this series on What is a Data Engineer? and Data Engineer Skills for Success. Let's finish off this series with the traits I see as most critical for success as a data engineer.
Data Engineer Skills for Success
Data engineers job descriptions vary significantly as they are asked to work on many different projects. Yet, there are categories of skills that are consistently desired in a data engineer and serve as a foundation for learning new technologies. Here are the skills I see as most critical for success as a data engineer.
What is a Data Engineer?
Data Engineer is an exciting and rewarding role. However, many are not sure what a data engineer does. Based on my experience in the field and many discussions with others, I present to you how I define the role Data Engineer!