Data engineers job descriptions vary significantly as they are asked to work on many different projects. Yet, there are categories of skills that are consistently desired in a data engineer and serve as a foundation for learning new technologies. Here are the skills I see as most critical for success as a data engineer.
Azure Synapse Analytics just went Public Preview so now you can access all kinds of capability. Here is a quick introduction to what it is and why it matters.
Data Engineer is an exciting and rewarding role. However, many are not sure what a data engineer does. Based on my experience in the field and many discussions with others, I present to you how I define the role Data Engineer!
This is part 2 of my Journey of a Data Engineer series which all started from the question “What’s the best path to be a great data engineer?” Check out Part 1: From College to BI Developer for the path from college through my first role as a BI consultant. In this post I’ll cover the steps… Continue Reading
At my last meetup someone asked the question "What's the best path to be a great data engineer?" My journey is a more traditional path than many, but required a lot of independent learning that anyone could have done. I would like to share a more complete response of my experience and what I learned in hopes it helps others with the question of how to go from where they are to being a data engineer. I will cover this topic in two parts. Part 1 (this post) is about what set the stage for data engineering: my path to get into the industry as a Business Intelligence Consultant.
I'm a believer in remote work as a great enabler of more opportunity and greater productivity. Here are my tips for working remote effectively, including how those in the office can best enable their distributed team members.
Hearing a lot of mention of Data Lakes but still not sure what that means or why anyone cares? This video will cover a brief introduction to what a Data Lake is and why so many organizations are adding them to their analytics ecosystem. To show what interacting with a data lake may look like for a typical data analyst, I included a demo of how you would use Spark SQL to query the data lake from Azure Databricks.
If you are working with Azure Databricks (or many other Azure resources), you may come across the need for a Service Principal in order to configure access to different resources. The steps are fairly straight forward but the terminology is not consistent so this video will walk through the steps and describe where to find the values to use when you authenticate.
When getting started with Azure Databricks for data processing and analytics, you need to create at least one cluster to get started. Check out the video for a quick overview of how to do this from the Azure Portal. I include a quick description of the options you have and an overview of what cluster… Continue Reading
In the world of data science we often default to processing in nightly or hourly batches, but that pattern is not enough any more. Our customers and business leaders see information is being created all the time and realize it should be available much sooner. While the move to stream processing adds complexity, the tools we have available make it achievable for teams of any size.
This presentation covers why we need to shift some of our workloads from batch data jobs to streaming in real-time. We dive into how Spark Structured Streaming in Azure Databricks enables this along with streaming data systems such as Kafka and EventHub. We will discuss the concepts, how Azure Databricks enables stream processing, and review code examples on a sample data set.