As a data engineer, you should not be trying to convince your colleagues that everything can be a scheduled batch job. It's time to learn how to building streaming data pipelines. For many data engineers, Apache Kafka is the go to platform for enabling real-time data pipelines. Let's quickly cover why and how to get started.
Continue ReadingData engineer roles vary but some core traits stand out for any data engineer. If you missed it, check out my first posts in this series on What is a Data Engineer? and Data Engineer Skills for Success. Let's finish off this series with the traits I see as most critical for success as a data engineer.
Continue ReadingData engineers job descriptions vary significantly as they are asked to work on many different projects. Yet, there are categories of skills that are consistently desired in a data engineer and serve as a foundation for learning new technologies. Here are the skills I see as most critical for success as a data engineer.
Continue ReadingThis is part 2 of my Journey of a Data Engineer series which all started from the question “What’s the best path to be a great data engineer?” Check out Part…
Continue ReadingHearing a lot of mention of Data Lakes but still not sure what that means or why anyone cares? This video will cover a brief introduction to what a Data Lake is and why so many organizations are adding them to their analytics ecosystem. To show what interacting with a data lake may look like for a typical data analyst, I included a demo of how you would use Spark SQL to query the data lake from Azure Databricks.
Continue ReadingThis video we will quickly cover Apache Spark. The goal is to cover why use Spark and where it fits in the data ecosystem. If you want to just get…
Continue ReadingManaging big data is critical for many organizations. Analytics can improve products and inform critical business decisions. Using data can provide distinct advantages, and it’s likely that an organization’s competitors are already leveraging their data.
Continue ReadingDustin Vannoy is a consultant in data analytics and engineering. His specialties are modern data pipelines, data lakes, and data warehouses. He loves to share knowledge with the data science community.
This site is a resource for you to learn about modern data technologies and practices, from kickstart tutorials to blog posts about the latest tips, tricks, and trends. If you are new to data engineering or data science check out the Data Kickstart tutorials.
Subscribe to get occasional email updates
Thank you for subscribing.
Something went wrong.
Your data will not be sold or shared with others