At my last meetup someone asked the question “What’s the best path to be a great data engineer?” After chatting a bit, I shared some of my story and what helped me; warning that there are many paths and mine is not prescriptive of what others should do. My journey is a more traditional path than many, but required a lot of independent learning that anyone could have done. I would like to share a more complete response of my experience and what I learned in hopes it helps others with the question of how to go from where they are to being a data engineer. I will cover this topic in two parts. Part 1 (this post) is about what set the stage for data engineering: my path to get into the industry as a Business Intelligence Consultant.
The path to data engineer started with a bachelor’s degree in computer science with a business systems specialization (plus an international studies minor, a bit less relevant). In the first semester I was taught programming using C++ with a lot of focus on the language syntax and the core concepts of programming: classes, methods, variables, loops, if statements, etc. After that semester things really ramped up and I had much more challenging programming courses including concepts like data structures, web development, and business information systems. Almost without warning we switched to Java for many courses. Though I also had projects involving Visual Basic, PHP, Perl, HTML/CSS, and C (without the ++). The degree program was very focused on getting us to learn how to understand and use new languages, new operating systems, and new databases. In the middle of the “hardcore” computer science and information systems courses was a database course. That is where I felt like things just clicked and made sense. We learned how to develop with Microsoft SQL Server, focused both on design and querying the system. I did great in that course and started volunteering for the database work in some of the large projects that I worked on throughout the rest of my degree.
A key part of my education was the internship I had in the summer break before my senior year, which continued as a part time job throughout the last year of college. As a side note, I was paid for this internship and all interns should be paid. One reason I believe this is I would have struggled a lot had I not been earning money during that time. For once in my life I wasn’t working side jobs that summer. No waiting tables. No landscaping. No cushy work study jobs in the computer lab or front desk of my dorm. I had one primary focus for 2 months, to work 40 hours a week in an office with professionals and contribute value by writing code and designing small feature enhancements. The experience I had was instrumental in the interview process for my first full time job out of college. Beyond getting hired, I had less fear about stepping into corporate offices after spending that time as an employee of one. You would think going from an office environment in Muncie, Indiana to one in Chicago would be a huge change but they were oddly similar.
Only a couple companies specifically recruited students at my university. The one I ended up starting my career with interviewed me for one of two roles: Application Development Consultant or Business Intelligence Consultant. Of course at that time none of us college kids knew what Business Intelligence (BI) meant or whether we had the skills to do it. I don’t think my Java skills impressed them, but the interview for business intelligence was mostly about databases, SQL, and very basic business concepts. The part that I recall the clearest was when my future boss handed me a receipt and asked me to draw out a data model that would support reporting on that data. I drew up a proper normalized data model as I had been taught and waited for him to return to discuss. The discussion was all around questions like “What if you want to report on history of the product category but it has changed since the purchase?” I gave my best guess but it led to him teaching me some basics about type-2 dimensions in data warehousing and hinting at denormalizing the data. I suppose I passed the test just by following what he was teaching me. I quickly accepted this denormalization concept; breaking free of E.F. Codd’s definition of a proper third normal form database. So I headed back home knowing three things. First, business intelligence is all about reporting out of databases. Second, I could become a consultant by working for this company and learning from them. Third, Chicago is expensive but more exciting than St. Louis or Muncie (not there is anything wrong with those places). My decision was to leave behind the known, predictable job options I had and go forward to the one I didn’t quite understand in a city I didn’t yet know.
First day on the job as a BI consultant I was told they need me to focus on data warehousing and ETL. As I sat there wondering if I was supposed to know what ETL meant, they handed me a book about data warehousing by Ralph Kimball and the instructor material for the Business Objects Data Integrator software. Thankfully both of these resources explained that ETL stands for “Extract, Transform, Load” which is the process for moving data from business systems into a database designed for reporting and analytics. I learned a lot more than that from digging into this material for the first week and getting hands on with the ETL software (similar to Informatica or SSIS). I was also fortunate to learn the basics of the Microsoft BI suite — especially SQL Server Integration Services (SSIS) — but was not using those tools every day. Thankfully I pushed to add SSIS to my repertoire since it was a required skill for the next role I found. Over my 4 years as an employee at this consulting company I had the privilege to learn many technologies, work on projects at many organizations, lead training classes, and learn that Chicago is too damn cold. The other great thing that happened in my 4 years in Chicago was meeting my wife. She had a special place in her heart for San Diego and after our first trip here I was convinced.
There are many paths to a career in BI, data engineering, and data science that are different from mine. I do hope it’s clear any path you take requires hard work in learning technology, real world experience, and continued learning to keep up with a fast changing world. It also requires taking risks; one of which was moving again to a new city and trading a role with BI software I was an expert in to one focus on the Microsoft BI suite that I had just recently learned. And once in San Diego the journey from BI Developer to Data Engineer really began, which you can read about in Part 2 – Microsoft BI Developer to Data Engineer.