I was fortunate to attend and speak at my first PASS Summit this week in Seattle. If you don’t already know, PASS Summit is a conference for Microsoft Data Professionals to learn and connect. There was a ton of great info ranging from SQL Server to Databricks, on-premise to Azure cloud, and Power BI to advanced querying techniques. Even some sessions on SQL Server on Kubernetes or AWS RDS.
This is a quick pre-conference post on my top 5 take-aways in case it’s helpful for others to hear my perspective.
Microsoft data community is super welcoming
I have been to Southern California PASS community meetings before and always had good experiences there, but this was a huge conference with people from all over the world. My eyes were opened to how kind and welcoming the community is to newcomers. Multiple speakers who people are looking to for technical guidance took time to encourage everyone to talk with those that are by themselves so that they can feel included. Many people went out of their way to introduce me to others they thought are good people to know, and everyone was very humble and willing to chat. For an introvert, I felt pretty comfortable after just a short time into the conference.
Not everyone is going cloud (yet)
So many people I met aren’t sure when, or if, their company will move to the cloud. My money is that 90% of them will end up there, at least for part of their production environment. Many I met seemed very interested in learning the new skills they will need for the evolving data industry and I’m encouraged to try and make cloud data engineering seem accessible for those that have not started the journey yet. There are a lot of technologies and a lot of service names to learn, but I could sense people really catching on to how they can translate their existing knowledge into the appropriate cloud options to explore.
Azure SQL DB Hyperscale is amazing
Scaling transactional systems horizontally is something that the industry has struggled with forever. Hyperscale is going to keep your data consistent while at the same time scaling storage and compute. This is huge! And it has implications for analytics workloads such as data warehouses.
My quick notes on Hyperscale:
It is really cool technology where they separate the storage engine used by SQL Server and scale that out – called Page Servers instead of Storage Engine. It scales out horizontally by adding more page servers. Each Page server stores up to 128 GB of data pages, has caching build in, and all kinds of interesting things going on to perform well for a reasonable storage cost.
We can scale up by adding more cores very rapidly (spin up new compute in a couple minutes and failover to the new compute near instantaneous). We can also scale out with ready only compute by adding replicas. It’s built on SQL Server engine (2019 I believe) so it’s the same experience SQL Server professionals are used to and can scale up to 100 TB storage (will expand).
Spark is part of the Microsoft Data Platform
People dig Azure Databricks (so I’m not wasting my time telling everyone why and how to use Spark in Azure) but many companies will think about it differently than how I’ve used it so far. Many have invested a lot in different type of SQL Server databases and need that to be highly integrated into their Azure Databricks workspaces. Spark is part of BIg Data Cluster, is running under the covers for Azure Data Factory Mapping Data Flows, and will be a key part of Synapse Analytics. A remaining big question is which place do we go to do Spark and Big Data analytics. We shall see, but I’m expecting for many organizations Azure Databricks is still going to be a good fit for at least some of that work. Viva Spark!
So much more to learn
I was sharing about big data concepts like data lakes and cloud data engineering, but I have a lot more to learn to really know all the options for organizations. Synapse Analytics (SQL DW Gen 2) is coming as a new space for all kinds of analytics work. Big Data Cluster is packed with features that data scientists and engineers love and it offers a way to avoid managing many open source tools separately.
BONUS: Recommended Sessions (if you have access to the recordings)
Databricks / ETL
- 10 Cool Things You Can Do With Azure Databricks – Ike, Simon, Dustin
- An Azure Data Engineer’s ETL Toolkit – Simon Whiteley
- Code Like a Snake Charmer – Introduction to Python! – Jamey Johnston
- Code Like a Snake Charmer – Advanced Data Modeling in Python! – Jamey Johnston
- Cosmic DBA – Cosmos DB for SQL Server Admins and Developers – Michael Donnelly
- CosmosDB – Designing and Troubleshooting Lessons – Neil Hambly
- Data Modeling Trends for 2019 and Beyond – Ike Ellis
- Innovative Data Modeling for Cool Data Warehouses – Jeff Renz, Leslie Weed
Data Warehouse / SQL DB
- Best, Better, Hyperscale! The Last Database You will Ever Need in the Cloud – Denzil Ribeiro
- Introducing Azure Synapse Analytics: The End-to-End Analytics Platform Built for Every Data Professional – Saveen Reddy
- Azure SQL Database: Maximizing Cloud Performance and Availability – Joe Sack, Denzil Ribeiro
- Delivering a Data Warehouse in the Cloud – Jeff Renz
- Data Warehousing: Which of the Many Cloud Products is the Right One for You? – Ginger Grant