It’s day 2 of the PASS Summit. Yesterday was a whirlwind of a day, after my early-morning post-Halloween trip to Seattle. I spent some time with my Microsoft friends in the SQL Clinic and visited several vendor receptions, followed by a quick visit to the karaoke party.
Today is the Cosmos DB keynote by Rimma Nehme. I’ve heard Dr. Nehme speak on a number of occasions, and I’ve always been delighted and energized by her keynotes. She has a way of presenting that delivers a lot of information in a short amount of time without feeling like she’s rushing through it. I took a handful of notes, but there’s quite a bit that I didn’t get down due to the pace and complexity of the material. Below are some of the highlights.
Cosmos DB
We are generating substantially more data than we did just a few years ago. It is estimated that 90% of the world’s data was created in just the last two years. Much of that data is stored on disparate platforms in dissimilar structures, so there is a strong need to allow easy and seamless access to that data. Out of that, Cosmos DB was conceived.
Cosmos DB began life as Project Florence, way back (!!) in 2010. It is designed to be a globally distributed, scalable, multi-platform database. It integrates with numerous data stores, including relational (various flavors), graph, documents, and others.
It allows simple deployment to your chosen geographical regions, or just a single region if you prefer. It offers policy-based geo-fencing for scenarios requiring data to be stored only in a certain geographical area.
It is clear that a great deal of time has been spent working through consistency models. The traditional relational database model trades some performance for consistency, while the newer NoSQL offerings reduce some of the latency through an “eventually consistent” model. The Cosmos DB architecture treats this as a spectrum, offering a choice of five different consistency models. Dr. Nehme said, “Real world consistency is not a binary choice,” indicating a desire to model the consistency model after business needs of the data.
Performance is managed by the allocation of resource units, or RUs. The RU is the currency of Cosmos DB, and Dr. Nehme described it as the “Bitcoin of Cosmos DB”. RUs can be dynamically assigned based on demand, for example, a sliding allocation of RUs based on time zone-specific spikes in activity.
Cosmos DB boasts guaranteed millisecond latency worldwide. Reads and writes are served from the local region to reduce latency. In one of the diagrams. the system seems to be limited simply by the speed of light. That’s cool!
To make the interaction with Cosmos DB agnostic of the underling schema, the data is represented as JSON files. The system can index those files into B-trees for greater performance. Multiple language can be used to access this data – even JavaScript can be used for UDFs and other business logic.
Implications
It’s becoming more clear with every new advancement that there will soon be no such thing as “just a DBA”. I still hear that phrase used on occasion to describe what one does. The DBA of the future won’t just be a backup/restore operator; rather, he/she will orchestrate complex data structures such as Cosmos DB. For those who call themselves “just a DBA”, the time is now to get out in front of this.
Other Announcements
The PASS board announced that next year’s Summit will be held on November 6-9 (no Halloween – yea!) in Seattle. Registration is currently open. We also learned that there were more than 2000 attendees watching live on PASStv.
Up Next
For me, it’s a day full of meetings, although I hope to catch at least one session before the end of the day. Tonight it’s dinner with friends and a couple of vendor receptions to visit. My session, SSIS and the Cloud: Yes, They Can Get Along, is tomorrow, and will be presented live on PASStv. I hope you’ll join me for that.
Leave a Reply