February 2019

Volume 34 Number 2

[Data Points]

Exploring the Multi-Model Capability of Azure Cosmos DB Using Its API for MongoDB

By Julie Lerman

Julie LermanIn last month’s column (msdn.com/magazine/mt848702), and quite a few before that, I discussed different ways of working with Azure Cosmos DB, the globally distributed, multi-model database service supporting various NoSQL APIs. In all of my work thus far, however, I’ve used only one particular API—the SQL API—which allows you to interact with the data using SQL as the query language. One of the other models lets you interact with a Cosmos DB database using most of the tools and APIs available for MongoDB. This model also benefits from the BSON document format, a binary serialization format that’s compact, efficient and provides concise data type encoding. But how is it possible for a single database to be accessed through these and the other models (Cassandra, Gremlin and Table)? The answer is found in the underlying database engine, which is based on what’s called the atom-record-sequence (ARS) data model, enabling it to natively support a number of different APIs and data models. Each of the APIs is wire protocol-compatible to a popular NoSQL engine and data structure, such as JSON, by which you interact with the data.

It’s important to understand that the multi-model APIs are currently not interchangeable. Cosmos DB databases are contained in a Cosmos DB Account. You can have multiple Cosmos DB accounts in your Azure subscription, but when you create an account, you select which API you’ll be using for the databases in that account. Once selected, that’s the only API you can use for its databases. Azure documentation uses terms like “currently” when discussing this, so the expectation is that at some point there will be more flexibility along these lines.

I’ve been curious about trying out another API. Except from some exploration of Azure Table Storage a while ago (see my July 2010 article at msdn.com/magazine/ff796231), which is aligned with the Table API in Azure Cosmos DB, I haven’t ever used the other types of databases anyway. MongoDB is one of the most commonly used APIs, so that’s the one I decided to investigate, and I’ve been having fun with my first explorations.  I’ll share some of what I’ve learned here, but please keep in mind that this is not intended as a “Getting Started with MongoDB” article. I’d recommend checking out Nuri Halperin’s Pluralsight courses (bit.ly/2SI2Vxw), which include beginner and expert MongoDB content. You’ll find links to many other resources within the article, as well.

The MongoDB support is targeted more toward developers and systems that are already using MongoDB, because Azure provides a lot of benefits. There’s even a guide for migrating data from existing MongoDB databases to Azure Cosmos DB (bit.ly/2FhzmPi).

The first benefit I was able to realize is that using MongoDB allows you to have a local instance of the database on your laptop to work with during development. While this is possible on Windows for the SQL and MongoDB APIs using the Cosmos DB Emulator (bit.ly/2sHNsAn), that emulator only runs on Windows. But you can install MongoDB—I’m using the Community Edition—on a variety of OSes, including macOS. This allows you to emulate the basic features of a Mongo DB API-driven Cosmos DB database locally. I’ll start by working locally, getting a feel for MongoDB and then switching to an Azure database.

You can find installation instructions for all supported platforms in the official MongoDB documents at bit.ly/2S96ywj. Alternatively, you can pull a Docker image for running MongoDB on macOS, Linux or Windows from hub.docker.com/_/mongo.

Once it’s installed, you start the process using the mongod.exe command. Unless otherwise specified, MongoDB expects you to have the directory /data/db created and to have permission to access. You can create this in the default location (for example, c:\data\db in Windows or in macOS, in the root folder where the Application, Library and Users folders live), or specify the location as a parameter of the “mongod” command. Because I’m just testing things out on my dev machine, I used “sudo mongod” to ensure that the service had the needed permissions. The Community Edition will run by default at localhost, port 27017, that is, 127.0.0.1:27017.

There are quite a few ways to interact with MongoDB. While I’m most interested in the C# API, I find it useful to start by working as close to “metal” as possible and then graduate to one of the APIs. If you’re totally new to MongoDB, you might want to start by using its shell (installed along with the service) to do some work at the command line. You can start the shell just by typing “mongo.” MongoDB installs three system databases: local, admin and config. In the shell, type “show dbs” to see them listed.

MongoDB doesn’t have an explicit command to create a new database in the shell or other APIs. Instead, you use a database and the first time you insert data into it, if it doesn’t exist, it will be created. Try this out with:

use myNewDatabase

If you call “show dbs” right afterward, you won’t see myNewDatabase yet.

The shell uses a shortcut “db” to work with the current database object even if the actual database doesn’t yet exist.

Now you need to insert a document. You create documents as JSON and the API will store them into MongoDB in its binary BSON format. Here’s a sample document:

{
  firstname : "Julie",
  lastname : "Lerman"
}

But documents don’t go directly in the database; they need to be stored in a collection. MongoDB follows the same behavior with collections as it does for the database: If you refer to a collection that doesn’t yet exist, it will create it for you when insert-ing data into that collection. Therefore, you can reference a new collection while inserting a new document. Note that every-thing is case-sensitive. I’ll use the insertOne method on a new collection called MyCollection to insert the document. Another document is returned that acknowledges that my document was inserted, and MongoDB provides a unique id key to my inserted document using its own data type, ObjectId:

> db.MyCollection.insertOne({firstname:"Julie",lastname:"Lerman"})
{
  "acknowledged" : true,
  "insertedId" : ObjectId("5c169d4f603846f26944937f")
}

Now “show dbs” will include myNewDatabase in its list and I can query the database with the “find” command of the collec-tion object. I won’t pass any filtering or sorting parameters, so it will return every document (in my case, only one):

>db.MyCollection.find()
{ "_id" : ObjectId("5c169d4f603846f26944937f"), "firstname" : "Julie", 
  "lastname" : "Lerman" }

I’ve barely scratched the surface of the capabilities, but it’s time to move on. Exit the mongo shell by typing “exit” at the prompt.

Using the Visual Studio Code Cosmos DB Extension

Now that you’ve created a new database and done a little work in the shell, let’s benefit from some tooling. First, I’ll use the Azure Cosmos DB extension in Visual Studio Code. If you don’t have VS Code, install it from code.visualstudio.com. You can then install the extension from the VS Code extensions pane.

To open the local MongoDB instance in the extension, right-click on Attached Database Accounts and choose “Attach Database Account.” You’ll be prompted to choose from the Cosmos DB APIs. Select MongoDB. The next prompt is for the address where the database should be found. This will default to the MongoDB default: mongodb://127.0.0.1:27017. Once connected, you should be able to see the databases in the Cosmos DB explorer, including the one created in the shell with its collection and document, as shown in Figure 1.


Figure 1 Exploring the MongoDB Server, Databases and Data

If you edit an opened document, the extension will prompt you to update it to the server. This will happen whether you’re connected to a local or to a cloud database.

MongoDB in a .NET Core App

Now let’s step it up another notch and use the MongoDB database in a .NET Core app. One of the various drivers available is a .NET driver (bit.ly/2BvUEFl). The current version, 2.7, is compatible with .NET Framework 4.5 and 4.6, as well as .NET Core 1.0 and 2.0 (including minor versions). I’m using the latest .NET Core SDK (2.2) and will start by creating a new console project in a folder I named Mongo­Test, using the dotnet command-line interface (CLI) command:

dotnet new console

Next, I’ll use the CLI to add a reference to the .NET driver for MongoDB, called mongocsharpdriver, into the project:

dotnet add package mongocsharpdriver

I’ll create a simple app, leaning on last month’s model with some classes related to the book and TV show, “The Expanse.” I have two classes, Ship and Character:

public Ship()
{
  Crew=new List<Character>();
}
  public string Name { get; set; }
  public Guid Id { get; set; }
  public List<Character> Crew{ get; set;}
}
public class Character
{
  public string Name { get; set; }
  public string Bio {get;set;}
}

Notice that Ship has a Guid named Id. By convention, MongoDB will associate that with its required _id property. Character has no Id. I’ve made a modeling decision. In the context of interacting with ship data, I always want to see the characters on that ship. Storing them together makes retrieval easy. Perhaps, however, you’re maintaining more details about characters elsewhere. You could embed an object that only has a reference to character Id’s, for example:

public List<Guid> Crew{ get; set;}

But that means having to go find them whenever you want a ship with its list of characters. A hybrid alternative would be to add an Id property to Character, enabling me to cross-reference as needed. As a sub-document, the Character’s Id would be just a random property. MongoDB requires only a root document to have an Id. But I’ve decided not to worry about the Id of Character for my first explorations.

There are many decisions to be made when modeling. Most importantly, if your brain defaults to relational database concepts, where you need to do a lot of translation between the relational data and your objects, you’ll need to stop and consider the document database patterns and merits. I find this guidance on modeling document data for NoSQL database in the Azure Cosmos DB docs to be very helpful: bit.ly/2kpF46A.

There are a few more points to understand about the Ids. If you don’t supply a value to the ship’s Id property, MongoDB will create the value for you, just as SQL Server or other databases would. If the root document doesn’t have any property that maps to the _id of the stored document (by convention or your own mapping rules), it will fail when attempting to serialize results where the _id is included.

Working with the mongocsharpdriver API

The .NET driver’s API starts with a MongoClient instance and you work from there to interact with the database and collection and documents. The API reflects some of the concepts I already demonstrated with the shell. For example, a database and collection can be created on the fly by inserting data. Here’s an example of that using the new API:

private static void InsertShip () {
  var mongoClient = new MongoClient ();
  var db = mongoClient.GetDatabase ("ExpanseDatabase");
  var coll = db.GetCollection<Ship> ("Ships");
  var ship = new Ship { Name = "Donnager" };
  ship.Characters.dd(new Character { Name = "Bobbie Draper", 
    Bio="Fierce Marine"});  
  coll.InsertOne (ship);
}

Where I used the command “use database” in the shell, I now call GetDatabase on the MongoClient object. The database object has a generic GetCollection<T> method. I’m specifying Ship as the type. The string “Ships” is the name of the collection in the database. Once that’s defined, I can InsertOne or InsertMany, just like in the shell. The .NET API also provides asynchronous counterparts, such as InsertOneAsync.

The first time I ran the InsertShip method, the new database and collection were created along with the new document. If I hadn’t inserted the new document and had only referenced the database and collection, they wouldn’t have been created on the fly. As with the shell, there’s no explicit command for creating a database.

Here’s the document that was created in the database:

{
  "_id": {
    "$binary": "TbKPi3+tLUK9b68lJkGaww==",
    "$type": "3"
  },
  "Name": "Donnager",
  "Characters": [
    {
      "Name": "Bobbie Draper",
      Bio: "Fierce Marine"
    }
  ]
}

What’s more interesting (to me), however, is the typed collection (GetCollection<Ship>). The MongoDB documentation describes a collection as “analogous to tables in relational databases” (bit.ly/2QZcOcD), which is an interesting description for a document database where you can store random, unrelated documents into a collection. Still, tying a collection to a particular type, as with the “ships” collection, does suggest that I’m enforcing the schema of the ship type in this collection. But this is for the Collection instance, not the physical collection in the database. It informs the particular instance how to serialize and deserialize objects, given that you can store data from any object into a single collection. As of version 3.2, MongoDB did add a feature that enforces schema validation rules, though that’s not the default.

I can use the same Ships collection for other types:

var collChar = db.GetCollection<SomeOtherTypeWithAnId> ("Ships");

However, this would create a problem when it’s time to retrieve data. You’d need a way to identify document types in the collection. If you read last month’s article about the Cosmos DB provider for EF Core (which uses the SQL API), you may recall that when EF Core inserts documents into Cosmos DB, it adds a Discriminator property so you can always be sure what type a document aligns to. You could do the same for the MongoDB API, but that would be a bit of a hack because MongoDB uses type discriminators for specifying object inheritance (bit.ly/2sbHvgA). I’ve added a new class, DecommissionedShip, that inherits from Ship:

public class DecomissionedShip : Ship {
  public DateTime Date { get; set; }
}

The API has a class called BsonClassMap used to specify custom mappings, including its SetDiscriminatorIsRequired method. This will inject the class name by default. Because you’ll be overriding the default mapping, you need to add in the Automap method, as well.

I’ve added a new method, ApplyMappings, into program.cs and am calling it as the first line of the Main method. This specifically instructs the API to add discriminators for Ship and DecommissionedShip:

private static void ApplyMappings () {
    BsonClassMap.RegisterClassMap<Ship> (cm => {
      cm.AutoMap ();
      cm.SetDiscriminatorIsRequired (true);
    });
    BsonClassMap.RegisterClassMap<DecommissionedShip> (cm => {
      cm.AutoMap ();
      cm.SetDiscriminatorIsRequired (true);
    });
  }

I’ve modified the InsertShip method to additionally create a new DecommissionedShip. Because it inherits from Ship, I can use a single Ship collection instance and its InsertMany command to add both ships to the database:

var decommissionedShip=new DecommissionedShip{Name="Canterbury", 
  Date=new DateTime(2350,1,1)};
coll.InsertMany(new[]{ship,decommissionedShip});

Both documents are inserted into in the Ships collection and each has a discriminator added as property “_t.” Here’s the DecommissionedShip:

{
  "_id": {
    "$binary": "D1my7H9MrkmGzzJGSHOZfA==",
    "$type": "3"
  },
  "_t": "DecommissionedShip",
  "Name": "Canterbury",
  "Characters": [],
  "Date": {
    "$date": "2350-01-01T05:00:00.000Z"
  }
}

When retrieving the data from a collection typed to Ship, as in this GetShip method:

private static void GetShips ()
{
  var coll = db.GetCollection<Ship> ("Ships");
  var ships = coll.AsQueryable ().ToList ();
}

the API reads the discriminators and materializes both the Ship and DecommissionedShip objects with all of their data intact, including the Date assigned to the DecommissionedShip.

Another path for mapping is to use a BsonDocument typed collection object that isn’t dependent on a particular type. Check my blog post, “A Few Coding Patterns with the MongoDB C# API”, to see how to use BsonDocuments, as well as how to encapsulate the MongoClient, Database and Collection for more readable code.

Use LINQ for Querying

You can retrieve documents with the .NET API using API methods or LINQ. The API uses a very rich Find method (similar to the shell’s find method), which returns a cursor. You can pass in filters, project properties and return objects using one of its execution or aggregation methods—many of which look like LINQ methods. The .NET API Find method requires a filter, so to get any and all documents, you can filter on new (empty) BsonDocument, which is a filter matching any document. For LINQ, you first need to transform a collection to an IQueryable  (using AsQueryable()) and then use the familiar LINQ methods to filter, sort and execute. If you didn’t include or map the _id property in your classes, you’ll need to use projection logic to take that into account as it will be returned from the query. You can refer to the documentation or to other articles (such as the great series by Peter Mbanugo at bit.ly/2Lqqw2J) to learn more of these details.

Switching to Azure Cosmos DB

After you’ve worked out your persistence logic locally against the MongoDB instance, eventually you’ll want to move it to the cloud-based Cosmos DB. Visual Studio Code’s Azure Cosmos DB extension makes it easy to create a new Cosmos DB account if you don’t have one yet, although you’ll likely want to tweak its settings in the portal. If you’re using Visual Studio, the Cloud Explorer for VS2017 extension has features for browsing but not creating databases, so in that case you’ll need to use the Azure CLI or work in the portal.

Here’s how you can create a new instance with the Azure Cosmos DB API for MongoDB from scratch using the VS Code extension.

First, you’ll need VS Code to be connected to your Azure subscription. This requires the Azure Account extension (bit.ly/2k1phdp), which, once installed, will help you connect. And once connected, the Cosmos DB extension will display your subscriptions and any existing databases. As with the MongoDB local connection shown in Figure 1, you can drill into your Cosmos DB accounts, databases, collections and documents (the Cosmos DB terms for these are containers and items). To create a brand-new account, right-click on the plus sign at the top of the extension’s explorer. The workflow will be similar to creating a local MongoDB database, as I did earlier. You’ll be prompted to enter an account name. I’ll use datapointsmongodbs. Next, choose MongoDB from the available APIs and either create a new Azure resource or choose an existing one to tie to the account. I created a new one so that I can cleanly delete the resource and the test database as needed. After this, you have to select from among the datacenter regions where this one should be hosted. I live in the eastern United States so I’ll pick the East US location. Given that Cosmos DB is a global database, you can control the use of regions in the portal or other apps, but I won’t need that for my demo. At this point, you’ll need to wait a few minutes while the account is created.

Once the new account is created, it will show up in the explorer. Right-click the account and select the “Copy Connection String” option. You can use this to change the MongoDB driver to point to the Azure Cosmos DB instance instead of pointing to the default local instance, as I’ve done here:

var connString=
  "mongodb://datapointsmongodbs:****.documents.azure.com:10255/?ssl=true";
ExpanseDb=new MongoClient(connString).GetDatabase("ExpanseDatabase");

I’ll run a refactored version of the method that inserts a new Ship and a new DecommissionedShip into the Ships collection of the ExpanseDatabase. After refreshing the database in the explorer, the explorer displays the newly created database in my Azure account, collection and documents in the Cosmos DB database, as shown in Figure 2.

The Newly Created Database, Collection and Documents in the Cosmos DB Database
Figure 2 The Newly Created Database, Collection and Documents in the Cosmos DB Database

Not a Mongo DB Expert, but a Better Understanding of Multi-Model

The availability of this API is not meant to convince users like me with decades of experience with SQL to switch to using MongoDB for my Cosmos DB databases. There would be so much for me to learn. The real goal is to enable the myriad developers and teams who already use MongoDB to have a familiar experience while gaining from the many benefits that Azure Cosmos DB has to offer. I undertook my own exploration into the Azure Cosmos DB API for MongoDB to gain a better understanding of the Cosmos DB multi-model capability, as well as to have a little fun checking out a new database. And, hopefully, my experience here will provide some high-level guidance for other developers or clients in the future.


Julie Lerman is a Microsoft Regional Director, Microsoft MVP, software team coach and consultant who lives in the hills of Vermont. You can find her presenting on data access and other topics at user groups and conferences around the world. She blogs at thedatafarm.com/blog and is the author of “Programming Entity Framework,” as well as a Code First and a DbContext edition, all from O’Reilly Media. Follow her on Twitter: @julielerman.

Thanks to the following technical expert for reviewing this article: Nuri Halperin (Plus N Consulting)


Discuss this article in the MSDN Magazine forum