January 2019

Volume 34 Number 1

[Data Points]

A Peek at the EF Core Cosmos DB Provider Preview

By Julie Lerman | January 2019

The EF Core Cosmos DB provider is in preview. All information is subject to change.

Julie LermanRegular readers of this column will know that I do a lot of work with Entity Framework (EF) and the newer EF Core, and that also I’m a big fan of Azure Cosmos DB. At first glance, you might think that an Object Relational Mapper (ORM) and a document database such as Cosmos DB would have nothing to do with one another. ORMs are designed to solve the problem of mapping objects to relational databases. With a document database or other type of NoSQL databases, you can just store objects and graphs of objects as JSON and not have to worry about the constraints of a relational database. So why would I, or the EF team or other developers be putting the two together?

Why Use an ORM with a NoSQL Database?

When the earliest betas of EF Core arrived in the form of EF7, they included a proof-of-concept provider to interact with Azure Table Storage, the only NoSQL database on Azure at the time. After trying that out, I realized an important advantage that EF7 provided, and that same advantage exists now with EF Core and Azure Cosmos DB. The clients for interacting with the database, such as the .NET client or the Node.js SDK, require a whole lot of setup code to identify the database, the containers and the query objects. (Note that upcoming versions of the clients will reduce this complexity.) But with EF Core, the first thing I noticed was that I didn’t have to write all of that extra code. I just provided a connection string and was good to go. In fact, while acknowledging that this data store isn’t relational, I don’t have to focus on that while doing perfunctory tasks like querying or saving data. I get to take all of my experience of interacting with data stores using EF Core and just point them to a different data store. The provider takes care of the interpretation of queries and the creation and reading of the JSON documents that get stored into the database.

While the provider can also create databases and containers using conventions, it’s still up to you to create the Cosmos DB accounts for the databases and determine their configuration, as well as tune each database to improve performance and reduce costs. These are tasks you perform with any relational database, so it’s no different when pointing EF Core to an Azure Cosmos DB database.

The new Cosmos DB provider for EF Core comes as a preview with EF Core 2.2. It’s expected to be fully released with EF Core 3.0. But I’ve been so curious about using it ever since I tested out the proof of concept for Azure Table Storage more than two years ago, so I decided not to wait until it’s fully released to check it out. And I’m sure many of you have been curious about it, as well.

Cosmos DB Client Tools

The Azure Portal has a great Data Explorer for viewing your Cosmos DB databases, containers and documents, but sometimes you don’t want to go back and forth from your IDE to a Web site. Both Visual Studio Code and Visual Studio 2017 have great extensions for viewing and editing documents in your Cosmos DB databases. VS Code has the Azure Cosmos DB extension (bit.ly/2SuTXmS) and Visual Studio has the Cloud Explorer for Visual Studio 2017 extension (bit.ly/2G3SNxj). The Azure Cosmos DB extension also lets you work with pre-existing Cosmos DB accounts to create and remove databases and containers on the fly.

Getting the Provider into Your Solution

The provider works like any other EF Core provider. You reference its package in your project and then specify it in OnConfiguring or, if you’re using ASP.NET Core, when defining the DbContext in Startup.

The provider is named Microsoft.EntityFrameworkCore.Cosmos. (The name has been shortened from earlier versions, if you were scratching your head over that.) And as with any package, there are a number of ways to add it to your project. In Visual Studio 2017, you can use the Package Manager. At the command line, you can add it with:

dotnet add package Microsoft.EntityFrameworkCore.Cosmos

Or if you want to simply open up the project file, you can add it in to an ItemGroup section with the following PackageReference:

<ItemGroup>
  <PackageReference Include="Microsoft.EntityFrameworkCore.Cosmos"/>
</ItemGroup>

Once you’ve added the package, you need to tell your DbContext to use the provider. I have a DbContext named ExpanseContext that maps my simplistic model of a TV show/book series I’m fond of: “The Expanse,” to my data store—an Azure Cosmos DB database.

As with any of the other providers, this package will give you the UseCosmos extension method on the DbContextOptionsBuilder. From there you need to provide the three key elements of an Azure Cosmos DB connection string: the AccountEndpoint value, the Key value and the name of the database.

You can copy the connection string from within the Azure portal, the Azure CosmosDB extension for VS Code or the Cloud Explorer for Visual Studio 2017 extension. The format required by UseCosmos isn’t the same format as the connection string, but the connection string does give you a head start. You need to supply the three values as three comma-separated parameters. Figure 1 shows an example where I’m configuring the connection directly in the OnConfiguring method of the ExpanseContext class.

Figure 1 Specifying the Cosmos DB Connection in the DbContext Class

using Expanse.Classes;
using Microsoft.EntityFrameworkCore;
public class ExpanseContext : DbContext
{
  public DbSet<Consortium> Consortia{get;set;}
  public DbSet<Planet> Planets { get; set; }
  public DbSet<Character> Characters { get; set; }
  protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
  {
    optionsBuilder.UseCosmos(
      "https://lermandatapoints.documents.azure.com:443",
      "theverylongaccesskeygoeshere",
      "ExpanseCosmosDemo"
  );
}

The connection parameters are:

  • My account endpoint
  • A substitute for my key value and
  • The name of the database, ExpanseCosmosDemo

If you’re building an ASP.NET Core app, you can configure the context in the ConfigureServices method of the Startup class as follows:

services.AddDbContext<ExpanseContext>(options=>
  options.UseCosmos(
    "https://lermandatapoints.documents.azure.com:443",
    "theverylongaccesskeygoeshere",
    "ExpanseCosmosDemo"
);

There are some other things you can configure with respect to the Cosmos DB database, but I’d like to show you some of the basic, default behavior first. Part two of this article will demonstrate additional configurations.

The Model Classes

You’ll need to be familiar with my model, shown in Figure 2. The Expanse tale is about two competing Consortiums, the United Nations and the Martian Congressional Republic. I’ve dumbed it down to reduce the complexity of changing allegiances or of characters who just go totally rogue and shun both consortiums. Respecting that in the model would be a Domain-Driven Design (DDD) lesson.

Figure 2 The Expanse Object Model for My Demo

public class Consortium
  {
    public Consortium()
    {
      Ships=new List<Ship>();
      Stations=new List<Station>();
    }
    public Guid ConsortiumId { get; set; }
    public string Name { get; set; }
    public List<Ship> Ships{get;set;}
    public List<Station> Stations{get;set;}
    public Origin Origin{get;set;  }
  }
  public class Planet
  {
    public Guid PlanetId { get; set; }
    public string PlanetName { get; set; }
  }
  public class Ship
  {
    public Guid ShipId {get;set;}
    public string ShipName {get;set;}
    public Guid PlanetId {get;set;}
    public Origin Origin{get;set;}
  }
  public class Origin
  {
    public DateTime Date{get;set;}
    public String Location{get;set;}
  }

This leaves me with a handful of simplistic classes and, again, for the sake of focusing on the behavior of the provider, I’ve left these as CRUD classes with no real business logic. I’ll skip the more interesting Space Station and character classes as I won’t be working with them in the demo.

In addition to configuring the provider, the context class specifies DbSets for both the Consortium and Ship entities. Here’s the full listing of my context class, ExpanseContext:

public class ExpanseContext : DbContext
{
  public DbSet<Consortium> Consortia{get;set;}
  public DbSet<Ship> Ships { get; set; }
  protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
  {
    optionsBuilder.UseCosmos(
      "https://lermandatapoints.documents.azure.com:443",
      "theverylongaccesskeygoeshere",
      "ExpanseCosmosDemo");
  }
}

Letting EF Core Create the Cosmos DB Database

An Azure Cosmos DB database is tied to a Cosmos DB account and each database stores its documents in various subsets called containers. You may know these as collections in Cosmos DB, but the terminology has recently changed. The documents are now referred to as items. Don’t think of containers as relational database tables, though. They are very different. The documentation refers to them as “the unit of scalability for both provisioned throughput and storage of items.” Document databases don’t predefine data structures, so it’s possible to store differently structured documents in a single container. If you’re new to document databases, check out my earlier article, “What the Heck Are Document Databases?” at msdn.com/magazine/hh547103. Because Cosmos DB is designed for storing large amounts of data, determining the balance between different containers and partitions is an important process. You should have some familiarity with it before embarking in order to manage performance and costs. You can, of course, fine-tune these aspects as you learn more about how your data is used. I recommend watching the video, “Modeling Data and Best Practices for the Azure Cosmos DB SQL API,” at bit.ly/2FZtIDs.

The EF Core Database.EnsureCreated method can create an Azure Cosmos DB database, as well as any needed containers In Azure. If you’re using the Windows-based Cosmos DB emulator (bit.ly/2sHNsAn), you can target a local version of the database during development. You’ll need to have an existing Cosmos DB account in advance. Most of the important decisions you need to make that impact performance and costs are made when creating an Azure Cosmos DB Account and provisioning the throughput on containers. The fact that EF Core might be creating new databases or containers doesn’t force you to accept some unknown defaults, but there are two defaults to be aware of: First, that EF Core will store all of the documents in a single partition and, second, that it will create a container using the name of your DbContext, storing all of the documents into that container. The reason for this is that the number of containers you have in your database impacts the cost. Therefore, if you don’t already have a plan for distributing your items, you can start with everything in one container and one partition, then configure additional partitioned containers as needed to improve performance—again, striving for that balance between performance and cost.

My demo application is just a console app. Its main method triggers a call to my CreateDB method, which calls EF Core’s EnsureCreated:

private static void CreateDB()
  {
    using(var context=new ExpanseContext())
    {
      context.Database.EnsureCreated();
    }
  }

As I’m using the defaults, the first time it’s run EnsureCreated will create the new database and a single container, named Expanse­Context. If you explicitly add other containers down the road, EnsureCreated will recognize that the database and initial container already exist and just create the new container for you. I’ll look more closely at this in part two of the article.

Storing Data into a Container

Your own code’s interaction with EF Core is the same as with any other database. What’s interesting, however, is how EF Core interacts with Cosmos DB under the covers.

For example, here’s a method to create a single Consortium object, attach it to the context and call SaveChanges:

private static void AddObject () {
  var consortium = new Consortium { ConsortiumId = Guid.NewGuid (),
    Name = "Martian Congressional Republic" };
  using (var context = new ExpanseContext ()) {
    context.Consortia.Add (consortium);
    context.SaveChanges ();
  }
}

The code is the same as you’d write for a relational database provider. But in the underlying SaveChanges method, EF Core will transform that object to a JSON object and it’s the JSON data that gets stored into the database.

But before sending it to Cosmos DB, EF Core adds two special properties to the JSON object. One is a property named id whose value is a GUID. The id will ensure each item in the container has a truly unique id even if they represent different entities. The other added property is named Discriminator and contains the name of the entity type this data represents. The discriminator enables EF Core to distinguish among the entity types the items represent. EF Core is able to create these extra properties, known only to the DbContext, thanks to its shadow properties feature (bit.ly/2PjUq9k). This also means that when you query data, the shadow properties will be returned to the DbContext but ignored when materializing the entity.

The id shadow property doesn’t relieve the EF Core requirement that every entity needs a key property. The conventions are the same as always. EF Core still looks for a key property named Id or [entity]Id and, in my case, that’s the ConsortiumId property. And you can always override that convention with mappings.

If you work with relational databases, you may be used to relying on the fact that they can generate the key values for you. Cosmos DB is not a relational database and won’t populate ConsortiumId or the other keys. However, EF Core will supply values for any missing keys that are GUIDs. It doesn’t have a key generator for integers, though, and keeping track of incrementing integers is no fun. So whether you plan to supply the keys or let EF Core do it, I’d highly recommend using GUIDs. My AddObject method creating the ConsortiumId value does this, even though I don’t have a true use case for doing so since I’m not using the value elsewhere. It’s just for demonstration. If I didn’t supply that value when instantiating Consortium, EF Core would’ve done it for me. Keep in mind that if you’re using the HasData method for seeding data, just as with the other providers, you’re required to specify the values of primary and foreign key properties.

Figure 3 shows the item that was stored into the container as a result of the AddObject method. The first four properties came from EF Core. But what about the others? Cosmos DB always adds a number of metadata properties that it uses under the covers. However, when you query that data with EF Core, none of those metadata properties are returned to EF Core’s DbContext.

The Item Created for the New Consortium Displayed by the Cosmos DB Extension
Figure 3 The Item Created for the New Consortium Displayed by the Cosmos DB Extension

What About Graphs of Data?

A consortium can have one or more ships. If I create a new consortium with a ship, my code might look like this;

var consortium=new Consortium{ConsortiumId= Guid.NewGuid(),
  ConsortiumName="United Nations"};
consortium.Ships.Add(
  new Ship{ShipId=Guid.NewGuid(),ShipName="Canterbury"});

After adding the consortium graph to the context and calling SaveChanges, two new items will get added to the container, as shown in Figure 4.

Figure 4 The Items Created Based on a New Consortium Graph Containing a Ship

{
  "ConsortiumId": "09bf2c04-e951-41d7-b890-ea5bc27b5766",
  "ConsortiumName": "United Nations Thursay",
  "Discriminator": "Consortium",
  "id": "fa479b49-144f-47ee-9761-e4f6dfe94cb2",
  "_rid": "Q0wDAKsiftgBAAAAAAAAAA==",
  "_self": "dbs/Q0wDAA==/colls/Q0wDAKsiftg=/docs/Q0wDAKsiftgBAAAAAAAAAA==/",
  "_etag": "\"000058c7-0000-0000-0000-5bf80dcf0000\"",
  "_attachments": "attachments/",
  "_ts": 1542983119
}
{
  "ShipId": "581a5c65-8df7-4479-8626-9d8fd2b1c4c7",
  "ConsortiumId": "09bf2c04-e951-41d7-b890-ea5bc27b5766",
  "Discriminator": "Ship",
  "PlanetId": 0,
  "ShipName": "Canterbury 3rd",
  "id": "ebc2dcda-efb5-451b-a65d-f6fa0bb011a4",
  "Origin": null,
  "_rid": "Q0wDAKsiftgCAAAAAAAAAA==",
  "_self": "dbs/Q0wDAA==/colls/Q0wDAKsiftg=/docs/Q0wDAKsiftgCAAAAAAAAAA==/",
  "_etag": "\"000059c7-0000-0000-0000-5bf80dd00000\"",
  "_attachments": "attachments/",
  "_ts": 1542983120
}

Notice that even though I didn’t define a ConsortiumId foreign key property in the Ship class, EF Core knows it will be needed to keep track of the relationship, so it did what it’s always done by applying the foreign key value, using its knowledge that the ship object belongs to the consortium object. Depending on my business logic, I often use and control the foreign key properties in my related types. But in case you don’t, it’s nice to see that the provider will take care of that for you.

But what EF Core doesn’t do is create a single JSON document with a ship as a subdocument of the consortium. That’s because both types have key properties and are true entities in my entity data model as defined by the ExpanseContext, so they’ll always be represented as individual documents.

EF Core does, however, understand the concept of hierarchical JSON objects and you can see this in action when you use Owned Entities in your model. I’ll look more closely at owned entities with this provider in the second article.

Retrieving Data from Cosmos DB

I’ll cover one more topic in this column, with a quick look at data retrieval.

To retrieve data, you can just write LINQ queries as with any other provider. EF Core will use the Cosmos DB SQL API to transform the LINQ queries for getting the items.

And as EF Core does with other providers, any shadow properties defined explicitly by you or implicitly by the provider (such as the Discriminator and id properties), will always be returned to the context as part of the entry. That allows EF Core to materialize the correct object types and maintain the individual identities of each object.

You can see this by querying the ChangeTracker Entries after querying data from the database as I’ve done here:

private static void GetSomeDataBack () {
  using (var context = new ExpanseContext ()) {
    var consortia = context.Consortia.ToList ();
    var entries = context.ChangeTracker.Entries ().ToList ();
  }
}

By drilling into the properties of one of the entries, you can see there are five properties, even though Consortium only has two, ConsortiumId and Name. Notice there’s even metadata explaining that they’re shadow properties. One final shadow property is __jObject, which contains the string representation of the full JSON object of the entry.

You can load related data using eager loading with Include or projections, as well as explicit loading. I tested out the proxy-based lazy loading, which didn’t return the related data, and have been told that the dependency injection-based lazy loading isn’t functional at this time, either. 

The three methods in Figure 5 show working examples of loading related ships using Include, a projection and explicit loading, along with a little bit of filtering, as well. The code is no different than if you were querying against a relational database provider.

Figure 5 Loading Related Data Works Just the Same as with RDBMS Providers

private static void EagerLoadInclude () {
  using (var context = new ExpanseContext ()) {
    var consortia = context.Consortia.Include (c => c.Ships).ToList ();
  }
}
private static void EagerLoadProjection () {
  using (var context = new ExpanseContext ()) {
    var consortia =
      context.Consortia.Select (c => new { c, c.Ships }).ToList ();
  }
}
private static void ExplicitLoad () {
  using (var context = new ExpanseContext ()) {
    var consortium =
      context.Consortia.FirstOrDefault (c => c.Name.Contains ("United"));
    context.Entry (consortium).Container (c => c.Ships).Load ();
  }
}

Work with Cosmos DB Using Your Existing EF Core Knowledge

Even though Cosmos DB is a completely different type of data store—a document database that stores JSON documents—and is not at all like a relational database, you can leverage your existing knowledge of EF Core to work with it to store and retrieve data. But like any database—relational or not—you’ll still need to work outside of EF Core to make sure that you’re using the database efficiently and cost-effectively.

In part two of this article, I’ll look at some more advanced features, like configuring containers and partitions, integrating owned entities into the mix and using logging to check out some of the SQL generated from the API. And perhaps by then a new version of the preview will be available to engage with additional functionality. If you want to keep an eye on the progress of the provider, there’s a great “hit list” of features to work on and consider in the EF Core GitHub repository at bit.ly/2rmUpYN.


Julie Lerman is a Microsoft Regional Director, Microsoft MVP, software team coach and consultant who lives in the hills of Vermont. You can find her presenting on data access and other topics at user groups and conferences around the world. She blogs at the thedatafarm.com/blog and is the author of “Programming Entity Framework,” as well as a Code First and a DbContext edition, all from O’Reilly Media. Follow her on Twitter: @julielerman and see her Pluralsight courses at bit.ly/PS-Julie.

Thanks to the following Microsoft technical experts for reviewing this article: Andriy Svyryd, Diego Vega


Discuss this article in the MSDN Magazine forum