Article
01/31/2019

April 2016

Volume 31 Number 4

[Data Points]

Handling the State of Disconnected Entities in EF

Julie Lerman Disconnected data is an old problem that precedes Entity Framework and, for that matter, most data access tools. It’s never been an easy one to solve. The server sends data down the wire, not knowing what may happen to it in the client app that requested it, not even knowing if it will return. Then, suddenly, some data reappears in a request. But is it the same data? What was it up to in its absence? Did anything happen to it? Is it completely new data? So much to worry about!

As .NET developers, you’ve likely seen patterns for solving this problem. Remember ADO.NET DataSets? Not only did they contain your data, but they encapsulated all of the change state information for each row and each column. This wasn’t limited to “it was modified” or “it is new”; the original data was kept, as well. When we started building ASMX Web services, it was so easy to serialize a dataset and send it down the wire. If that message went to a .NET client, that client could deserialize the dataset and continue to keep track of changes. When it was time to return the data to the service, you would just serialize it again and then, on the server side, deserialize it back into a dataset with all of that lovely change-tracking information intact to be easily persisted to the database. It worked. It was so easy. But it entailed such enormous amounts of data going back and forth across the wire. Not just the data bits, but the structure of the dataset getting serialized created big fat XML.

The size of the serialized message going back and forth across the wire was only one problem. The beauty of Web services was that you could provide services to a variety of platforms, but the message itself was meaningful only to another .NET application. In 2005, Scott Hanselman wrote a great wake-up call to the problem, epically titled, “Returning DataSets from WebServices Is the Spawn of Satan and Represents All That Is Truly Evil in the World” (bit.ly/1TlqcB8).

All of that state information on the wire disappeared when Entity Framework replaced DataSets as the primary data access tool in .NET. Rather than being stored with the data, change-tracking information—original value, current value, state—was stored by EF as part of the ObjectContext. But still, in the first iteration of EF, a serialized entity was a cumbersome message due to its need to inherit from the EF EntityObject type. But the message going back and forth across the wire with the entity data had lost its understanding of state. Those of us who were used to the overloaded DataSet freaked out. Those who were already familiar with handling disconnected state were upset for another reason—the EntityObject base class requirement. Eventually that problem won the EF team’s attention (a very good turn of events) and with the next iteration, EF4, EF had evolved to Plain Old CLR Object (POCO) support. This meant that the ObjectContext could maintain the state of a simple class with no need for that class to inherit from EntityObject.

But with EF4, the disconnected state problem did not go away. EF had no clue about the state of an entity it was not able to track. People familiar with DataSets expected EF to provide the same magical solution and were unhappy about having to choose between a lightweight message and disconnected change tracking. In the meantime, developers (including me) had explored a lot of ways to inform a server about what happened to the data while it was on its walkabout. You could re-read the data from the database and let EF do a comparison to work out what had changed, if anything. You could make presumptions such as “if the identity key value is 0, it must be new.” You could troll around in the low-level APIs to write code to make discoveries about state and act upon them. I did a lot of that back in the day, but none of those solutions were satisfying.

When EF4.1 came out with its lighter-weight DbContext, it had a gift from the EF team—the ability to easily inform the context about the state of the entity. With a class that inherits from DbContext, you can write code such as:

myContext.Entity(someEntity).State=EntityState.Modified;

When someEntity is new to the context, this forces the context to begin tracking the entity and, at the same time, specify its state. That’s enough for EF to know what type of SQL command to compose upon SaveChanges. In the preceding example, it would result in an UPDATE command. Entry().State doesn’t help with the problem of knowing the state when some data comes over the wire, but it does allow you to implement a nice pattern that’s now in wide use by developers using Entity Framework, which I’ll lay out further along in this article.

Even though the next version of Entity Framework—EF Core (the framework formerly known as EF7) will bring more consistency for working with disconnected graphs, the pattern you’ll learn in this article should still be useful in your bag of tricks.

The problem with disconnected data escalates as graphs of data get passed back and forth. One of the biggest problems is when those graphs contain objects of mixed state—with the server having no default way of detecting the varying states of entities it has received. If you use DbSet.Add, the entities will all get marked Added by default. If you use DbSet.Attach, they’ll be marked Unchanged. This is the case even if any of the data originated from the database and has a key property populated. EF follows the instructions, that is, Add or Attach. EF Core will give us an Update method, which will follow the same behavior as Add, Attach and Delete, but mark the entities as Modified. One exception to be aware of is that if the DbContext is already tracking an entity, it won’t overwrite the entity’s known state. But with a disconnected app, I wouldn’t expect the context to be tracking anything prior to connecting the data returned from a client.

Testing the Default Behavior

Let’s clarify the default behavior in order to highlight the problem. To demonstrate, I’ve got a simple model (available in the download) with a few related classes: Ninja, NinjaEquipment and Clan. A Ninja can have a collection of NinjaEquipment and be associated with a single Clan. The test that follows involves a graph with a new Ninja and a pre-existing, un-edited Clan. Keep in mind that I would normally assign a value to Ninja.ClanId to avoid confusion with reference data. In fact, setting foreign keys rather than navigation properties is a practice that can help you avoid a lot of problems due to the “magic” of EF working out state across relationships. (See my April 2013 column [bit.ly/20XVxQi], “Why Does Entity Framework Reinsert Existing Objects into My Database?” to learn more about that.) But I’m writing the code this way to demonstrate the behavior of EF. Notice that the clan object has its key property, Id, populated to indicate that it’s pre-existing data that came from the database:

[TestMethod]
public void EFDoesNotComprehendsMixedStatesWhenAddingUntrackedGraph() {
  var ninja = new Ninja();
  ninja.Clan = new Clan { Id = 1 };
  using (var context = new NinjaContext()) {
    context.Ninjas.Add(ninja);
    var entries = context.ChangeTracker.Entries();
    OutputState(entries);
    Assert.IsFalse(entries.Any(e => e.State != EntityState.Added));
  }
}

My OutputState method iterates through DbEntityEntry objects where the context retains the state information for each tracked entity and prints out the type and value of its State.

In the test, I emulate the scenario that, somewhere, I’ve created a new Ninja and associated it with the existing Clan. The clan is simply reference data and has not been edited. Then I create a new context and use the DbSet.Add method to tell EF to track this graph. I assert that none of the entities being tracked are anything but Added. When the test passes, it proves the context didn’t comprehend that the Clan was Unchanged. The test output tells me that EF thinks both entities are Added:

Result StandardOutput:
Debug Trace:
EF6WebAPI.Models.Ninja:Added
EF6WebAPI.Models.Clan:Added

As a result, calling SaveChanges will insert both the Ninja and the Clan, resulting in a duplicate of the Clan. If I had used the DbSet.Attach method instead, both entities would be marked Unchanged and SaveChanges wouldn’t insert the new Ninja into the database, causing real problems with the data persistence.

Another common scenario is retrieving a Ninja and its Equipment from the database and passing them to a Client. The Client then edits one of the pieces of equipment and adds a new one. The true state of the entities is that the Ninja is Unchanged, one piece of Equipment is Modified and another is Added. Neither DbSet.Add nor DbSet.Attach will comprehend the varying states without some help. So now it’s time to apply some help.

Informing EF of Each Entity’s State

The simple recipe for helping EF comprehend the correct state of each entity in a graph consists of a four-part solution:

Define an enum representing possible object states.
Create an interface with an ObjectState property defined by the enum.
Implement the interface in the domain entities.
Override the DbContext SaveChanges to read object state and inform EF.

EF has an EntityState enum with the enumerators Unchanged, Added, Modified and Deleted. I’ll create another enum to be used by my domain classes. This one mimics those four states, but has no ties to the Entity Framework APIs:

public enum ObjectState
{
  Unchanged,
  Added,
  Modified,
  Deleted
}

Unchanged is first so it will be the default. If you want to specify the values, be sure that Unchanged is equal to zero (0).

Next, I’ll create an interface to expose a property to track the state of objects using this enum. You might prefer to create a base class or add this to a base class you’re already using:

public interface IObjectWithState
{
  ObjectState State { get; set; }
}

This State property is for in-memory use only and doesn’t need to be persisted to the database. I’ve updated the NinjaContext to ensure that the property is ignored for any objects that implement it:

protected override void OnModelCreating(DbModelBuilder modelBuilder) {
  modelBuilder.Types<IObjectWithState>().Configure(c => c.Ignore(p=>p.State));
}

With the interface defined, I can implement it in my classes, for example, in the Ninja class shown in Figure 1.

Figure 1 Ninja Class Implementing IObjectState

public class Ninja : IObjectWithState
{
  public Ninja() {
    EquipmentOwned = new List<NinjaEquipment>();
  }
  public int Id { get; set; }
  public string Name { get; set; }
  public bool ServedInOniwaban { get; set; }
  public Clan Clan { get; set; }
  public int ClanId { get; set; }
  public List<NinjaEquipment> EquipmentOwned { get; set; }
  public ObjectState State { get; set; }
}

With my default ObjectState enum defined as Unchanged, every Ninja will begin Unchanged and anyone coding with the Ninja class will be responsible for setting the State value as needed.

If relying on the client to set state is a problem, another approach, which is influenced by Domain-Driven Design practices, can ensure that the Ninja object is more involved in its behavior and state. Figure 2 shows a much more richly defined version of the Ninja class. Note that:

The Create factory methods both set the State to Added.
I’ve hidden the setters of the properties.
I’ve created methods to change properties where the State is set to Modified if it isn’t a new Ninja (that is, the state isn’t already set to Added).

Figure 2 A Smarter Ninja Class

public class Ninja : IObjectWithState
{
  public static RichNinja CreateIndependent(string name, 
   bool servedinOniwaban) {
    var ninja = new Ninja(name, servedinOniwaban);
    ninja.State = ObjectState.Added;
    return ninja;
  }
  public static Ninja CreateBoundToClan(string name,
    bool servedinOniwaban, int clanId) {
    var ninja = new Ninja(name, servedinOniwaban);
    ninja.ClanId = clanId;
    ninja.State = ObjectState.Added;
    return ninja;
  }
  public Ninja(string name, bool servedinOniwaban) {
    EquipmentOwned = new List<NinjaEquipment>();
    Name = name;
    ServedInOniwaban = servedinOniwaban;
  }
  // EF needs parameterless ctor for queries
  private Ninja(){}
  public int Id { get; private set; }
  public string Name { get; private set; }
  public bool ServedInOniwaban { get; private set; }
  public Clan Clan { get; private set; }
  public int ClanId { get; private set; }
  public List<NinjaEquipment> EquipmentOwned { get; private set; }
  public ObjectState State { get; set; }
  public void ModifyOniwabanStatus(bool served) {
    ServedInOniwaban = served;
    SetModifedIfNotAdded();
  }
  private void SetModifedIfNotAdded() {
    if (State != ObjectState.Added) {
      State = ObjectState.Modified;
    }
  }
  public void SpecifyClan(Clan clan) {
    Clan = clan;
    ClanId = clan.Id;
    SetModifedIfNotAdded();
  }
  public void SpecifyClan(int id) {
    ClanId = id;
    SetModifedIfNotAdded();
  }
  public NinjaEquipment AddNewEquipment(string equipmentName) {
    return NinjaEquipment.Create(Id, equipmentName);
  }
  public void TransferEquipmentFromAnotherNinja(NinjaEquipment equipment) {
    equipment.ChangeOwner(this.Id);
  }
  public void EquipmentNoLongerExists(NinjaEquipment equipment) {
    equipment.State = ObjectState.Deleted;
  }
}

I’ve modified the NinjaEquipment type to be richer, as well, and you can see that I benefit from that in the AddNew, Transfer and NoLongerExists equipment methods. The modification ensures that the foreign keys pointing back to the Ninja are persisted correctly or, in the case of equipment being destroyed, that it gets deleted completely from the database according to the business rules of this particular domain. Tracking relationship changes when reconnecting graphs to EF is a little trickier, so I like that I can keep tight control over the relationships at the domain level. For example, the ChangeOwner method sets the State to Modified:

public NinjaEquipment ChangeOwner(int newNinjaId) {
  NinjaId = newNinjaId;
  State = ObjectState.Modified;
  return this;
}

Now, whether the client explicitly sets the state or uses classes like this (or similarly coded classes in the language of the client) on the client side, the objects passed back into the API or service will have their state defined.

Now it’s time to leverage that client-side state in the server-side code.

Once I connect the object or object graph to the context, the context will need to read the state of each object. This ConvertState method will take an ObjectState enum and return the matching EntityState enum:

public static EntityState ConvertState(ObjectState state) {
  switch (state) {
    case ObjectState.Added:
      return EntityState.Added;
    case ObjectState.Modified:
      return EntityState.Modified;
    case ObjectState.Deleted:
      return EntityState.Deleted;
    default:
      return EntityState.Unchanged;
  }
}

Next, I need a method in the NinjaContext class to iterate through the entities—just before EF saves the data—and update the context’s understanding of each entity’s state according to the State property of the object. That method, shown here, is called FixState:

public class NinjaContext : DbContext
{
  public DbSet<Ninja> Ninjas { get; set; }
  public DbSet<Clan> Clans { get; set; }
  public void FixState() {
    foreach (var entry in ChangeTracker.Entries<IObjectWithState>()) {
      IObjectWithState stateInfo = entry.Entity;
      entry.State = DataUtilities.ConvertState(stateInfo.State);
    }
  }
}

I considered calling FixState from inside of SaveChanges so it would be totally automated, but there could be side effects in a number of scenarios. For example, if you use your IObjectState entities in a connected application that doesn’t bother setting the local state, FixState will always revert entities to Unchanged. It’s better to leave it as a method to be executed explicitly. In “Programming Entity Framework: DbContext,” a book I co-authored with Rowan Miller, we discuss some additional edge cases that might be of interest.

Now, I’ll create a new version of the previous test that uses these new features, including the richer versions of my classes in the test. The new test asserts that EF comprehends mixed states for a brand-new Ninja tied to an existing Clan. The test method prints out the EntityState before and after calling NinjaContext.FixState:

[TestMethod]
public void EFComprehendsMixedStatesWhenAddingUntrackedGraph() {
  var ninja = Ninja.CreateIndependent("julie", true);
  ninja.SpecifyClan(new  Clan { Id = 1, ClanName = "Clan from database" });
  using (var context = new NinjaContext()) {
    context.Ninjas.Add(ninja);
    var entries = context.ChangeTracker.Entries();
    OutputState(entries);
    context.FixState();
    OutputState(entries);
    Assert.IsTrue(entries.Any(e => e.State == EntityState.Unchanged));
}

The test passes and the output shows that the FixState method applied the proper state to the Clan. If I were to call SaveChanges, that Clan wouldn’t be reinserted into the database by mistake:

Debug Trace:
Before:EF6Model.RichModels.Ninja:Added
Before:EF6Model.RichModels.Clan:Added
After:EF6Model.RichModels.Ninja:Added
After:EF6Model.RichModels.Clan:Unchanged

Using this pattern also solves the problem of the Ninja graph I discussed earlier where the Ninja might not have been edited and any number of changes (inserts, modifications or deletes) made to the equipment. Figure 3 shows a test that checks to see if EF correctly identifies that one of the entries is modified.

Figure 3 Testing State of Children in a Graph

[TestMethod]
public void MixedStatesWithExistingParentAndVaryingChildrenisUnderstood() {
  // Arrange
    var ninja = Ninja.CreateIndependent("julie", true);
    var pNinja =new PrivateObject(ninja);
    pNinja.SetProperty("Id", 1);
    var originalOwnerId = 99;
    var equip = Create(originalOwnerId, "arrow");
  // Act
    ninja.TransferEquipmentFromAnotherNinja(equip);
    using (var context = new NinjaContext()) {
      context.Ninjas.Attach(ninja);
      var entries = context.ChangeTracker.Entries();
      OutputState(entries);
      context.FixState();
      OutputState(entries);
  // Assert 
    Assert.IsTrue(entries.Any(e => e.State == EntityState.Modified));
  }
}

The test passes and the output shows that the original Attach method resulted in all objects marked Unchanged. After calling FixState, the Ninja is Unchanged (which is still correct), but the equipment object is correctly set to Modified:

Debug Trace:
Before:EF6Model.RichModels.Ninja:Unchanged
Before:EF6Model.RichModels.NinjaEquipment:Unchanged
After:EF6Model.RichModels.Ninja:Added
After:EF6Model.RichModels.NinjaEquipment:Modified

What About EF Core?

Even as I move to EF Core, I’ll keep this pattern in my toolbox. Great strides have been made toward simplifying the problems of disconnected graphs—mostly along the lines of providing consistent patterns. In EF Core, setting state using DbContext.Entry().State property will only ever set the state of the root of the graph. This will be advantageous in many scenarios. Additionally, there’s a new method called TrackGraph that will “walk the graph,” hitting every entity within, and apply a specified function to each method. The most obvious function is one that simply sets state:

context.ChangeTracker.TrackGraph(Samurai_GK,
  e => e.Entry.State = EntityState.Added);

Imagine having that function be one that uses the aforementioned FixState method to apply the EF state based on the ObjectState set on the client side.

Rich Domain Models Simplify Controlling State in the Client

While I prefer building the richer domain classes that update the state as needed, you can achieve the same results with simple CRUD classes as long as the client using the classes explicitly sets the states. With a manual method, however, you’ll have to pay closer attention to modified relationships, ensuring you account for the foreign key modifications.

I’ve been using this pattern for years, and sharing it in books, at conferences, with clients and in my Pluralsight courses. And I know it’s being happily used in many software solutions. Whether you’re using EF5 or EF6, or gearing up for EF Core, this recipe should remove a huge layer of pain related to your disconnected data.

Self-Tracking Entities

Another feature of EF4.1 was a T4 template that generated “self-tracking entities,” which turned those newly freed POCOs back into weighed-down beasts. Self-tracking entities were designed specifically for scenarios where Windows Communication Foundation (WCF) services served data to .NET clients. I was never a fan of self-tracking entities and was happy when they quietly disappeared from EF. However, some developers relied on them. And there are some APIs that will give you these benefits. For example, Tony Sneed built a lighter-weight implementation called “trackable entities,” which you can find at trackableentities.github.io. IdeaBlade (ideablade.com) has a rich history of solving disconnected data problems with its flagship product, DevForce, which includes EF support. IdeaBlade took that knowledge and created the free and open source Breeze.js and Breeze# products, which provide client- and server-side state tracking, as well. I’ve written about Breeze previously in this column, in the December 2012(bit.ly/1WpN0z3) and April 2014 issues (bit.ly/1Ton1Kg).

Julie Lerman is a Microsoft MVP, .NET mentor and consultant who lives in the hills of Vermont. You can find her presenting on data access and other .NET topics at user groups and conferences around the world. She blogs at thedatafarm.com/blog and is the author of “Programming Entity Framework,” as well as a Code First and a DbContext edition, all from O’Reilly Media. Follow her on Twitter: @julielerman and see her Pluralsight courses at juliel.me/PS-Videos.

Thanks to the following Microsoft technical expert for reviewing this article: Rowan Miller

Discuss this article in the MSDN Magazine forum