June 2019

Volume 34 Number 6

[ASP.NET Core 3.0]

AI-Powered Biometric Security in ASP.NET Core

By Stefano Tempesta

This article, in two parts, introduces the policy-based authorization model in ASP.NET Core 3, which aims to decouple authorization logic from the underlying user roles. It presents a specific example of this authorization process based on biometric information, such as face or voice recognition. In this case, access to a building is restricted when an unauthorized intrusion is detected. The severity of the intrusion is assessed by an anomaly detection service built into Azure Machine Learning.

Site Access

The context is an extremely secured site—think of a military area, or a hospital, or a datacenter. Access is restricted to authorized people, with some limitations. The following steps describe the security flow enforced at the door of each building to check people in:

  1. A person requesting access to a building swipes their access pass on the door’s card reader.
  2. Cameras detect motion and capture the face and body of the person; this should prevent the use of a printed photo, for example, to trick the camera with face-only recognition.
  3. The card reader and cameras are registered as Internet of Things (IoT) devices and stream recorded data to Azure IoT Hub.
  4. Microsoft Cognitive Services compares the person against a database of people authorized to access the building.
  5. An authorization flow matches the biometric information collected by the IoT devices with the identity of the person on the access pass.
  6. An Azure Machine Learning service is invoked to assess the risk level of the access request, and whether it’s an unauthorized intrusion.
  7. Authorization is granted by an ASP.NET Core Web API by checking for specific policy requirements owned by the profile defined in the previous steps.

If there’s a mismatch between the detected identity of the person and the access pass, access to the site is blocked immediately. Otherwise, the flow continues by checking whether any of the following anomalies have been encountered:

  • Atypical frequency of access to the building.
  • Whether the person has exited the building earlier (check out).
  • Number of accesses permitted per day.
  • Whether the person is on duty.
  • Criticality of the building (you may not want to restrict access to a canteen, but enforce a stricter policy for access to a server datacenter).
  • Whether the person is bringing someone or something else along.
  • Past occurrences of similar access typologies to the same building.
  • Risk-level changes measured in the past.
  • Number of intrusions detected in the past.

The anomaly detection service runs in Azure Machine Learning and returns a score, expressed as a likelihood that the access is a deviation from the standard, or not. The score is expressed in a range between zero and one, where zero is “no risk detected,” all good, full trust granted; and one is “red alert,” block access immediately! The risk level of each building determines the threshold that’s considered acceptable for allowing access to the building for any value greater than zero.

Authorization in ASP.NET Core

ASP.NET Core provides a simple authorization declarative role and a rich policy-based model. Authorization is expressed in requirements, and handlers evaluate a user’s claims against those requirements. For the purpose of authorizing users to access a site, I’ll describe how to generate custom policy requirements and their authorization handler. For more information about the authorization model in ASP.NET Core, please refer to the documentation at bit.ly/2UYZaJh.

As noted, a custom policy-based authorization mechanism consists of requirements and (typically) an authorization handler. Granting access to a building consists of invoking an API that unlocks the entry door. IoT devices stream biometric information to an Azure IoT Hub, which in turn triggers the verification workflow by posting the site ID, a unique identifier of the site. The Web API POST method simply returns an HTTP code 200 and a JSON message with user name and site ID if the authorization is successful. Otherwise, it throws the expected HTTP 401 Unauth­orized Access error code. But let’s go in order: I begin with the Startup class of the Web API, specifically the ConfigureServices method, which contains the instructions for configuring the necessary services to run the ASP.NET Core application. The auth­orization policies are added by calling the AddAuthorization method on the services object. The AddAuthorization method accepts a collection of policies that must be possessed by the API function when invoked for authorizing its execution. I need only one policy in this case, which I call “AuthorizedUser.” This policy, however, has several requirements to meet, which reflect the biometric characteristics of a person I want to verify: face, body and voice. The three requirements are each represented by a specific class that implements the IAuthorizationRequirement interface, as shown in Figure 1. When listing the requirements for the AuthorizedUser policy, I also specify the confidence level required for meeting the requirement. As I noted earlier, this value, between zero and one, expresses the accuracy of the identification of the respective biometric attribute. I’ll get back to this later when discussing biometric recognition with Cognitive Services.

Figure 1 Configuration of the Authorization Requirements in the Web API

public void ConfigureServices(IServiceCollection services)
{
  var authorizationRequirements = new List<IAuthorizationRequirement>
  {
    new FaceRecognitionRequirement(confidence: 0.9),
    new BodyRecognitionRequirement(confidence: 0.9),
    new VoiceRecognitionRequirement(confidence: 0.9)
  };
  services
    .AddAuthorization(options =>
    {
      options.AddPolicy("AuthorizedUser", policy => policy.Requirements =
        authorizationRequirements);
    })

The AuthorizedUser authorization policy contains multiple authorization requirements, and all requirements must pass in order for the policy evaluation to succeed. In other words, multiple authorization requirements added to a single authorization policy are treated on an AND basis.

The three policy requirements that I implemented in the solution are all classes that implement the IAuthorizationRequirement interface. This interface is actually empty; that is, it doesn’t dictate the implementation of any method. I’ve implemented the three requirements consistently by specifying a public ConfidenceScore property for capturing the expected level of confidence that the recognition API should meet for considering this requirement successful. The FaceRecognitionRequirement class looks like this:

public class FaceRecognitionRequirement : IAuthorizationRequirement
{
  public double ConfidenceScore { get; }
  public FaceRecognitionRequirement(double confidence) =>
    ConfidenceScore = confidence;
}

Similarly, the other requirements for body and voice recognition are implemented, respectively, in the BodyRecognitionRequirement and VoiceRecognitionRequirement classes.

Authorization to execute a Web API action is controlled through the Authorize attribute. At its simplest, applying AuthorizeAttri­bute to a controller or action limits access to that controller or action to any authenticated user. The Web API that controls access to a site exposes a single access controller, which contains only the Post action. This action is authorized if all requirements in the specified “AuthorizedUser” policy are met:

[ApiController]
public class AccessController : ControllerBase
{
  [HttpPost]
  [Authorize(Policy = "AuthorizedUser")]
  public IActionResult Post([FromBody] string siteId)
  {
    var response = new
    {
      User = HttpContext.User.Identity.Name,
      SiteId = siteId
    };
    return new JsonResult(response);
  }
}

Each requirement is managed by an authorization handler, like the one in Figure 2, which is responsible for the evaluation of a policy requirement. You can choose to have a single handler for all requirements, or separate handlers for each requirement. This latter approach is more flexible as it allows you to configure a gradient of authorization requirements that you can easily configure in the Startup class. The face, body and voice requirement handlers extend the AuthorizationHandler<TRequirement> abstract class, where TRequirement is the requirement to be handled. Because I want to evaluate three requirements, I need to write a custom handler that extends AuthorizationHandler for FaceRecognitionRequirement, BodyRecognitionRequirement and VoiceRecognitionRequirement each. Specifically, the HandleRequirementAsync method, which determines whether an authorization requirement is met. This method, as it’s asynchronous, doesn’t return a real value, except to indicate that the task has completed. Handling authorization consists of marking a requirement as “successful” by invoking the Succeed method on the authorization handler context. This is actually verified by a “recognizer” object, which uses the Cognitive Services API internally (more in the next section). The recognition action, performed by the Recognize method, obtains the name of the identified person and returns a value (score) that expresses the level of confidence that the identification is more (value closer to one) or less (value closer to zero) accurate. An expected level was specified in the API setup. You can tune this value to whatever threshold is appropriate for your solution.

Figure 2 The Custom Authorization Handler

public class FaceRequirementHandler :
  AuthorizationHandler<FaceRecognitionRequirement>
{
  protected override Task HandleRequirementAsync(
    AuthorizationHandlerContext context,
      FaceRecognitionRequirement requirement)
  {
    string siteId =
      (context.Resource as HttpContext).Request.Query["siteId"];
    IRecognition recognizer = new FaceRecognition();
    if (recognizer.Recognize(siteId, out string name) >=
      requirement.ConfidenceScore)
    {
      context.User.AddIdentity(new ClaimsIdentity(
        new GenericIdentity(name)));
      context.Succeed(requirement);
    }
    return Task.CompletedTask;
  }
}

Besides evaluating the specific requirement, the authorization handler also adds an identity claim to the current user. When an identity is created, it may be assigned one or more claims issued by a trusted party. A claim is a name-value pair that represents what the subject is. In this case, I’m assigning the identity claim to the user in context. This claim is then retrieved in the Post action of the Access controller and returned as part of the API’s response.

The last step to perform to enable this custom authorization process is the registration of the handler within the Web API. Handlers are registered in the services collection during configuration:

services.AddSingleton<IAuthorizationHandler, FaceRequirementHandler>();
services.AddSingleton<IAuthorizationHandler, BodyRequirementHandler>();
services.AddSingleton<IAuthorizationHandler, VoiceRequirementHandler>();

This code registers each requirement handler as a singleton using the built-in dependency injection (DI) framework in ASP.NET Core. An instance of the handler will be created when the application starts, and DI will inject the registered class into the relevant object.

Face Identification

The solution uses the Azure Cognitive Services for Vision API to identify a person’s face and body. For more information about Cognitive Services and details on the API, please visit bit.ly/2sxsqry.

The Vision API provides face attribute detection and face verification. Face detection refers to the ability to detect human faces in an image. The API returns the rectangle coordinates of the location of the face within the processed image, and, optionally, can extract a series of face-related attributes such as head pose, gender, age, emotion, facial hair, and glasses. Face verification, in contrast, performs an authentication of a detected face against a person’s pre-saved face. Practically, it evaluates whether two faces belong to the same person. This is the specific API I use in this security project. To get started, please add the following NuGet package to your Visual Studio solution: Microsoft.Azure.Cognitive­Services.Vision.Face 2.2.0-preview

The .NET managed package is in preview, so make sure that you check the “Include prerelease” option when browsing NuGet, as shown in Figure 3.

NuGet Package for the Face API
Figure 3 NuGet Package for the Face API

Using the .NET package, face detection and recognition are straightforward. Broadly speaking, face recognition describes the work of comparing two different faces to determine if they’re similar or belong to the same person. The recognition operations mostly use the data structures listed in Figure 4.

Figure 4 Data Structures for the Face API

Name Description
DetectedFace This is a single face representation retrieved by the face detection operation. Its ID expires 24 hours after it’s created.
PersistedFace When DetectedFace objects are added to a group (such as FaceList or Person), they become PersistedFace objects, which can be retrieved at any time and do not expire.
FaceList/LargeFaceList This is an assorted list of PersistedFace objects. A FaceList has a unique ID, a name string and, optionally, a user data string.
Person This is a list of PersistedFace objects that belong to the same person. It has a unique ID, a name string and, optionally, a user data string.
PersonGroup/LargePersonGroup This is an assorted list of Person objects. It has a unique ID, a name string and, optionally, a user data string. A PersonGroup must be trained before it can be used in recognition operations.

The verification operation takes a face ID from a list of detected faces in an image (the DetectedFace collection) and determines whether the faces belong to the same person by comparing the ID against a collection of persisted faces (PersistedFace). Persisted face images that have a unique ID and a name identify a Person. A group of persons can, optionally, be gathered in a PersonGroup in order to improve recognition performance. Basically, a person is a basic unit of identity and the person object can have one or more known faces registered. Each person is defined within a particular PersonGroup—a collection of people—and the identification is done against a PersonGroup. The security system would create one or more PersonGroup objects and then associate people with them. Once a group is created, the PersonGroup collection must be trained before an identification can be performed using it. Moreover, it has to be retrained after adding or removing any person, or if any person has their registered face edited. The training is done by the PersonGroup Train API. When using the client library, this is simply a call to the TrainPersonGroupAsync method:

await faceServiceClient.TrainPersonGroupAsync(personGroupId);

Training is an asynchronous process. It may not be finished even after the TrainPersonGroupAsync method returns. You may need to query the training status with the GetPersonGroupTrainingStatusAsync method until it’s ready before progressing with face detection or verification.

When performing face verification, the Face API computes the similarity of a detected face among all the faces within a group, and returns the most comparable person(s) for that test face. This is done through the IdentifyAsync method of the client library. The test face needs to be detected using the aforementioned steps, and the face ID is then passed to the Identify API as a second argument. Multiple face IDs can be identified at once, and the result will contain all the Identify results. By default, Identify returns only one person that matches the test face best. If you prefer, you can specify the optional parameter maxNumOfCandidatesReturned to let Identify return more candidates. The code in Figure 5 demonstrates the process of identifying and verifying a face:

Figure 5 The Face Recognition Process

public class FaceRecognition : IRecognition
{
  public double Recognize(string siteId, out string name)
  {
    FaceClient faceClient = new FaceClient(
      new ApiKeyServiceClientCredentials("<Subscription Key>"))
    {
      Endpoint = "<API Endpoint>"
    };
    ReadImageStream(siteId, out Stream imageStream);
    // Detect faces in the image
    IList<DetectedFace> detectedFaces =
      faceClient.Face.DetectWithStreamAsync(imageStream).Result;
    // Too many faces detected
    if (detectedFaces.Count > 1)
    {
      name = string.Empty;
      return 0;
    }
    IList<Guid> faceIds = detectedFaces.Select(f => f.FaceId.Value).ToList();
    // Identify faces
    IList<IdentifyResult> identifiedFaces =
      faceClient.Face.IdentifyAsync(faceIds, "<Person Group ID>").Result;
    // No faces identified
    if (identifiedFaces.Count == 0)
    {
      name = string.Empty;
      return 0;
    }
    // Get the first candidate (candidates are ranked by confidence)
    IdentifyCandidate candidate =
      identifiedFaces.Single().Candidates.FirstOrDefault();
    // Find the person
    Person person =
      faceClient.PersonGroupPerson.GetAsync("", candidate.PersonId).Result;
    name = person.Name;
    return candidate.Confidence;
  }

First, you need to obtain a client object for the Face API by passing your subscription key and the API endpoint. You can obtain both values from your Azure Portal where you provisioned the Face API service. You then detect any face visible in an image, passed as a stream to the DetectWithStreamAsync method of the client’s Face object. The Face object implements the detection and verification operations for the Face API. From the detected faces, I ensure that only one is actually detected, and obtain its ID—its unique identifier in the registered face collection of all authorized people to access that site. The IdentifyAsync method then performs the identification of the detected face within a PersonGroup, and returns a list of best matches, or candidates, sorted by confidence level. With the person ID of the first candidate, I retrieve the person name, which is eventually returned to the Access Web API. The face authorization requirement is met.

Voice Recognition

The Azure Cognitive Services Speaker Recognition API provides algorithms for speaker verification and speaker identification. Voices have unique characteristics that can be used to identify a person, just like a fingerprint. The security solution in this article uses voice as a signal for access control, where the subject says a pass phrase into a microphone registered as an IoT device. Just as with face recognition, voice recognition also requires a pre-enrollment of authorized people. The Speaker API calls an enrolled person a “Profile.” When enrolling a profile, the speaker’s voice is recorded saying a specific phrase, then a number of features are extracted and the chosen phrase is recognized. Together, both extracted features and the chosen phrase form a unique voice signature. During verification, an input voice and phrase are compared against the enrollment’s voice signature and phrase, in order to verify whether they’re from the same person and the phrase is correct.

Looking at the code implementation, the Speaker API doesn’t benefit from a managed package in NuGet like the Face API, so the approach I’ll take is to invoke the REST API directly with an HTTP client request and response mechanism. The first step is to instantiate an HttpClient with the necessary parameters for auth­entication and data type:

public VoiceRecognition()
{
  _httpClient = new HttpClient();
  _httpClient.BaseAddress = new Uri("<API Endpoint>");
  _httpClient.DefaultRequestHeaders.Add("Ocp-Apim-Subscription-Key",
    "<Subscription Key>");
  _httpClient.DefaultRequestHeaders.Accept.Add(
     new MediaTypeWithQualityHeaderValue("application/json"));
}

The Recognize method in Figure 6 develops in several steps, as follows: After obtaining the audio stream from the IoT device at the site, it attempts to identify that audio against a collection of enrolled profiles. Identification is coded in the IdentifyAsync method. This asynchronous method prepares a multipart request message that contains the audio stream and the identification profile IDs, and submits a POST request to a specific endpoint. If the response from the API is HTTP Code 202 (Accepted), then the value returned is a URI of the operation that runs in the background. This operation, at the identified URI, is checked by the Recognize method every 100 ms for completion. When it succeeds, you’ve obtained the profile ID of the identified person. With that ID, you can proceed with the verification of the audio stream, which is the final confirmation that the recorded voice belongs to the identified person. This is implemented in the VerifyAsync method, which works similarly to the IdentifyAsync method except that it returns a VoiceVerificationResponse object, which contains the profile of the person, and thus their name. The verification response includes a confidence level, which is also returned to the Access Web API, as with the Face API.

Figure 6 Voice Recognition

public double Recognize(string siteId, out string name)
{
  ReadAudioStream(siteId, out Stream audioStream);
  Guid[] enrolledProfileIds = GetEnrolledProfilesAsync();
  string operationUri =
    IdentifyAsync(audioStream, enrolledProfileIds).Result;
  IdentificationOperation status = null;
  do
  {
    status = CheckIdentificationStatusAsync(operationUri).Result;
    Thread.Sleep(100);
  } while (status == null);
  Guid profileId = status.ProcessingResult.IdentifiedProfileId;
  VoiceVerificationResponse verification =
    VerifyAsync(profileId, audioStream).Result;
  if (verification == null)
  {
    name = string.Empty;
    return 0;
  }
  Profile profile = GetProfileAsync(profileId).Result;
  name = profile.Name;
  return ToConfidenceScore(verification.Confidence);
}

I want to add a few additional comments about this API, indicating how it differs from the Face API. The voice verification API returns a JSON object that contains the overall result of the verification operation (Accept or Reject), confidence level (Low, Normal or High) and the recognized phrase:

{
  "result" : "Accept", // [Accept | Reject]
  "confidence" : "Normal", // [Low | Normal | High]
  "phrase": "recognized phrase"
}

This object is mapped to the VoiceVerificationResponse C# class for convenience of operation within the VerifyAsync method, but its level of confidence is expressed as a text:

public class VoiceVerificationResponse
{
  [JsonConverter(typeof(StringEnumConverter))]
  public Result Result { get; set; }
  [JsonConverter(typeof(StringEnumConverter))]
  public Confidence Confidence { get; set; }
  public string Phrase { get; set; }
}

The Access Web API, instead, expects a decimal value (double data type) between zero and one, so I specified some numeric values for the Confidence enumeration:

public enum Confidence
{
  Low = 1,
  Normal = 50,
  High = 99
}

I then converted these values to double before returning to the Access Web API:

private double ToConfidenceScore(Confidence confidence)
{
  return (double)confidence / 100.0d;
}

Wrapping Up

That’s it for this first part, in which I discussed the overall site access security flow, and covered the implementation of the authorization mechanism in ASP.NET Core Web API using custom policies and requirements. I then illustrated face and voice recognition, using the relevant Cognitive Services API, as a mechanism to restrict access based on biometric information of pre-authorized, or enrolled, person profiles. In the second part of this article, I’ll go through the data streaming from the IoT devices as a triggering point for requesting access, and the final confirmation from the Access API to unlock (or lock!) the access door. I’ll also cover a machine learning-based anomaly detection service that will run on any access attempt to identify its risk.

The source code for this initial part of the solution is available on GitHub at bit.ly/2IXPZCo.


Stefano Tempesta is a Microsoft Regional Director, MVP on AI and Business Applications, and member of Blockchain Council. A regular speaker at international IT conferences, including Microsoft Ignite and Tech Summit, Tempesta’s interests extend to blockchain and AI-related technologies. He created Blogchain Space (blogchain.space), a blog about blockchain technologies, writes for MSDN Magazine and MS Dynamics World, and publishes machine learning experiments on the Azure AI Gallery (gallery.azure.ai).

Thanks to the following Microsoft technical expert who reviewed this article: Barry Dorrans


Discuss this article in the MSDN Magazine forum