.NET April 10, 2021
6 min read

Entity Framework Core 5 — Split Queries

A new feature on Entity Framework Core 5, it’s the possibility to split a query into multiple SQL queries. On EF Core 3.0 for example, a single query was generated for each LINQ query, but now on EF Core 5, there is the possibility to split the query. In this article, I will explain the difference between them.

To show the differences between the default query and the split query, I’ve created a console application using .NET Core 5.0, and I’ve installed the packages:

  • Microsoft.EntityFrameworkCore.Design 5.0.3
  • Microsoft.EntityFrameworkCore.SqlServer 5.0.3
  • Microsoft.EntityFrameworkCore.Tools 5.0.3

On the Startup class, in the method OnConfiguring, I enabled the option to Log into the console all the queries generated by EF Core:

protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
{
    const string strConnection = "Data source=(localdb)\\mssqllocaldb; Initial Catalog=EntityFrameworkCore5Examples;Integrated Security=true;pooling=true;";
    optionsBuilder
        .UseSqlServer(strConnection)
        .EnableSensitiveDataLogging()
        .LogTo(Console.WriteLine, LogLevel.Information);
}

For demonstration purpose, let’s consider a scenario where we have two entities with a one-to-many relationship (1:N): “Student” and “Course”, where a Course can have many Students, and a Student can only have one Course.

There are already some data that will be created into the database when the application runs:

Course: Information Systems
		Student: John
		Student: Mark
Course: Computer Science
		Student: Luke

Without Split Query

In this example, we are creating a LINQ query using Include, without using the Slipt query, to return the courses and the students:

static void QueryWithoutSplitQuery()
{
    using var db = new ApplicationContext();

    var courses = db.Courses
        .Include(p => p.Students)
        .ToList();

    foreach (var course in courses)
    {
        Console.WriteLine($"Course: {course.Name}");

        foreach (var student in course.Students)
        {
            Console.WriteLine($"\tStudent: {student.Name}");
        }
    }
}

In this case, EF Core will create a single query where it will be returned the information for the Course and for the Students:

SELECT [c].[Id], [c].[Name], [s].[Id], [s].[CourseId], [s].[Name]
FROM [Courses] AS [c]
LEFT JOIN [Students] AS [s] ON [c].[Id] = [s].[CourseId]
ORDER BY [c].[Id], [s].[Id]

The result of this query will be the following:

Id  Name                  Id  CourseId   Name
1   Information Systems   1     1        John
1   Information Systems   2     1        Mark
2   Computer Science      3     2        Luke

As we can see, the property “Name” was returned twice, because there are two students registered in this course. Now think that for more complex queries with more columns and more relations, this amount of duplicated data can be a big problem.

With Split Query

Now with EF Core 5, we have the possibility to use the method “AsSplitQuery”, to split the query:

static void QueryWithSplitQuery()
{
    using var db = new ApplicationContext();

    var courses = db.Courses
        .Include(p => p.Students)
        .AsSplitQuery()
        .ToList();

    foreach (var course in courses)
    {
        Console.WriteLine($"Course: {course.Name}");

        foreach (var student in course.Students)
        {
            Console.WriteLine($"\tStudent: {student.Name}");
        }
    }
}

In this case, when using the AsSplitQuery method, EF Core will create two queries:

  • In the first query, it will be returned only the information related to the Courses
  • In the second query, it will be returned only the information related to the Students

The queries will be like this:

SELECT [c].[Id], [c].[Name]
FROM [Courses] AS [c]
ORDER BY [c].[Id]

SELECT [s].[Id], [s].[CourseId], [s].[Name], [c].[Id]
FROM [Courses] AS [c]
INNER JOIN [Students] AS [s] ON [c].[Id] = [s].[CourseId]
ORDER BY [c].[Id]

The result of this query will be the following:

// First query:
Id   Name
1    Information Systems
2    Computer Science

// Second query:
Id  CourseId   Name   Id
1     1        John   1
2     1        Mark   1
3     2        Luke   2

As we can see, with SpliQuery, the “Name” of the course was returned only once. This means that even if we have more complex queries with more columns and more relations, the data will not be duplicated, which will result in a much more performant query.

Configuring Split Query Globally

It’s also possible to enable SplitQuery globally, on the OnConfiguring method in the Startup class, using the extension method “UseQuerySplittingBehaviour”:

protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
{
    const string strConnection = "Data source=(localdb)\\mssqllocaldb; Initial Catalog=EntityFrameworkCore5Examples;Integrated Security=true;pooling=true;";
    optionsBuilder
        .UseSqlServer(strConnection, p => p.UseQuerySplittingBehavior(QuerySplittingBehavior.SplitQuery))
        .EnableSensitiveDataLogging()
        .LogTo(Console.WriteLine, LogLevel.Information);
}

When configuring it globally, it’s not necessary to use the method “AsSplitQuery” in the LINQ query anymore.

Advantages of using the Split Query

  • Split query avoids performance issues associated with JOINs
  • A problem that the Split query solves, is the problem known as “cartesian explosion”. An example of this problem is when a blog has multiple related posts, and the rows for these posts will duplicate the blog’s information. The more one-to-many relationships are loaded, the amount of duplicated data may grow and adversely affect the performance of your application.

Disadvantages of using the Split Query

  • There are no guarantees for data consistency for multiple queries, unlike when there is only a single query. When using multiple queries, if the database is updated concurrently when executing the queries, the resulting data may not be consistent.
  • Each query currently implies an additional roundtrip to the database, and multiple network roundtrips can degrade performance.
  • Some databases allow only a single query to be active at any given point, so all results from earlier queries must be buffered in your application’s memory before executing later queries, which leads to increased memory requirements.

Conclusion

Having a single SQL query for each LINQ query ensures consistency of the data that will be returned, however, this can become very slow when the query uses Include or a projection to bring back multiple related collections. Split into multiple SQL queries, can** significantly improve the performance** of the application, but* can result in inconsistency* in the returned results if the data changes between the two queries. So this is something that you always need to consider when thinking about using the split query method.

You can check the code of this project here: https://github.com/henriquesd/EntityFrameworkCore5Examples

If you like this solution, I kindly ask you to give a ⭐️ in the repository.

Thanks for reading!


References

Microsoft Docs — What’s New in EF Core 5.0 — Split Queries

Microsoft Docs — Split Queries

Desenvolvedor IO — Mastering the Entity Framework Core