Data: it's a hassle.  We need it, we need a TON of it.  We need data to ensure the data integrity is held true.  We need data to see into a users behavior.  We need data to follow cash flow.  We need data to buy a burger at McDonalds.  Data is everywhere, and it's needed everywhere.

But... data does not always need to be presented in full.  The consumption of data, manipulation of data, processing of data, and analytics tied to data are not the topic of this article.  I am writing specifically about the UI: The webpage, or the mobile app, or any other conceivable interface for a user to interact with the data.

This concept of data sculpting came to me when I was tasked with removing a blob object from a record when that blob object was not needed for presentation.  Simple: move the blob into another table and create a reference that would allow it to be directly accessed when it was needed!  Done!  That was the quickest task I have ever completed....

Wait though!  That only solves this one issue for this one field in this one table for this one project.  What about the global ability to reduce data transmission weight through an application?  Because that is really what needs to happen.

So I expaneded my view by zooming out to a lunar level view point.  As one instance that would need to be addressed, I found a table that was detrimental to an existing application that had 144 fields in it (I know!  I stated it was an existing application).  Not only were there 144 fields, but there were also almost a million records in that table, with many of the fields being varchar fields of varying length.  I found a perfect test case: Get all the records from this table through a Gateway into a RESTful service API then through the RESTful data API that is connected directly to the Database using Entitty Framework Core.

The following two code blocks were the initial implementation method.

Here is the Data API method (...api/Records):

public async Task<IActionResult> GetAsync([FromQuery] PagedModel model)
{
        var records = await _recordsRepository.GetAllAsync(model);

        return Ok(records);
}

Here is the Repository method:

public async Task<PagedList<Records>> GetAllAsync(PagedModel model)
{
        var total = ApplyFilters(_context.Records, model).Count(); // Get the total record count.
        var filtered = ApplyFilters(_context.Records, model); // Filtering applies to the IQueryable
        var sorted = ApplySort(filtered, model); // Sorting is also applied to the IQueryable
        var query = sorted.Skip((model.Page - 1) * model.Size).Take(model.Size);
        return new PagedList<dynamic>(query, total, model.Page, model.Size);
}

This code generates the following SQL:

SELECT [o].[RecordId], [o].[RecordName], [o].[FieldName], [o].[Information], [o].[Blob]
FROM [Records] AS [o]
ORDER BY [o].[RecordId]
OFFSET @__p_0 ROWS FETCH NEXT @__p_1 ROWS ONLY

This created the issue of field, record, or sets of data impacting the applications speed of operation.  While in most cases this affect would be fairly minimal, in both of my cases it became a breaking issue very quickly.

So, let's fix it.  In order to accomplish this, I did just a few things.

1 - I added a query parameter to the API of type string called fields
2 - I added an Expression method that gets passed into the Select statement of the IQueryable object
3 - I built a Data Scuplter using ExpandoObject and dynamic.
3 - I implemented the methods to call these new methods and changed the return types to be dynamic.

Here is the Expression method:

public static Expression<Func<T, T>> BuildSelect<T>(string fields)
{
        if (string.IsNullOrWhiteSpace(fields))
        {
            var baseProperties = typeof(T).GetProperties(BindingFlags.Public | BindingFlags.Instance).Select(p => p.Name);
            fields = string.Join(',', baseProperties);
        }
        var parameter = Expression.Parameter(typeof(T), "o");
        var obj = Expression.New(typeof(T));
        var bindings = fields.Split(',').Select(f => f.Trim()).Select(f => {
                var prop = typeof(T).GetProperty(f);
                var orig = Expression.Property(parameter, prop);
                return Expression.Bind(prop, orig);
        });

        var init = Expression.MemberInit(obj, bindings);
        var lambda = Expression.Lambda<Func<T, T>>(init, parameter);
        return lambda;
}

Here is the implementation of this method.

public async Task<PagedList<Record>> GetAllAsync(PagedModel model)
{
        var total = ApplyFilters(_context.Records, model).Count(); // Get the total record count.
        var filtered = ApplyFilters(_context.Records, model); // Filtering applies to the IQueryable
        var sorted = ApplySort(filtered, model); // Sorting is also applied to the IQueryable
        var query = sorted.Skip((model.Page - 1) * model.Size).Take(model.Size);
        var miniquery = query.Select(LambdaBuilder.BuildSelect<Record>(model.Fields)); // This builds the select statement to only include the fields that were passed.
        var sculptedquery = miniquery.ShapeData(model.Fields);
        return new PagedList<dynamic>(sculptedquery, total, model.Page, model.Size);
}
That allows us to call the api as: api.example.com/records?fields=RecordId,RecordName

In this example the resulting SQL is:
SELECT [o].[RecordId], [o].[RecordName]
FROM [Records] AS [o]
ORDER BY [o].[RecordId]
OFFSET @__p_0 ROWS FETCH NEXT @__p_1 ROWS ONLY

 

That's it!  One of the concerns that I have was that I did not want to innundate an architecture with a ton of DTO objects or countless end points.

Hope you enjoyed this bit of implementation fun.

© A long time ago in a galaxy far, far away through 2018 Copyright Brandon Korous