Missing some aggregation options when field contains 'null' values

Answered
Gerd Tautenhahn asked on June 27, 2022

Hello

Recently I found out a weird issue with available aggregations for fields that contain some 'null' values in the data set. Let's say I have a numeric field which contains following aggregations like:

- Sum
- Average
- Count
- Min
- Max

and a few more.

But the whole list is much more restricted once the data for the field contains a even one 'null' value in a given data set (especially when it happens to be the first value in a set). In that case the pivot only leaves me with two options:

- Count
- Distinct Count

What could be the reason behind it? Is there anything I can do about it? I've done some research and tried to create some fix for it, which do work, but I'm not entirely sure how that change could affect the rest of the code - probably in a bad way. What I'm almost certain is that the reason could be behind the part which is looking at the first row in the data set and tries to evaluate lots of default things for a given data source like: field types, mentioned available aggregations etc.. (GetMembers() private method in the Controller)

It is worth to mention I'm also using a custom data server based on your example here (with a few changes here and there to fit my use-case):

https://github.com/flexmonster/api-data-source

More information about my implementation, like code and so on, you can find in my previous support ticket.

3 answers

Public
Maksym Diachenko Maksym Diachenko Flexmonster June 29, 2022

Hello, Greg!

Thank you for reaching out to us.

Our example custom API server uses the GetSchema function in CubeController.cs, which is building a JSON response for the /fields request. As you correctly mentioned, data types and available aggregations are set based on the first JSON row. In our implementation, when the column's data contains `null` in the first row, the column's type is considered as a "string". The supported aggregations for fields with the "string" type are "count" and "distinctcount". That's why you see the limited aggregation set for the numeric field with null in the first row.

Currently, there are three data types available for custom API - "string", "number" and "date". The list of available aggregations differs from data type to data type (please check the /fields request doc page for more information). 

Considering the information above, we recommend changing the logic of type resolving. A possible solution may include explicitly defining the types for specific field or even changing the parser's logic.

Please let us know if further assistance is needed.

Best Regards,
Maksym

Public
Gerd Tautenhahn July 5, 2022

Hello Maksym,

Thanks for you help, I've managed to improve the evaluation of column types by extending the search of sample field values from one (first) row only to the logic which perform a deep search and looking for the nearest 'non-null' value for each unique field key in the dataset.

Best Regards

Public
Maksym Diachenko Maksym Diachenko Flexmonster July 7, 2022

Hello, 

Thank you for the update.
We are glad to hear that you found this solution.
Feel free to contact us if more questions arise.

Best Regards,
Maksym

Please login or Register to Submit Answer