We have a 1 million row data source (csv) with ~ 10 columns. It is about 110MB. The analysis phase of the client side implementation was painfully slow: ~ 6min, but after that, pivoting was fast. We switched to server side. The nodejs version loaded data quickly, but the query phase was very slow: ~1,5 min. We switched to the dotnet core version, which turned to be fast: ~ 200ms. This surprised us, because node can't be that slower. After investigating it a little bit we realized that the groupBy function was the problematic part, because of the concat function: https://github.com/flexmonster/api-data-source/blob/master/server-nodejs/api/cube.js#L492. I rewrote it to simple push and it worked perfectly. I use Node 10.15.3.
I don't know how the frontend implementation has been made, but I suspect, that the code is similar to what is implemented in node. If this could be fixed on the frontend, than we may not need a server.
Thanks,
Peter
Hello, Peter,
Thank you for pointing to the bottleneck in the Node.js server.
Still, we want to explain that both Node.js and .NET Core servers are samples that only serve as a demonstration. They were developed to be used as a reference for the custom data source API implementation.
In its turn, the client-side component itself is not connected with a Node.js server in any way and is highly optimized. Parsing 110MB of the data for 6 minutes seems to exceed the expected outcome significantly.
Therefore, we suggest checking whether Flexmonster takes the whole time. We assume the majority of this time may be dedicated to loading the CSV dataset over the network. You can use the "Network" tab of the browser's developer tools to check how fast the data is loaded. Another approach is to download the CSV file in advance and try opening it directly from the local system.
Finally, we want to note that 100-150MB is a recommended upper limit for the plain CSV data source. We suggest using our implementation of the custom data source API called Flexmonster Data Server for bigger datasets. The Data Server is a server-side utility similar to the mentioned Node.js and .NET versions. However, unlike them, it gets regular updates and new features. Currently, it is the best way to connect to large datasets in different formats: CSV, JSON, databases.
The Data Server keeps all the parsing and heavy computation on the server and passes the data to the client's browser in a ready-to-show format. It removes the need to pass and store the whole dataset on the client's machine. Instead, it will receive small pieces of the data needed for the current slice.
This approach allows operating with significantly larger datasets without loading delays.
Please let us know if it helps.
Feel free to contact us in case other questions arise.
Kind regards,
Illia