Hello,
I recently implemented the Data server API using the "Flexmonster.DataServer.Core" package(version 2.9.61). I noticed that it seems every time we refresh the indexes that the memory increases. Is there anything special I need to do to ensure that the old cached data for the index is released when the index is refreshed? I attached a picture of memory usage. You can see that every 5 hours the memory usage increases which is when the cache is refreshed. The increases also line up with the logs for "Index <name> was reloaded in <seconds> seconds".
We are dynamically adding indexes with something like this and setting the refresh:
var orgIndex = new OrganizationIndex(index, id);
if (!_dataSourceOptions.CurrentValue.Indexes.ContainsKey(orgIndex.ToString()))
{
var query = <query string>
var indexOption = new DatabaseIndexOptions("postgresql", _env.GetString("CONNECTION"), query.Value);
indexOption.RefreshTime = 300;
_dataSourceOptions.CurrentValue.Indexes.Add(orgIndex.ToString(), indexOption);
}
We only add ~4 indexes so the number of indexes is not increasing and these are added right after the server starts up.
Are there any settings I can adjust to make sure the stale cached data is released on refresh? We can restart the server occasionally as a short term fix but that wouldn't be a sustainable long term solution.
Thanks,
Brian
Hello, Brian!
Thank you for reaching out to us.
Please note that FDS DLL releases the cache after each refresh; no additional commands should be added. The attached graph may indicate that the server's garbage collector did not remove the cached data. This may occur since the memory usage reaches only 60 percent at peak.
To ensure that the cache is cleared, you can try to force the garbage collector into debug mode and check if it clears the memory.
Looking forward to hearing your feedback.
Best Regards,
Maksym
Thanks for the reply! I did do some additional testing and it does seem inline with the GC just not releasing memory if it doesn't need to.
Thanks,
Brian
Maksym,
I wanted to follow up on this thread with an unrelated callout. I had put a comment back on this ticket a few months ago: https://www.flexmonster.com/question/flex-data-server-authorization-and-or-parameters-in-an-index/ about FlexMonster offering a configurable data server. Ultimately, we wound up needing to build the wrapper around the .Net DLL so that our non.NET application server and web application could take advantage of the data server without having to roll our own (which would include us having to re-implement all the default aggregations) from scratch.
Our data server implement is thin but adds a concept of dynamic indexes external authorization that other FlexMonster customers could take advantage of. Just let us know if your team would be interested in taking a look at our implementation.
Hello,
Thank you for your feedback.
Our team will conduct research regarding the possible memory leak in Flexmonster Data Server DLL. We will get back to you with the results, ETA Feb 6th.
In the meantime, could you please provide us with a code sample of your server implementation? This would greatly help our research and provide valuable insight into your use case of Flexmonster Data Server DLL.
We are looking forward to hearing from you.
Best Regards,
Maksym
Well, depending on the answer we get on this thread: https://www.flexmonster.com/question/handling-large-datasets/ , we may be pivoting our approach to handle very large data sets more efficiently.
Hello, Bill!
Thank you for your reply.
We look forward to your feedback regarding which solution you would choose for working with big data.
Best Regards,
Maksym
Hello, Bill!
Thank you for reaching out to us.
After conducting research, we could not find any memory leaks connected to refresh in FDS DLL. We have tested the refresh with the 150MB dataset containing 5 million rows. The difference in used RAM before and after the reload is 325KB, meaning that the cache is released after refresh.
Looking forward to hearing your feedback.
Best Regards,
Maksym
Makysm,
We are loading data sets that are 10-20GB large. When we load them from a custom parquet reader we implemented, we only see a nominal bump in memory with a full release. When we use the FlexMonster SQLReader to load data in, we see it holding on to 5x the size of the data file. We have decompiled the DLLs and are going to inject a few fixes to see if we can make some recommendations on a more memory efficient loading strategy.
-Bill
Hello, Bill!
Thank you for your reply.
We highly appreciate your willingness to research and find potential improvements to Flexmonster Data Server DLL.
Please feel free to reach out to us if you require further assistance.
Best Regards,
Maksym
Hi Maksym,
Finally coming back to provide an update on this ticket. There is 3x memory utilization in your implementation of the database parser. We decompiled the Flexmonster library, and identified that boxing of value types was converting them to reference types, which was bloating memory usage. In addition, your code was NOT interning string values from the dataset, so for any string usage with high duplication, Flexmonster stored each duplicate vs consolidating those down to a single string reference in memory. This also added to memory bloat. We didn't have either of these issue with our custom parquet reader, which is why its memory utilization was far less than FlexMonster's SQL Reader. (P.S. I'd also love to see you guys add Parquet as a supported format, but we can make that a separate ticket/discussion if you want to see our implementation. With Parquet, you can lazy load which is a huge benefit)
We have changed your implementation and reduced the memory consumption for loading large data sets by 2/3rds (A 30 GB dataset went from 90GB memory usage down to 30GB memory usage, which is what we would have expected originally).
We are more than happy to share both the memory usage fix for your database parser, and our parquet implementation. We didn't want to just attach to this ticket directly without your prior permission. We can share directly if you would prefer that instead.
We await your response 🙂
-Bill
Hello, Bill!
Hope this message finds you well.
We highly appreciate your willingness to share the results of your work. Our developers are very interested in examining both the memory usage fix for the database parser and your Parquet implementation. We kindly request that you provide the code via inbox.
We look forward to hearing back from you soon. Thank you again for your contributions.
Best Regards,
Maksym
Sorry what is Inbox? Is that name of your ticketing system or are you saying you want us to email it over? If so, what email address should we send it to?
-Bill
Hello, Bill!
We apologize for the confusion; we wanted you to provide us with the code via email. Our representative has contacted you with a request for the implementation code.
Best Regards,
Maksym
Hello, Bill!
Thank you for sharing your implementation with us.
We have passed the code on to our developers so they can research the proposed improvements. We will get back to you with the research result on or before April 15th.
Please stay tuned for further updates.
Best Regards,
Maksym