This is very interesting use case.
Recently, my colleague
@Jan Kadlec wrote an
article about MindsDB, and I think this use case is sort of similar.
If I understand your use case precisely, you have two options (now):
• Store the result of python statistics back to data source and include it into GD model, even connect it with other datasets, so you can put it into different contexts (dimensions, filters)
• Do not utilize our Highcharts, instead create all visualizations directly in a Python framework. Rerun statistics code every time visualizations are requested.
Most important question here is:
how often do you need to (re)run the python statistics code and how resource demanding is it?
If I understand it well, you need to do it only when underlying data are changed.
So when you run an ELT process and change the data, you invalidate caches in GD and you need to re-run the python statistics code.
So it seems that the python statistics code could be a part of your ELT pipeline, something like "post-process stage", which would happen after you invalidate caches in GD:
1. Finish ELT
2. Invalidate GD caches
3. Run python script, store results into data source
4. Invalidate GD caches again
This is obviously not optimal (invalidate caches twice).
In the future, we plan to enable cache invalidation per table, not per whole data source like now. It would optimize this scenario.
Second option is to run the python code when creating visualizations.
If it is not resource demanding, it could work.
If it is, you could cache it -
1. Cache GD insight result
2. Cache the result of the python code
3. When GD results is changed (GD caches invalidated), invalidate python code cache and rerun the python code.
Let me know your thoughts.