Python coding within a visualisation is it possible to sele GoodData #gooddata-platform

Python coding within a visualisation is it possib...

Gorden Pfitzmann

08/23/2022, 3:40 PM

Python coding within a visualisation is it possible to select data wihin a visual, then code python (similar to Juypter notebook) against it, and output this into the visual ?

Jan Kos

08/23/2022, 3:53 PM

Hi Gorden, you can utilize our python SDK https://www.gooddata.com/developers/cloud-native/doc/cloud/api-and-sdk/python-sdk/

Jan Soubusta

08/23/2022, 4:06 PM

Focus on gooddata-pandas library there. You can build visualization using standard python libs like plotly. Obviously, you can work in your favorite IDE like Jupyter. If you would like to combine our UI.SDK(our nice Highchart visualizations) with Python SDK, it is also possible, but you would have to implement a little bit more complex solution. If you would be interested into this use case, let me know.

Gorden Pfitzmann

08/23/2022, 4:20 PM

what I was thinking of is something like: using data from the data-modell as a dataframe with pythonin GD enviroment, executing that python stuff in GD, and outputing that in my dashboard as a visual, in other words, my dashbaord has 10 GD visuals + 1 visual in GD with python code (if this makes sense), example: the user opens the dashboard with let's assume two visuals, first visual is pure GD, let's say a pie chart having turnover by country for male/female the second visual is a statistical test, using that data in GD, applying python code (like a macro), and outputing the statistical-test results in a visual (has male/female a significant influence on turnover)

Jan Soubusta

08/24/2022, 6:49 AM

This is very interesting use case. Recently, my colleague @Jan Kadlec wrote an article about MindsDB, and I think this use case is sort of similar. If I understand your use case precisely, you have two options (now): • Store the result of python statistics back to data source and include it into GD model, even connect it with other datasets, so you can put it into different contexts (dimensions, filters) • Do not utilize our Highcharts, instead create all visualizations directly in a Python framework. Rerun statistics code every time visualizations are requested. Most important question here is: how often do you need to (re)run the python statistics code and how resource demanding is it? If I understand it well, you need to do it only when underlying data are changed. So when you run an ELT process and change the data, you invalidate caches in GD and you need to re-run the python statistics code. So it seems that the python statistics code could be a part of your ELT pipeline, something like "post-process stage", which would happen after you invalidate caches in GD: 1. Finish ELT 2. Invalidate GD caches 3. Run python script, store results into data source 4. Invalidate GD caches again This is obviously not optimal (invalidate caches twice). In the future, we plan to enable cache invalidation per table, not per whole data source like now. It would optimize this scenario. Second option is to run the python code when creating visualizations. If it is not resource demanding, it could work. If it is, you could cache it - 1. Cache GD insight result 2. Cache the result of the python code 3. When GD results is changed (GD caches invalidated), invalidate python code cache and rerun the python code. Let me know your thoughts.

Gorden Pfitzmann

08/25/2022, 8:38 AM

thanks for the feedback. if i understood correctly, the python-GD-framework allows to bring all GD visuals + data from GD to python ? But it doesn't allow to bring python-to-GD, because then there'd need to be a code-interface ? Being able to add computations on top of visuals in the dashboarding allows great flexiblility, easy to use, and trying-out things. Bringing this to the ETL-pipeline is more expensive, reduces the users to use it and increases the waiting time. Let's assume a cross-tab is simply drag/drop into a visual then this data from this cross-tab could be used to compute a chi-quared-test. my main idea was to stay in the GD-interface with the one datasource, bring those standard calculations to the dahsboarding (like aggreagating or computing ratios etc,), if this make sense, because the ETL pipeline starts growing when we addd further + further stuff

Jan Soubusta

08/26/2022, 7:57 AM

Totally understand. Today, you can do it with pure Python, but you have to build visualization with Python libs. However, creating a hybrid app joining our python SDK with our UI SDK should not be that complicated. cc: @Patrik Braborec @Jiri Zajic @Jan Kadlec your thoughts here? Example code snippets? 😉

Open in Slack

Previous Next