Hello We have two <http GD CN|GD CN> instances set up a dev GoodData #gooddata-cn

Hello! We have two <GD.CN> instances set up: a "de...

Maia Hansen

05/20/2024, 11:00 PM

Hello! We have two GD.CN instances set up: a "dev" instance and a "prod" instance. These are in different environments but are otherwise the same (same LDM, same backing data, same users, same workspaces, etc.). We would like to build out some new charts and dashboards on our dev instance for review, and then use the Python SDK to script the copying of them over to our live production site, rather than having to recreate them individually by hand. In other words, given a particular dashboard's ID, we'd like to recreate all of its insights in a different environment, then recreate the dashboard itself. (It can be safely assumed that neither the insights nor the dashboard exists in the destination environment.) We're pretty sure we can figure out how to do this by digging through the Python SDK docs, but we thought we'd check first and see if anyone has code like this already written and would be willing to share? Thanks very much!

Jan Kadlec

05/21/2024, 4:32 AM

Hello Maia, My colleague @Jan Soubusta prepared a blueprint of e2e data pipeline where he describes how to safely deliver to multiple environments. https://medium.com/gooddata-developers/data-pipeline-as-code-journey-of-our-blueprint-99912b1485d2 There is an open source repo as a part of the blueprint.

👀 1

Jan Soubusta

05/21/2024, 3:34 PM

Regarding JSON parsing - that is something you do not have to do anyway. Python SDK loads models from API and populate them to Python data classes. Regarding your use case - "traverse it to identify the things we want to copy" - how exactly do you plan to identify these things?

Jan Soubusta

05/21/2024, 3:36 PM

Generally speaking - your use case is more than valid. Once we describe all requirements here, we may build a new function(s) in Python SDK for you and anyone else. Btw. Python SDK is open source, so we can even collaborate on that if you are willing to invest your time. Advantage for you would be that we will be maintaining it for you forever, applying changes/new features, etc. 😉

Maia Hansen

05/21/2024, 4:24 PM

Thank you, @Jan Soubusta! Regarding how to identify -- what we're imagining is something like this: 1. Find the ID for the dashboard we want to promote (e.g. by copying that ID from the URL when viewing the dashboard) 2. Retrieve that dashboard definition from DEV based on its ID (e.g. using

get_entity_analytical_dashboards

) 3. Find the IDs for all of the insights on that dashboard (e.g. look through the

items

looking for

insight

objects, and get their `id`s from the dashboard definition) 4. Get each of those insights (e.g. with

get_entity_visualization_objects

) 5. (If needed, look in the insights for metrics etc. and retrieve those, as far down as we need to -- in our case, we know we have all the needed metrics, LDM is the same, IDs are going to be the same, and so forth, so all we care about are dashboard definitions and insight definitions and we only need to worry about dashboard definitions and insight definitions.) 6. Check to see if the IDs for the dashboard and insights exist in the PROD analytics model 7. If not, add each of the insights to the PROD analytics model 8. Finally, add the dashboard to the PROD analytics model That's all that we need, but I can imagine a more generic use case that is sure to cover all of the bases could be a lot more complex. We have the advantage that we know ahead of time what will and won't exist already in PROD, so we can make assumptions (e.g. that all the needed metrics exist) that others might not be able to make. Thank you again for the info! We'll take a closer look at the python SDK and will follow up with whatever we come up with.

Jan Soubusta

05/21/2024, 6:04 PM

The process you described could be quite complex. Here is an alternative: If I understand you well, dashboard IDs to be moved to PROD are selected manually. So let's say you create a TXT file containing the list of dashboard IDs you want to migrate. If file does not exist, you can bootstrap it from the state of the PROD workspace. Then you can: • read Txt File • get_declarative_analytics_model + get_declarative_ldm for DEV workspace • get_declarative_analytics_model + get_declarative_ldm for PROD workspace • get_dependent_entities_graph for DEV workspace Now you have all metadata you need to generate a new LDM/ADM definitions for PROD workspace. Important is the dependency graph collected with get_dependent_entities_graph - you can identify all dependencies from dashboard down to LDM. I can see various strategies how to implement it, e.g.: • Copy dashboards and related objects to a new PROD LDM/ADM • Delete dashboards and related objects from DEV definitions, which are not in the TXT file

Maia Hansen

05/21/2024, 8:59 PM

Thank you for the tip! Regarding

get_dependent_entites_graph

, do you have any pointers to documentation or examples of its use? It's not obvious to us how to navigate it efficiently, and the documentation at https://www.gooddata.com/docs/python-sdk/latest/workspace-content/workspace-content/get_dependent_entities_graph/ doesn't go into very much detail. Maybe we're not using it correctly? It seems to return just a flat list of all of the entities for the entire workspace (as nodes), and then a flat list of pairs of dependencies (as edges). I'm not sure how that makes finding the dependencies for the dashboard significantly easier? There isn't an obvious way to do quick lookups by ID within the graph's lists of nodes and edges. It seems like we would still need to loop over all of the dependencies to locate the ones where one of the `CatalogEntityIdentifier`s contains the dashboard ID we're looking for, then loop over them again to locate the ones with the ID of that dependency, and keep doing loops through the full list like that until we identify all of them. Again, for our use case, we're already certain that the LDMs and metrics match between the two environments. What we need to publish are just the dashboards and the visualizations. This would definitely be more challenging if we didn't know how deep the dependencies need to go! Fortunately we are not in that situation. So, it seems like it would be easier for our use case to start with the analytical model, find the section that describes the dashboard, and retrieve the visualization IDs that are already nested underneath that. But again, we might not be using the graph correctly, so please let me know if there is better documentation or an example somewhere of its use! We also tried using

get_dependent_entities_graph_from_entry_points

as described here, and gave it the dashboard information as its entry point, in hopes that it would limit the returned information to just those entities that are required by that particular dashboard. However, the result contained nothing but a single node (the dashboard we had requested) and zero edges. Thanks again for the help! I think our use case is probably specialized enough that it doesn't require the more generic solution you're proposing, but it sounds like a great future feature request for the SDK. 🙂

Maia Hansen

05/21/2024, 10:29 PM

For reference, here's our first pass at this -- note that this hasn't been fully tested, but seems to basically do what we're trying to do. The big assumption is that everything other than the dashboards and insights is otherwise the same in prod as in dev -- i.e. that the ONLY new things created in dev are dashboards and insights. This assumption probably isn't true for other people, so I doubt this is generally applicable! We'd love to hear suggestions or other approaches -- we mostly did this with just trial and error and inspecting the SDK source code, so it's definitely not the right way to do it! 😄 But it seems like it will do what we need for now. Thanks again for the help!

Copy code

# In this example, we're copying the list of dashboard IDs that are in `dashboards_to_copy`
# from the "dev_sdk" and "source_workspace_id" into the "test_sdk" and "test_workspace_id".
# The big assumption is that the LDMs and analytics models are exactly the same between
# "dev" and "test" apart from these dashboards and their insights -- specifically, that the
# two environments contain the same users, metrics, attributes, etc. -- and also that all of the
# IDs are the same between the two environments.
#
# TODO: Publish to the prod connection / workspace instead of the test connection / workspace
# (test is just for, well, testing)
# TODO: Check to see if objects exist in the dest_analytics_model before appending them;
# if they do exist, replace them instead of appending.
# TODO: All sorts of error checking, including trapping errors where the assumption that
# everything else already exists is false.

dest_analytics_model = test_sdk.catalog_workspace_content.get_declarative_analytics_model(
    workspace_id = test_workspace_id
)
source_analytics_model = dev_sdk.catalog_workspace_content.get_declarative_analytics_model(
    workspace_id = source_workspace_id
)

source_visualizations = source_analytics_model.analytics.visualization_objects
source_dashboards = source_analytics_model.analytics.analytical_dashboards
source_filter_contexts = source_analytics_model.analytics.filter_contexts

for dashboard_id in dashboards_to_copy:
    insights_to_copy = []
    
    # Get details about the dashboard
    dashboard = dev_sdk.client.entities_api.get_entity_analytical_dashboards(
        workspace_id=source_workspace_id,
        object_id=dashboard_id
    )

    # Append the dashboard's filter context to the new analytics model
    # TODO: Make sure it doesn't already exist! If it does, update it.
    filter_context_id = dashboard.data.attributes.content['filterContextRef']['identifier']['id']
    for filter_context in source_filter_contexts:
        if filter_context.id == filter_context_id:
            dest_analytics_model.analytics.filter_contexts.append(filter_context)

    # For each section in the dashboard, find the insights and add their IDs to our list
    # of insight IDs to copy
    for section in dashboard.data.attributes.content['layout']['sections']:
        for item in section['items']:
            if item['widget']['type'] == 'insight':
                insights_to_copy.append(item['widget']['insight']['identifier']['id'])
    
    # Now that we have a list of insight IDs to copy, find their full objects in the
    # source visualizations and append to the new analytics model
    # TODO: Make sure they don't already exist! If they do, update them.
    for viz in source_visualizations:
        if viz.id in insights_to_copy:
            dest_analytics_model.analytics.visualization_objects.append(viz)

# Finally, now that we've copied over insights and filter contexts,
# append the dashboards themselves to the new analytics model
# TODO: Make sure they don't already exist! If they do, update them.
for dashboard in source_dashboards:
    if dashboard.id in dashboards_to_copy:
        dest_analytics_model.analytics.analytical_dashboards.append(dashboard)

# And publish the new replacement analytics model
test_sdk.catalog_workspace_content.put_declarative_analytics_model(
    workspace_id = test_workspace_id,
    analytics_model = dest_analytics_model
)

Jan Soubusta

05/22/2024, 6:21 AM

Seems like

get_dependent_entities_graph_from_entry_points

is building the graph in the direction which is opposite to what you need - bottom-up. Yes, you need to "loop" through the graph. But, there are libraries which can help you a lot with that - e.g. networkx. It provides a lot of functions for various graph operations. Now I understand your use case (no metrics/LDM). Your script is generally OK for the job. I am not sure if you need to call

get_entity_analytical_dashboards

for each dashboard to copy. You already have detail definitions of dashboards in

source_dashboards

, right? More validations are needed, but you know that and will make it more robust without me 😉

😊 1

🙏 1

57 Views

Open in Slack

Previous Next