Hi! We'd like to know what's the best practices wr...
# gooddata-platform
p
Hi! We'd like to know what's the best practices wrt structuring a multi-tenant self-hosted GoodData environment. Do you advise us to use separate workspaces + connections for each of our tenants, or to use a parent workspace with a central connection and child workspaces with filters on the data of the central connection? Here are some relevant factors you may find useful to answer the question: • We will use source version control & the GD API to do automation/CI/CD to template & deploy workspaces, connections, data models and dashboards. Data models (physical, logical) will be identical across all tenants; dashboards may differ on a tenant basis, but a common, base dashboard will be made for all tenants to begin with. • We query directly from BigQuery; the structure of the datasets will reflect either choice (one global dataset if we go for parent & child workspace, one separate dataset per tenant if we use one workspace per tenant). • We are interested in whitelabelling, SSO & embedding capabilities for all of the at least one workspaces • We would like to prioritize a separate workspace approach as it allows us to create more granular access control policies and could help reduce leaks if a security breach is discovered on any of the running instances of GoodData.CN. Thanks for your answer!
d
Hi Philippe! Your setup seems like a great fit for the parent-child WS hiearchy. From functional perspective the deciding factor would be that all your tenantes (child WS) share the same LDM and base analytical layer (insight, dashboards). This allows you to employ common engineering practices by having only single source of truth for metadata in GD.CN (the parent WS). Also you can limit your tenants' access to BQ data by means of WS data filters (as you already probably know.) I see that your main issue with this setup is ACLs and flexibility in handling of security breaches. Could you please elaborate more on these points? Also note that we plan to improve WS authorization support greatly in the next release 1.6.
p
Hi David! Thanks for your answer 😊 Indeed, our main reservations about parent-child WS hierarchy is that if a single instance of GD.CN is compromised and an attacker gain access to the sole service account key that GD.CN requires to access all the data, all of our data across all of our tenants is compromised. As such so far our plan was to use a flat, multi WS hierarchy controlled via source version control, with one service account per tenant. That way as we grow we can cap the number of service accounts per machine at the expense of higher number of containers. This being said if we are able to route requests related to certain accounts to the appropriate container, we could reduce the damage significantly depending on the nature of the attack. I'm no security buff, so this reasoning may be faulty, but that's what I concluded. Maybe you have ways to ensure that the service account keys cannot be compromised even if the container is compromised? If so I'd be happy to learn more about it, and I could build a parent-child WS hierarchy with confidence in the security measures in place.
r
Hello Phillipe, in case of single GoodData.CN deployment with parent-child WS hierarchy (that would be really convenient for your use case), the compromised BQ SA key leads to gaining access to all datasets, because this single deployment must have access to all datasets. Running one GoodData.CN per tenant, with BQ SA key limited to dataset access for this particular tenant would make much more secure, reducing blast radius of leaked key only to affected dataset. As you wrote, at the expense of higher amount of containers, CPUs, memory, and maintenance overhead. If you do not need to see aggregated data from your tenants, you won't loose too much functionality if you decide to go this way. Alternatively, you may mix both approaches - standalone deployments for "high-security-level" customers, shared deployments for "regular" customers... You can even monetize this approach 😉
p
Hey Robert, Thanks for your answer! I'll evaluate the complexity of the two main options and make a decision. Cheers, Philippe