Hello, we're migrating our <GD.CN> deployments fro...
# gooddata-cn
Hello, we're migrating our GD.CN deployments from GCP/BigQuery to AWS/Snowflake and would like to migrate the GD.CN databases from our Postgres clusters (dev/stage/prod) on GCP to our corresponding Postgres clusters on AWS, including the following objects: - workspaces - datasets - visualization objects - facts - metrics - filters - any other objects that the above objects depend on (note: organizations already exist on both platforms with the same ID's) We'd like to transfer all properties on the above objects to AWS exactly as they exist on GCP, including object ID's, such that when we refer to an object by its ID, we get an exact copy of the object on AWS as it existed in GCP with the following exceptions: 1) our data sources on AWS are Snowflake data sources (not BigQuery) 2) our dataset and source column names on AWS follow Snowflake naming conventions (not BigQuery conventions) Hence, we envision the following steps: 1. upgrade our GD.CN deployment on GCP to the same version deployed on AWS so that we have identical schemas on both platforms 2. export the GD.CN Postgres database on GCP to SQL 3. update the data source info, dataset names, source column names, and other values that are specific to BigQuery to their Snowflake equivalents in the exported SQL file 4. import the updated SQL to our RDS Postgres DB on AWS, used by GD.CN Would you suggest another approach? If the above approach seems reasonable, is there anything we might be missing that we should consider? Thanks for suggestions.
Instead of doing SQL dump/restore of
database, I'd recommend to perform migration by exporting the whole organization layout, filling up missing info (DS credentials) and updating DS entities directly in the resulting JSON file. Then you can upload the whole JSON back to target AWS deployment. It will recreate the organization including all objects, including users, groups, permissions, etc. Alternatively, you may use other Declarative APIs to deliver particular resource types (Datasources, Workspaces, etc.)
@Jan Soubusta ^^^
By naming conventions in BQ vs Snowflake, do you mean only lower/upper-case difference, or anything else? I agree with Robert that dumping the DB is not a recommended approach. You can write a simple script dumping the organization layout, updating necessary fields and PUT it to the new organization. Moreover, we could help you with it by extending our Python SDK accordingly - introduce a new method for migrating organization. Would you be interested in such a new feature?
It's case differences and unfortunately also underscore differences
Our concern with the JSON approach is that the names in the LDM have changed and need to be updated by hand. Is there a way to update source datasets and column names via the JSON? Also, are the visualization objects included with the JSON? Also, will the ID's be exported with the JSON? Our use case is that we have generated PDF financial statements for our clients based on these visualization objects and we need a statement that was generated on GCP to look identical on AWS using the same visualization objects, which are embedded in the statements by ID (api-id) - we need the ID's on AWS to match the ID's on GCP.
api-ids match when using JSON/Python SDK approach. You can GET JSON from API, update it in any way, and PUT it to different ORG. All types of objects are exported, including visualizations. API-IDs are exported as well, they are mandatory, they must be specified when calling PUT. My idea with Python SDK was the following: 1. Store old/new table/column name pairs in a file 2. GET organization with Python SDK 3. Read the file from the step 1 4. Replace all old names with new names in all places in what we GET in step 2 5. PUT the result to another organization
Thank you, Jan. The api approach seems promising, looking at this now.