To use your data in GoodData, you need to physically load the data into a workspace. A logical data model (LDM) that describes the data needs to be defined in the workspace before data can be loaded. Once the model is ready, you need to define a mapping between your source data and the workspace's logical data model.
The mapping tells the platform which fields from your source data to load into which fields in the LDM. In this article, we’ll discuss two different ways to map the data.
Naming convention vs. explicit mapping
Naming convention (also called “output stage”)
The first type is auto-mapping through a naming convention. This basically means that you have to prepare a special set of tables or views (so-called “output stage”) in your data source that follows a very specific naming convention, which allows the platform to auto-map these columns on the datasets and columns of the LDM. For example, the below-pictured LDM would auto-map to the below-pictured table via the naming conventions.
You can learn more about the output stage and naming conventions here.
The second option is to explicitly define which fields in the LDM map to which fields in the source data.
That basically means filling out a source table name and column names for each LDM dataset, as shown in the picture below:
If you created your datasets from CSV files or data warehouse tables, the mapping could be already partially prepopulated for you.
You can learn more about explicit mapping here.
Each dataset can be mapped to exactly one table/file in your source data in both cases. You cannot, for example, load one column from one table and the rest from another. Similarly, no transformations - like group by’s - can be applied in the mapping. The goal is to ensure a good performance of the load. You can still though map the datasets to views instead of tables if you need to include some lightweight transformations.
Which one to use?
We typically recommend using the explicit mapping option. It allows you to keep using whatever naming conventions you already might have established in your source data, allowing you to keep consistent field naming throughout your whole data pipeline. It is also more flexible and less likely to lead to a hard to debug mistakes.