Count is one of the aggregation functions mostly used. However, it is a MAQL function commonly misunderstood that is the same Count function used in SQL. In this article, we will explore the MAQL function and understand how to use this function on the GoodData Platform.
The count function is an aggregation function that counts the distinct value of the selected attribute. In the GoodData Platform, the count function has a different syntax to accommodate different LDM designs. Count can be used in the following syntax as mentioned in the documentation:
Count(attribute), Count(attribute, primary_key), Count(attribute) using primary_key
In his article, we will explore how we can use Count(attribute), Count(attribute, primary_key) for different designs of LDM, purposes, and situations. We will start with the first example to explain how to use Count(attribute):
We have a small sales data set of a hypothetical small retail company; the data set consists of details of transactions. Before building the MAQL query, we shall first look at the Logical Data Model (LDM) to understand the relationship among the attributes and facts in the data set. Below is Diagram 1, which shows the LDM of this hypothetical small retail company.
Diagram 1: LDM
The LDM tells you the relationship between attributes and facts about the data sets. The count function highly depends on this model; it is very useful when you are trying to find a primary key when using count(attribute, primary_key) later.
We are interested in finding out how many different types of payments during check-outs. Therefore, we can use MAQL to build a count function to answer my question. We can simply put in the custom metric page:
SELECT COUNT(Payment Method Code) - Syntax 1
Syntax 1 means return the unique value of the payment method code at the current level. If we use this metric to create insight, it will return:
The syntax allows the Platform to count distinct values of Payment Method Code among the transactions in the data set. We may get this result:
Diagram 2: Table of Distinct Payment Method Count
If we apply Syntax 1 when creating this table, it tells the Platform to count the number of distinct values of the payment method at the current level, which is the whole data set in our case. It tells you there are 6 payment methods that appear in the data set, but it is not the same idea as the number of distinct payment methods in all transactions. From a business perspective, we may be more interested in looking at how many payment methods are used in all transactions. If this is the case, Syntax 1 may not work the way we intended to because we want the GoodData Platform to count payment methods on the transaction data level. In order to tell the platform what context to look at, we need to pass a primary key in Count(attribute, primary_key). The primary key is a connection point between data sets; it tells the context of the data to count from. Before doing this, we need to check whether it is valid to do so from the LDM.
Diagram 3: Looking for Central Data Set in the LDM
Now the question is, what should I pick as the primary key to use in the Count() function. First, find the data set referenced from the data set containing the attribute you are counting (The data set has a lot of edges pointing at). Then, pick the primary key in the data set which is being referenced from. If we look at Diagram 2, we can see the data set Payment Method points at the data set Payment Transaction. The Transaction ID is the primary key of Payment Transaction, so that we will pick Transaction ID as the primary key in the Count() function. And therefore, we create Syntax 2 below.
SELECT COUNT(Payment Method Code, Transaction ID) - Syntax 2
If we declare a metric using syntax 2 (And call it Payment Method Count from Transaction ID), the platform would now be able to find the number of distinct payment methods used in all transactions.
Diagram 4: Difference Between not using primary key (Left) and using primary key (Right)
As we can see in Diagram 4, it makes a difference between using and not using the primary key as it informs the Platform how to look at the data. We have 6 payment methods in the data set; only 4 payment methods were used among all transactions. The left panel shows that there are 6 available payment method types to customers; on the right panel, it shows customers have used only 4 payment method types to pay. It means we have found that no customer has ever used two of the payment methods during check-outs.
LDM is also useful to know what attributes can be sliced from one attribute because a metric or fact can only be sliced by related attributes. If we look at Diagram 3 again, we know that Count(Payment Method type, Transaction ID) can be sliced by Category ID, Payment Method Type, and Transaction Date because the datasets they belong are referencing the Payment Transaction dataset (They both have edges pointing at Transaction ID). We called Payment Transaction a central dataset as it has many attributes referencing, while the primary key of Payment Transaction dataset is the central attribute; in our case, that would be Transaction ID. Most of the time, the central attribute is the best primary key to use.
When the LDM is complex, a primary key used in the count function becomes critical because it determines what attributes can be sliced by. There is a limitation to the MAQL count function: The metric is only allowed to be sliced by the attributes connected to what you are counting on. In the following example, consider we have a help desk, and we would like to understand how many employees involved in each ticket, and it has an LDM like this:
Diagram 5: LDM of a help desk data set
In this LDM, we can find 2 central datasets - Ticket Activity and Ticket Replies. We can tell those datasets are central datasets because they both have plenty of attributes or facts referencing. We will be counting Employee ID. If you look at the LDM in Diagram 5, we can trace the data set where Employee ID belongs, Employee, is pointing Ticket Replies, which also has a lot of edges pointing to. Since Ticket Replies have many edges pointing to, it is a good primary key to use in our count function to use Reply ID. However, when you look at the metric editor, you may find “Records of Ticket Replies” in the attributes section, like in Diagram 6:
Diagram 6: Metric Editor
When the LDM is getting more complex, you may see “Record of a certain dataset” appear in the metric editor. It is an invisible identifier to count distinct numbers in that dataset; the purpose is to give a clear identifier for the platform to look at, so that the Platform can find which attributes are legal to be sliced by. In our example, we want the platform only to focus on the dataset Ticket Replies and count the distinct number of Employee ID. Therefore, we would use the following syntax:
SELECT COUNT(Employee ID, Records of Ticket Replies) - Syntax 3
Or it will look like this in the Metric Editor:
Diagram 7: Build a Count function to count Employee ID in Metric Editor
Diagram 8: LDM of a Call Center Data Set, Attributes connected to Records of Ticket Replies
As we have discussed, you can see what Syntax 3 can be sliced by legally. If we look at the LDM, we can see that the following attributes in the data set Ticket Replies (Circled in red in Diagram 8):
- Reply ID
- Reply Body
- Ticket ID
- User ID
- Reply Timestamp
- Reply Date
While we define Syntax 3, we can slice the metric by the above attributes without a problem.
You may discover the limitation that you are not able to slice by any attribute with the data set that has no edge pointing at Ticket Replies. Let’s say you use the metric of Syntax 3 and try to slice by Ticket Priority in the Ticket Activity dataset. Once you have selected this metric on insight, the Platform updated the available attributes on the left panel, "Ticket Priority" was taken away (So do Ticket Assignee, Ticket Status, Assigned Date, and Solved Date). Since Ticket Priority does not have a direct relationship with the Ticket Replies dataset, it is not possible to slice by Ticket Priority from the metric of Syntax 3. Therefore, it is wise to think ahead on designing metrics while designing the LDM.
The best practice of using the Count function is first to know what metrics you would like to have and design an LDM based on that. Then, look at the attribute you would like to count, observe what other attribute it points at, or look for a central dataset in the LDM. Define the Count function by including an attribute and a primary key, especially if there is more than 1 central dataset.
The MAQL Count function is a powerful function that helps you to count the numbers of distinct values of an attribute while slicing by other related attributes. Before defining the metric using the Count function, the very first thing you need to do is understand the LDM and observe the relationship between the attribute you want to count and other attributes. Then, you can slice the metric by any attribute related to the attributes you are counting.