Using TOP or BOTTOM to filter a MAQL Measure in GoodData.CN

  • 22 July 2021
  • 0 replies
  • 14 views

Introduction

In GoodData.CN’s MAQL syntax, there is a ranking filter which allows you to only select a specific number (or a specific percentage) of records within the top or bottom values. The syntax looks like this: 

SELECT … WHERE TOP(…) OF … / IN … / WITHIN … 

SELECT … WHERE BOTTOM(...) OF … / IN … / WITHIN … 

 

The TOP and BOTTOM keywords effectively apply a filter that dynamically reduces the number of data slices used in the measure. Appending the OF, IN, and/or WITHIN keywords tells GoodData.CN how to slice the data (such as taking the top values within a specific groupings data). One useful thing that can come from using TOP or BOTTOM is reducing the number of data slices that show up in an insight. 

 

Below are some descriptions of how to use these keywords and some of the nuances of how it works. 

 

Creating measures to show the top values 

Using the demo data provided with the Community Edition of GoodData.CN, we can work through a few examples as follows. 

 

Let’s say we have a measure that calculates our revenue, with the ID “orderrevenue”, defined as:

SELECT SUM({fact/order_lines.price}*{fact/order_lines.quantity}) WHERE {label/order_lines.order_status} = \"Delivered\"

 

We can create a new measure that filters the results to only show the slices of data with the top 10 highest values for order revenue as follows:

SELECT {metric/orderrevenue} WHERE TOP(10) OF ({metric/orderrevenue})

This new measure will show the total revenue for only the slices of data with the 10 highest values for order revenue. 

 

If you were to create an insight using this measure, you could use Customer Name for your slice to show revenue for only your top 10 customers:

GmUP1IaU047Wg1-oF7gxfZKXzarU_NxSCdNWXZtglT0sxlhFxXN-z1YwX7y4KlhFpelKjTjEa6ez4OhwPHYlXIEEtBHH7WulVr1On3YjODd2_y2-UVeipqllOsG--ZWA7QuBaJKA

 

Or, you could slice by the customers’ state to show the total revenue for only the top 10 states:

-vYt6RfyqrldTP7LCcJ2dwAEJNfIMvvxpZJpwdgksAqJMU0NETvC9f3c2mA63gbzf9lmScDJ-GLWfquegw7w-hU-JN6NMgnVfhMs5ScTiUvMt5slt5ETBssvrqj0JuBdF2O26yHC

 

An important nuance to point out here: the measure does not take the top 10 slices but rather the top 10 values. If you have slices with equal values, then you could end up with more than 10 slices included in the measure.

 

An example to demonstrate this is below. Using the following measure to show the top 9 values for most products ordered actually shows 10 slices of data. This happens because the customer that ordered the 9th most products and the customer that ordered the 10th most ordered the same amount: 34. 

SELECT SUM({fact/order_lines.quantity}) WHERE TOP(9) OF (SUM({fact/order_lines.quantity}))

 

BSnU1-rCWk1Ysyn3XO3zvHQ3R8Zn2c1pTq3Eh-dMMUjNeP0V9BFYrLgzlZpq0qipMoAFjUTCcYiQ9vIrDThKttBUQtyC_vajl9an4G7Ykt4LgA6x2kXoK72uJFrXmckwe5zf2zED

 

Be sure to keep this in mind when designing your measures to avoid getting an undesired result. If you do only want a specific number of slices, then you can add a second metric to the OF portion in the query to rank the slices by an additional value. For example, the following measure would first use the total number of products per slice, then it would use the total revenue per slice to further rank the result:

SELECT SUM({fact/order_lines.quantity}) WHERE TOP(9) OF (SUM({fact/order_lines.quantity}), {metric/orderrevenue})

 

Creating measures to show the top values in a specific group

Referring back to our previous measure for the top 10 revenues, let’s try to display a single value showing the total revenue earned only by the top 10 customers. 

 

The previous measure will need to be adjusted since it is only filtering based on the slices used by the insight it is used in. This makes it more dynamic and useful in more situations but can give us an undesired result if we are looking for a more specific value. 

 

If we tried using it in an insight, it would give us the top ten slices, which would include everything if we are not slicing by a value, as shown in the screenshot below:

HkYLNUfAUlbtPdMkfrVf3JIj3XW6QSE0GIsDbxK4UnRFT8lyUamGVm-5CNSmwfkssiF7P2Elyfae8ouNtXCQPHeCZ-mbBY6CMM3xSpHeoHNFItrax3ZJm0GyguYqDYVWNoRHJOMh

 

To make this adjustment, you would need a measure that shows the revenue for only the top value in a particular slice, as follows:

SELECT {metric/orderrevenue} WHERE TOP(10) IN (SELECT {metric/orderrevenue} BY {label/customers.customer_name})

SHV3-O5gaSekwcSI67pAIIhr8gEQIsBmubNskeLhT8FTeUVT4PROdEU06eY8Zc7pF5CPRvx6Sp6MF41U_kVh-TAh00AC_xj7rmpkXdHJwhNaofX_qrwPex6r4TXQTTJluaHnITl_

In the above example, the measure takes the total revenue for each customer and then filters the result to only show the customers with the 10 highest values for the total revenue. To put it more generally, it takes the top 10 slices of data from the nested selection in the BY clause. This is a way to explicitly define the slices of data before taking the top or bottom slices.

 

As another example to demonstrate what MAQL is doing: you could use the TOP keyword to show the revenue for the customers that bought the most number of unique products using the following query: 

SELECT {metric/orderrevenue} WHERE TOP(10) IN (SELECT COUNT({attribute/products.product_name}) BY {label/customers.customer_name})

 

The above measure slices the data by customer and shows the revenue for the 10 customers that bought the most total number of unique products. 

 

Creating measures to show the top values within a specific sub-grouping

By including the WITHIN keyword, we can show the top values within a specific group of data. 

 

For example, the following measure shows the revenue for the top 10 customers within each state: 

SELECT {metric/orderrevenue} WHERE TOP(10) IN (SELECT {metric/orderrevenue} BY {label/customers.customer_name}) WITHIN ({label/customers.state}, ALL OTHER)

G3ufyAvG_7Erp3yIv0cmUy_bWVB1OwC-lz2SD4pKX3EB6AYJr4ix4DyDCtBII6A7K3OgnWhy36CMbbl2hRToFcusqfu89YKBHCt7wuVsdwG80C5r0PUqnt-6nmy5XE7f5P56VoJa

 

Using the IN keyword to specify a nested measure

As MAQL allows nesting measures (equivalent to what is often called sub-queries in SQL), you can specify a nested measure when using the TOP and BOTTOM keywords. For example, the following measure takes the top values that are only included in the nested measure that follows the IN keyword:

SELECT {metric/orderrevenue} WHERE TOP(10) IN (SELECT {metric/orderrevenue} WHERE {label/customers.state} = \"California\")

 

The above measure first determines which data is included in the nested measure, in this case the order revenue from customers in California, and then takes the top values from that measure. 

 

When creating measures like this, be careful where you place your parentheses. The following two measures seem similar but are in fact returning different results:

SELECT {metric/orderrevenue} WHERE TOP(10) IN (SELECT {metric/orderrevenue} WHERE {label/customers.state} = \"California\")

SELECT {metric/orderrevenue} WHERE TOP(10) IN (SELECT {metric/orderrevenue}) AND {label/customers.state} = \"California\"

 

The first measure above will include the top 10 order revenue values for customers in California, while the second measure will include the top 10 order revenue values for all customers anywhere and then filter those to only include them if they are in California. 

 

Further considerations

The examples so far have used TOP to show the highest N values. However, there is also BOTTOM to show the lowest N values. 

 

Additionally, the above examples used a specific number of top values, but you can also specify a percentage of values. For example, if you wanted to show the revenue from the top 10% of customers, you could use the following measure:
SELECT {metric/orderrevenue} WHERE TOP(10%) OF ({metric/orderrevenue})

 

When using the percentage of values, the result will always be a whole number and it will always be rounded up. So, if you want to select 20% of 11 items and each item has a distinct value, then the result would include 3 slices (20% of 11 is 2.2, and it gets rounded up to 3). This ensures that there will always be a value so long as there is at least one slice. If a measure took the top 10% of a slice of data with only 1 value, then it would return that one slice. Likewise, if that same measure was used for a slice of data with 2 values, then it would include 1 value. 

 

For more information, read our documentation on the TOP and BOTTOM keywords here

 

Example use cases 

Some examples of when you might want to use this filter might be:

  • Showing the top 10 largest opportunities won might look something like below:
    SELECT {metric/won_opportunity_revenue} WHERE TOP(10) OF ({metric/won_opportunity_revenue})

  • Showing the number of opportunities won by the 5 sales people who generated the most revenue by region:
    SELECT {metric/count_of_opportunities_won} WHERE TOP(5) IN (SELECT {metric/won_opportunity_revenue} BY {label/sales_team.sales_rep_name}) WITHIN ({label/sales_team.region}, ALL OTHER)

  • Showing the number of units sold for the 5 worst selling products:
    SELECT SUM({fact/product_sales.units_ordered}) WHERE BOTTOM(5) OF (SUM({fact/product_sales.units_ordered}))

  • Showing the number of units sold for the 5 worst selling products within each product category:
    SELECT SUM({fact/product_sales.units_ordered}) WHERE BOTTOM(5) IN (SELECT SUM({fact/product_sales.units_ordered}) BY {label/products.category}) WITHIN ({label/products.category}, ALL OTHER)

  • Showing the revenue earned by the top 20% of customers:
    SELECT {metric/revenue} WHERE TOP(20%) IN (SELECT {metric/revenue} BY {label/customers.customer_name})

  • Showing the revenue earned by the bottom 80% of customers:
    SELECT {metric/revenue} WHERE BOTTOM(80%) IN (SELECT {metric/revenue} BY {label/customers.customer_name})


0 replies

Be the first to reply!

Reply