Hi Team, Hope all is well. I was hoping to get s...
# gooddata-cn
i
Hi Team, Hope all is well. I was hoping to get some clarification around a behavior of the GoodData SDK for python. Specifically:
Copy code
gd_result_table = sdk.tables.for_items(
            workspace_id=gd_workspace_id,
            items=attributes_to_query,
            filters=filters,
        )
When my result table has more than 10,000 entries, the above raises an exception. I was curious if that 10,000 was a hard limit, or something I could configure? If it is a hard limit — is there a way I can configure the above query to just return the top N rows instead, where N <= 10,000?
j
Do you use the Community Edition (single container deployment) or the Kubernetes deployment? The limit can be configured, just differently for each above mentioned deployment... https://www.gooddata.com/developers/cloud-native/doc/cloud/deploy-and-install/cloud-native/execution-limits/
Additionally, it is possible to add a TOP(X) filter to the request you send with Python SDK. @Jan Kadlec can you share more details about how exactly this can be achieved?
j
Hi Igor, I would like to recommend you to try out our other Python package gooddata-pandas which allows you to work with data using popular pandas data frames. Using gooddata-pandas you can use the following approach, which is not exactly what you wanted, but I think it is a nice workaround. It gives you TOP(n) or BOTTOM(n) values of metric. If you do not have any metrics in your report and you want to list only attributes, you can create virtual metric for count of attribute.
Copy code
from gooddata_pandas import GoodPandas
from gooddata_sdk import RankingFilter, ObjId

good_pandas = GoodPandas("<http://localhost:3000>", "YWRtaW46Ym9vdHN0cmFwOmFkbWluMTIz")

df_factory = good_pandas.data_frames("demo")


df_factory.for_items(
    items=dict(
            reg="label/region",
            category="label/products.category",
            price="fact/price",
            order_amount=ObjId(type="metric", id="order_amount"),
        ),
    filter_by=RankingFilter(
            metrics=[ObjId(type="metric", id="order_amount")],
            operator="TOP",
            value=10,
            dimensionality=[]
    )
)
i
Hi @Jan Soubusta — thank you for your response. I’m using the K8S deployment. @Jan Kadlec — thank you for the suggestion about gooddata-pandas. I initially chose the gooddata-sdk because I figured it would have the most first-party support. I will take a look at gooddata-pandas and make sure it has the rest of the functionality I need. In the meantime; is there no “top” or “bottom” functionality in the gooddata SDK that I’m using?
j
You should be able to use RankingFilter in
sdk.tables.for_items
as well 🙂
❤️ 1
i
@Jan Kadlec just to confirm; if I use the pandas package, would I be able to apply a “TOP” operator to a query if that query result set (without the “TOP”) would have contained greater than 10,000 entries?
Or would I still get an error?
Also thanks for letting me know about RankingFilter, I will take a look
j
If you apply the filter then the error should not occurred.
i
I will give it a try then! Thank you.
j
TOP(x) filters are even pushed down to your database. The error does not occur in this case for sure.
i
Fantastic. And does that apply to RankingFilter for sdk.tables.for_items also?
j
Yes, it applies.
j
This should be transparent across Python SDK libraries (gooddata-sdk, gooddata-pandas). For instance, if you see RankingFilter in one interface, it should work identically in all other interfaces where it is exposed. If you find any kind of inconsistency, please, report it here, we will promptly fix it 😉