Hi, I'm trying the community edition but after I t...
# gooddata-cn
d
Hi, I'm trying the community edition but after I tried a larger dataset, something in the container ran out of memory and now it keeps running out of memory every time I start it - before I can even access the website. I have 24 GB allocated to Docker (the stats page in Docker reports the container is only using 3 GB).
r
Hi Dan, the community edition image is resource-constrained to allow running container on low-memory machines. If you need running large data sets, you may try running docker container with environment variable
Copy code
JAVA_OPTS="-Xmx2g -XX:+UseStringDeduplication -XX:+ExplicitGCInvokesConcurrent -XX:+ParallelRefProcEnabled"
In the upcoming release, there will be configurable limit on amount of returned datacells to prevent possible OOM events.
d
Thanks, I'll try it. Any idea why running out of memory once would cause another OOM every time you restart the container, and is the k8s version able to deal with unexpected OOM? I found this slightly concerning.
Unfortunately I'm still running into problems. When I try to load the values in a PostgreSQL table with <4 rows, it crashes with
Copy code
172.30.0.1 - - [10/Sep/2021:12:03:10 +0000] "GET /api/entities/workspaces/0fabe453644a413182f16a78c98778fd?metaInclude=config HTTP/1.1" 200 314 "<http://localhost:3000/analyze/>" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Firefox/91.0"

172.30.0.1 - - [10/Sep/2021:12:03:10 +0000] "GET /api/entities/workspaces/0fabe453644a413182f16a78c98778fd/metrics?size=250&tags=&page=0 HTTP/1.1" 200 269 "<http://localhost:3000/analyze/>" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Firefox/91.0"

172.30.0.1 - - [10/Sep/2021:12:03:10 +0000] "GET /api/entities/workspaces/0fabe453644a413182f16a78c98778fd/facts?size=250&tags=&page=0 HTTP/1.1" 200 532 "<http://localhost:3000/analyze/>" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Firefox/91.0"

172.30.0.1 - - [10/Sep/2021:12:03:10 +0000] "GET /api/entities/workspaces/0fabe453644a413182f16a78c98778fd/attributes?size=250&include=labels&tags=&page=0 HTTP/1.1" 200 15346 "<http://localhost:3000/analyze/>" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Firefox/91.0"

172.30.0.1 - - [10/Sep/2021:12:03:10 +0000] "GET /api/entities/workspaces/0fabe453644a413182f16a78c98778fd/attributes?size=250&include=labels%2Cdatasets&tags=&page=0 HTTP/1.1" 200 18635 "<http://localhost:3000/analyze/>" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Firefox/91.0"

172.30.0.1 - - [10/Sep/2021:12:03:10 +0000] "GET /api/entities/workspaces/0fabe453644a413182f16a78c98778fd/attributes?size=250&include=labels%2Cdatasets&tags=&page=0 HTTP/1.1" 200 18635 "<http://localhost:3000/analyze/>" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Firefox/91.0"

Warning: Nashorn engine is planned to be removed from a future JDK release

ts="2021-09-10 12:03:13.562" level=ERROR msg="Bad Request" logger=com.gooddata.tiger.web.exception.BaseExceptionHandling thread=DefaultDispatcher-worker-2 orgId=default spanId=7fece6104f81626d traceId=7fece6104f81626d userId=demo exc="errorType=com.gooddata.tiger.afm.tools.ResultCacheResponseError, message=An error has occurred during the listing of label elements

at com.gooddata.tiger.afm.service.ElementsProcessor.process$suspendImpl(ElementsProcessor.kt:67)

Suppressed: reactor.core.publisher.FluxOnAssembly$OnAssemblyException:

Error has been observed at the following site(s):

|_ checkpoint ⇢ Handler com.gooddata.tiger.afm.controller.ElementsController#processElementsRequest(String, String, ElementsOrder, boolean, boolean, String, int, int, float, boolean, ServerHttpRequest, Continuation) [DispatcherHandler]

Stack trace:

at com.gooddata.tiger.afm.service.ElementsProcessor.process$suspendImpl(ElementsProcessor.kt:67)

at com.gooddata.tiger.afm.service.ElementsProcessor$process$1.invokeSuspend(ElementsProcessor.kt)

at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)

at kotlinx.coroutines.DispatchedTaskKt.resume(DispatchedTask.kt:175)

at kotlinx.coroutines.DispatchedTaskKt.resumeUnconfined(DispatchedTask.kt:137)

at kotlinx.coroutines.DispatchedTaskKt.dispatch(DispatchedTask.kt:108)

at kotlinx.coroutines.CancellableContinuationImpl.dispatchResume(CancellableContinuationImpl.kt:308)

at kotlinx.coroutines.CancellableContinuationImpl.resumeImpl(CancellableContinuationImpl.kt:318)

at kotlinx.coroutines.CancellableContinuationImpl.resumeWith(CancellableContinuationImpl.kt:250)

at kotlinx.coroutines.channels.AbstractChannel$ReceiveElement.resumeReceiveClosed(AbstractChannel.kt:877)

at kotlinx.coroutines.channels.AbstractSendChannel.helpClose(AbstractChannel.kt:312)

at kotlinx.coroutines.channels.AbstractSendChannel.close(AbstractChannel.kt:241)

at kotlinx.coroutines.channels.SendChannel$DefaultImpls.close$default(Channel.kt:102)

at kotlinx.coroutines.channels.ProducerCoroutine.onCompleted(Produce.kt:137)

at kotlinx.coroutines.channels.ProducerCoroutine.onCompleted(Produce.kt:130)

at kotlinx.coroutines.AbstractCoroutine.onCompletionInternal(AbstractCoroutine.kt:104)

at kotlinx.coroutines.JobSupport.tryFinalizeSimpleState(JobSupport.kt:294)

at kotlinx.coroutines.JobSupport.tryMakeCompleting(JobSupport.kt:853)

at kotlinx.coroutines.JobSupport.makeCompletingOnce$kotlinx_coroutines_core(JobSupport.kt:825)

at kotlinx.coroutines.AbstractCoroutine.resumeWith(AbstractCoroutine.kt:111)

at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:46)

at kotlinx.coroutines.internal.ScopeCoroutine.afterResume(Scopes.kt:32)

at kotlinx.coroutines.AbstractCoroutine.resumeWith(AbstractCoroutine.kt:113)

at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:46)

at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:56)

at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:571)

at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:738)

at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:678)

at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:665)

"
There are no other exceptions between "All services of GoodData.CN are ready" and this point. The logical table model is generated just by scanning my testing database. The booking/value type tables have at most 3 rows and their labels are not loading. The product table has about 500 and its labels are loading fine. The sales table has ~1M which is 1-2 orders of magnitude less than what we usually expect it to have, and this one can load labels on one of the attributes. Obviously we have no issue querying a table with 3 rows in it from pgAdmin, so I don't believe the problem is on the side of the database. Is there a way to enable more verbose logging to figure out what's going on, or anything else that can help diagnose the issue?
r
We discovered bug in labelElements API that will be addressed in the next release. The periodic OOM issue could only be resolved while docker container is stopped. Assuming you're using a docker volume mounted to container, as recommended in our documentation. So attach this volume to some ephemeral container, like:
Copy code
docker run -it --rm -v gd-volume:/data busybox
And then remove the directory
/data/pulsar/standalone
in this container.
Copy code
rm -rf /data/pulsar/standalone/
Exit the ephemeral container and start the GoodData.CN CE again, with this "fixed" volume. This is a temporary fix, as the OOM can happen again. The next release will make it more resilient.
1
d
Hi, has the labelElements API fix been released yet?
r
Hi Dan, sorry I missed your question. The stability of labelElements API was substantially improved, we also made cache storage more efficient, and introduced platform limits. These changes will improve the overall performance and resilience to errors. Note that there still errors calling this API may occur (esp. when API needs to return too many unique values). But these errors will not influence an overall system stability. The new release 1.4.0 containing all the mentioned fixes is scheduled to 11th of October (the next Monday).