Hi All, We are doing load testing with GoodData-cn...
# gooddata-cn
k
Hi All, We are doing load testing with GoodData-cn, our version is 1.7.2. We are using GCP redis. I see loads of the below error in CalciqueGrpcService log. The error happened first time when we tried higher load(2x), but now the error is even happening with the smaller load(x) and its very frequent. What is the reason of this error? Also what is the possible actions we should take to resolve this error. I don't see any with redis though. {"exc":"org.springframework.dao.QueryTimeoutException: Redis command timed out; nested exception is io.lettuce.core.RedisCommandTimeoutException: Command timed out after 10 second(s) at org.springframework.data.redis.connection.lettuce.LettuceExceptionConverter.convert(LettuceExceptionConverter.java:70) at org.springframework.data.redis.connection.lettuce.LettuceExceptionConverter.convert(LettuceExceptionConverter.java:41) at org.springframework.data.redis.connection.lettuce.LettuceReactiveRedisConnection.lambda$translateException$0(LettuceReactiveRedisConnection.java:293) at reactor.core.publisher.Flux.lambda$onErrorMap$28(Flux.java:6911) at reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onError(FluxOnErrorResume.java:94) at reactor.core.publisher.MonoFlatMapMany$FlatMapManyInner.onError(MonoFlatMapMany.java:255) at org.springframework.cloud.sleuth.instrument.reactor.ScopePassingSpanSubscriber.onError(ScopePassingSpanSubscriber.java:95) at reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber.onError(FluxMapFuseable.java:140) at reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber.onError(FluxMapFuseable.java:140) at reactor.core.publisher.MonoFlatMap$FlatMapMain.secondError(MonoFlatMap.java:192) at reactor.core.publisher.MonoFlatMap$FlatMapInner.onError(MonoFlatMap.java:259) at org.springframework.cloud.sleuth.instrument.reactor.ScopePassingSpanSubscriber.onError(ScopePassingSpanSubscriber.java:95) at reactor.core.publisher.MonoNext$NextSubscriber.onError(MonoNext.java:93) at reactor.core.publisher.MonoNext$NextSubscriber.onError(MonoNext.java:93) at io.lettuce.core.RedisPublisher$ImmediateSubscriber.onError(RedisPublisher.java:891) at io.lettuce.core.RedisPublisher$State.onError(RedisPublisher.java:712) at io.lettuce.core.RedisPublisher$RedisSubscription.onError(RedisPublisher.java:357) at io.lettuce.core.RedisPublisher$SubscriptionCommand.onError(RedisPublisher.java:797) at io.lettuce.core.RedisPublisher$SubscriptionCommand.doOnError(RedisPublisher.java:793) at io.lettuce.core.protocol.CommandWrapper.completeExceptionally(CommandWrapper.java:128) at io.lettuce.core.protocol.CommandExpiryWriter.lambda$potentiallyExpire$0(CommandExpiryWriter.java:175) at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98) at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170) at io.netty.util.concurrent.DefaultEventExecutor.run(DefaultEventExecutor.java:66) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.base/java.lang.Thread.run(Unknown Source) Caused by: io.lettuce.core.RedisCommandTimeoutException: Command timed out after 10 second(s) at io.lettuce.core.internal.ExceptionFactory.createTimeoutException(ExceptionFactory.java:59) at io.lettuce.core.protocol.CommandExpiryWriter.lambda$potentiallyExpire$0(CommandExpiryWriter.java:176) ... 7 more ", "level":"ERROR", "logger":"com.gooddata.tiger.calcique.service.CalciqueGrpcService", "msg":"gRPC server call", "orgId":"xxxxxxxxxx", "spanId":"f47287a4344b2f3d", "thread":"DefaultDispatcher-worker-6", "traceId":"4c3ca0481448f410", "ts":"2022-10-14 031313.869", "userId":"admin"}
Is there any "TTL" for the keys written to the Redis?
This is the error in gooddata-cn-result-cache service. {"id":"878c2233f1a9fa09340d4652f5fe4e29", "level":"ERROR", "logger":"com.gooddata.tiger.cache.result.raw.service.RawCacheStore", "msg":"Store cache - unknown state of cache", "orgId":"<undefined>", "spanId":"784bd2542e41e8a1", "state":"NOT_FOUND", "thread":"DefaultDispatcher-worker-1", "traceId":"167288484439983d", "ts":"2022-10-14 055134.023", "userId":"<undefined>"}
j
@Ondrej Stumpf please, help
o
hi @Kshirod Mohanty, the fact that you see Redis exceptions in multiple services (calcique, result-cache, ...) suggests that there is a problem with Redis connectivity. The
unknown state of cache
might also suggest that Redis is overloaded and is evacuating records, which can cause inconsistency issues. Can you please double-check that Redis has enough memory? There indeed is TTL set for caches, but the actual value is different per key type, so there is no generic answer.
r
Redis command timed out
redis responses are usually sub-millisecond. If the redis didn't respond to 10s, it suggests there are connectivity issues. Please check if the redis is accessible from the cluster.
k
We upgraded the GCP memorystore(redis) memory from 5GB to 50GB and flushed out all the keys. We are still seeing the errors for the higher loads. For smaller loads everything looks fine.
r
It seems that we miss the important information in our documentation about redis configuration. The redis instance should have
maxmemory-policy=allkeys-lru
config key set. Please update your redis configuration - it should get the older keys evicted automatically when memory is full. I will add this setting to documentation.