Hello we re on GoodData 3 21 0 We re calling afmExecApi thro GoodData #gooddata-cn

Hello, we're on GoodData 3.21.0. We're calling afm...

Pete Lorenz

04/09/2025, 5:50 PM

Hello, we're on GoodData 3.21.0. We're calling afmExecApi through the Python SDK and noticing an error that we haven't seen before in the logs that results in 500 errors in the SDK call:

Copy code

errorType=com.gooddata.tiger.grpc.error.GrpcPropagatedServerException, message=UNAVAILABLE: Keepalive failed. The connection is likely gone,<no detail>
	at com.gooddata.tiger.grpc.error.ErrorPropagationKt.convertFromUnknownException(ErrorPropagation.kt:230)
	Suppressed: The stacktrace has been enhanced by Reactor, refer to additional information below: 
Error has been observed at the following site(s):
	*__checkpoint ⇢ AuthenticationWebFilter [DefaultWebFilterChain]
	*__checkpoint ⇢ OAuth2LoginAuthenticationWebFilter [DefaultWebFilterChain]
	*__checkpoint ⇢ OAuth2AuthorizationRequestRedirectWebFilter [DefaultWebFilterChain]
	*__checkpoint ⇢ OAuth2AuthorizationRequestRedirectWebFilter [DefaultWebFilterChain]
	*__checkpoint ⇢ ReactorContextWebFilter [DefaultWebFilterChain]
	*__checkpoint ⇢ CorsWebFilter [DefaultWebFilterChain]
	*__checkpoint ⇢ HttpHeaderWriterWebFilter [DefaultWebFilterChain]
	*__checkpoint ⇢ OrganizationWebFilter [DefaultWebFilterChain]
	*__checkpoint ⇢ ServerWebExchangeReactorContextWebFilter [DefaultWebFilterChain]
	*__checkpoint ⇢ org.springframework.security.web.server.WebFilterChainProxy [DefaultWebFilterChain]
	*__checkpoint ⇢ com.gooddata.tiger.httplogging.LogbookWritingFilter [DefaultWebFilterChain]
	*__checkpoint ⇢ HTTP POST "/api/v1/actions/workspaces/d8eb767b09eb2ab637e89663ec2d8d4a/execution/afm/execute" [ExceptionHandlingWebHandler]
Original Stack Trace:
		at com.gooddata.tiger.grpc.error.ErrorPropagationKt.convertFromUnknownException(ErrorPropagation.kt:230)
		at com.gooddata.tiger.grpc.error.ErrorPropagationKt.convertToTransferableException(ErrorPropagation.kt:216)
		at com.gooddata.tiger.grpc.error.ErrorPropagationKt.clientCatching(ErrorPropagation.kt:65)
		at com.gooddata.tiger.grpc.error.ErrorPropagationKt$clientCatching$1.invokeSuspend(ErrorPropagation.kt)
		at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
		at kotlinx.coroutines.internal.ScopeCoroutine.afterResume(Scopes.kt:28)
		at kotlinx.coroutines.AbstractCoroutine.resumeWith(AbstractCoroutine.kt:99)
		at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:46)
		at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:102)
		at kotlinx.coroutines.internal.LimitedDispatcher$Worker.run(LimitedDispatcher.kt:111)
		at kotlinx.coroutines.scheduling.TaskImpl.run(Tasks.kt:99)
		at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:584)
		at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:811)
		at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:715)
		at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:702)

We're wondering what this means or what we can do to debug this further. What does it mean that "Keepalive failed. The connection is likely gone" within the service itself?

Julius Kos

04/09/2025, 8:31 PM

HI Pete, thanks for sharing this. Few questions: • Could you confirm if this happens consistently or sporadically? • Do you notice it only during specific types of calls (e.g., large reports, heavy datasets)? • If you retry the call, does it typically succeed?

Pete Lorenz

04/10/2025, 3:54 PM

Hi Julius, thanks for getting back. It seems this is sporadic. We noticed it yesterday when debugging a separate issue and wanted to check whether this was causing issues in one of our applications, but it seems it was not. It still would be useful to know what causes these messages. Pete

Jan Kos

04/11/2025, 4:28 PM

Hi Pete, will try to dig something up regarding this. Is this the whole stack trace from the logs? Is the stack trace always the same? Based on this log sample it likely means that gRPC keepalive ping mechanism (gRPC is used internally inside CN application for communication between microservices) failed / didn’t get a response in time, during processing oauth authentication as it is seen on auth checkpoints dropping the connection. Could you check if auth-service was healthy at the time? or if there was a higher network activity?

Open in Slack

Previous Next