Pete Lorenz
01/02/2024, 10:03 PM{"ts":"2024-01-02 20:56:00.496","level":"ERROR","logger":"org.springframework.boot.SpringApplication","thread":"main","msg":"Application run failed","exc":"org.apache.pulsar.client.admin.PulsarAdminException$ServerSideErrorException: HTTP 500 Internal Server Error\n\tat org.apache.pulsar.client.admin.PulsarAdminException.wrap(PulsarAdminException.java:252)\n\tat org.apache.pulsar.client.admin.internal.BaseResource.sync(BaseResource.java:302)\n\tat org.apache.pulsar.client.admin.internal.TopicsImpl.createNonPartitionedTopic(TopicsImpl.java:340)\n\tat org.apache.pulsar.client.admin.Topics.createNonPartitionedTopic(Topics.java:482)\n\tat com.gooddata.tiger.pulsar.PulsarAutoConfiguration.producerBeanFactory$lambda-2(PulsarAutoConfiguration.kt:131)\n\tat org.springframework.context.support.PostProcessorRegistrationDelegate.invokeBeanFactoryPostProcessors(PostProcessorRegistrationDelegate.java:325)\n\tat
...
In the pulsar namespace, we're seeing a pulsar-bookie-1 pod in Crashloopbackoff, but pulsar-bookie-0 and pulsar-bookie-2 are running and available. I've tried restarting the pulsar broker and bookie pods as well as the affected gooddata pods but the same error occurs. I'm attaching the logs to the failing bookie pod as well as a failing zookeeper pod. Please let us know any ideas we can try to resolve this. Thanks so much!Robert Moucha
01/03/2024, 8:23 AMCaused by: org.rocksdb.RocksDBException: While appending to file: /pulsar/data/bookkeeper/ledgers/current/ledgers/023002.dbtmp: No space left on device
This error says that volume bound to PVC pulsar-bookie-ledgers-pulsar-bookie-1
is full. Usually it has 5Gi capacity and it should be sufficient. If the bookie ran out of space, it suggests some problem that messages are not correctly dispatched and stay in topics.
I recommend to perform deeper investigation to see which topics are causing troubles.
You can connect to one of the brokers and use bin/pulsar-admin
command to inspect backlog size of topics.
Refer to https://pulsar.apache.org/reference/#/2.11.x/pulsar-admin/topics where you can get information how to use pulsar-admin command CLI tool. The most important subcommands are bin/pulsar-admin topics list <<tenant>>/<<namespace>>
and bin/pulsar-admin topics stats persistent://<<tenant>>/<<namespace>>/<<topic>>
(where <<tenant>>/<<namespace>>
are pulsar tenant and namespace, typically gooddata-cn/gooddata-cn
stats
sub-command returns json-formatted output for given topic, look for backlogSize
that is non-zero (or much higher than zero).Pete Lorenz
01/03/2024, 3:42 PMRobert Moucha
01/04/2024, 7:59 AMPete Lorenz
01/04/2024, 8:35 PMbin/pulsar-admin topics stats <persistent://gooddata-cn/gooddata-cn/[topic]>
It appears that "backlogSize" is 0 for every topic (as of now). It appears that the list of topics changes. Is this expected? For example, I ran topics list at first and got:
root@pulsar-broker-0:/pulsar# bin/pulsar-admin topics list gooddata-cn/gooddata-cn
"<persistent://gooddata-cn/gooddata-cn/__change_events>"
"<persistent://gooddata-cn/gooddata-cn/metadata.model.calcique.DLQ>"
"<persistent://gooddata-cn/gooddata-cn/export-tabular.request.DLQ>"
"<persistent://gooddata-cn/gooddata-cn/cache-settings.change>"
"<persistent://gooddata-cn/gooddata-cn/export-visual.request>"
"<persistent://gooddata-cn/gooddata-cn/metadata.model.DLQ>"
"<persistent://gooddata-cn/gooddata-cn/metadata.model>"
"<persistent://gooddata-cn/gooddata-cn/export-visual.request.DLQ>"
"<persistent://gooddata-cn/gooddata-cn/export-tabular.request>"
"<persistent://gooddata-cn/gooddata-cn/cache-settings.bootstrap>"
"<persistent://gooddata-cn/gooddata-cn/compute.calcique.DLQ>"
"<persistent://gooddata-cn/gooddata-cn/result.xtab.DLQ>"
"<persistent://gooddata-cn/gooddata-cn/cache-settings.change.DLQ>"
"<persistent://gooddata-cn/gooddata-cn/metadata.cache-command>"
"<persistent://gooddata-cn/gooddata-cn/caches.garbage-collect.DLQ>"
"<persistent://gooddata-cn/gooddata-cn/data-source.change.DLQ>"
"<persistent://gooddata-cn/gooddata-cn/data-source.change.calcique.DLQ>"
"<persistent://gooddata-cn/gooddata-cn/compute.calcique>"
About 20 minutes later, I ran the same command and the list of topics is different:
root@pulsar-broker-0:/pulsar# bin/pulsar-admin topics list gooddata-cn/gooddata-cn
"<persistent://gooddata-cn/gooddata-cn/__change_events>"
"<persistent://gooddata-cn/gooddata-cn/export-tabular.request.DLQ>"
"<persistent://gooddata-cn/gooddata-cn/cache-settings.change>"
"<persistent://gooddata-cn/gooddata-cn/export-visual.request>"
"<persistent://gooddata-cn/gooddata-cn/export-visual.request.DLQ>"
"<persistent://gooddata-cn/gooddata-cn/export-tabular.request>"
"<persistent://gooddata-cn/gooddata-cn/cache-settings.bootstrap>"
"<persistent://gooddata-cn/gooddata-cn/cache-settings.change.DLQ>"
"<persistent://gooddata-cn/gooddata-cn/metadata.cache-command>"
"<persistent://gooddata-cn/gooddata-cn/sql.select.DLQ>"
"<persistent://gooddata-cn/gooddata-cn/caches.garbage-collect.DLQ>"
"<persistent://gooddata-cn/gooddata-cn/data-source.change.DLQ>"
"<persistent://gooddata-cn/gooddata-cn/data-source.change.calcique.DLQ>"
"<persistent://gooddata-cn/gooddata-cn/compute.calcique>"
Note that the "metadata-model" topic is in the first list but not the second. I'm wondering if this is expected.Pete Lorenz
01/04/2024, 8:38 PMPete Lorenz
01/04/2024, 9:41 PMRobert Moucha
01/05/2024, 8:26 AM<persistent://gooddata-cn/gooddata-cn/caches.garbage-collect>
<persistent://gooddata-cn/gooddata-cn/compute.calcique.DLQ>
<persistent://gooddata-cn/gooddata-cn/data-source.change>
<persistent://gooddata-cn/gooddata-cn/metadata.model>
<persistent://gooddata-cn/gooddata-cn/result.xtab>
<persistent://gooddata-cn/gooddata-cn/sql.select>
Topics are created dynamically by application. To make sure the messaging stack is working correctly, please restart both pulsar brokers first, and when they come up, perform rolling restart of the following gooddata-cn deployments: calcique, sql-executor, result-cache
This is the list of topics that should exist:
<persistent://gooddata-cn/gooddata-cn/cache-settings.bootstrap>
<persistent://gooddata-cn/gooddata-cn/cache-settings.change>
<persistent://gooddata-cn/gooddata-cn/cache-settings.change.DLQ>
<persistent://gooddata-cn/gooddata-cn/caches.garbage-collect>
<persistent://gooddata-cn/gooddata-cn/caches.garbage-collect.DLQ>
<persistent://gooddata-cn/gooddata-cn/compute.calcique>
<persistent://gooddata-cn/gooddata-cn/compute.calcique.DLQ>
<persistent://gooddata-cn/gooddata-cn/data-source.change>
<persistent://gooddata-cn/gooddata-cn/data-source.change.calcique.DLQ>
<persistent://gooddata-cn/gooddata-cn/data-source.change.DLQ>
<persistent://gooddata-cn/gooddata-cn/export-tabular.request>
<persistent://gooddata-cn/gooddata-cn/export-tabular.request.DLQ>
<persistent://gooddata-cn/gooddata-cn/export-visual.request>
<persistent://gooddata-cn/gooddata-cn/export-visual.request.DLQ>
<persistent://gooddata-cn/gooddata-cn/metadata.cache-command>
<persistent://gooddata-cn/gooddata-cn/metadata.model>
<persistent://gooddata-cn/gooddata-cn/metadata.model.calcique.DLQ>
<persistent://gooddata-cn/gooddata-cn/metadata.model.DLQ>
<persistent://gooddata-cn/gooddata-cn/result.xtab>
<persistent://gooddata-cn/gooddata-cn/result.xtab.DLQ>
<persistent://gooddata-cn/gooddata-cn/sql.select>
<persistent://gooddata-cn/gooddata-cn/sql.select.DLQ>
(I don't mention system topic __change_events
that is created by Pulsar itself).Pete Lorenz
01/05/2024, 6:49 PM