Note: This post is a continuation of this one, as many readers have asked what happens when asynchronous jobs run out of retries.
Handling of failed jobs
After a job fails several times (depending on the configured number of retries ), the engine won’t execute it again. Instead, the job will be moved to a different table in the database called the dead letter table (ACT_RU_DEADLETTER_JOB). The dead letter is a common pattern in message passing systems.
Having all of them in a separate table helps finding the problematic jobs with a single and simple query. In addition to that, this partitioning of jobs improves the overall engine performance because failed jobs won’t have to be filtered out by queries from the async executor. Only Flowable 6 releases support this concept.
The big question is then what to do with the dead jobs. Flowable provides two ways of bringing them back to life: programmatically and manually via UI.
Programmatically: Flowable provides an API to reanimate jobs, see org.flowable.engine.ManagementService#moveDeadLetterJobToExecutableJob(JobId, retries). The result of the execution is that the Job is moved back to the table containing the running Jobs (ACT_RU_JOB) and the number of retries set to the parameter value.
Manually: There is a section in the Flowable Admin App that displays the current list of jobs. On the right hand side, the Job type filter has an option to show only dead letter jobs. Once this filter is applied, the list will now display the number of remaining retries (zero) and the Exception that caused the job to fail:
When clicking on an entry, additional information will be shown and, on the right-hand side, possible actions. The first action, ‘Move job’ will move the job out of the dead letter table, so it will be queued again for execution. The second option, ‘Delete job’ will delete it, but then the associated process will be stuck and will need to be cleaned up.
The most appropriate action to be taken is dependent on your use case and the nature of the exception. Flowable provides you the tools and infrastructure, but the error handling strategy is decided entirely by the requirements and the development team.
Failures in the communication, server down, unreliable network will throw easily recognizable exceptions: timeouts, network or I/O exceptions. In this case, triggering the Job again usually solves the issue, given the infrastructure is properly working again. A backend component could be written that periodically checks the dead letter jobs and, based on the exception type of a job, move it back to the regular job table.
ServiceTasks that call backend services whose implementation is buggy won’t succeed even if they’re retried multiple times. In this case, the bug has to be fixed first and then the job reactivated. If it was properly fixed, the process will normally continue its execution.
If the failed task is not critical, for instance a mail notification could not be sent but your Customer Care already called the client, the job could be deleted to avoid sending a redundant confirmation.
Flowable orchestration of Microsoft Power Automate Desktop – RPA for free
In this post, I’m going to show you how to run Microsoft’s RPA product from Flowable. The full and complete way of doing this is by using Microsoft Power Automate Cloud to remotely drive Microsoft …Read more
Engineering | AUGUST 30, 2021
Flowable 6.7.0 Release
The 6.7.0 release is a long awaited new version of the Flowable Engines. This release has taken longer than it was initially planned and we will try to meet timelines better for the next release. …Read more
Engineering | JUNE 16, 2021
Flowable: Open for Digital Business
When anyone uses the word “open” in the same sentence as “Flowable”, people usually assume they’re talking about open source. Often, that will be true, but there are other aspects of openness that are also …Read more