PDF workers with RabbitMQ

Initiating a CPU-intensive task from within an HTTP request is quite common. Asking for a PDF report is a typical example. But where should the processing take place? Processing the task in the same thread as the HTTP request is simple and works well, but it can overload the web server machine. Another solution consists in submitting the task to remote workers. This solution offers a good decoupling between the task producer (the web controller) and the consumers (the remote workers). Let’s see how to implement such a solution with RabbitMQ ans Spring AMQP’s support.

The simple, 100%-local solution

Let’s see first the simple solution, where the producer and the worker are co-located, as illustrated in the diagram below.
Local solution for PDF generation
What about the code? Imagine you’re in a Java Servlet or any kind of web controller:

PdfService pdfService = new ITextPdfService();
byte [] content = pdfService.createPdf(pdfRequest);
response.getOutputStream().write(content);
response.setStatus(HttpServletResponse.SC_OK);
response.setContentType("application/pdf");
response.setHeader("Content-Disposition","attachment; filename=spring-amqp.pdf");

Apart from the use of an PdfService interface, the web controller and the PDF service can’t be more coupled. We can do better with messaging, let’s see a little bit of theory first.

Messaging at the rescue

Messaging is a powerful paradigm, it is based among others on asynchronicity and document-based communication. Message producers and message consumers don’t share state and don’t work synchronously. Most of the time, they work in different processes, on different nodes. Messaging helps to keep components decoupled, making the whole system easier to maintain and more scalable.
Building a message-based system can be tricky, but hopefully patterns can help. Let’s see the two patterns we’ll need for our PDF generation.

Work queues

With the work queue pattern, tasks are queued and worker processes pop tasks from the queue and eventually execute the job.
Work queues
Work queues provide good decoupling between task submission and task processing. Processing can easily scale in case of spikes just by adding new workers. If the task processing is resource-intensive, the workers can be on powerful servers. Work queues are the stable parts (they never move, always there to queue tasks) and the workers are dynamic parts of the system.
The load balancing can be natural: a worker unqueues a task when it’s available.
But what about the response? In our case, we need the generated PDF file! Work queues are often used to postpone task execution, but they also work in request/reply scenario.

Request/reply (aka RPC)

A producer can submit a task and also wait for an answer. In such a case, the task is unqueued, processed, and the worker sends back the response in a queue the producer is waiting on.
Request/reply
The response queue can be private (as in the diagram) or shared. In the latter case, the task contains a correlation id that will identify the response (another pattern!).
OK, enough theory, let’s see the practice.

A decoupled solution with Spring AMQP

Spring AMQP provides a RabbitTemplate that directly implements the request/reply pattern. Behind the scenes RabbitMQ powers up message sending, queuing, and dispatching.
Here is how to configure the RabbitTemplate:

<rabbit:connection-factory id="connectionFactory" channel-cache-size="10" />
<bean class="org.springframework.amqp.rabbit.core.RabbitTemplate">
  <property name="connectionFactory" ref="connectionFactory" />
</bean>

Note you can do this configuration in Java if you don’t like XML. Using the RabbitTemplate is as simple as this:

RabbitTemplate tpl = (...)  // looked up from the Spring application context
byte [] content = (byte[]) tpl.convertSendAndReceive("pdfRequests", request);

Where pdfRequests is the routing key, which means the request will be sent to a pdfRequests queue.
But what about the worker? You can configure a worker to listens on the pdfRequests this way:

<!-- the original PDF service -->
<bean id="pdfService" class="com.zenika.rabbitmq.ITextPdfService" />
<!-- making the PDF service message -->
<rabbit:listener-container connection-factory="connectionFactory">
  <rabbit:listener ref="pdfService" method="createPdf" queues="pdfRequests" />
</rabbit:listener-container>

You just have to bootstrap a Spring application context on different servers and you have your army of workers. Note you can make your workers multi-threaded.
Our solution works and is decoupled now, but it’s a little bit intrusive because we directly use the RabbitTemplate in our code. Spring Integration can help us to make the processing less intrusive and even more decoupled.

A more decoupled solution with Spring Integration

Spring Integration‘s AMQP support builds on top Spring AMQP. Spring Integration implements tons of messaging patterns (aka “Enterprise Integration Patterns“). One of them is the outbound gateway that fits perfectly our needs. The outbound gateway implements the request/reply pattern (it takes care of the submission, the response queue, and so on) and can behave as a PdfService, our application interface.
This means our application code is agnostic to the task submission and to the processing: local, in another thread, remote with JMS or AMQP… this is all about configuration, the application code won’t change.
Here is what we about to do:
PDF generation with RabbitMQ and Spring Integration
And now the code. The worker part doesn’t change: workers still listen on the pdfRequests queue. We configure now 2 gateways for the producer side:

<int:gateway id="pdfService" default-request-channel="pdfRequestsChannel" service-interface="com.zenika.rabbitmq.PdfService" />
<int:channel id="pdfRequestsChannel" />
<int-amqp:outbound-gateway request-channel="pdfRequestsChannel" amqp-template="amqpTemplate" routing-key="pdfRequests" />

The first gateway is the one that ends up used in the application, it feels and smells like a PdfService, but it’s an entry point into the messaging world. It sends every request through the pdfRequestsChannel to the second gateway. This second gateway handles the request/reply pattern.
Our application code goes back to normal: it knows only about the PdfService:

PdfService pdfService = (...) // looked up from Spring application context
byte [] content = pdfService.createPdf(pdfRequest);

Conclusion

Messaging is your friend when it comes to decouple and scale your systems. Message-based systems are built up from the combination of patterns, but implementing these patterns is usually cumbersome and error-prone. Frameworks like Spring Integration help to leverage these patterns and let you focus on what matters: the application code.
Source code available on Github.

Auteur/Autrice

Laisser un commentaire

Ce site utilise Akismet pour réduire les indésirables. En savoir plus sur comment les données de vos commentaires sont utilisées.

%d blogueurs aiment cette page :