Jack Marchant

Principal Software Engineer @ Deputy

Twitter | GitHub

Exploring Async PHP

Asynchronous programming is a foundational building block for scaling web applications due to the increasing need to do more in each web request. A typical example of this is sending an email as part of a request.

In many web applications, when something is processed on the server we want to notify people via email and it's common for this to be a separate HTTP request to a third-party service such as SendGrid, Mailchimp etc.

This becomes a more than trivial example when you need to send a lot of emails at once. In PHP, if you want to send an email and the HTTP process takes 100ms to complete, you'd quickly increase the total time for the request by sending tens or hundreds of emails.

Of course, any good third-party email service would provide a bulk endpoint to negate this, but for the sake of the example - let's say you want to send 100 emails and each has to be processed individually.

So, we need to make a decision: how can we move the processing of the emails into a separate process so that it doesn't block the original web request? That is what we'll explore in this post, particularly all the different ways this can be solved in PHP with or without new infrastructure.

Using exec()

exec() is a native function in PHP that can be used to execute an external program and returns the result. In our case, it could be a script that sends emails. This function uses the operating system to spawn a completely new (blank, nothing copied or shared) process and you can pass any state you need to it.

Let's take a look at an example.

<?php
// handle a web request

// record the start time of the web request
$start = microtime(true);
$path = __DIR__ . '/send_email.php';

// output to /dev/null & so we don't block to wait for the result
$command = 'php ' . $path . ' --email=%s > /dev/null &';
$emails = ['joe@blogs.com', 'jack@test.com'];

// for each of the emails, call exec to start a new script
foreach ($emails as $email) {
    // Execute the command
    exec(sprintf($command, $email));
}

// record the finish time of the web request
$finish = microtime(true);
$duration = round($finish - $start, 4);

// output duration of web request
echo "finished web request in $duration\n";

send_email.php

<?php

$email = explode('--email=', $argv[1])[1];
// this blocking sleep won't affect the web request duration
// (illustrative purposes only)
sleep(5);

// here we can send the email
echo "sending email to $email\n";

Output

$ php src/exec.php

finished web request in 0.0184

The above scripts show the web request still finishes in milliseconds, even though there is a blocking sleep function call in the send_email.php script.

The reason it doesn't block is because we've told exec with the inclusion of > /dev/null & in the command that we don't want to wait for exec command to finish so we can get the result, meaning it can happen in the background and the web request can continue.

In this way, the web request script is simply responsible for running the script, not for monitoring its execution and/or failure.

This is an inherent downside of this solution, as the monitoring of the process falls to the process itself and it cannot be restarted. However, this is an easy way to get asynchronous behaviour into a PHP application without much effort.

exec runs a command on a server so you have to be careful about how the script is executed, particularly if it involves user input. It can be hard to manage using exec particularly as you manage scaling the application, as the script is likely running on the exact same box that is processing external web requests, so you could end up exhausing CPU and memory if many hundreds or thousands of new processes are spawned via exec.

pcntl_fork

pcntl_fork is a low-level function which requires PCNTL extension to be enabled and is a powerful, yet potentially error prone method for writing asynchronous code in PHP.

pcntl_fork will fork or clone the current process and split it into a parent and a number of child processes (depending on how many times it is called). By detecting the Process ID or PID we can run different code when in the context of a parent process or a child process.

The parent process will be responsibile for spawning child processes and waiting until the spawned processes have completed before it can complete.

In this case, we can have more control over how the processes exit and can easily write some logic to handle retries in case of failure in the child process.

Now, on to the example code for our use case to send emails in a non-blocking way.

<?php

function sendEmail($to, $subject, $message)
{
    // Code to send email (replace with your email sending logic)
    // This is just a mock implementation for demonstration purposes
    sleep(3); // Simulating sending email by sleeping for 3 seconds
    echo "Email sent to: $to\n";
}

$emails = [
    [
        'to' => 'john@example.com',
        'subject' => 'Hello John',
        'message' => 'This is a test email for John.',
    ],
    [
        'to' => 'jane@example.com',
        'subject' => 'Hello Jane',
        'message' => 'This is a test email for Jane.',
    ],
    // Add more email entries as needed
];

$children = [];

foreach ($emails as $email) {
    $pid = pcntl_fork();

    if ($pid == -1) {
        // Fork failed
        die('Error: Unable to fork process.');
    } elseif ($pid == 0) {
        // Child process
        sendEmail($email['to'], $email['subject'], $email['message']);
        exit(); // Exit the child process
    } else {
        // Parent process
        $children[] = $pid;
    }
}

echo "running some other things in parent process\n";
sleep(3);

// Parent process waits for each child process to finish
foreach ($children as $pid) {
    pcntl_waitpid($pid, $status);
    $status = pcntl_wexitstatus($status);
    echo "Child process $pid exited with status: $status\n";
}

echo 'All emails sent.';

In the above example using pcntl_fork we can fork the current process, which copies the parent process into new child processes and wait for the execution to complete. Additionally, after forking the child processses to send emails, the parent process can continue doing other things, before ultimately ensuring the child processes have finished.

This is a step above using exec where we were pretty limited in what is possible because the scripts are completely separate contexts so monitoring is not possible from an overall perspective.

We also gain process isolation as each child process runs in a separate memory space and does not affect other processes. By tracking the process IDs we can effectively monitor and manage execution flow.

A downside in forking requests in this way, directly from the web request (parent process) is that by waiting for the child processes to finish, there's no benefit to the response time of the original request in doing it this way.

Fortunately, there is a solution to this and it's to combine both exec and pcntl_fork to get the best of both worlds, which looks like this:

  1. Web request uses exec() to spawn a new PHP process
  2. The spawned process is passed a list of emails as a batch
  3. The spawned process becomes the parent as it forks to send each email individually

This can all happen in the background, rather than blocking the original request.

Let's take a look at making this work:

<?php

$start = microtime(true);
$path = __DIR__ . '/pcntl_fork_send_email.php';
$emails = implode(',', ['joe@blogs.com', 'jack@test.com']);
$command = 'php ' . $path . ' --emails=%s > /dev/null &';

// Execute the command
echo "running exec\n";
exec(sprintf($command, $emails));
$finish = microtime(true);

$duration = round($finish - $start, 4);
echo "finished web request in $duration\n";

pctnl_fork_send_email.php

<?php

$param = explode('--emails=', $argv[1])[1];
$emails = explode(',', $param);

function sendEmail($to)
{
    sleep(3); // Simulating sending email by sleeping for 3 seconds
    echo "Email sent to: $to\n";
}

$children = [];

foreach ($emails as $email) {
    $pid = pcntl_fork();

    if ($pid == -1) {
        // Fork failed
        die('Error: Unable to fork process.');
    } elseif ($pid == 0) {
        // Child process
        sendEmail($email);
        exit(); // Exit the child process
    } else {
        // Parent process
        $children[] = $pid;
    }
}

echo "running some other things in parent process\n";
sleep(3);

// Parent process waits for each child process to finish
foreach ($children as $pid) {
    pcntl_waitpid($pid, $status);
    $status = pcntl_wexitstatus($status);
    echo "Child process $pid exited with status: $status\n";
}

echo "All emails sent.\n";

The beauty of this solution, albeit more complicated, is that you can set up a separate process all together whose responsibility it is to run and monitor forked processes for the purpose of doing work asynchronously.

AMPHP

amphp (Asynchronous Multi-tasking PHP) is a collection of libraries that allow you to build fast, concurrent applications with PHP.

The release of PHP 8.1 in November 2021 shipped support for Fibers which implement a lightweight cooperative concurrency model.

Now we know a little bit about how amphp works and why it's exciting for the future of PHP programs, let's take look at an example:

<?php

require __DIR__ . '/../vendor/autoload.php'; // Include the autoload file for the amphp/amp library

use function Amp\delay;
use function Amp\async;

function sendEmail($to, $subject, $message)
{
    delay(3000)->onResolve(function () use ($to) {
        echo "Email sent to: $to\n";
    });
}

$emails = [
    [
        'to' => 'john@example.com',
        'subject' => 'Hello John',
        'message' => 'This is a test email for John.',
    ],
    [
        'to' => 'jane@example.com',
        'subject' => 'Hello Jane',
        'message' => 'This is a test email for Jane.',
    ],
    // Add more email entries as needed
];

foreach ($emails as $email) {
    $future = async(static function () use ($email) {
        $to = $email['to'];
        $subject = $email['subject'];
        $message = $email['message'];
        sendEmail($to, $subject, $message);
    });

    // block current process by running $future->await();
}

echo "All emails sent.\n";

The above script is a very simple version of running things asynchronously. It will create a new fiber asynchronously using the given closure, returning a Future (object).

This is a much simpler version than rolling your own and does the heavy lifting for you, which is key for building an application as you don't need to worry about how the work is queued internally - you just know it happens asynchronously.

Queues and Workers

A solution to this problem also exists outside of PHP and prior to PHP 8.1 it could be considered the gold standard because it's language independent and highly scalable.

The use of queues such as Amazon SQS, RabbitMQ or Apache Kafka has been a widely accepted solution for some time.

Queues are pieces of infrastructure to run workers indepdenent to your application for the processing of any work asynchronously. This is not without risk or downside either, but tried and tested over time.

Let's get into an example:

Sender, in this example, is typically your existsing web application.

sender.php

<?php

require 'vendor/autoload.php';

use Aws\Sqs\SqsClient;

// Initialize the SQS client
$client = new SqsClient([
    'region' => 'us-east-1',
    'version' => 'latest',
    'credentials' => [
        'key' => 'YOUR_AWS_ACCESS_KEY',
        'secret' => 'YOUR_AWS_SECRET_ACCESS_KEY',
    ],
]);

// Define the message details
$message = [
    'to' => 'john@example.com',
    'subject' => 'Hello John',
    'message' => 'This is a test email for John.',
];

// Send the message to SQS
$result = $client->sendMessage([
    'QueueUrl' => 'YOUR_SQS_QUEUE_URL',
    'MessageBody' => json_encode($message),
]);

echo "Message sent to SQS with MessageId: " . $result['MessageId'] . "\n";

Workers are an additional deployment of running code to process jobs.

worker.php

<?php

require 'vendor/autoload.php';

use Aws\Sqs\SqsClient;

// Initialize the SQS client
$client = new SqsClient([
    'region' => 'us-east-1',
    'version' => 'latest',
    'credentials' => [
        'key' => 'YOUR_AWS_ACCESS_KEY',
        'secret' => 'YOUR_AWS_SECRET_ACCESS_KEY',
    ],
]);

// Receive and process messages from SQS
while (true) {
    $result = $client->receiveMessage([
        'QueueUrl' => 'YOUR_SQS_QUEUE_URL',
        'MaxNumberOfMessages' => 1,
        'WaitTimeSeconds' => 20,
    ]);

    if (!empty($result['Messages'])) {
        foreach ($result['Messages'] as $message) {
            $body = json_decode($message['Body'], true);

            // Process the message (send email in this case)
            sendEmail($body['to'], $body['subject'], $body['message']);

            // Delete the message from SQS
            $client->deleteMessage([
                'QueueUrl' => 'YOUR_SQS_QUEUE_URL',
                'ReceiptHandle' => $message['ReceiptHandle'],
            ]);
        }
    }
}

function sendEmail($to, $subject, $message)
{
    sleep(3); // Simulating sending email by sleeping for 3 seconds
    echo "Email sent to: $to\n";
}

This solution is comprised of two parts:

It can be scaled through increasing the number of workers relative to the number of messages that get sent by any number of senders.

By using a queue, the worker is completely independent from the sender and can be written in any language as the communication between sender and worker is through JSON messages.

Which solution is best?

It's almost impossible to say out of all of the solutions we've explored above, which would be the best for your application because although they all aim at solving the problem of running asynchronous code with PHP the implementations are quite different and have different benefits and drawbacks.

To summarise each option in a few points:

exec()

pcntl_fork()

AMPHP

Queues and Workers

Conclusion

The main reason I wanted to dive a bit deeper into all the different possibilities of async code in PHP is to understand how (if at all) the introduction of Fibers in PHP 8.1 changes how we can write async programs in the future.

There are many solutions available without requiring PHP 8.1 that have been battle tested, but it's interesting to see the direction the PHP language is going in to compete with the likes of Golang and Elixir, both of which support async programming and have done for years.

Ultimately, I would probably still reach for a Queue/Worker approach given the scalability and cross-platform/cross-language support - however I think over time we might see libraries such as AMPHP become more feature rich and make this problem easier to solve without introducing new infrastructure.

To see the code samples used in this blog post, you can find them on GitHub.

. . .

refactoring for performance

I spend most of my time thinking about performance improvements. Refactoring is tricky work, even more so when you’re unfamiliar with the feature or part of the codebase.

exploring async php

Asynchronous programming is a foundational building block for scaling web applications due to the increasing need to do more in each web request. A typical example of this is sending an email as part of a request.

maintaining feature flags in a product engineering team

I have mixed feelings about feature flags. They are part of the product development workflow and you would be hard pressed to find a product engineering team that doesn’t use them. Gone are the days of either shipping and hoping the code will work first time or testing the life out of a feature so much that it delays the project.

technical interviewing

When I first started interviewing candidates for engineering roles, I was very nervous. The process can be quite daunting as both an interviewer and interviewee. The goal for the interviewer is to assess the candidate for their technical capabilities and make a judgement on whether you think they should move to the next round (there’s always a next round). Making a judgement on someone after an hour, sometimes a bit longer, is hard and error prone.

using a dependency injection container to decouple code

Dependency Injection is the method of passing objects to another (usually during instantiation) to invert the dependency created when you use an object. A Container is often used as a collection of the objects used in your system, to achieve separation between usage and instantiation.

3 tips to help with working from home

Working from home has been thrust upon those lucky enough to still have a job. Many aren’t sure how to cope, some are trying to find ways to help them through the day. Make no mistake, this is not a normal remote working environment we find ourselves in, but nonetheless we should find ways to embrace it.

making software a three step process

One of the most useful tips that has guided much of my decision over the years has been this simple principle: three steps, executed in sequential order;

help me help you code review

Code Reviews are one of the easiest ways to help your team-mates. There are a number of benefits for both the reviewer and pull request author:

a pratical guide to test driven development

It’s been a while since I last wrote about why testing is important, but in this post I thought I would expand on that and talk about why not only unit testing is important, but how a full spectrum of automated tests can improve productivity, increase confidence pushing code and help keep users happy.

facade pattern

Design Patterns allow you to create abstractions that decouple sections of a codebase with the purpose of making a change to the code later a much easier process.

the problem with elixir umbrella apps

Umbrella apps are big projects that contain multiple mix projects. Using umbrella apps feels more like getting poked in the eye from an actual umbrella.

broken windows

Ever get the feeling that adding this "one little hack", a couple of lines of code, won't have much of an impact on the rest of the codebase? You think nothing of it and add it, convincing your team members it was the correct decision to get this new feature over the line. In theory, and generally speaking, I would kind of agree with doing it, but every hack is different so it's hard to paint them all with the same brush. If you've been doing software development for long enough you can see this kind of code coming from a mile away. It's the kind of code that can haunt your dreams if you're not careful.

lonestar elixir 2019

Last week was Lonestar ElixirConf 2019 held in Austin, Texas. The conference ran over 2 days and was the first Elixir conference I had been to.

genserver async concurrent tasks

In most cases I have found inter-process communication to be an unnecessary overhead for the work I have been doing. Although Elixir is known for this (along with Erlang), it really depends on what you’re trying to achieve and processes shouldn’t be spawned just for the fun of it. I have recently come across a scenario where I thought having a separate process be responsible for performing concurrent and asynchronous jobs would be the best way to approach the problem. In this article I will explain the problem and the solution.

best practices third party integrations

When we think about what an application does, it's typical to think of how it behaves in context of its dependencies. For example, we could say a ficticious application sync's data with a third-party CRM.

you might not need a genserver

When you're browsing your way through Elixir documentation or reading blog posts (like this one), there's no doubt you'll come across a GenServer. It is perhaps one of the most overused modules in the Elixir standard library, simply because it's a good teaching tool for abstractions around processes. It can be confusing though, to know when to reach for your friendly, neighbourhood GenServer.

offset cursor pagination

Typically in an application with a database, you might have more records than you can fit on a page or in a single result set from a query. When you or your users want to retrieve the next page of results, two common options for paginating data include:

protocols

Protocols are a way to implement polymorphism in Elixir. We can use it to apply a function to multiple object types or structured data types, which are specific to the object itself. There are two steps; defining a protocol in the form of function(s), and one or many implementations for that protocol.

exdocker

Recently, I've been writing a tonne of Elixir code, some Phoenix websites and a few other small Elixir applications. One thing that was bugging me every time I would create a new project is that I would want to add Docker to it either straight away because I knew there would be a dependency on Redis or Postgres etc, or halfway through a project and it would really slow down the speed at which I could hack something together.

working with tasks

While writing Understanding Concurrency in Elixir I started to grasp processes more than I have before. Working with them more closely has strengthened the concepts in my own mind.

understanding concurrency

Concurrency in Elixir is a big selling point for the language, but what does it really mean for the code that we write in Elixir? It all comes down to Processes. Thanks to the Erlang Virtual Machine, upon which Elixir is built, we can create process threads that aren't actual processes on your machine, but in the Erlang VM. This means that in an Elixir application we can create thousands of Erlang processes without the application skipping a beat.

composing ecto queries

Ecto is an Elixir library, which allows you to define schemas that map to database tables. It's a super light weight ORM, (Object-Relational Mapper) that allows you to define structs to represent data.

streaming datasets

We often think about Streaming as being the way we watch multimedia content such as video/audio. We press play and the content is bufferred and starts sending data over the wire. The client receiving the data will handle those packets and show the content, while at the same time requesting more data. Streaming has allowed us to consume large media content types such as tv shows or movies over the internet.

elixir queues

A Queue is a collection data structure, which uses the FIFO (First In, First Out) method. This means that when you add items to a queue, often called enqueuing, the item takes its place at the end of the queue. When you dequeue an item, we remove the item from the front of the queue.

composing plugs

Elixir is a functional language, so it’s no surprise that one of the main building blocks of the request-response cycle is the humble Plug. A Plug will take connection struct (see Plug.Conn) and return a new struct of the same type. It is this concept that allows you to join multiple plugs together, each with their own transformation on a Conn struct.

elixir supervision trees

A Supervision Tree in Elixir has quite a number of parallels to how developers using React think about a component tree. In this article I will attempt to describe parallel concepts between the two - and if you've used React and are interested in functional programming, it might prompt you to take a look at Elixir.

surviving tech debt

Technical debt is a potentially crippling disease that can take over your codebase without much warning. One day, you’re building features, the next, you struggle to untangle the mess you (or maybe your team) has created.

pattern matching elixir

Before being introduced to Elixir, a functional programming language built on top of Erlang, I had no idea what pattern matching was. Hopefully, by the end of this article you will have at least a rudimentary understanding of how awesome it is.

first impressions elixir

Elixir is a functional programming language based on Erlang. I’m told it’s very similar to Ruby, with a few tweaks and improvements to the developer experience and language syntax.

write unit tests

Unit testing can sometimes be a tricky subject no matter what language you’re writing in. There’s a few reasons for this: