Ready, Preload, Go

Ready, Preload, Go

PHP 5.5 was the first version of PHP to ship with an out-of-the-box bytecode cache. And PHP 8 introduced a tracing Just-in-Time compiler (Tracing JIT).

In this article we have a closer look at preloading, a performance feature introduced in PHP 7.4.

OpCache, PHP's built-in and default solution for bytecode optimization and caching, can be used to avoid compiling PHP source code into PHP bytecode for each request, over and over again. While this already provides quite a substantial performance boost, compilation is not the only thing that the PHP interpreter repeatedly has to do before it can actually run the code.

By default, OpCache checks whether the original source file had been modified and thus requires compilation. Although this check can be disabled (opcache.validate_timestamps=0) for production systems to reduce the amount of I/O operations, there is still a cost associated with fetching the bytecode from OpCache's shared memory into the context of the current request and preparing it for execution.

This is, at least in part, because PHP compiles and caches each file independently and logically separated from other files. PHP therefore needs to re-establish the links between classes, interfaces, and traits to prepare the code for execution in the context of the current request. This includes, for instance, checking that the language's rules for inheritance and trait usage are followed.

When preloading is used, this "linking" is no longer performed redundantly and on-demand in the context of each request. It is instead performed once, and only once, during server startup. This makes preloading conceptually quite different from traditional file-based bytecode caching. Changes to preloaded source files will not have any effect until the server is restarted, for instance. In other words: the PHP interpreter will behave as if opcache.validate_timestamps=0 was configured. Furthermore, no autoload callback will be triggered for classes, interfaces, and traits that are preloaded.

Preloading requires a so called preload script that is controlled by two php.ini configuration directives:

opcache.preload_user=www opcache.preload=/path/to/preload_script.php

opcache.preload_user is used to configure the name of the system user under which the preload script is executed. This is important as most services, at least initially, run as the root super-user and no PHP script, not even "just" a preload script, should be run with such extensive privileges.

opcache.preload is a regular PHP script that is automatically executed on server startup. While the entire power of the PHP language is available to be used in this script, you should only (pre)load your classes, interfaces, and traits here.

The simplest thing that could possibly work as a preload script is a (potentially long) list of require statements:

require   __DIR__   .   '/MyClassA.php' ;
require   __DIR__   .   '/MyClassB.php' ;
require   __DIR__   .   '/MyClassC.php' ;
// ...

There is a catch, though, to using require as this statement not only loads and compiles a file but also executes any code in the file's global scope. This can lead to unintended side effects that can be avoided by using the opcache_compile_file() function instead:

opcache_compile_file ( __DIR__   .   '/MyClassA.php' ) ;
opcache_compile_file ( __DIR__   .   '/MyClassB.php' ) ;
opcache_compile_file ( __DIR__   .   '/MyClassC.php' ) ;
// ...

opcache_compile_file() only loads and compiles a file but does not execute any code in the file's global scope.

Another difference is that opcache_compile_file() can load files in any order. When you have a file MyClassA.php which declares a class named MyClassA and a file MyClassB.php which declares a class named MyClassB that extends MyClassA, MyClassA.php has to be loaded before MyClassB.php with include, include_once, require, and require_once. opcache_compile_file() does not care about the order in which MyClassA.php and MyClassB.php are loaded.

This article is an excerpt from our eBook PHP 7 Explained.

It is important to note that only include, include_once, require, and require_once support the conditional declaration of functions, classes, interfaces, and traits like so:

if   ( true )   {
     require   __DIR__   .   '/MyClassA.php' ;
}

When a preloaded file is loaded again later using include, include_once, require, or require_once then its code outside the declaration of functions, classes, interfaces, and traits will still be executed. Any functions, classes, interfaces, or traits will not be re-defined, though. Using include_once and require_once does not prevent a preloaded file from being loaded again.

All variables, objects, and resources that may be created or opened by the preload script will be garbage-collected after server startup. They will not be available in the requests later on.

It is also important to realize that the order in which files get loaded is very likely to be relevant: to compile a unit of code, all its dependencies need to be resolvable. For that to work, a parent class, a trait, or an interface needs to be known before they can be extended, used, or implemented.

For units of code with unresolvable dependencies, PHP will still keep the bytecode of the file but will otherwise refuse to preload the unit of code itself:

NOTICE: PHP message: PHP Warning: Can't preload unlinked class Foo: Unknown interface DemoInterface in ...

At the time of writing, finding errors such as the one shown above is unfortunately a bit tricky. The problem arises during startup and not at request time and therefore looking at the error log of an PHP-FPM pool, for instance, does not help. Even a configured opcache.error_log or the general FPM error_log do not contain these warnings. We have to start PHP-FPM in the foreground (php-fpm -F) to see them as they occur or by asking systemd's journal (for instance using journalctl -e -u php-fpm -g "Can't preload") in case PHP-FPM is run as a systemd service.

Depending on the code base, not all dependency problems may be resolvable. If classes have missing (third-party) dependencies, are dynamically generated at runtime, or conditionally defined, preloading them is not possible. Another reason for preloading to fail are method compatibility checks the engine cannot perform because not all involved units of code have been compiled yet.

To use such a unit of code at request time, the file declaring it has to be made available by either registering a conventional autoloader or by explicit require or include statements.

It is important to note that the preload functionality operates on the instance level. When, for example, PHP-FPM is configured with multiple pools then all pools share a common preload cache. This is very convenient for many use cases, but it might lead to unexpected behavior or even cause security problems.

By the very nature of preloading, the name of a unit of code has to be unique per instance. It is not possible to have multiple versions of the same class preloaded at the same time in the same instance. That means projects served via separate pools cannot have their own version of a class preloaded individually under the same name.

Sharing an instance of PHP with preloaded code requires full trust among all sharing parties. Since no files are read at runtime, file permissions that traditionally would have protected one project from accessing files of another are no longer effective. This may have unexpected security implications.