Enterprise PHP – Caching Part 1

Elephants Never Forget and Neither Does Caching

You may have found that creating a working web application is much easier than getting it to scale well. If we’re speaking in terms of the LAMP stack, it’s probably more of a love-hate relationship; most will agree that PHP’s simplicity is an incredible pro when developing, but a con in regards to scalability. I believe there exists a preconceived notion that because, by default, the LAMP stack does not provide the enterprise features you would find in the J2EE space, that PHP simply cannot perform like a heavy-weight Java application. Caching, bytecode in particular, dispels that idea. Bytecode caching is one of the easiest and most common ways to greatly improve the performance and scalability of PHP applications. In web applications, you’ll find there are two typical versions of caching: object and output. Object caching allows a system to keep pieces of data that are expensive to calculate or retrieve in memory, which lowers the operating cost associated with repeated requests. Output caching, either full or partial page, works by capturing the HTML that an application yields which allows repeated requests to completely circumvent the application server. Because PHP is a scripting language, it allows for a third type: bytecode caching.

Architecture that is Worlds Apart

In order to better understand bytecode caching, I think it’s necessary to explain the fundamental differences between a more traditional “compiled” language such as Java, and a scripting language, in this instance PHP. The application lifecycle of a Java program involves the JVM loading class files of Java bytecode on-demand and either interpreting or JIT compiling them. Once a class has been loaded, it will stay in memory and not require the file to be accessed again. Java application servers handle many requests within the same process space, so once a block of code has been loaded, optimized or compiled that persists between requests. The price is paid once while the benefits carry forward. PHP has no state between requests. Because requests are not handled in the same process space*, at the end of a request everything the interpreter has loaded and processed is discarded. Therefore, each time there is a request every PHP file involved will be loaded from the disk, converted into Zend bytecode and then interpreted by the Zend Engine. Please allow that to digest for a moment. If you come from the enterprise space this might seem a bit crazy, but if you consider how Unix/Linux is designed, this makes a lot of sense. Creating a large, all-encompassing system is simply not how things are done. Conversely, creating small tools that do just one thing extremely well and piecing them together better accomplishes tasks. Because the Zend Engine, which ultimately runs your PHP application, is extremely modular developers are able to write extensions that fill-in missing or desired functionality, such as bytecode caching. Bytecode caches work by keeping a copy of the Zend bytecode outside of the PHP process space so it can be preserved between script executions, thus removing the need to unload disk files and parse them again.

Tools of the Trade

There are a number of bytecode caching modules available, including APC (Alternative PHP Cache), XCache, eAccelerator and Zend Optimizer+. The amazing thing about the nature of bytecode caching is that it is completely transparent to your application. There are many things in this world that claim to magically “make things better” without any effort, but bytecode caching actually delivers on that claim. I am focusing on APC because it was developed by the core PHP development team and has shown itself to be extremely compatible with the changes in the 5.X.X branch of PHP. Assuming you already have Apache2 installed with PHP5, then installation of APC is extremely simple.

Debian/Ubuntu

sudo apt-get install php-apc sudo /etc/init.d/apache2 restart

Fedora Core/RHEL

yum -y install php-pecl-apc /etc/init.d/httpd restart

Windows

Download the module from http://windows.php.net/download/ and unzip to your {php install}/ext directory Modify your php.ini file and add the line: extension=php_apc.dll Restart Apache service

Verify APC was installed by viewing your PHP info and looking for the APC section. If ‘APC Support’ is listed as “enabled” then congratulations! You are now experiencing cached bytecode. Once it has been enabled you should take some time to optimize its configurations. The settings for APC are stored in the php.ini file, under the [apc] heading. Many of the defaults for APC are sufficient for most applications, the most common settings to adjust are:

apc.shm_size – Amount of memory allocated to caching. Defaults to 32M.
apc.ttl – Number of seconds an entry can idle in a slot that is needed. Idling is a measure of time since the last time it was accessed.
apc.stat – Should APC stat a file before returning the cached version to see if it has been updated. Set to 1 for development, 0 for production.
apc.max_file_size – The largest file you plan on caching. Defaults to 1M.
apc.num_files_hint – An estimate on the number of files that will be cached. This helps APC optimize its memory use; this should be easy to determine.

What you set these values to depends entirely on what you are hosting. If it is a small application, the defaults are generally acceptable, but for something like Magento, you’ll really want to crank it! Although unscientific, I recommend setting the apc.shm_size to as large as you can afford, and running a load test on your application. Then install the APC management tool by downloading the source and extracting apc.php. (Note: this should never be placed in a publicly accessible location on your server). Once that file is deployed you can navigate to it in order to see how your cache is performing. These images show a fresh cache for a web app I am working on, and the cache after hitting a few pages: Yikes! I have covered maybe 50% of the PHP files in the system and already I am using almost 60% of my memory. Clearly I need to increase that cache size.

Benchmark Everything

Once you have your bytecode cache established and configured, you should do some benchmark testing to see how things have improved. There are many performance testing tools out there, but I use Apache JMeter, for no particular reason. Once you have metrics on your site cached, you will need to turn off APC to test it again. This can be done by setting apc.enabled=0 in your php.ini file. I created a very simple test plan with 100 users requesting a page from WordPress running in an Ubuntu VM on my machine. Normally you should be running JMeter on a different machine, but I wasn’t too concerned as this wasn’t a terribly scientific test. Even so, the results were quite startling. Without caching, it would peak at roughly 300 page views a minute before becoming unstable and requiring an Apache restart. With APC on, I was able to average over 1,300 page views a minute without breaking my stack. Amazing!

Bytecode caching is a simple and easy way to dramatically increase the performance of your PHP applications. I can’t imagine a reason why you shouldn’t use this amazing technology. In my next two posts, I will be covering output caching and object caching techniques. Stay tuned!

* The modern PHP interpreter can be run in a multi-process mode in Apache which allows one process to handle multiple requests internally; however, the performance considerations which still apply as resources are not shared between requests. This feature does allow for better connection pooling if the scripts take advantage of those APIs.