Enterprise PHP - Caching Part 2

Posted by Content Maestro

Aug 29, 2011 11:00:00 AM

Share this blog on:     
A Baby elephant

Work Smarter, Not Harder - Or Not At All

In my previous post, I explained how you can use bytecode caching to increase the performance and stability of your PHP server. Now, I’d like to delve into the process of output caching. Often the goal of PHP is to produce an HTML page, such as a blog post or a shopping cart, which will be rendered in a browser. As an example, when you opened this blog post PHP loaded a number of source files and executed them; these source files connected to a database and extracted the data necessary to produce the HTML page. Then, the data and code were smashed together and the information presents itself as a bland article on caching. Instead, imagine it resulted in an awesome article on kittens! In this new example, I find the article so interesting that I send it to my friend Nicole, who immediately loads it. So, code is executed, data is extracted and she gets a completely identical page. She then forwards the link for the awesome kitten-page to her mom, who decides to posts it on www.reddit.com. Pretty soon, thousands upon thousands of people are loading it, and every time the code executes, data loads and HTML is rendered. If you read my bytecode caching post this will sound a bit familiar because, fundamentally, it was the same issue being presented. There is work being done more than once for the same result, which from an efficiency perspective is completely unacceptable. The end result in the aforementioned example is an over-worked server, and people being left without kittens! I world without kittens is no place I want to live in. If you're thinking anyone concerned with efficiency shouldn't be using PHP then you are a jerk. If you’re thinking that we should save the output to avoid all the re-computation you should give yourself a cookie! The following snippet is a simple blogging engine written in PHP. Some portions of code, such as error handling, have been omitted for simplicity.

File: index.php

[php] <pre><!--?php // Fake Blogging Engine include('lib/blog.php');include('template/header.php');$post_id = intval($_GET['post_id']); $cat_id = intval($_GET['cat_id']); if($post_id) { load_post($post_id); } else { list_posts(intval($_GET['cat_id'])); } include('template/footer.php'); [/php]

File: blog.php

[php] <!--?phpfunction load_post($post_id) { $db = new mysqli('localhost','username','password','blog'); /* ... snip error checking ... */ $query = $db--->prepare('select content, title, post_date, author from posts where id=?'); $query->bind_param('i', $post_id); $query->execute(); $query->bind_result($content, $title, $post_date, $author); if(!$query->fetch()){ /* ... Snip Error handling ... */ } $query->close(); $db->close(); ?> <h1><!--?php echo $title ?--></h1> <h2>Written by <!--?php echo $author ?--> on <!--?php echo $post_date ?--></h2> <div class="blog-post"> <!--?php echo $content ?--></div> <!--?php } function list_posts($cat_id=null) { $db = new mysqli('localhost','username','password','blog'); /* ... snip error checking ... */ if(!$cat_id){ $query = $db--->prepare('select id, title, post_date, author from posts order by post_date desc limit 10'); } else { $query = $db->prepare('select id, title, post_date, author from posts where category_id=? order by post_date desc limit 10'); $query->bind_param('i',$cat_id); } $query->execute(); $query->bind_result($post_id, $title, $post_date, $author); while($query->fetch()){ ?> <h1><a href="?post_id=<?php echo $post_id; ?>"><!--?php echo $title; ?--></a></h1> <h2>Written by <!--?php echo $author; ?--> on <!--?php echo $post_date ?--></h2> <!--?php } $query--->close(); $db->close(); } [/php]

File: header.php

[php htmlscript="true"]Blog Title!<!-- .. Snip other header elements .. --> <h1><a href="?home">Home</a></h1> <ul class="navigation"> <!--?php $db = new mysqli('localhost','username','password','blog'); $query = $db--->prepare('select cat_name, id from categories order by cat_order asc'); $query->execute(); $query->bind_result($cat_name, $cat_id); while($query->fetch()){ ?> <li><a href="?cat_id=<?php echo $cat_id; ?>"><!--?php echo $cat_name; ?--></a></li> <!--?php } $query--->close(); $db->close(); ?></ul> <div class="blog"> [/php]

File: footer.php

[php htmlscript="true"] <!-- END div.blog --> <ul class="footer"> <!--?php $db = new mysqli('localhost','username','password','blog'); $query = $db--->prepare('select link from footer order by footer_order asc'); $query->execute(); $query->bind_result($link); while($query->fetch()){ ?> <li><!--?php echo $link; ?--></li> <!--?php } $query--->close(); $db->close(); ?></ul> [/php]

Be Green and Recycle Your Work

There are two parts to output caching, capturing the initial output and later serving that captured data. Most output-caching technique uses PHP’s output buffer to capture in-memory what would otherwise be written directly to the client’s browser. Because there is no state between requests in PHP, this in-memory version of the page needs to be stored somewhere before the request ends. The functions that allow you to grab the output are prefixed with “ob_” and are part of the “output control” library. To enable this, set the output_buffering property in the PHP.ini file to an integer representing the maximum size in bytes. I recommend the default 4096 as a good starting point, though each instance will vary. If anything, do not set it to “ON”; doing so will compromise your web server by removing the upper limit. Once that is done, take a look at these snippets of code. I decided to add caching to the header and footer first, as they were generally static.

File: cache.php

[php] $_cache_id = array(); function cache_start($id){ $filename = cache_get_filename($id);global $_cache_id; array_push($_cache_id, $id); ob_start(); //Turn on output buffering } function cache_stop(){ global $_cache_id; $_id = array_pop($_cache_id); if($_id===null){ /*... snip error handing code...*/ } //The in-memory version of what generate_some_page produced $page_to_cache = ob_get_contents(); file_put_contents( cache_get_filename($_id), $page_to_cache);ob_end_flush(); //Write in-memory version out to the client } function cache_get_filename($id) { return sys_get_temp_dir() . '/cache/c_' . preg_replace('/[\\|&|/|^|:|?|<|>|*|]/','',$id) . '.cache'; } [/php]

File: index-caching-v1.php

[php highlight="7,9,22,24"] <!--?php // Fake Blogging Engine // Load the functions for reading blog posts include('lib/blog.php'); include('lib/cache.php');cache_start('header'); include('template/header.php'); cache_stop();// Get requested blog post, or if none specified get homepage (post_id===0) $post_id = intval($_GET['post_id']); $cat_id = intval($_GET['cat_id']); $cache_id = ($post_id? 'page'. $post_id: ($cat_id? 'category'.$cat_id:'homepage')); if($post_id) { load_post($post_id); } else { list_posts(intval($_GET['cat_id'])); } cache_start('footer'); include('template/footer.php'); cache_stop(); [/php]
The code changes were easy to implement and didn’t create much clutter. The header is now cached and our performance should skyrocket! ::Dramatic Pause:: Not so fast! The cached header is being saved, but nothing is being done with it. The caching code needs to be altered a bit further to serve up the file if it exists and isn’t too old; it’s but a simple addition to the original caching code.

File: cache-updated.php

[php highlight="6,7,8,9,13"] <!--?php$_cache_id = array(); function cache_start($id){ $filename = cache_get_filename($id); if(file_exists($filename)){ include($filename); return true; } global $_cache_id; array_push($_cache_id, $id); ob_start(); //Turn on output buffering return false; } function cache_stop(){ global $_cache_id; $_id = array_pop($_cache_id); if($_id===null){ /*... snip error handing code...*/ } //The in-memory version of what generate_some_page produced $page_to_cache = ob_get_contents(); file_put_contents( cache_get_filename($_id), $page_to_cache);ob_end_flush(); //Write in-memory version out to the client } function cache_get_filename($id) { return sys_get_temp_dir() . '/cache/c_' . preg_replace('/[\\|&|/|^|:|?|<|-->|*|]/','',$id) . '.cache'; } [/php]

File: index-cache-v2.php

[php highlight="7,10,22,26"] <!--?php // Fake Blogging Engine // Load the functions for reading blog posts include('lib/blog.php'); include('lib/cache.php'); // Load the header content if(!cache_start('header')){ include('template/header.php'); cache_stop(); } // Get requested blog post, or if none specified get homepage (post_id===0) $post_id = intval($_GET['post_id']); $cat_id = intval($_GET['cat_id']);if($post_id) { load_post($post_id); } else { list_posts(intval($_GET['cat_id'])); }if(!cache_start('footer')){ // Load the footer content include('template/footer.php'); cache_stop(); } ?--> [/php]

The Fruits of Your Labor

You could easily add this cache to the rest of the blogging engine which would result in a zippy little site. One viable option might be to cache the entire page by using a single pair of cacheStart/cacheStop calls in certain instances, but it probably wouldn’t be optimal for a blogging engine. The reason is because the header will change when a new category is created; this would leave all the other posts out of sync. Sync issues can be resolved by purging stale caches when you save a new record. Given that, it would be much more efficient to purge only the header’s record when updated, as opposed to every old post. If commenting capabilities were added, full page caching becomes even less ideal. Caching only the comment section and updating it as new comments are added would be a far more efficient choice. Since I just made a case for not using full-page caching, I thought it would be prudent to plug my next blog post, “Enterprise PHP: Caching Part 3”, where I will explore full-page caching!

The Nyan Connection

Throughout this post I talked about the concept of efficiency. It is important to define efficiency in relation to the task at hand, and I will endeavor to do so here. Efficiency in this instance is the minimization of latency, CPU resources and Network traffic.
  • Latency: the measure of time between a request being received and a response being returned. When calculating latency, do not factor in transport-layer issues such as physical distance between clients and server or low-bandwidth connections.
  • CPU resources: the measure of both the number of cycles executed and the system-time spent waiting on a resource. The system-time is important because an application that spends a great deal of time blocking for database data might not execute many instructions, but would still not be considered efficient.
  • Network traffic: the measure of data transferred between the web server and database server. Although you can have your database and web server on the same machine, that would hardly be considered an efficient enterprise solution, as they would be contending for the same resources.
With those measures I present a "scientific" ridiculous table detailing how caching has increased our efficiency.
 LatencyCPU ResourcesNetwork TrafficTotal
No Cache - 1st Req
No Cache - Subsequent
Cache - 1st Req
Cache - Subsequent  

Actual Results

Using JMeter, as I always do, I bench-marked the blog engine. I tested with and without caching, as well as with and without APC bytecode caching turned on. The final test was a variation of the file caching where I used APC to cache the page output. Caching Results for No Cache, No APC - 410.3 Requests per secondCaching Results for Caching On, No APC - 632.0 Requests per SecondCaching Results for No Caching, APC On - 675.0 Requests per SecondCaching Results for Caching On and APC On - 1170.8 Requests per SecondCaching Results for APC Object Caching and APC On - 1195.5 Requests per Second
All of the source code for this post is hosted on GitHub. It is licensed under the MIT license, which means you can pretty much use it any way you want! Would I recommend that? Absolutely not. The code is written for illustrative purposes only and is not suitable for any type of production use. Additionally, we should all avoid reinventing the wheel when possible. That being said, the Zend Framework provides an excellent caching library that is enterprise-ready, fully featured and a breeze to work with. Included in the source code is a version that uses the Zend Cache library. Id does have a bit of overhead, performing worse than no caching in my simple blog. However, in a larger system it will definitely outperform no caching. I have also used PEAR Cache_Lite, which is a quite a bit faster than the Zend Cache library, though not nearly as fully featured. If you are looking for an alternative to the Zend Cache library, you might also check out Stash, though I have no experience with it.

Topics: Technology

Search This Blog

Subscribe to Email Updates

Contact Us