Warm up + Refresh WordPress Varnish 3 Cache + CloudFlare

varnish-cache-logo-text-200Varnish Cache is the best speed improvement for your WordPress site. It acts as a reverse proxy and will cache all of your pages as HTML and serve them up quickly, bypassing the slow PHP backend (Apache or nginx). With the WordPress Varnish cache method outlined here, you will always be serving up fast Varnish cached versions of your page. Varnish has a clever feature: using hash_always_miss which lets Varnish continue to serve its cached version while it refreshes its cache. This means you can slowly refill the Varnish cache instead of purging it all at once and serving slow uncached WordPress posts to new users. I will assume you have shell or root access to your server (like with Digital Ocean) to install and edit files and have XML sitemaps enabled. Yoast SEO creates XML sitemaps for you and I assume most WordPress users are using this plugin. This method works woth Varnish 3 and Cloudflare which requires a few extra steps compared to the Cloudflare-less guide, yet still works with your WordPress posts, pages and categories.

I have a WordPress plugin which automates this, use the contact form to help test it and get a free copy

VPS Provider
Locations
RAM
Hard Drive
Speed
Price
Vultr
US, EU, Asia
768 MB
15 GB SSD
100 Mbps
$5 / month
Digital Ocean
US, EU, Asia
512 MB
20 GB SSD
100 Mbps
$5 / month
HostUS
US, UK, China, Australia
768 MB
20 GB
1-10 Gbps
$15 / year

Warm up WordPress Varnish 3 Cache Behind CloudFlare

I will assume you already have Varnish installed and configured to work with WordPress, my configuration will be posted in the near future.

CloudFlare includes both the origin IP and the CloudFlare IP in the header so we need to extract the IP as a string and convert it to IP format using a vmod. The extracted IP is used to test if the hash_always_miss requests comes from an authorized IP (i.e. your web server).

Three WordPress Varnish cache bash scripts are included

  • Manual refresh one URL – script prompts for the URL
  • Automated Full site refresh – script refreshes your whole site
  • Manual Multiple URL refresh – paste a list of URLs to be refreshed

Enable Real IP from CloudFlare in Varnish

Prepare Varnish 3 for CloudFlare

You probably already have Varnish installed, however you will still need to prepare a Varnish source to build vmods which enable additional functionality for Varnish.

Note that if you are on a 64-bit system this will probably install Varnish 4 despite the 3.0 label. It is better to use the Varnish 4 CloudFlare guide here.

echo "deb-src https://repo.varnish-cache.org/debian/ wheezy varnish-3.0" >> /etc/apt/sources.list.d/varnish-cache.list
sudo apt-get update

Check your Varnish version

varnishd -V

If it is below 3.0.7 you need to install from source on top of the repo which is an extra step after preparing the Varnish source

Prepare the Varnish source

cd ~
apt-get build-dep varnish -y
apt-get source varnish -y
cd varnish-3.0.7
./configure --prefix=/usr && make

If you need Varnish version 3.0.7 installed because your Varnish version was not 3.0.7

make install

Also copy the updated varnishstat and varnishlog if you are installing Varnish 3.0.7 from source

sudo cp ~/varnish-3.0.7/bin/varnishstat/varnishstat /usr/bin/varnishstat
sudo cp ~/varnish-3.0.7/bin/varnishlog/varnishlog /usr/bin/varnishlog

Install Varnish vmod building dependencies

sudo apt-get install dpkg-dev pkg-config build-essential -y

Find your vmods folder which you will need for compiling the vmods so it installs vmods to the right directory

sudo find / -name vmods

I got this output

/usr/lib/i386-linux-gnu/varnish/vmods

but you may see the output below, either way use the output you received as the VMODDIR when compiling libvmod-ipcast

/usr/lib/varnish/vmods

Install libvmod-ipcast

Install libvmod-ipcast adjust your VMODDIR with the directory you found before

apt-get install git autotools-dev automake libtool -y
cd ~
git clone https://github.com/lkarsten/libvmod-ipcast
cd libvmod-ipcast
sh autogen.sh
./configure VARNISHSRC=~/varnish-3.0.7 VMODDIR=/usr/lib/varnish/vmods/
sudo make
make install

Configuring Varnish for Smart Refreshing

Open your Varnish vcl file, usually default.vcl

sudo nano /etc/varnish/default.vcl

Add the Varnish ipcast vmod functionality in your default.vcl by adding this at the top of the file. Ipcast converts strings to IP addresses so you can match the real IP to your editors access control list below.

import ipcast;

Add a section for editors, this is for security so that other machines cannot force refreshes on your page. Add it after the backend section but before the sub vcl_recv section. Change IP.of.Server to the source IP address that will be sending the curl refresh commands. I am just using the web server to run the script so it is the same IP as what CloudFlare points to for this domain.

acl editors {
  "127.0.0.1";
  "IP.of.Server";
}

Adjust your VCL file to include hash_always_miss in your sub vcl_recv section, you only need to add the red section. It looks for the header Cache-Control with no-cache (which we will send later using curl) and whether the sender is a member of the editors acl. Change Any.Valid.IP to any valid IP (e.g. 1.2.3.4) . Add the red section inside vcl_recv. This works because the first IP is always the real IP.

sub vcl_recv {

# set realIP by trimming CloudFlare IP which will be used for various checks
set req.http.X-Actual-IP = regsub(req.http.X-Forwarded-For, "[, ].*$", "");

# check if the real IP is coming from your web server
if (ipcast.ip(req.http.X-Actual-IP, "Any.Valid.IP") ~ editors) {

# if the header is set to no-cache set always miss
if (req.http.Cache-Control ~ "no-cache" ) {
         set req.hash_always_miss = true;
    }
  }
}

I have also set my Varnish time to live – how long Varnish should keep cached version of the page – to 1 year in the sub vcl_fetch function because I can refresh the cache using hash_always_miss instead of purging.

set beresp.ttl = 52w;

Test your adjusted Varnish default.vcl works

varnishd -C -f /etc/varnish/default.vcl

If you didn’t get any errors reload your Varnish configuration which doesn’t empty your Varnish cache like restarting the Varnish service does.

sudo service varnish reload

Manual Varnish Smart Refresh Purge for Single URL

Create a new script that will send the hash_always_miss request for a URL you paste into the script when prompted

nano smartvarnishrefreshsingle.sh

Paste this code

#!/usr/bin/env bash
# WordPress Varnish Cache Refresh from HTPCGuides.com
echo Enter full URL to purge
read url
curl -s $url -H "Cache-Control: no-cache" -o /dev/null
echo Refreshed $url

Ctrl+X, Y and Enter to save the script

Make the script executable

sudo chmod +x smartvarnishrefreshsingle.sh

Run the manual single Varnish purge script like this

bash smartvarnishrefreshsingle.sh

You will see this output, just paste your URL and press Enter

Enter full URL to purge

Now that page has been intelligently refreshed using Varnish and no user will have received a slow WordPress post or page.

Automate Smart Varnish WordPress Refresh using VPS

The automated slow Varnish cache refresher script requires xml2 and curl so install them

sudo apt-get install xml2 curl -y

Create the Varnish Full Refresh script

nano varnishfullrefresh.sh

Paste this, adjust site to your site’s name and the number 30 at the bottom to the number of pages in your blogroll.

#!/usr/bin/env bash
# WordPress Varnish Cache Refresh Script for CloudFlare from htpcguides.com
site=http://www.htpcguides.com

#Download post sitemap
wget -q $site/post-sitemap.xml -O postsitemap.xml

#Parse the xml file and put it into posts.txt
xml2 < postsitemap.xml | grep /url/loc= | sed 's/.*=//' > posts.txt

# Loop through the posts.txt and use curl to send an always miss request

while read post; do
  curl -s $post -H "Cache-Control: no-cache" -o /dev/null
  echo Refreshed $post
  echo Waiting
  sleep 10
done < posts.txt

#Download page sitemap
wget -q $site/page-sitemap.xml -O pagesitemap.xml

#Parse the xml file and put it into pages.txt
xml2 < pagesitemap.xml | grep /url/loc= | sed 's/.*=//' > pages.txt

# Loop through the pages.txt and use curl to send an always miss request

while read page; do
  curl -s $page -H "Cache-Control: no-cache" -o /dev/null
  echo Refreshed $page
  echo Waiting
  sleep 10
done < pages.txt

#Download category sitemap
wget -q $site/category-sitemap.xml -O categorysitemap.xml

#Parse the xml file and put it into categories.txt
xml2 < categorysitemap.xml | grep /url/loc= | sed 's/.*=//' > categories.txt

# Loop through the categories.txt and use curl to send an always miss request

while read category; do
  curl -s $category -H "Cache-Control: no-cache" -o /dev/null
  echo Refreshed $category
  echo Waiting
  sleep 30
done < categories.txt

# Warm up and refresh blogroll pages, change 30 to the number of pages back you show posts
for i in {1..30}
 do
 echo Refreshing $site/page/$i
 curl -s $site/page/$i/ -H "Cache-Control: no-cache" -o /dev/null
 echo Refreshed $site/page/$i
 echo Waiting
 sleep 1
done

Ctrl+X, Y and Enter to save the script

Make the script executable

sudo chmod +x varnishfullrefresh.sh

Run the Varnish cache refresh script to test it

bash varnishfullrefresh.sh

You will see a lot of curl commands and Refreshed URL names, this will take a while depending on how many posts, pages and categories you have.

I have also added a daily cronjob for refreshing the WordPress Varnish Cache

crontab -l | { cat; echo "@daily /path/to/varnishfullrefresh.sh"; } | crontab -

Manual Smart Refresh Varnish Method

Go to your sitemaps, you can use as many or as few as you want

  • Posts – http://www.yourwebsite.com/post-sitemap.xml
  • Pages – http://www.yourwebsite.com/page-sitemap.xml
  • Categories – http://www.yourwebsite.com/post-category.xml

Highlight and copy the entire table to the clipboard

Go into Excel or any spreadsheet program and paste it

Do a search or find and replace, make find your full URL http://www.yourwebsite.com and leave the replace blank

Highlight the whole column and copy it to the clipboard, you will paste it in the manual script below

sudo nano smartvarnishrefresh.sh

Change your site name and paste your WordPress URLs from the spreadsheet to refresh in the Varnish cache (inspired from GiantDorks). You should delete # Paste your URL sitemap here line. Adjust the sleep value to whatever you want, it is the small break taken before refreshing the subsequent URL.

#!/usr/bin/env bash
site="https://www.htpcguides.com"
pages="
/
# Paste your URL sitemap here
/100-amazon-gift-card-giveaway-july-2015/
/40-amazon-gift-card-giveaway/
"
echo -----------------------------
echo Refresh old pages from cache
echo -----------------------------
for page in $pages; do
	echo Setting always miss for $site$page
	curl -s $site$page -H "Cache-Control: no-cache" -o /dev/null
	sleep 10
done

Ctrl+X, Y and Enter to save the script

Make the script executable

sudo chmod +x smartvarnishrefresh.sh

Run the manual single Varnish purge script like this

bash smartvarnishrefresh.sh