Page Load Count Accuracy
One would think counting page loads is a trivial task.
Simply increment a count whenever a browser loads the page.
Thinking about it a bit, one realizes the mechanism that
records page loads is likely to be less than 100% accurate.
What is a Page Load?
Introducing more uncertainty, the definition of a page load
is nebulous.
When a page begins to load and is then interrupted (the
browser window closed or a different page starts to load),
was it a page load? Some say yes. Others say no. Still
others say it depends.
If almost all of the content loads before the interruption,
is it then a page load?
How about if the page content just barely starts to appear
on the page when the user clicks on a link causing a
different page to start loading. Was the first page then a
page load?
Then there are page loads by software other than browsers
(like content grabbers and SE spiders). Is it a page load
only when a human is present to see it?
Some people consider page reloads to be page loads.
Others do not.
This is a situation where you'll need to determine for
yourself what a page load means to you. Deciding what a page
load is and using that definition consistently will allow
you to see trends and to spot significant achievements.
Several Ways to Count Page Loads
All of the methods described below rely on logging page
requests or page load events. Some are more reliable than
others. Similarly, some easier to implement than others.
Whichever method you decide to use, when used consistently,
can provide valuable trend information even if the numbers
aren't entirely accurate according to your definition of
page load.
Scanning Server Access Logs
Scanning the server access log can provide accurate counts
of requests for pages.
All page load and reload requests are recorded in the log,
except those loaded into the browser from cache on local
hard drives. A "request" is, for this article, the server
log entry when any software asks the server for a web page.
When logged page requests are used as a guide for
determining number of page loads, consider the following:
-
Log entries when page requests are not completed
(the page load might be interrupted immediately
after the request is made) could be considered
invalid. Yet, it is nearly impossible to identify
those entries.
-
When pages are loaded from cache there is no log
entry.
-
Spiders and other page requesting software will
cause a log entry to be made.
Whatever software is used to scan the server access log, it
will need to filter out all requests that are not requests
for web pages.
The access log can provide a page request count, not a page
load count. A page request is when a browser (or other
software) asks for a web page. A page load is when the page
is received (or whatever definition you've decided upon).
A real-world example: About half a year ago willmaster.com
went past the 20,000 page average per day mark, according
to Awstats software (which filtered out known spiders and
robots). Yet, when I scanned the logs, I realized many of
the page request entries, at least 10%, was our own software
in the act of retrieving templates and other files to create
and deliver composite web pages on-the-fly.
I did not want to count those retrievals as page loads.
Therefore, the "20,000 celebration" had to wait.
PHP Page Load Logs
When PHP code within the web page is used to record when
that page loads, the log is updated before the page is
delivered to the browser. The PHP code is run every time the
page is loaded from the server.
The PHP code is run immediately after the server gets the
page request. Therefore, the log will come up with nearly
the same count as the server access log web page request
count.
This method has the same three inaccuracy considerations as
the server log request count method (above).
SSI Launches a Counter Script
Using SSI to launch a counter script to record the page load
has a similar effect as using PHP code. The script is run
before the page is delivered to the browser, every time the
page is loaded from the server.
Like the PHP method, this page load log will come up with
nearly the same count as the server access log web page
request count. The script is run immediately after the
server gets the page request.
This method also has the three inaccuracy considerations as
the server access log request count method.
Image Launches a Counter Script
An image tag can be used to launch a page log counter
script. The script's URL is the value of the <img... tag's
src attribute. The script returns an image after it's logged
the page load.
This method kicks in only after the web page arrives at the
browser. The location of the image tag in the web page
source code determines how soon during the page load the
counting script will run.
The count is subject to images being turned off. Also, if
the browser caches images, reloads will not affect the
count. (The no-cache meta tag might prevent the image from
caching, causing otherwise missed reloads to be counted.)
The placement of the image, near the top of the page or near
the bottom, can have an effect on the count when a page load
is interrupted.
JavaScript Launches a Counter Script
JavaScript loads a page log counter script. Use a src
attribute in the script tag to specify the script's URL.
Like the image-launched script, this method kicks in only
after the web page arrives at the browser. The location of
the script tag in the web page source code determines how
soon during the page load the counting script will run.
The placement of the JavaScript, near the top of the page or
near the bottom, can have an effect on the count when a page
load is interrupted.
Using JavaScript requires browsers to have JavaScript
enabled. Otherwise, no page load is counted.
Also, reloads probably will not be counted because many
browsers default to caching JavaScript.
The Best Method
The method that may be the hardest/most expensive to
implement, scanning server logs, may also be the most
accurate. It takes sophisticated software to scan the
logs and extract only pertinent information.
The PHP and SSI methods are also highly accurate
depending on your definition of page load.
If your definition of page load says a page load is counted
only when the page is completely loaded, then the image
launch method may be best.
The method that may be the easiest to implement is also
the least accurate. A small percentage of browsers have
JavaScript turned off, making this easy method less accurate
by that percentage than an image-launched counter might be.
The JavaScript method may be easiest because no attention
needs be paid regarding special file name extensions for web
pages, unlike SSI and PHP. And the counter script does not
need to reply with an image, like an image-launched counter
script would need to.
The most accurate and the easiest are at opposite ends of
the pole, in this case.
Question:
Did you find this article interesting and understandable? How can it be improved?
Your response is anonymous.
When done typing, click anywhere outside the box. [more info]
Will Bontrager
©2007 Bontrager Connection, LLC Bontrager Connection, LLC
Please note:
Articles on this website are presented "as is". However -
If you have a question about a CGI script, HTML, CSS, PHP, or JavaScript
Ask one of our Experts and you'll have your answer!
Click here for details.