Website Statistics Reporting Discrepancies

By Richard Hamel

A longtime client of doWW’s came to me with a question regarding their website statistics. The client had noticed an apparent reporting discrepancy between their server (web host service provider) reports and the reports that Google Analytics produces for them. Their server reports stated they had Total Visits of more than 4,500 for the previous month. During that same period, Google Analytics stated they had just under 1,000 Total Visits.

The client was, to say the least, very curious.

Since I too had been noticing variances within my own reports (web server and Analytics), I felt some investigating was called for. After a call to the client’s web host provider, which happens to be the same one I use, along with a day of web researching, I learned that discrepancies between the two accurate reporting systems are not only common and significant, but should be expected.

“How’s that possible?” you may be asking.

Well let’s take a look at what the reports do, beginning with server reports.

Website server statistics are reflective of one’s “server logs.” Server log files state, accurately and plainly, how many (and which) pages or files were viewed within a specific time period via the server’s reporting software (such as the popular program Webalizer). The problem here is that the server reports are subject to varying logging standards. They don’t know if it is a person viewing the website or another computer. Furthermore, log files (e.g.: time stamp, filename, requesting IP, and browser type) in themselves do not provide a complete picture. To counteract log file shortcomings, software designers define rules that they believe will distinguish between a visit and a unique visitor. However, this information is never fully accurate, and it is not standardized across different server logs.

In a Nutshell: Log files (which produce server reports) will always give you a higher number of page views, visits, and unique visitors when compared with that of a web-based, JavaScript program such as Google Analytics. And in most cases, experts believe, the numbers on the server reports are too high.

So much for server reports. Now let’s take a look at web-based reports, e.g.: Google Analytics. These differ from the server reports. Since Google is not granted access to your web server, Google tracks your traffic via a JavaScript tag which was placed by your webmaster within your website’s templates to create an external log file. Most also set a cookie to track referring pages, browsing history, and visitor history (first time or repeat). Web-based tracking programs like these more accurately track visits since they track individuals (anonymously) and don’t over-count visits like server logs do. However, they tend to under-count in the process.

Since these tracking systems rely on JavaScript and cookies, there will be a loss in reporting. The reason is simple. Many visitors, for security considerations, disable their cookies or clear their cookies monthly. (The percentage of those doing so varies from as few as 14% to as many as 40%.) What’s more, web-based tracking systems often do not track non-html file downloads well, so if your website’s performance is based on file downloads, your reports will be affected.

In a Nutshell: Web-based programs (e.g.: Analytics) give you a better picture of the visits your website receives, but report numbers usually a lot lower (yet presumably more accurate) than log files. The tradeoff is that users with high security settings may only be partially tracked or not tracked at all and that file downloads are difficult to monitor.

Bottom Line: It is not unusual at all for the server report to produce a visitor count 4-5 times greater than that of Analytics (as was the case with my client). A solution is to add about 20% to the figures reported in the web-based reports (e.g.: Google Analytics). That should give you a more reliable picture of your visitor traffic. Nonetheless, you may not want to dismiss the higher visitor count that server reports generate. The non-person (e.g.: “robot” or “spider”) that was viewing the site/page may have been viewing your website for a good reason, such as better indexing your website and/or position within a search engine.

Sources: DevelopmentSeed.org, SEOmoz.org and Jumpline.com (technical department).