Template by:
Free Blog Templates

Logfile Analysis

Web servers have always recorded all their transactions in a logfile . It was soon realised that these logfiles could be read by a program to provide data on the popularity of the website. In the early 1990s, web site statistics consisted primarily of counting the number of client requests made to the web server. This was a reasonable method initially, since each web site often consisted of a single HTML file. However, with the introduction of images in HTML, and web sites that spanned multiple HTML files, this count became less useful.

Two units of measure were introduced in the mid 1990s to gauge more accurately the amount of human activity on web servers. These were page views and visits (or sessions). A page view was defined as a request made to the web server for a page, as opposed to a graphic, while a visit was defined as a sequence of requests from a uniquely identified client that expired after a certain amount of inactivity, usually 30 minutes. The page views and visits are still commonly displayed metrics, but are now considered rather unsophisticated measurements.


The emergence of search engine spiders and robots in the late 1990s, along with web proxies and dynamically assigned IP addresses for large companies and ISPs, made it more difficult to identify unique human visitors to a website. Log analyzers responded by tracking visits by cookies, and by ignoring requests from known spiders.
The extensive use of web caches also presented a problem for logfile analysis. If a person revisits a page, the second request will often be retrieved from the browser's cache, and so no request will be received by the web server. This means that the person's path through the site is lost. Caching can be defeated by configuring the web server, but this can result in degraded performance for the visitor to the website.

Advantages of logfile analysis

The main advantages of logfile analysis over page tagging are as follows.
  • The web server normally already produces logfiles, so the raw data is already available. To collect data via page tagging requires changes to the website.
  • The web server reliably records every transaction it makes. Page tagging relies on the visitors' browsers co-operating, which a certain proportion may not do.
  • The data is on the company's own servers, and is in a standard, rather than a proprietary, format. This makes it easy for a company to switch programs later, use several different programs, and analyze historical data with a new program. Page tagging solutions involve vendor lock-in.
  • Logfiles contain information on visits from search engine spiders. Although these should not be reported as part of the human activity, it is important data for performing search engine optimization.
  • Logfiles contain information on failed requests; page tagging only records an event if the page is successfully viewed.

0 comments: