Website Log File Analysis

 

Web Analytics
Demo & Tutorial
Download
Support Services
Urchin Tips
Urchin Price List
Whitepaper Review

(1.4Mb pdf)

Urchin Tips


The following are some useful tips for getting the most out of your Urchin installation. The Urchin software should be installed and working on your server and you should have a reasonable understanding of how web analytics works.

 

All tips have been tested on our own servers, though we take no responsibility for our advise.

 

Contents:

an image 1. Ensure you are detecting all Search Engines
an image 2. Freeserve Correction
an image 3. Report all Pay-Per-Click campaigns in one place
an image 4. How many people add my site to Favourites?
an image 5. Filters - some regex tips
an image 6. Can I use Urchin if I don't have access to a clients' logfiles?
an image 7. Log file rotation - How to
an image 8. Auto email Urchin reports to clients

 

 

 

1. Ensure you are detecting all Search Engines

 

By default, out-of-the-box detection will not pick up EVERY single Search Engine - particularly country specific or product specific versions.

 

To correct this, as Admin, go to Profile Settings/Reporting. Append the 'Referral Keywords Match' value with ones applicable to you. For example, we use the following string (text marked in red is additional to the Urchin default):

 

p,q,qs,qt,mt,kw,key,word,text,words,query,search,search_string,general,
ask,qry,qkw,search_term,txtsearch_Term,query_contain

 * recently changed (Oct 2003) to use q=

 ** No longer using query_stems

 

Note, these 'query_stems' can, and do change as Search Engines update their technology. So always check Search Engine query_stems relevant to you by analysing a test web site logfile. Alternatively, we can do this for you!

 

 

 

2. Freeserve Correction

 Reported to Urchin [USCT-15041]

 

Freeserve uses a combination of variables in its query_stem that include qt, p and q. This confuses 'Referral Keywords Match' (see above). Within Urchin reports, under Page Query Terms you will see the term 'b' or sometimes '_searchbox' for searches that have originated from Freeserve. 

 

Apply the following Report Filter to correct for this:

 

 

In English, this reads:

'if &q= exists in the query_stem, replace any previous value of Page Query Terms with this one'.

 

 

 

3. Report all Pay-Per-Click campaigns in one place

 

This is more of a set-up issue with your PPC provider e.g. Google Adwords, Overture, Espotting etc. When creating your PPC campaigns, ensure you use tracking urls e.g.

 

http://www.aptstrategies.com.au/?googleads=term1+term2+...

http://www.aptstrategies.com.au/?overtureuk=term1+term2+...

http://www.aptstrategies.com.au/?espotting=term1+term2+...

 

Where term1, term2 etc., are the words in the phrase you are paying for. The portion that reads 'googleads=' is known as the Page Query Term and is automatically reported on within Urchin under Referrals.

 

Using the above technique in your PPC set-up will produce the following Page Query Terms for immediate comparison:

 

 

Then click on the small blue arrow next to each PPC source to view the individual query terms that came from that PPC source:

 

 

 

 

4. How many people add my site to Favourites

 

In MS Internet Explorer v5 and above, when a visitor adds your site to their favourites, IE requests a small icon file. Tracking this download allows you to monitor this.

 

As Admin, within Reporting, append ico the following to the 'Downloads Match' field e.g.

pdf,zip,exe,sh,tar,gz,dmg,pkg,doc,xls,ppt,ico

 

 
 

5. Filters - some regex tips

 

The following pdf file (1 page) is an excellent overview of using regular expressions when creating filters.

 

RegEx Tips

 

 

6. Can I use Urchin if I don't have access to a clients' logfiles?

 

Yes, you can act as an Application Service Provider (ASP) by adding a few lines of javascript code to your clients' web pages. The javascript code simply calls a php program on your own server which Urchin will analyse in the usual way.

 

 

 

7. Log file rotation - Howto

 

A "Howto" document for web server log file rotation with compression - Unix and Windows examples.

 

Why Rotate?

Urchin stores aggregated logfile information in its own database enabling the end user to build 'real-time' visitor reports. With is own 'Log Tracker' keeping track of how far into your server logfiles it has processed, you could say Logfile rotation can be ignored. However this isn't recommend for the following reasons:

  • Disk Space - logfiles in use are uncompressed plain text and can consume large amounts of space. Typically compression will reduce file sizes by 20:1, so it make sense to do this. However, a web server logfile can not be compressed while in use (locked), so it first must be rotated out of use.
  • Periodically, you will need to check logfiles for warnings, error messages (scripts not working), Search Engine detection etc, and this is best done using manageable file sizes.
  • Opening, closing and manipulating data for very large file sizes consumes system resources and will therefore slow down your server. It is much more efficient from both a system and application standpoint to manage several smaller logs than one very large log.
  • Smaller files are much easier to back up and restore in the event of system failure. 

Logfile rotation is achieved quite simply on unix machines using crontab and logrotate. We describe a separate method below for Windows.

 

System

This example was developed and tested on:

Unix: RedHat 6.x and 7.1/2/3 using Apache v1.3.9-29.
Windows: NT4/SP6, Windows 2000 SP1.

 

Schematic Unix Example

  • Each night (or any set time period), Urchin runs on the selected log file*
  • Urchin Log Tracker keeps an internal marker as to how far into the file it has analysed.
  • Each month (or any set time period), rotate and compress logfiles
  • Keep rotated logfiles for 12 months (or any number) -  in case you need to re-analyse!

* If you wish to rotate/compress files each and every time Urchin is scheduled to run, you can do this within Urchins' Log Manager. However, this can create a large number of logfiles, especially if you have multiple logfiles/hosts.

 

Our logfile rotation results in the following monthly files being created:

  • httpd_log.1.gz, httpd_log.2.gz, httpd_log.3.gz etc...

In this example, apache is using the 'combined' log format described in httpd.conf e.g.

 

# The following directives define some format nicknames for use
# with a CustomLog directive (see below).
#
LogFormat "%h %v %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" \"%{Cookie}i\"" combined

<VirtualHost www.adomain.tld>
...
CustomLog /home/httpd/logs_dir/httpd_log combined
</VirtualHost>

Note the Logformat directive maybe slightly different than what you see in some installations of apache. Ours is recommended as it allows you to simply use the default Urchin format: 'Log Format = auto'.

 

Unix Method

Each minute, the system crontab checks what jobs require scheduling. Scheduling is set in the /etc/crontab file.

 

SHELL=/bin/bash
PATH=/sbin:/bin:/usr/sbin:/usr/bin
MAILTO=root
HOME=/

# column headings - thanks Toby
# mins, hr, date, month, day, command

# run-parts
# Min Hr Date Month Day Owner Command File
01 * * * * root run-parts /etc/cron.hourly
02 1 * * * root run-parts /etc/cron.daily
50 23 * * 0 root run-parts /etc/cron.weekly
01 00 1 * * root run-parts /etc/cron.monthly

and what jobs are to be run is described in for example /etc/cron.monthly. In the above example, the directory /etc/cron.monthly is checked at 00:01 on the first day of every month.

 

My /etc/cron.monthly directory contains a file logrotate, contents of which are:

 

#!/bin/sh
/usr/sbin/logrotate /etc/logrotate.conf -f

The first line is required and simply informs the operating system to use the system shell to run the next line (command). Line 2 does the rotation, using the program located at /usr/sbin/logrotate and the configuration file /etc/logrotate.conf . My logrotate.conf contains:

 

# system-specific logs may be configured here

#########################################################
# #
# MONTHLY rotations #
# #
#########################################################

# rotate apache log files:
/home/httpd/logs/*_log {
    ifempty
    copytruncate
    rotate 12
    monthly
    compress
}

Note the first part of this file (up to # system-specific logs may be configured here) are default parameters and are ignored here for clarity. Below this comment, parameters over-ride the defaults.

 

[One caveat of the default parameters is:

 # send errors to root
errors your@emailaddress

This does not work as etc/crontab has: MAILTO=root which over-rides any set in logrotate.conf.]

The part that does the rotating/compressing (you can even even e-mail the rotated file), follows the comment:

 

# system-specific logs may be configured here

/home/httpd/logs/*_log {
defines which files are to be rotated and must end in a sing closing brace '}'.

 

ifempty
defines that rotation will continue even if the file is empty.

 

copytruncate
defines a copy of the logfile is created first and then its contents are removed (instead of simply creating a new one). This is required by apache as it can not be told to close a logfile (release) without stopping the service. By this method the apache server does not have to be restarted.

 

rotate 12
Over-rides the default (4) by keeping 12 previous files.

 

monthly
Over-rides the default (weekly) by performing rotations monthly.

 

compress
Compress the logfile usually by as much as 20:1. Auto-compression is probably logrotates' most powerful feature - something Window's struggles with! See below.

Read man logrotate for details concerning what other options may be useful to you. 

[A caveat from the man logrotate page is that it appears to indicate the order of the commands is un-important. For instance, viewing the /var/log/news/* example nocompress  appears after endscript. However changing this to compress will not work. It must come above postrotate. i.e. nocompress is actually ignored, but occurs as the default action.]

 

Windows Schematic Method

This is very similar to the Unix method above. Setup is much easier than for unix (simply select a radio button in the IIS control panel). However there is little flexibility with this and compression is more complicated to achieve. The difference between Windows and unix Schematics are:

  • Windows log rotation is system wide - all virtual web sites on the same server must therefore have the same log rotation settings (for unix, this can be controlled on a per web site basis).
  • Windows rotation time periods are set to midnight only, on a daily, weekly or monthly basis (for unix, any time period can be specified).
  • Weekly rotation takes place on Sunday (not Monday), the first day of the week in the US.
  • For compression, additional software is required e.g. winzip, and a separate batch (script) file to run the compression with command line parameters.

Windows Method

The following discusses how you can add compression functionality and assumes you have already selected the required logfile rotation frequency in the IIS control panel and have installed winzip and the winzip commandline add-on on your Windows machine in their respective default directories.

 

Create a file ziplogs.bat with your text editor containing the following:

 

ren "c:\Inetpub\weblog\w3svc1\monthly.zip" "monthly-old.zip"
"c:\program files\winzip\wzzip" -exomT "c:\Inetpub\weblog\w3svc1\monthly.zip" "c:\weblog\w3svc1\*.log"

The first line renames the previous zip file. The second line calls the wzzip program (actually the winzip commandline add-on) with options = exomT, and compresses all logfiles from the default IIS logfile location into monthly.zip.

 

[Note, *.log is used here because Windows names its log files using a unique timestamp. However, following the initial run of the script, there should in fact only be one logfile available - the last one just rotated.]

Options:

ex = Set the compression level to maximum
o = Change the zip file's file date to the same as the newest file in the Zip file
m = move files into the zip file
T = Include files older than the current date (if no date specified)

The most important option is 'T' (case sensitive). 'T' ensures only the last logfile is included in the compression, not the newly created one. Use the Windows scheduler to run your ziplogs.bat batch file at 01:00 on the day in question i.e. just after the rotation.

 

As you will have noticed, the Windows method only gives you two backups of your logfiles (monthly.zip and monthly-old.zip). A more advance batch file is required if you wish to keep further (numbered) copies.

 

 

8. Auto email Urchin reports to clients

 

Tested on Linux, this perl script is a customisation of the original Urchin supplied u5data_extractor.pl script. It is set to run on the 1st of each month emailing a custom defined report to you and the client. In this example, Search Engine visitors.

 

A cron job is set as follows:
/usr/local/urchin/util/u5data_extractor.pl --profile <<client profile>> | mutt -x -s "Last month's SE visitors" -c <<your@email.addr>> <<client@email.addr>>

 

where <<client profile>> is the profile name you set in the Urchin Admin console.

 

Rename the existing u5data_extractor.pl script and install this modified one u5data_extractor.txt (save and rename to .pl). Ensure the permissions are set correctly to execute this:

chmod ugo+x u5data_extractor.pl

The file is fully commented and has two main differences from the original:

  • Start and end time period is set to the 1st and last day of the previous month
  • Table values are summed to give a report total - similar to web interface.


For more information, please contact:


Contact:
APT Strategies Pty. Ltd.
PO Box 1644
Double Bay, NSW 1360
Telephone: +61 2 8354 1344
Facsimile: +61 2 9360 0385
Email: info@aptstrategies.com.au