|
Whitepaper Review (1.4Mb pdf)
|
Urchin Tips
The following are some useful tips for getting
the most out of your Urchin installation. The Urchin software should be
installed and working on your server and you should have a reasonable
understanding of how web analytics works.
All tips have been tested on our own servers, though we take no responsibility
for our advise.
|
1. Ensure you are detecting all Search Engines |
By default, out-of-the-box detection will not pick up EVERY single
Search Engine - particularly country specific or product specific versions.
To correct this, as Admin, go to Profile Settings/Reporting. Append the
'Referral Keywords Match' value with ones applicable to you. For example,
we use the following string (text marked in red is additional to the Urchin
default):
p,q,qs,qt,mt,kw,key,word,text,words,query,search,search_string,general,
ask,qry,qkw,search_term,txtsearch_Term,query_contain
* recently changed (Oct 2003) to use
q=
** No longer using query_stems
Note, these 'query_stems' can, and do change as Search Engines update
their technology. So always check Search Engine query_stems relevant to
you by analysing a test web site logfile. Alternatively, we can do this
for you!

|
2. Freeserve Correction |
Reported to Urchin [USCT-15041]
Freeserve uses a combination of variables in its query_stem that include
qt, p and q. This confuses 'Referral Keywords Match' (see above). Within
Urchin reports, under Page Query Terms you will see the term 'b' or sometimes
'_searchbox' for searches that have originated from Freeserve.
Apply the following Report Filter to correct for this:

In English, this reads:
'if &q= exists in the query_stem, replace any previous value of Page
Query Terms with this one'.

|
3. Report all Pay-Per-Click campaigns in one
place |
This is more of a set-up issue with your PPC provider e.g. Google Adwords,
Overture, Espotting etc. When creating your PPC campaigns, ensure you
use tracking urls e.g.
http://www.aptstrategies.com.au/?googleads=term1+term2+...
http://www.aptstrategies.com.au/?overtureuk=term1+term2+...
http://www.aptstrategies.com.au/?espotting=term1+term2+...
Where term1, term2 etc., are the words in the phrase you are paying for.
The portion that reads 'googleads=' is known as the Page Query Term and
is automatically reported on within Urchin under Referrals.
Using the above technique in your PPC set-up will produce the following
Page Query Terms for immediate comparison:

Then click on the small blue arrow next to each PPC source to view the
individual query terms that came from that PPC source:


|
4. How many people add my site to Favourites |
In MS Internet Explorer v5 and above, when a visitor adds your site to
their favourites, IE requests a small icon file. Tracking this download
allows you to monitor this.
As Admin, within Reporting, append ico
the following to the 'Downloads
Match' field e.g.
pdf,zip,exe,sh,tar,gz,dmg,pkg,doc,xls,ppt,ico

|
5. Filters - some regex
tips |
The following pdf file (1 page) is an excellent overview of using
regular expressions when creating filters.
RegEx
Tips

|
6. Can I use Urchin if I don't have
access to a clients' logfiles? |
Yes, you can act as an Application Service Provider (ASP) by adding
a few lines of javascript code to your clients' web pages. The javascript
code simply calls a php program on your own server which Urchin
will analyse in the usual way.

|
7. Log file rotation - Howto |
A "Howto" document for web
server log file rotation with compression - Unix and Windows examples.
Why Rotate?
Urchin stores aggregated logfile information in its own database
enabling the end user to build 'real-time' visitor reports. With
is own 'Log Tracker' keeping track of how far into your server logfiles
it has processed, you could say Logfile rotation can be ignored.
However this isn't recommend for the following reasons:
- Disk Space - logfiles in use are uncompressed plain text and
can consume large amounts of space. Typically compression will
reduce file sizes by 20:1, so it make sense to do this. However,
a web server logfile can not be compressed while in use (locked),
so it first must be rotated out of use.
- Periodically, you will need to check logfiles for warnings,
error messages (scripts not working), Search Engine detection
etc, and this is best done using manageable file sizes.
- Opening, closing and manipulating data for very large file sizes
consumes system resources and will therefore slow down your server.
It is much more efficient from both a system and application standpoint
to manage several smaller logs than one very large log.
- Smaller files are much easier to back up and restore in the
event of system failure.
Logfile rotation is achieved quite simply on unix machines using
crontab and logrotate.
We describe a separate method below
for Windows.
System
This example was developed and tested on:
Unix: RedHat 6.x and 7.1/2/3 using Apache v1.3.9-29.
Windows: NT4/SP6, Windows 2000 SP1.
Schematic Unix Example
- Each night (or any set time period), Urchin runs on the selected
log file*
- Urchin Log Tracker keeps an internal marker as to how far into
the file it has analysed.
- Each month (or any set time period), rotate and compress logfiles
- Keep rotated logfiles for 12 months (or any number) -
in case you need to re-analyse!
* If you wish to rotate/compress files each and every time Urchin
is scheduled to run, you can do this within Urchins' Log Manager.
However, this can create a large number of logfiles, especially
if you have multiple logfiles/hosts.
Our logfile rotation results in the following monthly files being
created:
- httpd_log.1.gz, httpd_log.2.gz, httpd_log.3.gz etc...
In this example, apache is using the 'combined' log format described
in httpd.conf e.g.
# The following
directives define some format nicknames for use
# with a CustomLog directive (see below).
#
LogFormat "%h %v %u %t \"%r\" %>s %b \"%{Referer}i\"
\"%{User-Agent}i\" \"%{Cookie}i\"" combined
<VirtualHost
www.adomain.tld>
...
CustomLog /home/httpd/logs_dir/httpd_log combined
</VirtualHost>
Note the Logformat
directive maybe slightly different than what you see in some installations
of apache. Ours is recommended as it allows you to simply use the
default Urchin format: 'Log Format = auto'.
Unix Method
Each minute, the system crontab checks what jobs require scheduling.
Scheduling is set in the /etc/crontab file.
SHELL=/bin/bash
PATH=/sbin:/bin:/usr/sbin:/usr/bin
MAILTO=root
HOME=/
# column headings - thanks Toby
# mins, hr, date, month, day, command
# run-parts
# Min Hr Date Month Day Owner Command File
01 * * * * root run-parts /etc/cron.hourly
02 1 * * * root run-parts /etc/cron.daily
50 23 * * 0 root run-parts /etc/cron.weekly
01 00 1 * * root run-parts /etc/cron.monthly
and what jobs are to be run is described in for example /etc/cron.monthly.
In the above example, the directory /etc/cron.monthly is checked
at 00:01 on the first day of every month.
My /etc/cron.monthly directory contains a file logrotate,
contents of which are:
#!/bin/sh
/usr/sbin/logrotate /etc/logrotate.conf -f
The first line is required and simply informs the operating system
to use the system shell to run the next line (command). Line 2 does
the rotation, using the program located at /usr/sbin/logrotate
and the configuration file /etc/logrotate.conf
. My logrotate.conf contains:
# system-specific
logs may be configured here
#########################################################
# #
# MONTHLY rotations #
# #
#########################################################
# rotate apache log files:
/home/httpd/logs/*_log {
ifempty
copytruncate
rotate 12
monthly
compress
}
Note the first part of this file (up to #
system-specific logs may be configured here) are default
parameters and are ignored here for clarity. Below this comment,
parameters over-ride the defaults.
[One caveat of the default parameters is:
#
send errors to root
errors your@emailaddress
This does not work as etc/crontab has: MAILTO=root
which over-rides any set in logrotate.conf.]
The part that does the rotating/compressing (you can even even
e-mail the rotated file), follows the comment:
# system-specific
logs may be configured here
/home/httpd/logs/*_log
{
defines which files are to be rotated and must end in a
sing closing brace '}'.
ifempty
defines that rotation will continue even if the file is
empty.
copytruncate
defines a copy of the logfile is created first and then its contents
are removed (instead of simply creating a new one). This is required
by apache as it can not be told to close a logfile (release) without
stopping the service. By this method the apache server does not
have to be restarted.
rotate 12
Over-rides the default (4) by keeping 12 previous files.
monthly
Over-rides the default (weekly) by performing rotations
monthly.
compress
Compress the logfile usually by as much as 20:1. Auto-compression
is probably logrotates' most powerful feature - something Window's
struggles with! See below.
Read man logrotate
for details concerning what other options may be useful
to you.
[A caveat from the man logrotate page is that it appears to indicate
the order of the commands is un-important. For instance, viewing
the /var/log/news/* example
nocompress appears after endscript.
However changing this to compress
will not work. It must come above postrotate.
i.e. nocompress is actually ignored, but occurs as the default
action.]
This is very similar to the Unix method above. Setup is much easier
than for unix (simply select a radio button in the IIS control panel).
However there is little flexibility with this and compression is
more complicated to achieve. The difference between Windows and
unix Schematics are:
- Windows log rotation is system wide - all virtual web sites
on the same server must therefore have the same log rotation settings
(for unix, this can be controlled on a per web site basis).
- Windows rotation time periods are set to midnight only, on a
daily, weekly or monthly basis (for unix, any time period can
be specified).
- Weekly rotation takes place on Sunday (not Monday), the first
day of the week in the US.
- For compression, additional software is required e.g. winzip,
and a separate batch (script) file to run the compression with
command line parameters.
Windows Method
The following discusses how you can add compression functionality
and assumes you have already selected the required logfile rotation
frequency in the IIS control panel and have installed winzip
and the winzip commandline
add-on on your Windows machine in their respective default directories.
Create a file ziplogs.bat with your text editor containing the
following:
ren "c:\Inetpub\weblog\w3svc1\monthly.zip"
"monthly-old.zip"
"c:\program files\winzip\wzzip" -exomT "c:\Inetpub\weblog\w3svc1\monthly.zip"
"c:\weblog\w3svc1\*.log"
The first line renames the previous zip file. The second line calls
the wzzip program (actually the winzip commandline add-on) with
options = exomT, and compresses all logfiles from the default IIS
logfile location into monthly.zip.
[Note, *.log is used here because Windows names its log files
using a unique timestamp. However, following the initial run of
the script, there should in fact only be one logfile available
- the last one just rotated.]
Options:
ex = Set the compression level to maximum
o = Change the zip file's file date to the same as the newest
file in the Zip file
m = move files into the zip file
T = Include files older than the current date (if no date specified)
The most important option is 'T' (case sensitive). 'T' ensures
only the last logfile is included in the compression, not the newly
created one. Use the Windows scheduler to run your ziplogs.bat batch
file at 01:00 on the day in question i.e. just after the rotation.
As you will have noticed, the Windows method only gives you two
backups of your logfiles (monthly.zip and monthly-old.zip). A
more advance batch file is required if you wish to keep further
(numbered) copies.

|
8. Auto email Urchin reports
to clients |
Tested on Linux, this perl script is a customisation of the original
Urchin supplied u5data_extractor.pl script. It is set to run on
the 1st of each month emailing a custom defined report to you and
the client. In this example, Search Engine visitors.
A cron job is set as follows:
/usr/local/urchin/util/u5data_extractor.pl
--profile <<client profile>> | mutt -x -s "Last month's SE
visitors" -c <<your@email.addr>> <<client@email.addr>>
where <<client profile>> is the profile name you set in the
Urchin Admin console.
Rename the existing u5data_extractor.pl script and install this
modified one u5data_extractor.txt
(save and rename to .pl). Ensure the permissions are set correctly
to execute this:
chmod ugo+x u5data_extractor.pl
The file is fully commented and has two main differences from the
original:
- Start and end time period is set to the 1st and last day of
the previous month
- Table values are summed to give a report total - similar to
web interface.

|
For more information, please contact:
Contact:
APT Strategies Pty. Ltd.
PO Box 1644
Double Bay, NSW 1360
Telephone: +61 2 8354 1344 Facsimile: +61 2 9360 0385
Email: info@aptstrategies.com.au
|