Urchin requires a web server, and comes with a CGI script and a mod_perl handler for Apache. The CGI script should work with any CGI-enabled web server, while the mod_perl handler requires Apache as the web server. Urchin uses MySQL for its database functionality. The installation instructions for these various system software components can be found at the relevant sites.

For the Apache server, see

http://httpd.apache.org/

Note that for mod_perl functionality Urchin requires Apache 2.0. This version of Urchin has been tested with Apache 2.0.40.

For mod_perl, see

http://perl.apache.org/

Note that for mod_perl funtionality Urchin uses Apache 2.0 and so requires mod_perl 2.0 (mod_perl 2.0 is currently in development and so the actual version to use as of May 21, 2004 would be 1.99_14 or higher).

Finally for MySQL, see

http://www.mysql.com/

Note that the MySQL version must be 4.0.13 or higher.

0. Untar / CVS checkout

To install from tarball:

To install from CVS:

Note that there are currently (August 20, 2004) some problems with Sourceforge CVS access. During the outage, you can get the latest development code snapshot by downloading Urchin-dev-20040820.tar.gz from the project download page.

Then:

1. Install CPAN modules

Urchin requires these CPAN modules:

MODULE                    VER TESTED    STOCK IN PERL VER
------------------------------------------------------------
Apache::compat            -
Apache::Const             0.01
Apache::Emulator          0.04
Apache2                   -
Carp                      1.01          5.00307, 5.007003
CGI                       2.81          5.004, 5.008
DBI                       1.30 
Data::Dumper              2.12          5.005, 5.007003
Encode                    1.83          5.007003, 5.008001
ExtUtils::MakeMaker       6.03          5.00307, 5.008
File::Path                1.05          5.00307, 5.007003
FindBin                   1.43          5.00307, 5.007003
HTML::Entities            1.23
HTML::LinkExtractor       0.11
HTML::Sanitizer           0.04
HTML::Template            2.6
HTTP::Request             1.30
HTTP::Response            1.41
HTTP::Status              1.26
LWP::RobotUA              1.18
LWP::UserAgent            2.001
POSIX                     1.05          5.00307, 5.007003
Parse::RecDescent         1.94
RDF::Core                 0.30
Set::Array                0.11
Sys::Hostname::Long       1.2
Text::CSV                 0.01
Time::ParseDate           2003.1126
Time::Stopwatch           1.00
URI                       1.21
XML::DOM                  1.27
XML::RSS                  1.02
XML::RSS::Tools           0.13
XML::XPath                1.13
XML::XSLT                 0.45

The second column shows the version tested with Urchin, and thus much earlier versions are unlikely to work. The third column indicates in which version of Perl the first release of that module appeared, at all, and the version listed in the previous column (this data from Module::CoreList 1.96). If you have a recent version of Perl such as 5.8.0, you can skip installation of the modules with a Perl version listed.

You can install Perl CPAN modules with the CPAN shell, e.g.:

perl -MCPAN -e shell
cpan> install RDF::Core

If you prefer system packages, Red Hat users can check http://rpmpan.sourceforge.net/ and Debian users can check their apt repository or man dh-make-perl.

XML::RSS::Tools will need XML::LibXML and XML::LibXSLT. This will require you to install the system libraries that they need.

2. Database setup

***Your MySQL version must support InnoDB - use version 4.0.13 or higher***

To create a database, run the setup script. Running setup without parameters will provide help text:

cd db/scripts/mysql

./setup

An example of a working setup command might be:

./setup 'mysql -u root -p' create urchin urchin rss

3. System setup

Run the following commands as root to setup a group and directories for Urchin to write into:

groupadd urchin

# add apache user to urchin group; username may be www-data instead

gpasswd -a apache urchin

# add yourself to urchin group

gpasswd -a apache myuser

mkdir /var/cache/urchin

chgrp urchin /var/cache/urchin

chmod 2775 /var/cache/urchin

mkdir /var/log/urchin

chgrp urchin /var/log/urchin

chmod 2775 /var/log/urchin

4. Configuration

Copy the stock configuration file to the default location:

cp config /etc/urchin.conf

Edit /etc/urchin.conf to:

  1. use the correct database name, username and password for the urchin database:

    dbi.source = mysql:urchin
    dbi.username = user
    dbi.password = password

  2. specify the directory where Urchin should cache downloaded data:

    cache.path = /path/to/urchin/cache
    log.path = /var/log/urchin

5. CGI setup

Copy the CGI script into the appropriate directory:

cp urchin.cgi /var/www/cgi-bin

chmod a+x /var/www/cgi-bin/urchin.cgi

6. mod_perl setup (if used)

You should have the following items in your Apache configuration:

If the Urchin libraries are being installed to a special location you must get them loaded by Perl. You can accomplish this by placing the following command in your Apache config outside of all VirtualHost's. If your distro has a /etc/httpd/conf.d/perl.conf place this command (modified with the right directory name) near the bottom of that file, otherwise add it somewhere to /etc/httpd/conf/httpd.conf.

  PerlSwitches -I/var/www/cgi-bin/urchin_lib

All Urchin mod_perl instances will need the following; it can go inside or outside a VirtualHost:

  PerlModule Apache::Urchin
  <Location /urchinsearch>
    SetHandler perl-script
    PerlHandler Apache::Urchin
  </Location>
  

The PerlModule line is not strictly required but its recommended.

You may have to say PerlHandler Apache::Urchin::handler on some systems if Apache gives you an error about not being able to find or initialize the handler.

And this can be added for some web administration:

<Location /urchinadmin>
    SetHandler perl-script
    PerlHandler Apache::Urchin
    PerlSetVar UrchinAdmin On
    PerlAuthenHandler Apache::Urchin::authen_handler
    AuthName "Urchin Administrative Commands"
    AuthType basic
    require valid-user
  </Location>

If you intend to lock out the public from even the non-admin section, and require all users to supply a valid username/password, then you should use a Location block for /urchinsearch that looks more like the /urchinadmin example - copy the PerlAuthenHandler, AuthName, AuthType and require lines over. Then in /etc/urchin.conf you need to set web.public_access = NO.

7. Import RSS feeds

Prepare a file of RSS feed URLs to import - for an example see feeds.txt

Add to the urchin database:

$ perl urchinadm add < feeds.txt

Alternatively, specify URLs to import on the command line:

$ perl urchinadm add http://www.nature.com/news/rss.rdf

$ perl urchinadm add http://slashdot.org/slashdot.rss

The urchinadm command offers other convenient administrative commands; run it without any parameters to see help text. You may find it useful to create a symlink to urchinadm somewhere in your PATH.

8. Set up cron job for database refresh

Edit /etc/aliases to include urchinadm

The refresh database shell script urchin_refresh.cron mails error reports to a user called 'urchinadm'. You should set up an alias or aliases for this address:

  urchinadm:      john, paul, george, ringo

Be nice in how often you run the refresh script. More than once an hour is too often – and for most sites, even that will be unnecessary.