[Lsb-infrastructure] Navigator and web performance

Denis Silakov silakov at ispras.ru
Mon Apr 28 00:49:34 PDT 2008


David Ames wrote:
> Earlier today we were hit by several web crawlers at the same time: google, yahoo and a
> barrage from MSN. When they crawled the navigator/dbadmin tool our database server CPU and
> load went through the roof slowing down all our sites.
>
> As I mentioned at the collaboration summit. I want to move the lsb database to a dedicated
> server which should optimize performance for all involved. I plan to work on this in the
> middle of next week. Is there a particularly good or bad time to perform this? I imagine
> we are talking about very little downtime. Less than 5 minutes.
>   

No limitations from our side here.

> In the meantime I want to get our robots.txt file up to date. Today I set it to disallow
> /dbadmin and /navigator. Can the ISPRAS guys suggest an appropriate robots.txt setup for
> dbadmin? What should and should not be indexed?

First, I think it is enough to index /navigator folder only.

As for particular files/directories that should not be indexed and that
can really cause performance problems when subjected to crawlers, the
following ones should be definitely excluded:

Disallow: /navigator/admin/
Disallow: /navigator/admin_community/
Disallow: /navigator/consistency/
Disallow: /navigator/coverage/
Disallow: /navigator/browse/app_stats.php
Disallow: /navigator/browse/component.php
Disallow: /navigator/browse/queries.php
Disallow: /navigator/browse/rawclass.php
Disallow: /navigator/browse/rawcmd.php
Disallow: /navigator/browse/rawilmodule.php
Disallow: /navigator/browse/rawint.php
Disallow: /navigator/browse/rawlib.php

These are mainly pages concerning community tables that have lots of
cross-links and some additional stuff such as auxiliary queries. I guess
admin* folders are already 'closed' for robots through .htaccess, but
I've added them here for completeness.

-- 
Regards,
Denis.



More information about the lsb-infrastructure mailing list