[Lsb-infrastructure] Navigator and web performance
Denis Silakov
silakov at ispras.ru
Mon Apr 28 00:49:34 PDT 2008
David Ames wrote:
> Earlier today we were hit by several web crawlers at the same time: google, yahoo and a
> barrage from MSN. When they crawled the navigator/dbadmin tool our database server CPU and
> load went through the roof slowing down all our sites.
>
> As I mentioned at the collaboration summit. I want to move the lsb database to a dedicated
> server which should optimize performance for all involved. I plan to work on this in the
> middle of next week. Is there a particularly good or bad time to perform this? I imagine
> we are talking about very little downtime. Less than 5 minutes.
>
No limitations from our side here.
> In the meantime I want to get our robots.txt file up to date. Today I set it to disallow
> /dbadmin and /navigator. Can the ISPRAS guys suggest an appropriate robots.txt setup for
> dbadmin? What should and should not be indexed?
First, I think it is enough to index /navigator folder only.
As for particular files/directories that should not be indexed and that
can really cause performance problems when subjected to crawlers, the
following ones should be definitely excluded:
Disallow: /navigator/admin/
Disallow: /navigator/admin_community/
Disallow: /navigator/consistency/
Disallow: /navigator/coverage/
Disallow: /navigator/browse/app_stats.php
Disallow: /navigator/browse/component.php
Disallow: /navigator/browse/queries.php
Disallow: /navigator/browse/rawclass.php
Disallow: /navigator/browse/rawcmd.php
Disallow: /navigator/browse/rawilmodule.php
Disallow: /navigator/browse/rawint.php
Disallow: /navigator/browse/rawlib.php
These are mainly pages concerning community tables that have lots of
cross-links and some additional stuff such as auxiliary queries. I guess
admin* folders are already 'closed' for robots through .htaccess, but
I've added them here for completeness.
--
Regards,
Denis.
More information about the lsb-infrastructure
mailing list