Administrator Guide: Domain Filters

The following domain filters control where the spider crawls.  This feature is useful when you want to prevent the spider from accessing specific areas of your Web site. The indexing domain filters control what the spider indexes.

Both following domain filters and indexing domain filters use RegEx for pattern matching. Special signs are used to specify whether or not the pattern must be matched, should not be matched, or at least one should be matched.

These special signs are:

  1. Minus sign: The minus sign instructs the spider to ignore items that match the filter.
  2. Plus sign: The plus sign instructs the spider to only allow items that match the filter.
  3. Question mark: The question mark instructs the spider to only allow items that match at least on of the filters that begin with a question mark. 

Examples:

  1. If we wanted to instruct the spider to ignore our private area of our Web site, we would add the following filter:
     
    -http://www.mydomain.com/privatearea/.*
     
  2. If we wanted to instruct the spider to only allow our support area of our Web site, we would add the following filter:
     
    +http://www.mydomain.com/support/.*
     
  3. If we wanted to instruct the spider to only allow our support area and our knowledge base area, we would add the following two filters:
     
    ?http://www.mydomain.com/support/.*
    ?http://www.mydomain.com/knowledgebase/.*
      

In the above examples, we’ve included .* at the end. The .* is used for wildcard matching. .* will match zero or more characters and can be placed anywhere in the filter.

Example:

  1. If we wanted to instruct the spider to crawl all of our subdomains, but not other domains, we would add the following filter:
     
    +http://.*.mydomain.com/.*

 

 

<< Back To Table Of Contents

 

©2009 Innerprise . All rights reserved. Privacy Statement - Link To Us