Administrator Guide: Domain Filters
The following domain filters control where the spider crawls.
This feature is useful when you want to prevent the spider from accessing
specific areas of your Web site. The indexing domain filters control what the
spider indexes.
Both following domain filters and indexing domain filters use RegEx for
pattern matching. Special signs are used to specify whether or not the pattern
must be matched, should not be matched, or at least one should be matched.
These special signs are:
-
Minus sign: The minus sign instructs the spider to ignore items that match the
filter.
-
Plus sign: The plus sign instructs the spider to only allow items that match
the filter.
-
Question mark: The question mark instructs the spider to only allow items that
match at least on of the filters that begin with a question mark.
Examples:
-
If we wanted to instruct the spider to ignore our private area of our Web site,
we would add the following filter:
-http://www.mydomain.com/privatearea/.*
-
If we wanted to instruct the spider to only allow our support area of our Web
site, we would add the following filter:
+http://www.mydomain.com/support/.*
-
If we wanted to instruct the spider to only allow our support area and our
knowledge base area, we would add the following two filters:
?http://www.mydomain.com/support/.*
?http://www.mydomain.com/knowledgebase/.*
In the above examples, we’ve included .* at the end. The .* is used for wildcard
matching. .* will match zero or more characters and can be placed anywhere in
the filter.
Example:
-
If we wanted to instruct the spider to crawl all of our subdomains, but not
other domains, we would add the following filter:
+http://.*.mydomain.com/.*
<< Back To Table Of Contents
|