About The Thunderstone Web Site Catalog
This is one of several experimental search engines produced by Thunderstone's R&D group whose mission is
to advance our overall technology leadership. We are
very pleased with the results.
We continuously survey all primary COM, NET, and ORG
web-servers and distill their contents to produce this database.
This is an index of sites not pages. It is very good at
finding companies and organizations by purpose, product, subject
matter, or location. If you're trying to finding things like
"BillyBob's personal beer can page on AOL", try Yahoo or
Dogpile. This engine attempts to focus on the quality of answers,
not the quantity. You may not add your own url. Addition
is automatic. You may review your information by selecting
"edit your site".
Note that the walker/indexer has gone hiatus as of April 2017.
The 'Distillation' process
Our Webinator web robot is dispatched to
each site to obtain their web pages. Then, each site's pages are examined as a whole to determine
the principal subject matter areas that would best characterize the entire site. Additions and
updates are performed at the rate of about 100 sites per minute. In October 1998 more than 350 gigabytes of content was
represented in this database. The growth rate is about 200,000 newly discovered sites per week.
Webmasters may use the User-Agent "thunderstone" in their robots.txt file to control access.
The Categorization process
After a site's content has been acquired they are passed to Thunderstone's Automated Categorization Engine.
This process seeks to identify the general classifications under which a site belongs. The % figure
that follows a site's category indicates the degree of confidence that the categorization engine
had in its answer. While this technology is not always completely accurate, it does perform
a task that would otherwise require 75-80 people to accomplish. Each site is assigned up to 4 categories,
but our search interface currently only displays the best one.
The News Search
This database is created by a special version of our Webinator
program called a "differential crawler". It periodically examines 2300 news sites and indexes pages
which are different or new since the prior examination. A site's indexing frequency varies from
once and hour to once every three days based on how dynamically they update their news.
The Machinery
There are 3 machines that support this service; two machines support the Webinator walker and Categorizer, one
supports Webinator searches as well as the news indexer and search.
The walker and categorizer machines have 2 Intel Xeon 2.66GHz processors, 3.5GB of ram, and 1TB of SATA disks.
The search and news machine has an Intel P4 3.0GHz processor, 2GB of ram, and 1TB of
IDE disks.
By the way If you are interested in purchasing this software or building any form of a search / dynamic publishing
/ database driven site, give us a call.
|