Home - Regional Searching - Thunderstone Home - Edit Your Site - About this Service


The Thunderstone Web Site Catalog
This is one of several experimental search engines produced by Thunderstone's R&D group whose mission is to advance our overall technology leadership. We are very pleased with the results.

We continuously survey all primary COM, NET, and ORG web-servers and distill their contents to produce this database. This is an index of sites not pages. It is very good at finding companies and organizations by purpose, product, subject matter, or location. If you're trying to finding things like "BillyBob's personal beer can page on AOL", try Yahoo or Dogpile. This engine attempts to focus on the quality of answers, not the quantity. You may not add your own url. Addition is automatic. You may review your information by selecting "edit your site".

The 'Distillation' process
Our Webinator web robot is dispatched to each site to obtain their web pages. Then, each site's pages are examined as a whole to determine the principal subject matter areas that would best characterize the entire site. Additions and updates are performed at the rate of about 100 sites per minute. In October 1998 more than 350 gigabytes of content was represented in this database. The growth rate is about 200,000 newly discovered sites per week. Webmasters may use the User-Agent "thunderstone" in their robots.txt file to control access.

The Categorization process
After a site's content has been acquired they are passed to Thunderstone's Automated Categorization Engine. This process seeks to identify the general classifications under which a site belongs. The % figure that follows a site's category indicates the degree of confidence that the categorization engine had in its answer.

While this technology is not always completely accurate, it does perform a task that would otherwise require 75-80 people to accomplish. Each site is assigned up to 4 categories, but our search interface currently only displays the best one.

The News Search
This database is created by a special version of our Webinator program called a "differential crawler". It periodically examines 2300 news sites and indexes pages which are different or new since the prior examination. A site's indexing frequency varies from once and hour to once every three days based on how dynamically they update their news.

The Machinery
There are 3 machines that support this service; two machines support the Webinator and Categorizer, one supports the news service, and web-site searches. Each 1U machine has an Intel P4 2.4GHz processor, 1GB of ram, and 200GB of IDE disks.

By the way
If you are interested in purchasing this software or building any form of a search / dynamic publishing / database driven site, give us a call.

Copyright© 2024 Thunderstone