|
webbase is an internet web
crawler written in C and
later ported to C++. It uses
a MySQL database to store
information about crawled
URLs. It is available as a
command line program or as a
library (shared or static).
It has two main functions:
crawl the WEB to get
documents and build a full
text database with these
documents. The crawler part
visits the documents and
stores intersting
information about them
locally. It visits the
document on a regular basis
to make sure that it is
still there and updates it
if it changes. The full text
database uses the local
copies of the document to
build a searchable index.
The full text indexing
functions are not included
in webbase.
Date: Oct, 27 2000 |