|
Datagrab Indexer - Web
Harvest-NG is a collection of
I-Spy is a Perl script which
Set a starting point and let
Crawler, Indexer &
Perl modules and scripts
identifies new files on
iuCrawler do the rest.
Search Engine Extracts urls
which provide a powerful web
various remote FTP and Web
Retrieve information from
from the web and start
crawling and summarizing
sites. It grabs and compares
websites, build databases
building a search index.
agent. The code is aimed at
contents of FTP directories
quickly, accurately and as
Crawl up to more than a
providing an open source,
and web pages. It will then
often as you require.
100,000 documents to produce
standards compliant, tool
compile a report and either
Completely customizable with
a lightning fast index,
for fetching content from a
send it via e-mail or save
html templates. Seperate
which will be searchable via
wide variety of information
it as a web page. You may
spider and crawler for
a front end web interface.
sources, summarising it into
also request both deliveries
maximum performance and ease
Supports AND, OR, NOT,
a set of resource
of the report.
For e-mail
of use. Export your data to
Phrase and Fuzzy Search
descriptions, and storing
reports, you may request
all popular formats
through an advance ruleset
these in an easily
plain text or HTML. I-Spy
including My SQL,
configurations. Pack with
accessible database from
logs its activity as it
PostgreSQL, MS Excel and
many other features too
which search services can be
chugs along. You may specify
more.
long to be mentioned here.
built and statistical
the log
directory, or I-Spy
This system has been proven
information compiled.
will try to find one
to fulfill the needs of
automatically. For web page
almost any website out
reports, I-Spy will attempt
there.
to store the log in such a
place where it may be
referenced by the report and
served by the web server.
Date: Feb, 01 2006 Date: Feb, 28 2000 Date: Jan, 15 2000 Date: Jan, 06 2003 |
|
Web Secretary is a web page
This is a proof-of-concept of
monitoring software.
a tool to automate web
However, it goes beyond the
browsing / data collection.
normal functionalities
It works like AWK except
offered by such software.
that instead of working on
Not only does it detect
files and lines it works on
changes based on content
HTML pages and hyperlinks.
analysis (instead of
It is meant to be run as a
date/time stamp or simple
command line script and
textual comparison), it will
includes base_url - the URL
email the changed page to
the script was initially
you with the new contents
invoked on, base_path - root
highlighted. Web Secretary
of saved data tree, url -
is written in Perl and
current URL being processed,
should be able to run on all
linked_from - parent of
Unix systems with the Perl
current URL, and content -
interpreter (and LWP module)
the actual data
installed.
corresponding to the current
URL.
Date: Nov, 26 2003 Date: Jan, 02 2000 |