|
Datagrab Indexer - Web
Set a starting point and let
I-Spy is a Perl script which
This is a proof-of-concept of
Crawler, Indexer &
iuCrawler do the rest.
identifies new files on
a tool to automate web
Search Engine Extracts urls
Retrieve information from
various remote FTP and Web
browsing / data collection.
from the web and start
websites, build databases
sites. It grabs and compares
It works like AWK except
building a search index.
quickly, accurately and as
contents of FTP directories
that instead of working on
Crawl up to more than a
often as you require.
and web pages. It will then
files and lines it works on
100,000 documents to produce
Completely customizable with
compile a report and either
HTML pages and hyperlinks.
a lightning fast index,
html templates. Seperate
send it via e-mail or save
It is meant to be run as a
which will be searchable via
spider and crawler for
it as a web page. You may
command line script and
a front end web interface.
maximum performance and ease
also request both deliveries
includes base_url - the URL
Supports AND, OR, NOT,
of use. Export your data to
of the report.
For e-mail
the script was initially
Phrase and Fuzzy Search
all popular formats
reports, you may request
invoked on, base_path - root
through an advance ruleset
including My SQL,
plain text or HTML. I-Spy
of saved data tree, url -
configurations. Pack with
PostgreSQL, MS Excel and
logs its activity as it
current URL being processed,
many other features too
more.
chugs along. You may specify
linked_from - parent of
long to be mentioned here.
the log
directory, or I-Spy
current URL, and content -
This system has been proven
will try to find one
the actual data
to fulfill the needs of
automatically. For web page
corresponding to the current
almost any website out
reports, I-Spy will attempt
URL.
there.
to store the log in such a
place where it may be
referenced by the report and
served by the web server.
Date: Feb, 01 2006 Date: Jan, 06 2003 Date: Jan, 15 2000 Date: Jan, 02 2000 |
|
Web Secretary is a web page
Harvest-NG is a collection of
monitoring software.
Perl modules and scripts
However, it goes beyond the
which provide a powerful web
normal functionalities
crawling and summarizing
offered by such software.
agent. The code is aimed at
Not only does it detect
providing an open source,
changes based on content
standards compliant, tool
analysis (instead of
for fetching content from a
date/time stamp or simple
wide variety of information
textual comparison), it will
sources, summarising it into
email the changed page to
a set of resource
you with the new contents
descriptions, and storing
highlighted. Web Secretary
these in an easily
is written in Perl and
accessible database from
should be able to run on all
which search services can be
Unix systems with the Perl
built and statistical
interpreter (and LWP module)
information compiled.
installed.
Date: Nov, 26 2003 Date: Feb, 28 2000 |