|
Datagrab Indexer - Web
Web Secretary is a web page
Set a starting point and let
Harvest-NG is a collection of
Crawler, Indexer &
monitoring software.
iuCrawler do the rest.
Perl modules and scripts
Search Engine Extracts urls
However, it goes beyond the
Retrieve information from
which provide a powerful web
from the web and start
normal functionalities
websites, build databases
crawling and summarizing
building a search index.
offered by such software.
quickly, accurately and as
agent. The code is aimed at
Crawl up to more than a
Not only does it detect
often as you require.
providing an open source,
100,000 documents to produce
changes based on content
Completely customizable with
standards compliant, tool
a lightning fast index,
analysis (instead of
html templates. Seperate
for fetching content from a
which will be searchable via
date/time stamp or simple
spider and crawler for
wide variety of information
a front end web interface.
textual comparison), it will
maximum performance and ease
sources, summarising it into
Supports AND, OR, NOT,
email the changed page to
of use. Export your data to
a set of resource
Phrase and Fuzzy Search
you with the new contents
all popular formats
descriptions, and storing
through an advance ruleset
highlighted. Web Secretary
including My SQL,
these in an easily
configurations. Pack with
is written in Perl and
PostgreSQL, MS Excel and
accessible database from
many other features too
should be able to run on all
more.
which search services can be
long to be mentioned here.
Unix systems with the Perl
built and statistical
This system has been proven
interpreter (and LWP module)
information compiled.
to fulfill the needs of
installed.
almost any website out
there.
Date: Feb, 01 2006 Date: Nov, 26 2003 Date: Jan, 06 2003 Date: Feb, 28 2000 |
|
I-Spy is a Perl script which
This is a proof-of-concept of
identifies new files on
a tool to automate web
various remote FTP and Web
browsing / data collection.
sites. It grabs and compares
It works like AWK except
contents of FTP directories
that instead of working on
and web pages. It will then
files and lines it works on
compile a report and either
HTML pages and hyperlinks.
send it via e-mail or save
It is meant to be run as a
it as a web page. You may
command line script and
also request both deliveries
includes base_url - the URL
of the report.
For e-mail
the script was initially
reports, you may request
invoked on, base_path - root
plain text or HTML. I-Spy
of saved data tree, url -
logs its activity as it
current URL being processed,
chugs along. You may specify
linked_from - parent of
the log
directory, or I-Spy
current URL, and content -
will try to find one
the actual data
automatically. For web page
corresponding to the current
reports, I-Spy will attempt
URL.
to store the log in such a
place where it may be
referenced by the report and
served by the web server.
Date: Jan, 15 2000 Date: Jan, 02 2000 |