/**

popular collection of scripts for all

*/
/** Search */

Web Indexing

 }

iuCrawler

Hits: 14
*****
0.0

Internet Spy (i-spy)

Hits: 11
*****
0.0

WebAwk

Hits: 11
*****
0.0
Datagrab Indexer - Web
Set a starting point and let
I-Spy is a Perl script which
This is a proof-of-concept of
Crawler, Indexer &
iuCrawler do the rest.
identifies new files on
a tool to automate web
Search Engine Extracts urls
Retrieve information from
various remote FTP and Web
browsing / data collection.
from the web and start
websites, build databases
sites. It grabs and compares
It works like AWK except
building a search index.
quickly, accurately and as
contents of FTP directories
that instead of working on
Crawl up to more than a
often as you require.
and web pages. It will then
files and lines it works on
100,000 documents to produce
Completely customizable with
compile a report and either
HTML pages and hyperlinks.
a lightning fast index,
html templates. Seperate
send it via e-mail or save
It is meant to be run as a
which will be searchable via
spider and crawler for
it as a web page. You may
command line script and
a front end web interface.
maximum performance and ease
also request both deliveries
includes base_url - the URL
Supports AND, OR, NOT,
of use. Export your data to
of the report. For e-mail
the script was initially
Phrase and Fuzzy Search
all popular formats
reports, you may request
invoked on, base_path - root
through an advance ruleset
including My SQL,
plain text or HTML. I-Spy
of saved data tree, url -
configurations. Pack with
PostgreSQL, MS Excel and
logs its activity as it
current URL being processed,
many other features too
more.
chugs along. You may specify
linked_from - parent of
long to be mentioned here.
 
the log directory, or I-Spy
current URL, and content -
This system has been proven
 
will try to find one
the actual data
to fulfill the needs of
 
automatically. For web page
corresponding to the current
almost any website out
 
reports, I-Spy will attempt
URL.
there.
 
to store the log in such a
 
 
 
place where it may be
 
 
 
referenced by the report and
 
 
 
served by the web server.
 


Date: Feb, 01 2006


Date: Jan, 06 2003


Date: Jan, 15 2000


Date: Jan, 02 2000
Web Secretary is a web page
Harvest-NG is a collection of
 
 
monitoring software.
Perl modules and scripts
 
 
However, it goes beyond the
which provide a powerful web
 
 
normal functionalities
crawling and summarizing
 
 
offered by such software.
agent. The code is aimed at
 
 
Not only does it detect
providing an open source,
 
 
changes based on content
standards compliant, tool
 
 
analysis (instead of
for fetching content from a
 
 
date/time stamp or simple
wide variety of information
 
 
textual comparison), it will
sources, summarising it into
 
 
email the changed page to
a set of resource
 
 
you with the new contents
descriptions, and storing
 
 
highlighted. Web Secretary
these in an easily
 
 
is written in Perl and
accessible database from
 
 
should be able to run on all
which search services can be
 
 
Unix systems with the Perl
built and statistical
 
 
interpreter (and LWP module)
information compiled.
 
 
installed.
 
 
 


Date: Nov, 26 2003


Date: Feb, 28 2000
{ Copyright } ©2006 NuclearScripts.com