Simple method for indexing MS Word documentsHits: 29
Building indexers/spiders that can read binary MS Word (.doc) documents can be difficult, expecially on *nix servers, which don't support PHP's COM abilities.
Solutions usually involve installing binaries on the server (often impossible or disallowed).
This simple PHP snippet makes a pretty good job of extracting text from an MS Word document for use in a search index. While not pretending to be perfect, it has proved itself useful on thousands of test documents.
Platform(s): Windows
Date: Apr, 30 2006 Author: The Mouse Whisperer, http://www.mousewhisperer.co.uk/php_page.html {
License}
{
Ratings}Number of Ratings: 1 Votes
Visitor Voting Booth:
{
Others Scripts}
|