DEGU is a J2EE based distributed index and retrieval engine written in 100% Java (License: LGPL). 

The philosophy behind DEGU is to index a rather small sized collection of documents, but to provide high quality search capabilities. Unlike other search engines, DEGU not only retrieves whole documents,  but also  document parts  like chapters and pages. For example if you search the IBM Redbooks you will likely get resulting documents, which have around 1000 pages. It would be smarter to get the relevant chapters only. Each hit produced by DEGU is accompanied with the document's Table of Contents (TOC), if it was possible to extract one. TOCs are searchable as well. Since TOC entries represent document chunks like sections, subsections, subsubsections etc.,  DEGU is capable to append to each TOC entry the number of relevant pages, and, of course, any chunk is separately downloadable.
Moreover, DEGU alters the hits, for example, DEGU underlines in PDF files keywords and adds bookmarks, which points to the relevant pages.

Current features

Planned features

Many thanks to these projects

Michael Barth, 2006

sourceforge logo