degu.degudocumentbuilder.ejb
Class DocumentBuilder

java.lang.Object
  extended by degu.degudocumentbuilder.ejb.DocumentBuilder
Direct Known Subclasses:
FromPDFDocumentBuilder

public abstract class DocumentBuilder
extends java.lang.Object

Base class for all DocumentBuilders. A DocumentBuilder builds an XML representation out of a RawDocument of a special file type.


Field Summary
protected  org.jdom.Document jDomDocument
           
protected  RawDocument rawDocument
           
protected  java.util.Vector<TextStripper> textStrippers
           
 
Constructor Summary
DocumentBuilder(java.util.Vector<TextStripper> textStrippers, RawDocument rawDocument)
          ctor Setting up the document builder with a vector of text strippers use setRawDocument() and/or setTextStrippers() in order to reuse a DocumentBuilder instance
 
Method Summary
abstract  DeguDocument build()
          Builds a degu document
protected  java.lang.String calculateDigestString(java.io.InputStream is)
          Calculates a digest string on the is.
protected  void initDoc()
          Creates initially the jdom doc, put the file type dependent stuff into the implementation of build() or overide some methods of this class, respectively This method must be called at the beginning of build
protected  org.jdom.Element makeParsedTextElement()
          makes the parsed-text jDom Element
protected  org.jdom.Element makeRawTextElement()
          makes the raw-text jDom Element
protected abstract  org.jdom.Element makeTOCElement()
          Extracts the TOC TODO implement the command pattern here (like TextStrippers)
protected  void setRawDocument(RawDocument rawDocument)
          set a new rawDocument, i.e.
protected  void setTextStrippers(java.util.Vector<TextStripper> textStrippers)
          set a new set of textStrippers, i.e.
protected  java.lang.String stripVoted(java.io.InputStream is)
          strips the InputStream is, this function tries every algorithm given by textStrippers (constructor), then the best stripping result is returned
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

rawDocument

protected RawDocument rawDocument

textStrippers

protected java.util.Vector<TextStripper> textStrippers

jDomDocument

protected org.jdom.Document jDomDocument
Constructor Detail

DocumentBuilder

public DocumentBuilder(java.util.Vector<TextStripper> textStrippers,
                       RawDocument rawDocument)
ctor Setting up the document builder with a vector of text strippers use setRawDocument() and/or setTextStrippers() in order to reuse a DocumentBuilder instance

Parameters:
textStrippers -
Method Detail

build

public abstract DeguDocument build()
                            throws FailedToStripText,
                                   DocumentEncryptedException,
                                   java.io.IOException
Builds a degu document

Returns:
Throws:
FailedToStripText
DocumentEncryptedException
java.io.IOException

makeTOCElement

protected abstract org.jdom.Element makeTOCElement()
                                            throws java.io.IOException
Extracts the TOC TODO implement the command pattern here (like TextStrippers)

Returns:
Throws:
java.io.IOException

stripVoted

protected java.lang.String stripVoted(java.io.InputStream is)
                               throws FailedToStripText
strips the InputStream is, this function tries every algorithm given by textStrippers (constructor), then the best stripping result is returned

Parameters:
is -
Returns:
Throws:
FailedToStripText

setRawDocument

protected void setRawDocument(RawDocument rawDocument)
set a new rawDocument, i.e. a DocumentBuilder instance is reusable

Parameters:
rawDocument -

setTextStrippers

protected void setTextStrippers(java.util.Vector<TextStripper> textStrippers)
set a new set of textStrippers, i.e. a DocumentBuilder instance is reusable

Parameters:
rawDocument -

calculateDigestString

protected java.lang.String calculateDigestString(java.io.InputStream is)
Calculates a digest string on the is.

Parameters:
is -
Returns:

initDoc

protected void initDoc()
                throws java.io.IOException,
                       FailedToStripText
Creates initially the jdom doc, put the file type dependent stuff into the implementation of build() or overide some methods of this class, respectively This method must be called at the beginning of build

Throws:
java.io.IOException
FailedToStripText

makeRawTextElement

protected org.jdom.Element makeRawTextElement()
                                       throws FailedToStripText,
                                              java.io.IOException
makes the raw-text jDom Element

Parameters:
file -
Returns:
Throws:
FailedToStripText
java.io.IOException

makeParsedTextElement

protected org.jdom.Element makeParsedTextElement()
                                          throws FailedToStripText,
                                                 java.io.IOException
makes the parsed-text jDom Element

Parameters:
file -
Returns:
Throws:
java.io.IOException
FailedToStripText