degu.util.textstripper
Class PDFBoxTextStripper

java.lang.Object
  extended by degu.util.textstripper.TextStripper
      extended by degu.util.textstripper.PageSupportedTextStripper
          extended by degu.util.textstripper.PDFBoxTextStripper
All Implemented Interfaces:
PageSupport

public class PDFBoxTextStripper
extends PageSupportedTextStripper

PDF stripper based on PDF-Box


Field Summary
(package private)  java.io.InputStream is
           
(package private)  org.pdfbox.pdmodel.PDDocument pdDocument
           
(package private)  org.pdfbox.util.PDFTextStripper stripper
           
 
Constructor Summary
PDFBoxTextStripper()
           
 
Method Summary
private  void abortIfNotInitialized()
           
protected  void finalize()
           
 void initialize()
          Does initialization, this method must be called before any other method
private  void setUpPdDocument(java.io.InputStream is)
          This is to optimize subsequent calls on the same InputStream of any stripAs* methods
 java.lang.String stripAsString(java.io.InputStream is)
          returns contents of fis as String
 java.lang.String stripAsString(java.io.InputStream is, int page)
          strips a single page out of the document
 java.lang.String stripAsString(java.io.InputStream is, int begin, int end)
          strips all pages from begin to end into a String
 java.lang.String[] stripAsStringArray(java.io.InputStream is)
          strips the document into a string array, each entry represents one page
 java.lang.String[] stripAsStringArray(java.io.InputStream is, int begin, int end)
          strips the document pages beginning from page nr.
 
Methods inherited from class java.lang.Object
clone, equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

stripper

org.pdfbox.util.PDFTextStripper stripper

pdDocument

org.pdfbox.pdmodel.PDDocument pdDocument

is

java.io.InputStream is
Constructor Detail

PDFBoxTextStripper

public PDFBoxTextStripper()
Method Detail

initialize

public void initialize()
                throws TextStripperInitializeException
Description copied from class: TextStripper
Does initialization, this method must be called before any other method

Specified by:
initialize in class TextStripper
Throws:
TextStripperInitializeException

finalize

protected void finalize()
                 throws java.lang.Throwable
Overrides:
finalize in class java.lang.Object
Throws:
java.lang.Throwable

abortIfNotInitialized

private void abortIfNotInitialized()

setUpPdDocument

private void setUpPdDocument(java.io.InputStream is)
                      throws java.io.IOException
This is to optimize subsequent calls on the same InputStream of any stripAs* methods

Parameters:
is -
Throws:
java.io.IOException

stripAsString

public java.lang.String stripAsString(java.io.InputStream is)
                               throws FailedToStripText
Description copied from class: TextStripper
returns contents of fis as String

Specified by:
stripAsString in class TextStripper
Returns:
Throws:
FailedToStripText

stripAsString

public java.lang.String stripAsString(java.io.InputStream is,
                                      int page)
                               throws FailedToStripText
Description copied from interface: PageSupport
strips a single page out of the document

page - the page to be stripped
Returns:
Throws:
FailedToStripText

stripAsStringArray

public java.lang.String[] stripAsStringArray(java.io.InputStream is)
                                      throws FailedToStripText
Description copied from interface: PageSupport
strips the document into a string array, each entry represents one page

Returns:
Throws:
FailedToStripText

stripAsString

public java.lang.String stripAsString(java.io.InputStream is,
                                      int begin,
                                      int end)
                               throws FailedToStripText,
                                      InvalidPageIndexException
Description copied from interface: PageSupport
strips all pages from begin to end into a String

Returns:
Throws:
InvalidPageIndexException
FailedToStripText

stripAsStringArray

public java.lang.String[] stripAsStringArray(java.io.InputStream is,
                                             int begin,
                                             int end)
                                      throws FailedToStripText,
                                             InvalidPageIndexException
Description copied from interface: PageSupport
strips the document pages beginning from page nr. "begin" to page nr. "end" into a string array, each entry represents one page, respectively

Returns:
Throws:
FailedToStripText
InvalidPageIndexException