|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object com.asprise.util.pdf.PDFReader
Represents a PDF reader with image rendering and text extraction feature.
Basic flow:
Main features:
PDFReader reader = new PDFReader(new File("my.pdf"));
reader.open(); // open the file.
int pages = reader.getNumberOfPages();
for(int i=0; i < pages; i++) {
String text = reader.extractTextFromPage(i);
System.out.println("Page " + i + ": " + text);
}
... // perform other operations on pages.
reader.close(); // finally, close the file.
Constructor Summary | |
PDFReader(java.io.File pdfFile)
Creates a new PDF reader for the given PDF file. |
|
PDFReader(java.io.InputStream pdfStream)
Creates a new PDF reader with the specified stream as the input. |
Method Summary | |
void |
close()
Closes the PDF and releases resources used. |
java.lang.String |
extractTextFromPage(int pageIndex)
Extracts text from the specified page. |
int |
getNumberOfPages()
Returns the total number of pages in the PDF. |
java.awt.image.BufferedImage |
getPageAsImage(int pageIndex)
Renders the specified page as a buffered image. |
java.awt.Rectangle |
getPageSize(int pageIndex)
Returns the page size of the specified page. |
PDFSecurityObject |
getSecurityObject()
Returns the security object. |
static void |
main(java.lang.String[] args)
A utility that extract text from a PDF file. |
void |
open()
Opens and parses the pdf content. |
void |
savePageAsImageFile(int pageIndex,
java.lang.String formatName,
java.io.File output)
Saves the specified page as an image file with the given format. |
void |
setSecurityObject(PDFSecurityObject securityObject)
If the PDF is encrypted, you need to supply a security object to 'unlock' the PDF before open(). |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
public PDFReader(java.io.InputStream pdfStream)
pdfStream
- public PDFReader(java.io.File pdfFile) throws java.io.FileNotFoundException
pdfFile
-
java.io.FileNotFoundException
Method Detail |
public void open() throws java.io.IOException
This method may throw exception when the PDF page is too complex to rasterize (for example type 0 font). In that case, you can use this free utility.
java.io.IOException
public void close() throws java.io.IOException
java.io.IOException
public int getNumberOfPages()
public java.awt.image.BufferedImage getPageAsImage(int pageIndex) throws java.io.IOException
This method may throw exception when the PDF page is too complex to rasterize (for example type 0 font). In that case, you can use this free utility.
pageIndex
- - zero based page index, i.e., the first page is page 0.
java.io.IOException
public void savePageAsImageFile(int pageIndex, java.lang.String formatName, java.io.File output) throws java.io.IOException
This method may throw exception when the PDF page is too complex to rasterize (for example type 0 font). In that case, you can use this free utility.
pageIndex
- - zero based page index, i.e., the first page is page 0.formatName
- - valid values are "gif", "jpeg", "png"output
-
java.io.IOException
public java.lang.String extractTextFromPage(int pageIndex) throws java.io.IOException
pageIndex
- - zero based page index, i.e., the first page is page 0.
java.io.IOException
public java.awt.Rectangle getPageSize(int pageIndex)
pageIndex
- - zero based page index, i.e., the first page is page 0.
public static void main(java.lang.String[] args) throws java.lang.Exception
args
-
java.lang.Exception
public PDFSecurityObject getSecurityObject()
public void setSecurityObject(PDFSecurityObject securityObject)
securityObject
-
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |