Apache PDFBox is an open source library for Java to work with PDF documents.In this article, we will learn how to use PDFBox to Create / Read PDF in Java.
In order to use Apache PDFBox we need to have the following dependencies added in your project.
- pdfbox-2.0.7.jar
- fontbox-2.0.7.jar
- commons-logging-1.2.jar
If you are running on maven add the below dependency to your pom.xml
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>com.javainterviewpoint</groupId> <artifactId>PDFBoxExample</artifactId> <version>0.0.1-SNAPSHOT</version> <packaging>jar</packaging> <name>PDFBoxExample</name> <url>http://maven.apache.org</url> <properties> <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> </properties> <dependencies> <dependency> <groupId>org.apache.pdfbox</groupId> <artifactId>pdfbox</artifactId> <version>2.0.7</version> </dependency> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>3.8.1</version> <scope>test</scope> </dependency> </dependencies> <build> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-compiler-plugin</artifactId> <version>3.7.0</version> <configuration> <source>1.8</source> <target>1.8</target> </configuration> </plugin> </plugins> </build> </project>
How to use PDFBox to Create / Read PDF
PDFBox create PDF Example
package com.javainterviewpoint; import java.io.IOException; import org.apache.pdfbox.pdmodel.PDDocument; import org.apache.pdfbox.pdmodel.PDPage; import org.apache.pdfbox.pdmodel.PDPageContentStream; import org.apache.pdfbox.pdmodel.font.PDFont; import org.apache.pdfbox.pdmodel.font.PDType1Font; public class CreatePDF { public static void main(String[] args) { //Creating a new document PDDocument document = new PDDocument(); //Creating a new page and adding it to the document PDPage page = new PDPage(); document.addPage(page); PDFont font = PDType1Font.HELVETICA_BOLD_OBLIQUE; try { //ContentStream holds the content PDPageContentStream contentStream = new PDPageContentStream(document,page); //Set the starting offset for contentStream and font contentStream.beginText(); contentStream.setFont(font, 14); //Text offset contentStream.newLineAtOffset(100, 500); //Display the mentioned text at the offset specified contentStream.showText("PDF created using Apache PDFBox 2.0"); contentStream.endText(); //Closing the contentStream contentStream.close(); //Location for saving the pdf file document.save("c://JavaInterviewPoint//Hello.pdf"); //Closing the document document.close(); } catch(IOException ie) { ie.printStackTrace(); } } }
In order to create a new PDF all we need to do is
- Create a instance of PDDocument and PDPage
PDDocument document = new PDDocument(); PDPage page = new PDPage();
- Add the page to the document
document.addPage(page);
- Create a new PDPageContentStream instance passing the above created document and page
PDPageContentStream contentStream = new PDPageContentStream(document,page);
- Using the showText() method display the content which we need to display
contentStream.showText("PDF created using Apache PDFBox 2.0"); contentStream.endText();
- Finally, close the PDPageContentStream, PDDocument
document.save("c://JavaInterviewPoint//Hello.pdf"); document.close();
Changing PDFBox PDType0Font
By default PDFBox supports a standard set of 14 fonts listed below, which will always be available when consuming PDF documents.
Standard Font | Description |
---|---|
PDType1Font.TIMES_ROMAN | Times regular |
PDType1Font.TIMES_BOLD | Times bold |
PDType1Font.TIMES_ITALIC | Times italic |
PDType1Font.TIMES_BOLD_ITALIC | Times bold italic |
PDType1Font.HELVETICA | Helvetica regular |
PDType1Font.HELVETICA_BOLD | Helvetica bold |
PDType1Font.HELVETICA_OBLIQUE | Helvetica italic |
PDType1Font.HELVETICA_BOLD_OBLIQUE | Helvetica bold italic |
PDType1Font.COURIER | Courier |
PDType1Font.COURIER_BOLD | Courier bold |
PDType1Font.COURIER_OBLIQUE | Courier italic |
PDType1Font.COURIER_BOLD_OBLIQUE | Courier bold italic |
PDType1Font.SYMBOL | Symbol Set |
PDType1Font.ZAPF_DINGBATS | Dingbat Typeface |
In our previous example we have used “HELVETICA_BOLD” font, “PDType1Font” font supports only the above mentioned 14 fonts. In order to use a custom font then we have to use “PDType0Font” passing our custom font. lets look into the below example where we try to create a PDF with “CALIBRI” font.
package com.javainterviewpoint; import java.io.File; import java.io.IOException; import org.apache.pdfbox.pdmodel.PDDocument; import org.apache.pdfbox.pdmodel.PDPage; import org.apache.pdfbox.pdmodel.PDPageContentStream; import org.apache.pdfbox.pdmodel.font.PDFont; import org.apache.pdfbox.pdmodel.font.PDType0Font; public class ChangeFont { public static void main(String[] args) { // Creating a new document PDDocument document = new PDDocument(); // Creating a new page and adding it to the document PDPage page = new PDPage(); document.addPage(page); try { // Manually loading the font PDFont font = PDType0Font.load(document, new File("c://JavaInterviewPoint//calibri.ttf")); // ContentStream holds the content PDPageContentStream contentStream = new PDPageContentStream(document, page); // Set the starting offset for contentStream and font contentStream.beginText(); contentStream.setFont(font, 14); // Text offset contentStream.newLineAtOffset(100, 500); // Display the mentioned text at the offset specified contentStream.showText("Changing the font - Apache PDFBox 2.0"); contentStream.endText(); // Closing the contentStream contentStream.close(); // Location for saving the pdf file document.save("c://JavaInterviewPoint//Hello1.pdf"); // Closing the document document.close(); } catch (IOException ie) { ie.printStackTrace(); } } }
PDFBox Extract Text Line By Line
In order to extract Text from a PDF we need to use PDFTextStripper class, in the below example we will try to extract text from the first page of the PDF.
package com.javainterviewpoint; import java.io.File; import java.io.IOException; import org.apache.pdfbox.pdmodel.PDDocument; import org.apache.pdfbox.text.PDFTextStripper; public class ExtractText { public static void main(String[] args) { try { File file = new File("c://JavaInterviewPoint//Hello.pdf"); //Reading the pdf file PDDocument document = PDDocument.load(file); //Get the number of pages System.out.println("Number of pages in the pdf :"+document.getNumberOfPages()); //Strip the text from a particular page PDFTextStripper textStripper = new PDFTextStripper(); //Lets read page 1 textStripper.setStartPage(1); textStripper.setEndPage(1); System.out.println("Text in the pdf >>> "+textStripper.getText(document)); } catch(IOException ie) { ie.printStackTrace(); } } }
Output:
Number of pages in the pdf :1 Text in the pdf >>> PDF created using Apache PDFBox 2.0
Leave a Reply