Skip to main content

How to Read PDF and Write into Text file in Java? PDF to Text file in Java

Copy PDF text and paste to Text file in Java

Extract text from an existing PDF document to Text file in  JAVA

In this article, we will seen how to create new text file and Extract text from PDF document to text file.

We will use Apache pdfbox for extract PDF. For use Apache pdfbox we can use Maven project and include dependency or Crate Dynamic Web Project and add pdfbox JAR file. So in this we will use Dynamic Web Project. 

Step 1 : Create new Dynamic Web Project in eclipse

Go to File -> New -> Dynamic Web Project

Create Java class.

Step 2 : Add pdfbox JAR file in Project

Click on below link and download JAR file.

For include JAR into our project follow below steps :

  1. Click Right click on project -> Build Path - >  Configure Build Path
  2. Go to Libraries tab -> Click on Add External JARs button. Select Apache pdfbox jar.
  3. Click Apply and Close button.

Now all set for extracting PDF store into text file.

Step 3 : Java code for Extract PDF text to Text file

import java.io.BufferedWriter;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStreamWriter;

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.text.PDFTextStripper;


public class FileReadWrite {

    public static void main(String[] args) {
        try {
            PDDocument pd;
            BufferedWriter wr;
            
            String filePath = "D:\\JavaFileDemo/";
            
            // The PDF file name and full path that you want to extract
            File input = new File(filePath + "input.pdf");
            
            // The text file name and its path where you want to store
            File output = new File(filePath + "output.txt");

            pd = PDDocument.load(input);
            PDFTextStripper stripper = new PDFTextStripper();
            wr = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(output)));
            stripper.writeText(pd, wr);
            
            if (pd != null) {
                pd.close();
            }
            wr.close();
            System.out.println("Successfully extracted : PDF to text file");
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
    }

If above code successfully compile and run, then "Successfully extracted : PDF to text file" message print.

Go to your Path location and check there is output.txt is successfully created and PDF data is pasted into it.


Other articles you may like :

Spring Boot and Security Articles :

Comments

Popular posts from this blog

Queen's Attack II HackerRank Solution in Java with Explanation

Queen's Attack II Problem's Solution in Java (Chessboard Problem)   Problem Description : You will be given a square chess board with one queen and a number of obstacles placed on it. Determine how many squares the queen can attack.  A queen is standing on an n * n chessboard. The chess board's rows are numbered from 1 to n, going from bottom to top. Its columns are numbered from 1 to n, going from left to right. Each square is referenced by a tuple, (r, c), describing the row r and column c, where the square is located. The queen is standing at position (r_q, c_q). In a single move, queen can attack any square in any of the eight directions The queen can move: Horizontally (left, right) Vertically (up, down) Diagonally (four directions: up-left, up-right, down-left, down-right) The queen can move any number of squares in any of these directions, but it cannot move through obstacles. Input Format : n : The size of the chessboard ( n x n ). k : The number of obstacles...

Java Hashset HackerRank Solution | Programming Blog

Java Hashset HackerRank Solution with Explanation   Problem Statement :- In computer science, a set is an abstract data type that can store certain values, without any particular order, and no repeated values. {1,2,3} is an example of a set, but {1,2,2} is not a set. Today you will learn how to use sets in java by solving this problem. You are given n pairs of strings. Two pairs (a,b) and (c,d) are identical if a = c and b = d. That also implies (a,b) is not same as (b,a). After taking each pair as input, you need to print number of unique pairs you currently have. See full problem description in HackerRank Website :- https://www.hackerrank.com/challenges/java-hashset/problem Let's see solution of problem. import java.util.HashSet; import java.util.Scanner; public class Solution {     public static void main(String[] args) {         Scanner s = new Scanner(System.in);         System.out.println("Enter tot...