Tag Archives: Tesseract ocr error

[Solved] Tesseract ocr error: JFIF APP0 must be first marker after SOI

1. Reason

This error occurs because the image format is incomplete or the image format stored in the interface conflicts with the internal encoding.

2. Solve

Convert it to png and then call Tesseract

I packaged a tool

 // Convert the image to png format 
    public  static String convertPng(String url) {
        String tarFilePath = url.substring( 0 , url.lastIndexOf( " . " )) + " .png " ;
         try {
            BufferedImage bufferedImage = ImageIO.read( new File(url));
            BufferedImage newBufferedImage = new BufferedImage(bufferedImage.getWidth(), bufferedImage.getHeight(), BufferedImage.TYPE_INT_RGB);
            newBufferedImage.createGraphics().drawImage(bufferedImage, 0 , 0 , Color.white, null );
            ImageIO.write(newBufferedImage, " png " , new File(tarFilePath));
        } catch (IOException e) {
            return  "" ;
        }
        return tarFilePath;
    }

You cannot directly change the suffix, you need to use ImageIO to convert it, remember