Skip to main content

Questions tagged [ocr]

Optical Character Recognition, the process of converting printed or handwritten text or images of text into digitally encoded text on a computer (so that, for example, it can be reproduced, machine-translated, reformatted, edited, distributed, used as input to software such as text-to-speech and so on)

Filter by
Sorted by
Tagged with
101 votes
9 answers

What's the best, simplest OCR solution?

I'd like to scan a good amount of papers I have lying around, with the least possible hassle. I would like to convert them to images using Simple Scan, then convert them to text using OCR. Is there a ...
Bou's user avatar
  • 4,502
48 votes
5 answers

How can I extract text from images?

How can I extract text from images? I am not talking about scanned files, but garden variety images, such as when you take a high-def picture of a blackboard at class, and it is nicely handwritten; ...
Strapakowsky's user avatar
42 votes
1 answer

How do I install a new language pack for Tesseract on 16.04

Just installed gscan2pdf v1.3.9 as well as Tesseract. As for the latter, first it appeared at the bottom of my Installed Software list, but now it seems to be gone, although still working (I think). ...
m.a.a.'s user avatar
  • 645
41 votes
12 answers

How can instantaneously extract text from a screen area using OCR tools?

In Ubuntu 12.10, if I type gnome-screenshot -a | tesseract output it returns: ** Message: Unable to use GNOME Shell's builtin screenshot interface, resorting to fallback X11. How can I select a ...
Erling's user avatar
  • 497
37 votes
7 answers

How to turn a pdf into a text searchable pdf?

I have a number of scanned documents in pdf and I want to be able to search them. How can I do that? Essentially I have to OCR the pdf and then blend the extracted text back into a new pdf. I have ...
don.joey's user avatar
  • 28.8k
35 votes
10 answers

Adding OCR info to a PDF

I have a good quality scan of a document; such scan is in pdf format. How can I add ocr information to the pdf, so that it becomes searchable? By searchable I mean that the goal is that when viewing ...
fdierre's user avatar
  • 1,023
15 votes
5 answers

How do I edit text in a scanned .jpeg?

I need to upload a scanned image as a PDF document. After scanning the document, I have a .jpeg with small text that I want to edit before converting to PDF for the upload. I have never done this ...
Mysterio's user avatar
  • 12.1k
6 votes
2 answers

document management private users

i am searching for a document management system that supports: can bulk scan documents automatic OCR of scanned documents data storage on my local HD / external server of my choice automatic backups ...
Alex's user avatar
  • 489
6 votes
1 answer

How can I specify the language to be used by Tesseract when using OCRFeeder

I'm using the OCR-utility of OCRFeeder. OCRFeeder is using the tesseract-engine. I have installed the several language-packs needed for tesseract. How can I set the language such that tesseract will ...
Bernard Decock's user avatar
6 votes
1 answer

How do I produce a multi-page sandwich pdf with hocr2pdf?

I used tesseract to produce the special html to use with hocr2pdf starting from a muti-page tif. I tried using hoc2pdf to produce a "sandwich pdf" (image + hidden text layer). Hocr2pdf produces a ...
To Do's user avatar
  • 15.6k
5 votes
2 answers

ocrfeeder doesn't detect anything

When I try to detect text on my jpeg, it shows correctly all areas where it suspects text and images, but when I export it to ODT it only creates an ODT with empty text- and imageframes. Do I have to ...
rubo77's user avatar
  • 32.8k
5 votes
2 answers

Abbyy fine reader like application for Ubuntu 13.04

I have a lot of images and what I want to do is to scan those images and get output in ms word file that can be edited later. For Windows, I have Abbyy fine reader. But I don't want to go back to ...
Faisal Aslam's user avatar
4 votes
1 answer

How to create high fidelity PDFs with copyable text from scans?

Some companies provide software for Windows with their scanners* that can create PDFs from scanned pages which look exactly like the scanned material (as if it were just full-page images) but the text ...
Damn Terminal's user avatar
4 votes
1 answer

How I prevent hocr2pdf to use a large font from tesseract generated .hocr file?

Tesseract now creates an .hocr file rather than an .html file for ocr output, but this is not exactly what is at issue here. When hocr2pdf uses this output it uses a large text size with small ...
user299889's user avatar
4 votes
0 answers

How to add OCRed text to original pdf in gscan2pdf? [closed]

I am new to gscan2pdf 0.9.31, and just used it to OCR a scanned pdf. After saving the pdf, the OCRed text is stored on the top left corner. However I wish each OCRed character to be added to exactly ...
Tim's user avatar
  • 25.4k
3 votes
2 answers

Optical character recognition for LibreOffice

I have a paper document. There are more pages containing a table with 3 columns (current number, name and a grade). I scanned it and got 16 jpeg documents. Each jpeg is a scanned page. Now, I need ...
Mihaita's user avatar
  • 31
3 votes
1 answer

OCR of a pdf with gocr

I installed gocr, with the command suggested by the ubuntu terminal (sudo apt install gocr), in order to carry out an OCR recognition of the text in a pdf file. How could I use it? I didn't find a ...
Gennaro Arguzzi's user avatar
3 votes
1 answer

How can I use OCR on a partial screen capture to get text?

When I was still using Windows I loved using the capture2text OCR program to grab Japanese kanji from manga and dump them into, and was wondering how I could get the same functionality on ...
TakingItCasual's user avatar
3 votes
1 answer

pdfsandwich - how to not change page colour

I am using pdfsandwich but it changes the colour of the pages from colour to black and white. Since I have a document with many coloured pictures how can I avoid it?
brasileiro's user avatar
3 votes
1 answer

What program is suitable for making scanned PDF files searchable?

I would like to be able to scan paper documents to PDF files and make the text searchable. I believe the Tesseract program can assist this, but don't know how to begin, and don't know what would be ...
Hedley Finger's user avatar
3 votes
3 answers

Is there a good OCR-readable font

As part of my backups, I would like to be able to print and later re-scan a Base64-encoded copy of my private key. Unfortunately, neither gocr nor tesseract seems to be able to properly read any font ...
user1207177's user avatar
3 votes
1 answer

How to improve tesseract performance?

By all accounts, tesseract is superb. However, my results are dismal. I need to convert (digital, as opposed to from a book) text that I only have as a png. For instance: 2 3 academics 1 1711 2 ...
katriel's user avatar
  • 447
3 votes
0 answers

Extract text from image

I am looking for software that recognizes text within images. I tried out all of the tools mentioned here (gocr, fuzzyocr, libhocr0, ocrad, ocrfeeder, ocropus, tesseract-ocr, cuneiform). My input was ...
Socrates's user avatar
  • 2,513
2 votes
2 answers

How to wildcard tesseract?

I want tesseract to convert all the files of a folder. I do not want to merge the files in any way as I am having trouble with programs like hocr2pdf and pdfbeads merging more than one file at a time. ...
user140393's user avatar
2 votes
1 answer

OCRopus installing problem

I'm working on a project and need to use OCRopus, I tried to install it on windows but failed, so I moved to Ubuntu. I'm not a nerdy when it comes to Ubuntu, so I'm stuck now. I have installed python ...
Hendk's user avatar
  • 23
2 votes
1 answer

scan receipts with a GUI

I'm new to ubuntu, and I'm trying to find an application to scan my receipts in order to create an expense report. Is there any software available? Any help is greatly appreciated. thanks
user273008's user avatar
2 votes
1 answer

Help with Canon CanoScan LiDE scanned PDF Documents

I have just started working with Ubuntu for the last 10 days, with the intention to stop using Windows permanently. So far it has been awesome. I have replaced almost all my Microsoft applications ...
learner's user avatar
  • 31
2 votes
1 answer

Cannot Scan from Gscan2PDF on 13.10 or OCR with Tesseract

I am having a little bit of trouble with one of my favourite pieces of open source software. I had installed Gscan2PDF (1.0.4) from the Software Centre on my 13.10 64bit machine (clean install from 13....
Dustin's user avatar
  • 2,103
2 votes
2 answers

pdfbeads will only output a single page

Following the instructions from this page I take a djvu document, check it for any sign of corruption by opening it in djvulibre and it checks out fine. Copy it to my testing folder and rename it ...
user140393's user avatar
2 votes
1 answer

Copy+Paste from Screenshot [duplicate]

I receive a lot of screenshots during my daily work. Most of them contain numbers which I need to copy+paste. Is there a magic way to copy+paste numbers from images? I use thunderbird and firefox ...
guettli's user avatar
  • 1,397
2 votes
0 answers

OCR with two-page layout

I'm trying to do OCR on a pdf with a two-page layout - in a landscape-orientation page of the PDF, the left half is one (portrait-orientation) page, the right half is the next (portrait-orientation) ...
Raffi's user avatar
  • 121
2 votes
0 answers

Conversion images pdf to text

I have a 500page pdf scan of a 15th century book. I wish to convert it into a single txt file of any format so as to be able to work on it and/or export it to epub. Calibre is unable to process it. ...
Arnaud Doolittle's user avatar
2 votes
2 answers

Why are no OCR engines working in Gscan2pdf after upgrading to 14.04?

I recently upgraded to Ubuntu 14.04, but the OCR in gscan2pdf stopped working. I am using the latest gscan2pdf (1.2.4) with both Tesseract and Cuneiform available. When loading pdf documents in ...
user273895's user avatar
2 votes
0 answers

Alternative to Paperwork

I am looking for an alternative to the program Paperwork as it is rather complicated to install under Ubuntu. I am specifically interested in finding a way to read dates, prices and other details from ...
user235334's user avatar
2 votes
1 answer

Conversion of tiff image in Python script - OCR using Tesseract

I want to convert a tiff image file to text document. My code works as I expected to convert tiff images with usual font, but it's not working for French script font. My tiff image file contains text. ...
PYTHON TEAM's user avatar
2 votes
1 answer

ASCII art generator with OCR

What program can turn an image into ASCII-art, but also replace any text with the actual text with OCR. For example, for converting a comic to ASCII art.
Christopher King's user avatar
1 vote
3 answers

Ubuntu 18.04 error install tesseract

I've installed Ubuntu 18.04. I've installed tesseract using sudo apt-get install tesseract-ocr When I type: tesseract -v I had an error: tesseract: symbol lookup error: /usr/lib/x86_64-linux-gnu/...
mayur panchal's user avatar
1 vote
3 answers

How to install Mathpix Snip in Ubuntu 22.04 using terminal?

I installed Mathpix Snip in Ubuntu 20.04 by following the following instructions from their documentation: Getting Snip app and launching it from your Terminal (Advanced) Open your terminal. And ...
leo's user avatar
  • 11
1 vote
1 answer

Tesseract OCR Engine on ubuntu how to

I've installed tesseract-ocr. I was looking at the manual, but i can't see an option that i can define an image bounds (X,Y,W,H) Can someone help about it , or am asking in a wrong place ?
Ahmed Al-attar's user avatar
1 vote
2 answers

Process tesseract output: remove line breaks, concatenate individual pages

I have a pdf that I cannot directly process into .txt to go through piper TTS because the output from the .pdf is missing letters and otherwise generally unintelligible (
iconoclasthero's user avatar
1 vote
3 answers

gimage reader OCR

I have recently installed gimage reader OCR. It is not obvious how to use it. I have not yet worked out how to get an editable text file. My aim is to get a libreoffice file to edit and save. Thanks ...
TonyB's user avatar
  • 19
1 vote
2 answers

"sh: 1: cannot open /tmp/pdfsandwich4e375e.html: No such file" when using pdfsandwitch [closed]

I tried to add a textlayer to some pdf files in order to make them searchable. This technique is explained in the german Ubuntu wiki: . After installing ...
highsciguy's user avatar
1 vote
1 answer

Why ocrodjvu's engine not found?

I run unsuccessfully in Ubuntu 14.04 LTS, trying to have better OCR of a DjVU document ocrodjvu --in-place document.djvu but get ocrodjvu: error: OCR engine (tesseract) was not found. I found out ...
Léo Léopold Hertz 준영's user avatar
1 vote
1 answer

OCR-Software sought

What is recommended commercially free usable OCR-Software? It should be accessible via Python or a Python library. And it should run under Linux. The one I came across myself is tesseract. Are there ...
empedokles's user avatar
  • 3,943
1 vote
1 answer

Use xsane as OCR without scanner

When I start xsane, It exits with the message no decvice accessible. But I would like to use it just as OCR tool. How can I suppress the device search?
rubo77's user avatar
  • 32.8k
1 vote
2 answers

converting djvu to pdf trouble with this OCR preserving code

I want to convert djvu to pdf while preserving OCR. This page describes how to do so, but I am getting a blank html file. In /home/steven/Documents/djvu2pdf/1/, djvu2hocr -p 1 Intro.djvu gives me: ...
user140393's user avatar
1 vote
1 answer

Tesseract and OCRopus

I was wondering what relations are between Tesseract and OCRopus? Is OCRopus a wrapper of Tesseract? Or are they now developing independently? What are some advantages of one over the other? Thanks ...
Tim's user avatar
  • 25.4k
1 vote
0 answers

I'm having trouble installing OCRopy, I want to use it to create train data for an old manuscript in latin. What prereqs are needed and lines to write

So I am new to using Ubuntu and I am trying to install OCRopy to make train data with the end goal of creating a transcript for a 15th c. manuscript. So far I am considering that my problem may be a ...
mumbot's user avatar
  • 11
1 vote
1 answer

Can Qt-box-editor be used for tesseract 4.0?

I am using tesseract 4.0 for character recognition. In many blogs, it is written that Qt-box-editor can be used with tesseract 3.x. My question is:- Can Qt-box-editor be used with tesseract 4.0?
Ashna Eldho's user avatar
1 vote
0 answers

Why is OCR-Text recognized with whitespaces after every char?

I'm trying to get all my documents scanned and throw away those nasty papers. To simplify this process I recently bought a Brother ADS-2100e scanner. I thought this scanner could create OCR-PDF on USB-...
Alex's user avatar
  • 43