[Okular-devel] OCR Tool for Okular

Discussion:

Anıl Özbek

2013-04-03 11:20:35 UTC

Hi,

Last week, I've started to write a simple OCR tool for Okular.
Generally it received good response from KDE users [1-3].

What do you think about adding such a tool to Okular? Is it possible?
If possible, I'd be happy to help as far as I can do. But I would like
to say that I'm not experienced in the KDE/Qt development.

Currently my code (which mostly copy/paste from other projects) take
an image part from active document and save it to os's temp dir. Then
run a particular OCR app's executable file (for now only Tesseract)
and convert image to text file. Finally code open the text file and
copy its content to clipboard. And after all, the temporary files are
deleted.

I think before going any further it would be better to clarify some
issues that I encountered.

API vs Executable
-------------------
Which one would be better to use? It's easier to use the executable
file. But using API seems a more right approach. As far as I see
Tesseract [4] and Cuneiform [5] provide API but I don't know about
other OCR software.

Maybe instead of trying to give support to more than one OCR software
we can choose just a default one. But it will restrict the users.

If we use API, Okular will link to OCR software libraries and this
means more dependencies for Okular package. If we use executable, we
can check executable file before running it and if it's not installed
we can show a info message to user which tells something like that:
"additional packages must be installed to use this feature".

If we choose API way these [6-9] way help.

OCR Output's Accuracy
-----------------------
OCR performance isn't well enough (at least for comics) for now. There
is almost 50% success. My current code use image directly from comics,
may be it would be nice to convert image first black and white or
2-bit and apply some other image operations to make letters clearer.
Do you have any suggestions about this?

Icon for OCR Tool
-------------------
Currently I used scanner icon from Oxygen [10] but if we have a better
option we can use it.

Document Language
-------------------
To give OCR software correct parameters we must know document
language. For now Okular can't determine language of opened documents
[11]. Until this feature implemented we can add a new section to
Okular Configurations for OCR tool. Users can select language for OCR
process from here as well as which OCR software will be used.

Links
-------
[1] http://wklej.org/id/995982/
[2]

[3] https://plus.google.com/113435503145887565355/posts/RqzC3hMcGcd
[4] https://code.google.com/p/tesseract-ocr/
[5] https://launchpad.net/cuneiform-linux
[6] https://raw.github.com/ruediger/VobSub2SRT/master/CMakeModules/FindTesseract.cmake
[7] https://raw.github.com/ck1125/sikuli/master/cmake_modules/FindTesseract.cmake
[8] https://projects.kde.org/projects/playground/libs/kolena/repository/revisions/master/entry/cmake/modules/FindTesseract.cmake
[9] https://raw.github.com/uliss/quneiform_tests/master/cmake/FindCuneiform.cmake
[10] Loading Image...

[11] https://bugs.kde.org/show_bug.cgi?id=317486

Regards,
--
An?l ?zbek

Albert Astals Cid

2013-04-03 17:18:34 UTC

Permalink

Post by AnÄ±l Ãzbek
Hi,

Post by AnÄ±l Ãzbek
Last week, I've started to write a simple OCR tool for Okular.
Generally it received good response from KDE users [1-3].
What do you think about adding such a tool to Okular? Is it possible?
If possible, I'd be happy to help as far as I can do. But I would like
to say that I'm not experienced in the KDE/Qt development.
Currently my code (which mostly copy/paste from other projects) take
an image part from active document and save it to os's temp dir. Then
run a particular OCR app's executable file (for now only Tesseract)
and convert image to text file. Finally code open the text file and
copy its content to clipboard. And after all, the temporary files are
deleted.
I think before going any further it would be better to clarify some
issues that I encountered.
API vs Executable
-------------------
Which one would be better to use? It's easier to use the executable
file. But using API seems a more right approach. As far as I see
Tesseract [4] and Cuneiform [5] provide API but I don't know about
other OCR software.
Maybe instead of trying to give support to more than one OCR software
we can choose just a default one. But it will restrict the users.
If we use API, Okular will link to OCR software libraries and this
means more dependencies for Okular package. If we use executable, we
can check executable file before running it and if it's not installed
"additional packages must be installed to use this feature".
If we choose API way these [6-9] way help.
OCR Output's Accuracy
-----------------------
OCR performance isn't well enough (at least for comics) for now. There
is almost 50% success. My current code use image directly from comics,
may be it would be nice to convert image first black and white or
2-bit and apply some other image operations to make letters clearer.
Do you have any suggestions about this?
Icon for OCR Tool
-------------------
Currently I used scanner icon from Oxygen [10] but if we have a better
option we can use it.
Document Language
-------------------
To give OCR software correct parameters we must know document
language. For now Okular can't determine language of opened documents
[11]. Until this feature implemented we can add a new section to
Okular Configurations for OCR tool. Users can select language for OCR
process from here as well as which OCR software will be used.

Why should this be a part of Okular and not a separate binary? I can imagine
millions of other places i'd like to have OCR on. What's the benefit of it
being Okular-only?

Cheers,
Albert

Post by AnÄ±l Ãzbek
Links
-------
[1] http://wklej.org/id/995982/
[2] http://youtu.be/duSTyByIPLc
[3] https://plus.google.com/113435503145887565355/posts/RqzC3hMcGcd
[4] https://code.google.com/p/tesseract-ocr/
[5] https://launchpad.net/cuneiform-linux
[6]
https://raw.github.com/ruediger/VobSub2SRT/master/CMakeModules/FindTesserac
t.cmake [7]
https://raw.github.com/ck1125/sikuli/master/cmake_modules/FindTesseract.cma
ke [8]
https://projects.kde.org/projects/playground/libs/kolena/repository/revisio
ns/master/entry/cmake/modules/FindTesseract.cmake [9]
https://raw.github.com/uliss/quneiform_tests/master/cmake/FindCuneiform.cma
ke [10] http://i.imgur.com/xn8iyDw.png
[11] https://bugs.kde.org/show_bug.cgi?id=317486
Regards,
--
An?l ?zbek
_______________________________________________
Okular-devel mailing list
Okular-devel at kde.org
https://mail.kde.org/mailman/listinfo/okular-devel

Anıl Özbek

2013-04-03 18:12:30 UTC

Permalink

Post by Albert Astals Cid
Why should this be a part of Okular and not a separate binary? I can imagine
millions of other places i'd like to have OCR on. What's the benefit of it
being Okular-only?

I need small-scale OCR only at three type software:

* Web Browser (Chrome)
* Image Viewer (Gwenview)
* Document Viewer (Okular)

And document viewer comes first for me. Actually there isn't much
benefits of OCR tool at Okular. I can count only one or two:

* It's easier to run (more than once) a builtin tool instead of run a
third party app.
* It works more integrated with Okular (selected text shown with
Okular's notification system) and it can be used very similarly to
other tools in Okular (just select something and selected thing is in
your clipboard like Copy to Clipboard tool does).
* Not everyone can find such a app if it released standalone. Because
it's not a very generic type of software. But if it comes with an
widely used and related application, more people can use it.

Regards,
--
An?l ?zbek

Albert Astals Cid

2013-04-03 22:52:19 UTC

Permalink

Post by AnÄ±l Ãzbek

Post by Albert Astals Cid
Why should this be a part of Okular and not a separate binary? I can
imagine millions of other places i'd like to have OCR on. What's the
benefit of it being Okular-only?

* Web Browser (Chrome)
* Image Viewer (Gwenview)
* Document Viewer (Okular)

So are you planning to implement the same feature three times seems a bit
boring and a hell maintainance wise?

Sincerely I think a standalone application/library is much better. Once that
standalone application/library exists we might even add a toolbar entry or
something in okular if we detect it is installed to ease the use to the end
user.

Cheers,
Albert

Post by AnÄ±l Ãzbek
And document viewer comes first for me. Actually there isn't much
* It's easier to run (more than once) a builtin tool instead of run a
third party app.
* It works more integrated with Okular (selected text shown with
Okular's notification system) and it can be used very similarly to
other tools in Okular (just select something and selected thing is in
your clipboard like Copy to Clipboard tool does).
* Not everyone can find such a app if it released standalone. Because
it's not a very generic type of software. But if it comes with an
widely used and related application, more people can use it.
Regards,
--
An?l ?zbek
_______________________________________________
Okular-devel mailing list
Okular-devel at kde.org
https://mail.kde.org/mailman/listinfo/okular-devel

Anıl Özbek

2013-04-04 09:19:11 UTC

Permalink

Post by Albert Astals Cid
Sincerely I think a standalone application/library is much better. Once that
standalone application/library exists we might even add a toolbar entry or
something in okular if we detect it is installed to ease the use to the end
user.

Thank you for your feedback. I'll try to write a simple Qt
application. In fact I've even got the name: scr2ocr(tm).

Regards,
--
An?l ?zbek

Albert Astals Cid

2013-04-04 21:05:20 UTC

Permalink

Post by AnÄ±l Ãzbek

Post by Albert Astals Cid
Sincerely I think a standalone application/library is much better. Once
that standalone application/library exists we might even add a toolbar
entry or something in okular if we detect it is installed to ease the use
to the end user.

Thank you for your feedback. I'll try to write a simple Qt
application. In fact I've even got the name: scr2ocr(tm).

Good :-)

Have you thought of making this application be part of the KDE family? I'm
sure we'd be interested in hosting it in our repos and you becoming part of
our community :-)

Cheers,
Albert

Post by AnÄ±l Ãzbek
Regards,
--
An?l ?zbek
_______________________________________________
Okular-devel mailing list
Okular-devel at kde.org
https://mail.kde.org/mailman/listinfo/okular-devel

Anıl Özbek

2013-04-05 21:36:39 UTC

Permalink

That would be great. I've started to read related documents like "The
development lifecycle for a new application" and "Get a Contributor
Account" from KDE TechBase.

By the way, when I searched deeper I found two similar app to my future scr2ocr:

* ocr_copy: https://github.com/spreelanka/ocr_copy
* liveocr: https://github.com/gkovacs/liveocr

these (they're not complete and therefore not work as expected, at
least for me) and KSnapshot, QSnapshot, ScreenGrab, YASCU-Qt etc.
screen capture softwares may help me writing scr2ocr.

But I don't sure about screen capturing part in scr2ocr, maybe I don't
need implement it at all. Screen captureres or softwares' internal
tools do this for me :) For example (may be not a good one but):

$ ksnapshot --region --copy-to-stdout | scr2ocr

https://bugs.kde.org/show_bug.cgi?id=298493

I hope I can contribute directly to Okular in future. There are some
very nice feature requests at KDE Bugzilla:

https://bugs.kde.org/buglist.cgi?quicksearch=okular

Regards,
--
An?l ?zbek

Albert Astals Cid

2013-04-05 22:54:31 UTC

Permalink

Post by AnÄ±l Ãzbek
That would be great. I've started to read related documents like "The
development lifecycle for a new application" and "Get a Contributor
Account" from KDE TechBase.
* ocr_copy: https://github.com/spreelanka/ocr_copy
* liveocr: https://github.com/gkovacs/liveocr
these (they're not complete and therefore not work as expected, at
least for me) and KSnapshot, QSnapshot, ScreenGrab, YASCU-Qt etc.
screen capture softwares may help me writing scr2ocr.
But I don't sure about screen capturing part in scr2ocr, maybe I don't
need implement it at all. Screen captureres or softwares' internal
$ ksnapshot --region --copy-to-stdout | scr2ocr

This kind of interfacing with ksnapshot would make much sense, maybe you could
even call ksnapshot internally to make it transparent to the user. You can try
approaching Richard to see how he'd fell for such a feature to exist.

Post by AnÄ±l Ãzbek
https://bugs.kde.org/show_bug.cgi?id=298493
I hope I can contribute directly to Okular in future. There are some
https://bugs.kde.org/buglist.cgi?quicksearch=okular

We have lots of things to work on :-)

Cheers,
Albert

Post by AnÄ±l Ãzbek
Regards,
--
An?l ?zbek
_______________________________________________
Okular-devel mailing list
Okular-devel at kde.org
https://mail.kde.org/mailman/listinfo/okular-devel