OCR Test (v. 0.5.9) Разработано Robert Theis |
Experimental app for optical character recognition (OCR).
Runs the Tesseract 3.00 open source OCR engine to find text in images captured by the device camera.
The purpose of this app is to demonstrate OCR running on an Android device. Conventionally, OCR is run using a flatbed scanner to scan printed pages of text. In contrast, running OCR on images captured by a smartphone/tablet camera on an Android device gives much lower quality--but interesting--recognition results.
The default single-shot capture runs OCR on a snapshot image that's captured when you click the shutter button, like a regular photo.
When the "continuous preview" checkbox is checked, the app shows a dynamic, real-time display of what the device is recognizing right beside the camera viewfinder. The continuous preview works best on a fast device.
A translation capability (powered by Google/Bing) can be run after the OCR is finished. The translator results are not too useful on a practical level, though, because the errors in OCR are compounded by the machine translation.
Some notes on using this app:
- Hold down the on-screen shutter button to auto-focus, and release your finger from the button to take the picture. Or just tap the button to take the picture without autofocus.
- The same suggestions for effectively using the camera work as in other apps, such as Google Goggles or Google Docs: hold the camera steady, with the camera lens perpendicular to the word or characters you want to capture, and be sure the autofocus engages before taking the picture. Also, the OCR engine expects text to be approximately horizontal when the picture is captured in landscape orientation.
- To copy text to the clipboard, long-press on the recognized text or translated text.
- For recognizing individual Chinese characters, set the page segmentation mode to "single character."
- Supported languages for OCR: Bulgarian, Catalan, Chinese (Simplified), Chinese (Traditional), Czech, Danish, Dutch, English, Finnish, French, German, Greek, Hungarian, Indonesian, Italian, Japanese, Korean, Latvian, Lithuanian, Polish, Portuguese, Romanian, Russian, Serbian (Latin), Slovak, Slovenian, Spanish, Swedish, Tagalog, Turkish, Ukrainian, and Vietnamese.
Development notes:
- This app is a mash-up of several open source projects: the Tesseract OCR engine, Tesseract Tools for Android (tesseract-android-tools), the Zxing Barcode Scanner, Google-api-translate-java, Microsoft-translator-java-api, and a grad school class project. Language data downloads use files in the tesseract-ocr project at http://code.google.com/p/tesseract-ocr/downloads/list.
- Students and others researching how to make an app that performs OCR may be interested in this short tutorial: http://rmtheis.wordpress.com/2011/08/06/using-tesseract-tools-for-android-to-create-a-basic-ocr-app/. Questions about image processing using Tesseract should be directed to the tesseract-ocr mailing list: http://groups.google.com/group/tesseract-ocr.
- The source code for this project will be available on Github in a few weeks under the Apache 2 license. I'll put a link to the project page here when it's available.
- Thanks to the contributors: Spoorthi, Hunvil, Jingjing, Xuyuan, and Mandar.
|
[2011-10-23] Gaui: Awesome Awesome, although it needs to support special letters like: Ð, ð, É, é, Ý, ý, Ú, ú, Í, í Ó, ó, Ö, ö, Á, á, Æ, æ, Þ, þ ... then it would be super!! |
|
[2011-10-23] Monty: Cant download update |
|
[2011-10-10] DJ AndieK: Awesome simple app Exactly what I was looking for. A simple OCR app that can recognise Kanji and use google translator to translate into English. |
|
[2011-10-10] Rob: Good start Works well with clear sans serif fonts struggles with script. |
|
[2011-09-24] Devashish: Need help in making OCR app for learning project Hi I am an undergrad student and I am trying to make an android application that needs to perform OCR reading. I tried looking for open source libraries but I was not able to find any. Then I stumbled upon this, and I must say the app is really incredibly accurate and fast! I was wondering if you could publish your code that performs the OCR reading? undocumented code will also be fine! if you would like I can talk to you about my complete app. Thanks. |
|
[2011-09-22] The Bat: Interesting Looks like a decent stab at a project of this magnitude. Lots of kinks to work out, though. Samsung Fascinate. |
|
[2011-09-09] Mark: OCR can't resolve clear text at large size. Much work needs to be done on the basics! HTC Desire. |
|
[2011-09-09] Phillip: Truly amazing Any way you can open source this? |
|
[2011-09-06] James: Oct test Love it!!! |
|
[2011-09-06] Waleed: Good effort so far. Please add arabic to it. |
|
[2011-09-05] p: Excellent proof of concept, very impressive to get it running on a mobile CPU. Looking forward to more rounded versions in future! |
|
[2011-09-03] simon: A work in progress, but Awsome. Keep going. |
|
[2011-09-03] Pokester: Cool demo! Could easily become useful app to capture model no., serial nos , recipes or biz cards... or translate on-the-fly |
|
[2011-08-30] sy: Amazing Extremely accurate on my Thunderbolt. I would pay for this app! Please keep updating it! |
|
[2011-08-18] Jake: Brilliant. Android is sadly lacking. If you can add Pinyin to the Chinese OCR translation, I am good for a $20 donation. |
|
[2011-08-12] Pritpaul: Nice! I know you intend this as a test app, but it works well enough to be useful in some cases. Could we please get a button to copy the text to clipboard? |
|
[2011-08-04] psmith: Cool experimental app Cool experimental app. Is the source code available? Please support thai text recognition, it would be awesome. |
|
[2011-07-20] Folko: Please continue! Goggles sometimes returns images only. |
|
[2011-07-17] Del Deegi: Impressive... Russian translation works awsome. Like the focus when I hold the shotter button... |
|
[2011-07-04] John: Was this coded on a bar napkin? Needs a lot of work. |
|
|