Improve Optical Character Recognition

The existing OCR is poor. Many words are garbled, rendering search results meaningless

132 votes

Anonymous shared this idea · May 21, 2014 · Report… · Admin →

Open - Ongoing process · Oct 14, 2014

Show previous admin responses (2)

An error occurred while saving the comment

Anonymous commented · September 23, 2015 2:43 AM · Report

I agree. Although BNA say they use "state of the art" OCR, it's often very poor, and simply feeding an image of an article to one of the free web-based OCR services often produces far better results than BNA's transcription.

Submitting...
Anonymous commented · January 29, 2015 3:16 AM · Report

Improve your character-recognition software: it really needs to recognise a long 's' .

Submitting...
Anonymous commented · December 29, 2014 5:55 AM · Report

Ability to correct text from within find my past

Submitting...
johnjo commented · October 17, 2014 1:41 AM · Report

This short example shows how bad the OCR can be!
b^*T. fad to ala t.o* unfit r *f b* f aaapar.aj b* nvglMt or d«la> Th* 1«44i u*4l •u'kjntb «iU rtrw >• •oiiiMbiog • hi* L will |>ut them ,n 4 batuw tb*y lit«id iu u*for* •*» I tii4»*a* •urkinn. *f» 4l(*Adj m Uua bat tar pu«itA<4 in <i v«rk lOtalitVo* <jf lhe wvrti"g Ulna 1* *bonar condition* .J labcvr 4r« oaora aatiafactory; mwii cannot t» dri»#i» 4iid harAaaad tliay •ar» Tf.at ha* a trad# »n.wn •bt«b abiatda a«aiiiat tnanr abtiaaa •OK-h fortoarlr lb a* anbnut Bur* •* Worth But tb*M improtovitoou or* port

Submitting...
johnjo commented · October 16, 2014 10:32 AM · Report

I am referring to recently scanned pages. The Lincs Chronicle of 1919 is almost useless. I would love to post here the text as read by OCR to illustrate this.

Submitting...
johnjo commented · September 8, 2014 1:50 PM · Report

Sometimes it seems that the newspapesr you are scanning are of poor quality. They do fade and get damaged, and their storage affects their survival. It must be difficult to get hold of good copies quickly, so some rescanning may have to take place in the future

Submitting...
Anonymous commented · August 28, 2014 2:39 AM · Report

If we could use wildcards it would get round some of these problems

Submitting...
John Woolman commented · May 29, 2014 3:29 AM · Report

There are some quite common misreads such as 'tbe' instead of 'the' would it be possible to do a find all and replace all search

Submitting...
Anonymous commented · May 22, 2014 12:12 PM · Report

I find that searching with 'EXACT' for a surname like IVIN picks up Giving and other such words this shouldn't be so with good OCR

Submitting...