Improve Optical Character Recognition
The existing OCR is poor. Many words are garbled, rendering search results meaningless
-
Anonymous commented
I agree. Although BNA say they use "state of the art" OCR, it's often very poor, and simply feeding an image of an article to one of the free web-based OCR services often produces far better results than BNA's transcription.
-
Anonymous commented
Improve your character-recognition software: it really needs to recognise a long 's' .
-
Anonymous commented
Ability to correct text from within find my past
-
johnjo commented
This short example shows how bad the OCR can be!
b^*T. fad to ala t.o* unfit r *f b* f aaapar.aj b* nvglMt or d«la> Th* 1«44i u*4l •u'kjntb «iU rtrw >• •oiiiMbiog • hi* L will |>ut them ,n 4 batuw tb*y lit«id iu u*for* •*» I tii4»*a* •urkinn. *f» 4l(*Adj m Uua bat tar pu«itA<4 in <i v«rk lOtalitVo* <jf lhe wvrti"g Ulna 1* *bonar condition* .J labcvr 4r« oaora aatiafactory; mwii cannot t» dri»#i» 4iid harAaaad tliay •ar» Tf.at ha* a trad# »n.wn •bt«b abiatda a«aiiiat tnanr abtiaaa •OK-h fortoarlr lb a* anbnut Bur* •* Worth But tb*M improtovitoou or* port -
johnjo commented
I am referring to recently scanned pages. The Lincs Chronicle of 1919 is almost useless. I would love to post here the text as read by OCR to illustrate this.
-
johnjo commented
Sometimes it seems that the newspapesr you are scanning are of poor quality. They do fade and get damaged, and their storage affects their survival. It must be difficult to get hold of good copies quickly, so some rescanning may have to take place in the future
-
Anonymous commented
If we could use wildcards it would get round some of these problems
-
John Woolman commented
There are some quite common misreads such as 'tbe' instead of 'the' would it be possible to do a find all and replace all search
-
Anonymous commented
I find that searching with 'EXACT' for a surname like IVIN picks up Giving and other such words this shouldn't be so with good OCR