Use Google OCR and resubmit all pages through it.
Using Google OCR gives an almost flawless image to text conversion. I think you are using the hopeless Tesseract. I suggest you collaborate with Google, and resubmit all the pages already done, the improvements will be enormous. As an example the Keith Waterhouse column in Daily Mirror 19780810 page 8, gives an utter garbled output. The first section that comes out of Google OCR without any manual corrections is.
My hit that missed..
THAT splendid actor Jack Hedley, in an interview about the making of the BBC serial "Who Pays the
Ferryman?" tells of the mysterious influence that the island of Crete had over him and other members of
"It had a profound effect on me," he reports. "I have changed considerably since I came back. One of
the fundamental changes is that I don't take The Times any more."
As well as being inscrutable this is a great shame, for Mr. Hedley is a thoughtful man, and The Times
has just commenced a series that will cause furrows on many a ruminative forehead.
These articles are about historical events which never took place, and the first one poses the
question: "What would have happened it Hitler had been assassinated in July, 1944 7" **
your version reads (partially)
THAT splendid actor Jack Hedley, in an interview . ' 959/ 5 -. WashingMachine7l42A Sale about the making of the BB C serial " Who Pays the 9 Programmes rpm spn speed ary wort capably. PRICE Ferryman?” tells of the mysterious influence that the ','• 17 5 as island of Crete had over him and other members of the cast. -- "It had a ofound effect on me,” he reports. " I have \ changed considerably since I came back. One of the la 1 f 35 TRADE-IN f ra u o n re
This is really not acceptable. As a result of this, searching itself never finds the articles it might find (say for search for a relative, or place).