When performing OCR, it's quite remarkable how the quality of the output is driven by the fonts which the software used supports/understands. For example, I was asked to digitise a printed document for a family history society (which owned the copyright to the document). The first attempt at OCR-ing the document produced absolute rubbish. I then spent some time identifying the font (not long - at most 1 hour). The quality of the OCR output then rose to 99% ...
I have no detailed knowledge of the history of newspapers but I suspect that many used the same small set of fonts which are no longer standard today. I reckon that a newspaper historian would be able to point to documents which chart the history of newspaper fonts, and that this would allow The BNA to install the appropriate fonts into its OCR software. (Don't forget to include the bold and italic flavours as well.)
When performing OCR, it's quite remarkable how the quality of the output is driven by the fonts which the software used supports/understands. For example, I was asked to digitise a printed document for a family history society (which owned the copyright to the document). The first attempt at OCR-ing the document produced absolute rubbish. I then spent some time identifying the font (not long - at most 1 hour). The quality of the OCR output then rose to 99% ...
I have no detailed knowledge of the history of newspapers but I suspect that many used the same small set of fonts which are no longer standard today. I reckon that a newspaper historian would be able to point to documents which chart the history of newspaper fonts, and that this would allow The BNA to install the appropriate fonts into its OCR software. (Don't forget to include the bold and italic flavours as well.)