Transcriptions - focus on re-doing older papers before starting new ones.
We are missing so many stories about our ancestors due to Abysmal Transcriptions. Yes, I can edit and update existing garbage to make more sense, but that does not help me find articles in the first place. If what is transcribed isn't anything approaching what is printed, the article will never be found except by sheer chance. Occasionally I have found an article by chance, then copied the image into google keep where it does a far better job of transcription than what is on this site. Now that there is AI (or rather large language models) the transcriptions from any of the specialised sites is superior to that.
Why focus on new newspapers when there is already a treasure trove of unfindable information in what you have already scanned. Focus on re-doing older papers before starting new ones.
For example - Bucksey (BCCKSBY or B<Kkiwr) and Househam (HOCSEHAM or Houehm), Dickinson (Dickinaoo) and Murchie (MurAie) are unlikely to ever find this record unless someone edits it for you.
The only reason the middle marriage is correct is because I edited it - ONE of the 3 words Barton was correctly spelled!
-
Stray
commented
Improving older newspaper transcriptions should make a major difference for researchers, especially when names and key details are currently unreadable in search results. Ongoing OCR reprocessing sounds like the right priority, while another online option can be found at https://play-regal.org.uk/
-
Thank you for your feedback. You're absolutely right that poor transcriptions can make valuable articles effectively invisible to search, and the examples you've provided illustrate that problem well.
Improving the searchability of our existing archive is a major area of focus for us. We are currently undertaking a large-scale reprocessing programme, continually re-running existing content through newer OCR technology. This is not a one-off project but an ongoing process, meaning transcription quality should continue to improve over time as we work through the archive.
You mention AI, and it's something we look at closely. However, our primary goal is to create a trustworthy historical archive. While Large Language Models can sometimes produce impressive results, they can also introduce hallucinations and generate text that was never present in the original newspaper. For historical research, we believe it is better to have an imperfect transcription that reflects the source material than a fluent transcription that may contain invented content.
That said, OCR technology has improved significantly in recent years, and we expect the combination of improved OCR, better page zoning and ongoing reprocessing to make a substantial difference to the discoverability of material already in the archive.