A Few OCR Thoughts Based on a Marriage Announcement

I’ve been working in digital newspaper collections lately and this newspaper clipping, from the Mendon [Illinois] Dispatch of December 1935, got me to thinking about ways that we need to search newspapers. In this case, it was the typos and errors that made several key points. This clipping was located the old fashioned way though–a manual search based upon my grandparents date of marriage and where they were living at the time of their marriage.

Trautretter

Grandma’s maiden name was actually Trautvetter. For some reason it is spelled “Trautretter” throughout the announcement. Soundex searches will not catch the reference and other search formulations might not either, depending upon how they are constructed.

The Headline

The last name of the groom, Neill, is spelled correctly throughout the announcement. However, there is a blob over part of the name in the headline. If the headline had been the only location where the last name of Neill appeared, searches based upon that name might not have located the reference.

Still-well or Stillwell?

There is a dash in the name of “Stillwell” in the last reference to it in the announcement. Why eludes me, but again that dash (or hyphen) might cause searches for just the name of the town to not locate the reference if only the hyphenated version has been used.

Kaithsburg

It is actually Keithsburg. Easily a typo.

——————–
Fortunately the dates and other details in the document are correct, based upon the actual record of the marriage. But it never hurts to keep some of these things in mind when searching digital versions of newspapers.

And if you have the date of an event, a manual search is still a good idea–just in case.

Share

2 thoughts on “A Few OCR Thoughts Based on a Marriage Announcement

  1. Good examples of why our searches don’t produce results! It’s a miracle they work as well as they do. If you know a date and location, actually reading the newspaper (online or on microfilm) may be the only way to locate the information.

  2. Clorinda Madsen says:

    The ANcestor Hunt blog has put out an article on common mistakes that OCR can make and what letter substitutes can be useful in searching.

    Also, if you ever have the chance to look at raw OCR data and compare it to the actual print, you’ll be glad that we actually have the number of hits we do. Sometimes, it is just downright nasty.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

This site uses Akismet to reduce spam. Learn how your comment data is processed.