Saturday, January 29, 2011

Using Wordle to Generate a Family Name Cloud

Now that I'm finished Randy's Saturday Night Genealogy Fun, I decided to have my own fun by playing with word clouds.

Earlier today I was in a presentation where my boss had used the Wordle tool to generate a word cloud that described our library. A word cloud visually captures a snapshot of text and presents the words in different font sizes and weights to illustrated the frequency of a word used within that text.

This is a tool I've heard of before, and had played around a little with, but had never seen used in a meaningful or illustrative way. I wondered if I could find an interesting way to generate a genealogy-related word cloud. Particularly, I wondered if I could somehow extract surnames from my genealogy software and create a cloud that would illustrate how frequently a family name appears in my database.

I have to say I'm fairly pleased with the results:

For privacy reasons, I decided to first exclude all living people from my cloud, since there are a handle of names used in recent generations only among the living. To generate the data used (using Reunion):
  1. Identified all non-living people using one of Reunion's preset searches. It finds all people with a death date, death place, burial date, or who is over 100 years. Imperfect, I know, since it is possible to have relatives living more than 100 years (I have none). It also omits anyone who has no birth or death dates at all -- many of whom in my database are in fact deceased. Nevertheless, I got a decent sized sample of 590 people.
  2. I marked the resulting people Reunion had identified as non-living, then exported a text file of their surnames.
  3. Copy and pasted the list of names into Wordle to generate the word cloud. Once in Wordle, you can play with fonts, colors and layouts, though Wordle determines the sizes of the words. (Regarding privacy: By not saving my Wordle to their gallery, the site claims none of my text was saved to their site: http://www.wordle.net/faq#secure)
My cloud turned out to be a pretty decent visual representation of the families populating my database. The largest names do tend the be the most populous. But in general most names in the cloud I can read with "the naked eye" (with a couple of exceptions) are families I remember researching and entering in Reunion.

4 comments:

  1. Sara,

    I was looking for your email address on your Blog. Sorry, I must have missed it.

    We may have a couple of things in common. Worrall, Sharpless ... I just in the next county to Hunterdon, go through Trenton lots, actually around it, to Burlington County ...

    As some would say "we gotta talk".

    Thank you,

    Russ

    hrworth@gmail.com

    ReplyDelete
  2. Great minds think alike! I recently did a Wordless Wednesday post using a Wordle image of the surnames in my family tree: http://goo.gl/uLjwT. I have a Surnames page on my blog which lists all the surnames of direct ancestors (my own, my husband's and my children's). I generated the image directly from that page by entering the URL in Wordle.

    ReplyDelete
  3. Caroline, that's a great idea for generating a family name Wordle. I'd originally tried doing something with my blog, but wasn't liking the results. I'd thought of doing a names page of sorts for this blog, but haven't gotten to it -- you've inspired me to think more seriously about producing it.

    ReplyDelete
  4. I feel sheepish that I'd never heard of Wordle before reading your post. I've been fiddling with it for the last half hour. Thank you.

    ReplyDelete