Sunday, January 27, 2008

Parsing Names with Honorifics

In Railscast #16, Ryan Bates goes over Virtual Attributes in Rails, using the standard example of storing first and last names but getting/setting full names. He uses the following simple snippet:


def full_name=(name)
  split = name.split(' ', 2)
  self.first_name = split.first
  self.last_name = split.last
end

Which -- given that the focus was on virtual attributes -- is fine for explanation. However, that snippet will fail on names like "Franklin Delano Roosevelt" (last name of "Delano Roosevelt"). Here's a method which our 32d President will like better:


def clean(n, re = /\s+|[^[:alpha:]\-]/)
 return n.gsub(re, ' ').strip
end

# Returns [first_name, last_name] (or '' if there isn't any).
# Leading/trailing spaces ignored.
def first_last_from_name(n) 
    parts    = clean(n).split(' ')
    [parts.slice(0..-2).join(' '), parts.last]
end

names = [
    "Bill! Merkin,PhD.",
    "Jim               Thurston Howell III   ",
    "Charo", 
    "Heywood Jablowmie",
    "Sergei Rodriguez-Ivanoviv",
    "Polly Romanesq. ",
    "   ", 
    "",
    ]
p names.map { |n| first_last_from_name n }
# => [["Bill", "Merkin,PhD"], ["Jim Thurston Howell", "III"], ["", "Charo"], ["Heywood", "Jablowmie"], ["Sergei", "Rodriguez-Ivanoviv"], ["Polly", "Romanesq"], ["", nil], ["", nil]]

A regex is more extensible, and makes more sense for Perl refugees like me.


# Returns [first_name, last_name] (or nil if there isn't any).
# Leading/trailing spaces ignored.
def first_last_from_name_re(n)
    n = clean(n); 
    (n =~ / /) ? (n.scan(/(.*)\s+(\S+)$/).first) : [nil, n]     
end

p names.map { |n| first_last_from_name_re n }
# => [["Bill", "Merkin,PhD"], ["Jim Thurston Howell", "III"], [nil, "Charo"], ["Heywood", "Jablowmie"], ["Sergei", "Rodriguez-Ivanoviv"], ["Polly", "Romanesq"], [nil, ""], [nil, ""]]

However, as someone who can't check in at the automatic kiosks in airports because -- no joke -- the credit card thinks my last name is "IV", I like this version better.


# Returns [first_name, last_name, appendix] 
# (first name and appendix are nil if there isn't any).
# Leading/trailing spaces ignored.
# 
def first_last_appendix_from_name_re(n, appendix = nil)
    n = clean(n)
    appendix_re ||= %q((I|II|III|IV|(?:jr|sr|m\.?d|esq|Ph\.?D)\.?))
    if (n !~ / /) then
        [nil, n, nil]           # with no spaces return n as last name
    else
        n.scan(
          /\A(.*?)\s+           # everything up to the last name
           (\S+?)               # last name is last stretch of non-whitespace
           (?:                  # But! there may be an appendix.  Look for an optional group
             (?:,\s*|\s+)       #   that is set off by a comma or spaces
             #{appendix_re}     #   and that matches any of our standard honorifics.
             )?                 # but if not, don't worry about it.
           \Z/ix).first         # scan gives array of arrays; \A..\Z guarantees exactly one match
    end
end

p names.map { |n| first_last_appendix_from_name_re n }
# => [["Bill", "Merkin", "PhD"], ["Jim Thurston", "Howell", "III"], [nil, "Charo", nil], ["Heywood", "Jablowmie", nil], ["Sergei", "Rodriguez-Ivanoviv", nil], ["Polly", "Romanesq", nil], [nil, "", nil], [nil, "", nil]]

All three versions might make Japanese (and other "FamilyName GivenNames" cultures) sad.

Labels: , , , , , , , , , , , , ,

Friday, October 26, 2007

The Asdrubal Carrera Hall of Fame

Inspired by one of Tim McCarver's flights of fancy during the ALCS, I present The Asdrubal Carrera Hall of Fame, open to anyone in unique possession of a particular first name among Major League baseball players: LIST.

You may be familiar with Honus Wagner, Eppa Rixey, Boog Powell or Yogi Berra. But have you heard the storied diamond exploits of Firpo Mayberry, Zoilo Versalles, Pi Schwert or Bevo LeBourveau? OK, then how about Mysterious Walker, The Only Nolan, or Phenomenal Smith?

For some dinnertime fun over the holidays, discuss the relative merits of Urban Shocker, Twink Twining, Pussy Tebeau, Bris Lord, Boob Fowler, Crazy Schmit, Creepy Crespi, Cuddles Marshall, Vinegar Bend Mizell, and Buttercup Dickerson. (Unfortunately, 12 other players keep Rusty Kuntz off this list.)

Other stunningly yclept combatants include Ambiorix Burgos, Alamazoo Jennings, Welcome Gaston, Chicken Hawks, Sixto Lezcano, Wheezer Dell, Yam Yaryan, Yo-Yo Davalillo, Admiral Schlei, Boss Schmidt, Brick Smith, Brickyard Kennedy, Broadway Jones, Cannonball Titcomb, Baby Doll Jacobson, Sweetbreads Bailey, Zaza Harvey, Bubbles Hargrave, Pickles Dillhoefer, Double Joe Dwyer, Cowboy Jones, Coot Veal, Mul Holland, Live Oak Taylor, Skyrocket Smith, Kaiser Wilhelm, Kewpie Pennington, Possum Whitted, Snooks Dowd, and Mox McQuery.

See the list for links to each player's Baseball Reference page. Nerds may additionally view the generating mySQL query here.

Labels: , , , , , , , , ,