<?xml version='1.0' encoding='UTF-8'?><rss xmlns:atom='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' version='2.0'><channel><atom:id>tag:blogger.com,1999:blog-4201613802871642889</atom:id><lastBuildDate>Fri, 01 Feb 2008 18:27:58 +0000</lastBuildDate><title>Vizsage</title><description/><link>http://vizsage.com/blog/</link><managingEditor>flip</managingEditor><generator>Blogger</generator><openSearch:totalResults>28</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>25</openSearch:itemsPerPage><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-4201613802871642889.post-8350476070712450886</guid><pubDate>Mon, 28 Jan 2008 01:42:00 +0000</pubDate><atom:updated>2008-01-27T20:15:47.258-06:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>acts</category><category domain='http://www.blogger.com/atom/ns#'>plugins</category><category domain='http://www.blogger.com/atom/ns#'>rails</category><category domain='http://www.blogger.com/atom/ns#'>views</category><category domain='http://www.blogger.com/atom/ns#'>acts_as_authenticated</category><category domain='http://www.blogger.com/atom/ns#'>templates</category><category domain='http://www.blogger.com/atom/ns#'>irb</category><category domain='http://www.blogger.com/atom/ns#'>ruby</category><category domain='http://www.blogger.com/atom/ns#'>log</category><category domain='http://www.blogger.com/atom/ns#'>debug</category><category domain='http://www.blogger.com/atom/ns#'>restful-authentication</category><category domain='http://www.blogger.com/atom/ns#'>attr_accessible</category><category domain='http://www.blogger.com/atom/ns#'>migrations</category><category domain='http://www.blogger.com/atom/ns#'>as</category><category domain='http://www.blogger.com/atom/ns#'>authenticated</category><category domain='http://www.blogger.com/atom/ns#'>console</category><category domain='http://www.blogger.com/atom/ns#'>layouts</category><title>Rails Lessons Learned the Hard Way</title><description>Things I've learned the hard way in Rails:
&lt;ul&gt;
&lt;li&gt;Layouts run &lt;strong&gt;inside&lt;/strong&gt; views, not the other way round.  Set an instance variable in app/views/monkeys/show.html.erb and it will be defined in app/views/layouts/monkey.html.erb but not vice versa.
&lt;ul&gt;
  &lt;li&gt;set instance vars in view&lt;br /&gt;
      &lt;code&gt;@foo_val = find_foo_val&lt;/code&gt;
&lt;/li&gt;
  &lt;li&gt;pass variables to partials using&lt;br /&gt;
      &lt;code&gt;&lt;%= render :partial =&gt; "root/license", :locals =&gt; { :foo =&gt; @foo_val } -%&gt;&lt;/code&gt;
&lt;/li&gt;
  &lt;li&gt;use the instance var freely in the layout; it will take the value defined in the view&lt;/li&gt;
&lt;/ul&gt;
&lt;li&gt;Dump an object for bobo debugging through the console or log:&lt;br /&gt;
      &lt;code&gt;$stderr.puts tag_list.to_yaml&lt;/code&gt;
&lt;/li&gt;

&lt;li&gt;In a migration, if you define a unique index on an attribute, make sure both the index AND attribute are &lt;code&gt;:unique =&gt; true&lt;/code&gt;, or else you'll get no uniqueness validation from Rails:&lt;br /&gt;
&lt;pre&gt;&lt;code&gt;
   create_table  :monkeys do |t|
     # set :unique here
     t.string :name, :default =&gt; "", :null =&gt; false, :unique =&gt; true
   end
   # if you have :unique here
   add_index :datasets, [:name], :name =&gt; :name,  :unique =&gt; true
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;&lt;li&gt;If you scaffold a User or other object with private data, MAKE SURE you &lt;a href="http://blog.wolfman.com/articles/2007/06/26/rest-scaffold_resource-security-warning"&gt;strip out fields you don't want a user setting or viewing&lt;/a&gt;:&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;Set attr_accessible, which controls data coming *in* -- prevents someone setting an attribute by stuffing in a form value.  &lt;/li&gt;
&lt;li&gt;In each view (.html.erb &amp;amp;c) and render method (to_xml), strip out fields you don't want anyone to see using the &lt;code&gt;:only =&gt; [:ok_to_see, :this_too]&lt;/code&gt; parameter.&lt;/li&gt;
&lt;li&gt;Set filter_parameter_logging, which controls what goes into your logs. (Logs should of course be outside the public purview, but 'Defense in Depth' is ever our creed.)&lt;/li&gt;
&lt;/ul&gt;

Using the the restful-authentication generator as an example:&lt;br /&gt;
 &lt;ul&gt;
  &lt;li&gt;In the model, whitelist fields the user is allowed to set (this excludes things like confirmation code or usergroup):&lt;br /&gt;
      &lt;code&gt;attr_accessible :login, :email, :password, :password_confirmation&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;In the controller file, whitelist only the fields you wish to xml serialize:&lt;br /&gt;
      &lt;code&gt;format.xml  { render :xml =&gt; @user.to_xml(:only =&gt; [:first_name, :last_name]) }&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;Obviously,In the show.html.erb and edit.html.erb strip out fields that shouldn't be seen.&lt;/li&gt;
  &lt;li&gt;In the model file, blacklist fields from the logs:&lt;br /&gt;
      &lt;code&gt;filter_parameter_logging :password, :salt, "activation-code"&lt;/code&gt;
 &lt;/liin&gt;&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;I won't even tell you how often this happens to me: If you edit or install code in a plugin, &lt;strong&gt;restart the server&lt;/strong&gt;.
&lt;/li&gt;
&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;script type="text/javascript"
  src="http://vizsage.com/assets/adsense-728x90.js"&gt;
&lt;/script&gt;
&lt;script type="text/javascript"
  src="http://pagead2.googlesyndication.com/pagead/show_ads.js"&gt;
&lt;/script&gt;&lt;/div&gt;</description><link>http://vizsage.com/blog/2008/01/rails-lessons-learned-hard-way.html</link><author>flip</author></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-4201613802871642889.post-5366061723671071075</guid><pubDate>Mon, 28 Jan 2008 00:34:00 +0000</pubDate><atom:updated>2008-01-27T20:06:16.943-06:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>whitespace</category><category domain='http://www.blogger.com/atom/ns#'>rails</category><category domain='http://www.blogger.com/atom/ns#'>honorific</category><category domain='http://www.blogger.com/atom/ns#'>name</category><category domain='http://www.blogger.com/atom/ns#'>regex</category><category domain='http://www.blogger.com/atom/ns#'>MD</category><category domain='http://www.blogger.com/atom/ns#'>match</category><category domain='http://www.blogger.com/atom/ns#'>appendix</category><category domain='http://www.blogger.com/atom/ns#'>parse</category><category domain='http://www.blogger.com/atom/ns#'>sr</category><category domain='http://www.blogger.com/atom/ns#'>ruby</category><category domain='http://www.blogger.com/atom/ns#'>jr</category><category domain='http://www.blogger.com/atom/ns#'>virtual</category><category domain='http://www.blogger.com/atom/ns#'>attributes</category><title>Parsing Names with Honorifics</title><description>&lt;p&gt;In &lt;a href="http://railscasts.com/episodes/16"&gt;Railscast #16&lt;/a&gt;, Ryan Bates goes over Virtual Attributes in Rails, using the standard example of storing first and last names but getting/setting full names. He uses the following simple snippet:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;
def full_name=(name)
  split = name.split(' ', 2)
  self.first_name = split.first
  self.last_name = split.last
end&lt;/code&gt;&lt;/pre&gt; 
&lt;p&gt;Which -- given that the focus was on virtual attributes -- is fine for explanation.  However, that snippet will fail on names like "Franklin Delano Roosevelt" (last name of "Delano Roosevelt").  Here's a method which our 32d President will like better:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;
def clean(n, re = /\s+|[^[:alpha:]\-]/)
 return n.gsub(re, ' ').strip
end

# Returns [first_name, last_name] (or '' if there isn't any).
# Leading/trailing spaces ignored.
def first_last_from_name(n) 
    parts    = clean(n).split(' ')
    [parts.slice(0..-2).join(' '), parts.last]
end

names = [
    "Bill! Merkin,PhD.",
    "Jim               Thurston Howell III   ",
    "Charo", 
    "Heywood Jablowmie",
    "Sergei Rodriguez-Ivanoviv",
    "Polly Romanesq. ",
    "   ", 
    "",
    ]
p names.map { |n| first_last_from_name n }
# =&gt; [["Bill", "Merkin,PhD"], ["Jim Thurston Howell", "III"], ["", "Charo"], ["Heywood", "Jablowmie"], ["Sergei", "Rodriguez-Ivanoviv"], ["Polly", "Romanesq"], ["", nil], ["", nil]]
&lt;/code&gt;&lt;/pre&gt; 
&lt;p&gt;A &lt;a href="http://www.regular-expressions.info/tutorial.html"&gt;regex&lt;/a&gt; is more extensible, and makes more sense for Perl refugees like me.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;
# Returns [first_name, last_name] (or nil if there isn't any).
# Leading/trailing spaces ignored.
def first_last_from_name_re(n)
    n = clean(n); 
    (n =~ / /) ? (n.scan(/(.*)\s+(\S+)$/).first) : [nil, n]     
end

p names.map { |n| first_last_from_name_re n }
# =&gt; [["Bill", "Merkin,PhD"], ["Jim Thurston Howell", "III"], [nil, "Charo"], ["Heywood", "Jablowmie"], ["Sergei", "Rodriguez-Ivanoviv"], ["Polly", "Romanesq"], [nil, ""], [nil, ""]]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;However, as someone who can't check in at the automatic kiosks in airports because -- no joke -- the credit card thinks my last name is "IV", I like this version better.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;
# Returns [first_name, last_name, appendix] 
# (first name and appendix are nil if there isn't any).
# Leading/trailing spaces ignored.
# 
def first_last_appendix_from_name_re(n, appendix = nil)
    n = clean(n)
    appendix_re ||= %q((I|II|III|IV|(?:jr|sr|m\.?d|esq|Ph\.?D)\.?))
    if (n !~ / /) then
        [nil, n, nil]           # with no spaces return n as last name
    else
        n.scan(
          /\A(.*?)\s+           # everything up to the last name
           (\S+?)               # last name is last stretch of non-whitespace
           (?:                  # But! there may be an appendix.  Look for an optional group
             (?:,\s*|\s+)       #   that is set off by a comma or spaces
             #{appendix_re}     #   and that matches any of our standard honorifics.
             )?                 # but if not, don't worry about it.
           \Z/ix).first         # scan gives array of arrays; \A..\Z guarantees exactly one match
    end
end

p names.map { |n| first_last_appendix_from_name_re n }
# =&gt; [["Bill", "Merkin", "PhD"], ["Jim Thurston", "Howell", "III"], [nil, "Charo", nil], ["Heywood", "Jablowmie", nil], ["Sergei", "Rodriguez-Ivanoviv", nil], ["Polly", "Romanesq", nil], [nil, "", nil], [nil, "", nil]]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;All three versions might make Japanese (and other "FamilyName GivenNames" cultures) sad.&lt;div class="blogger-post-footer"&gt;&lt;script type="text/javascript"
  src="http://vizsage.com/assets/adsense-728x90.js"&gt;
&lt;/script&gt;
&lt;script type="text/javascript"
  src="http://pagead2.googlesyndication.com/pagead/show_ads.js"&gt;
&lt;/script&gt;&lt;/div&gt;</description><link>http://vizsage.com/blog/2008/01/parsing-names-with-honorifics.html</link><author>flip</author></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-4201613802871642889.post-3129687550268833035</guid><pubDate>Wed, 23 Jan 2008 03:15:00 +0000</pubDate><atom:updated>2008-01-22T21:28:02.677-06:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>scrabble</category><category domain='http://www.blogger.com/atom/ns#'>scrabulous</category><category domain='http://www.blogger.com/atom/ns#'>foolish</category><category domain='http://www.blogger.com/atom/ns#'>software</category><category domain='http://www.blogger.com/atom/ns#'>facebook</category><category domain='http://www.blogger.com/atom/ns#'>engineering</category><category domain='http://www.blogger.com/atom/ns#'>programming</category><category domain='http://www.blogger.com/atom/ns#'>business</category><title>Copyright Disputes are usually Failures of Imagination</title><description>&lt;p&gt;Hasbro is &lt;a href="http://www.huffingtonpost.com/2008/01/11/hasbro-tries-to-shut-down_n_81176.html"&gt;trying to shut down Scrabulous&lt;/a&gt;, a successful online Scrabble game -- perhaps the most successful Facebook app to date.
&lt;/p&gt;&lt;p&gt;
On the one hand, I think that Hasbro is completely within their rights: it's a clear infringement. 
&lt;/p&gt;&lt;p&gt;
On the other hand, it's a departure from form (they've for a long time licensed gray-market implementations), and a failure of imagination that doesn't account for important subtleties in software engineering and social networks.
&lt;/p&gt;&lt;p&gt;
On the software engineering end, all of the interesting computer Scrabble implementations I know of were created independently and *then* brought into the fold, to both parties' mutual benefit.  Hasbro is a board game company: It doesn't, and shouldn't employ brilliant independent software engineers who create a new entry in the scrabble ecosystem.  The other thing to note is that Scrabulous solves some difficult problems in a way no previous product has. 
&lt;/p&gt;&lt;p&gt;
Here's a brief history of the important scrabble programs I know of.  The first ones let you play against a computer; this requires a powerful artificial intelligence (AI) engine and an unobtrusive interface.  (The hard part is the AI; note that  Scrabulous was written in Flash, a very constrained programming environment).  Maven was the first Scrabble program that played at an expert level (at one time it was the best scrabble player in the world).  Though developed independently, it was purchased by Hasbro (or their licensee) and adopted as the AI agent in the official Hasbro Scrabble software.  I don't believe that the official software has been updated for some years, it was Windows-only, and the &lt;a href="http://www.hasbro.com/games/adult-games/scrabble/home.cfm?page=Products/catalog&amp;amp;sort=displayname"&gt;official scrabble site&lt;/a&gt; has no link to it.  &lt;a href="http://web.archive.org/web/20021212213306/http://www.doe.carleton.ca/%7Ejac/scrab.html"&gt;ACbot&lt;/a&gt;, another early implementation, was independently developed by James A Cherry and could play at a low-expert level..  A current offering is &lt;a href="http://web.mit.edu/jasonkb/www/quackle/"&gt;Quackle&lt;/a&gt;, a free scrabble robot developed by a student at MIT.  Its AI engine is extremely strong (also one of the best players in the world) and its front end, while /quite/ rough, is useable and works on Windows/OSX/Linux.  All of these programs were written outside Hasbro's aegis.  They were developed by experts in computer artificial intelligence and game theory and are far superior to anything that was or could be developed in-house by a board game company.
&lt;/p&gt;&lt;p&gt;
Another approach lets you play against a person using the network in real time. One of the first was MarlDOoM -- a primitive (text only, pre-web technology) free online scrabble bulletin board.  It was developed by &lt;a href="http://www.math.toronto.edu/jjchew"&gt;John Chew&lt;/a&gt;, who at the time was simply a scrabble enthusiast but is now on the official Nat'l Scrabble Association's dictionary committee and the webmaster for their site -- I believe that implementing MarlDOoM helped bring this about.  There are modern programs and websites that are officially licensed and let you compete remotely.  However, their price or subscription fee exceeds the cost of the physical version, and they require that *both* parties pay for the game, which the physical version does not.
&lt;/p&gt;&lt;p&gt;
A third approach is 'scrabble by mail' -- one move every day or so, with as much or as little time commitment and deliberation as you care to devote.  If there are licensed products that allow this I'm not aware of them.
&lt;/p&gt;&lt;p&gt;
In all, here's what you'd like a compelling software version of a board game to offer:
&lt;ol&gt;
&lt;li&gt;Play from any computer, anywhere; simple to acquire, install and use.&lt;/li&gt;&lt;li&gt;Reasonable price compared to the physical game&lt;/li&gt;
&lt;li&gt;Skill level:
&lt;ul&gt;
  &lt;li&gt;Enjoyable for an expert player&lt;/li&gt;
  &lt;li&gt;Enjoyable for a casual player&lt;/li&gt;
  &lt;li&gt;A casual player and a strong player may enjoy a game where their focus is on socializing and not gameplay&lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;Time commitment:
&lt;ul&gt;
  &lt;li&gt;Play for 10 minutes at a time -- an quick diversion.&lt;/li&gt;
  &lt;li&gt;Play for an hour at a time -- a leisure activity&lt;/li&gt;
  &lt;li&gt;Play without having to meet at the same time&lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;Social play:
&lt;ul&gt;
  &lt;li&gt;Play remotely against a friend, in real time (complete a game at one sitting)&lt;/li&gt;
  &lt;li&gt;Play remotely in a "Chess by Mail" context: make a move every day or so, when you have time.&lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt; 6. Competitve play:
&lt;ul&gt;&lt;li&gt;Compete remotely against a skill-matched stranger, in real time or move-a-day&lt;/li&gt;
&lt;li&gt;Track durable competitive rankings &lt;/li&gt;
&lt;li&gt;Tie those ratings to a reputation system to prevent gaming the rating mechanism.&lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;&lt;/ol&gt;
&lt;/p&gt;&lt;p&gt;
None of the licensed programs or sites, as far as I know, cost less than the one-time-only, one-person-plays price of a physical Hasbro scrabble.  Scrabulous is free, requires only a browser, and is available from any computer anywhere.  It provides a simple experience that my computer-incompetent mom can enjoy. (As far as she knows, facebook *is* a scrabble program.) 
&lt;/p&gt;&lt;p&gt;
Scrabulous is the first solution that enables me (an intermediate tournament-level player) to play remotely against any of my casual-level friends -- friends would never pay for, or seek out, or regularly visit, a scrabble-only site.  My friend Jen lives in Shanghai -- no previous approach that I'm aware lets me play on my lunch break against her on her lunch break,.  None let me *easily* discover when a casual friend is on: all require that you go to their sandbox when you want to play, and that all the people you'd like to compete with patronize the same sandbox.  None of them let me jump in / jump out for a quick 10-minute timewaster.   Since Scrabulous/Facebook is part of a compelling portal, it's natural to check in and meet friends; it understands my social network; and the play-by-turns feature lets my scale the time commitment and schedule.
&lt;/p&gt;&lt;p&gt;
No previous approach effectively prevent a cheater from manipulating his rating.  However, in Facebook you are a person: you have friends, you have a name, you are part of a community.  It's still feasible to be a troll or a sock-puppet or any of the other strategies to game or disrupt a community rating, but there are barriers and consequences for doing so.
&lt;/p&gt;&lt;p&gt;
If Hasbro shuts down -- rather than licenses -- Scrabulous it will be a business failure.  They should be ecstatic that people are integrating scrabble into their social lives, and should see a modest halo effect in board game sales.  The revenue stream from Scrabulous' share of Facebook advertising is, I believe, quite significant -- enough for Hasbro and Scrabulous to both enjoy while keeping the game free.
&lt;/p&gt;&lt;p&gt;
More importantly, Social Network research consistently highlights the importance of "Network Effects" in technology adoption (http://en.wikipedia.org/wiki/Metcalfe%27s_law). There are many, many social games on Facebook, and if Scrabulous is taken down the large body of casual users will move to another entry in this niche.  After all, these games are only interesting if your friends also play.  Any Hasbro implementation must not only match the quality of Scrabulous' implementation, but must build a network of friends who select it for their social gameplay arena -- and they must build that network against the ill-will that will accrue from shutting Scrabulous down.
&lt;/p&gt;&lt;p&gt;
Creating a software program (and more importantly a community) like Scrabulous has is HARD: look at all of the previous attempts that have failed to get millions of people to play online.  It's hard because there are subtle and serious software engineering challenges, and it's hard because there are subtle and serious community building challenges.  If Hasbro shuts down the Scrabulous guys there's no reason to think they'll be able to reproduce their success.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;script type="text/javascript"
  src="http://vizsage.com/assets/adsense-728x90.js"&gt;
&lt;/script&gt;
&lt;script type="text/javascript"
  src="http://pagead2.googlesyndication.com/pagead/show_ads.js"&gt;
&lt;/script&gt;&lt;/div&gt;</description><link>http://vizsage.com/blog/2008/01/copyright-disputes-are-usually-failures.html</link><author>flip</author></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-4201613802871642889.post-905722303542146474</guid><pubDate>Thu, 17 Jan 2008 01:22:00 +0000</pubDate><atom:updated>2008-01-16T20:09:18.859-06:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>data</category><category domain='http://www.blogger.com/atom/ns#'>disk usage</category><category domain='http://www.blogger.com/atom/ns#'>mipmap</category><category domain='http://www.blogger.com/atom/ns#'>free</category><category domain='http://www.blogger.com/atom/ns#'>open</category><category domain='http://www.blogger.com/atom/ns#'>osx</category><category domain='http://www.blogger.com/atom/ns#'>utilities</category><category domain='http://www.blogger.com/atom/ns#'>usage</category><category domain='http://www.blogger.com/atom/ns#'>visualization</category><category domain='http://www.blogger.com/atom/ns#'>infographics</category><category domain='http://www.blogger.com/atom/ns#'>useful</category><category domain='http://www.blogger.com/atom/ns#'>apps</category><category domain='http://www.blogger.com/atom/ns#'>clustering</category><category domain='http://www.blogger.com/atom/ns#'>mac</category><title>The power of a good visualization</title><description>&lt;div style="text-align: left;"&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp1.blogger.com/_KypMAWXENa4/R46aSQy0NqI/AAAAAAAAACw/3wOs1kBNlFA/s1600-h/work-all-diskusage.jpg"&gt;&lt;img style="margin: 0pt 0pt 10px 10px; float: right; cursor: pointer;" src="http://bp1.blogger.com/_KypMAWXENa4/R46aSQy0NqI/AAAAAAAAACw/3wOs1kBNlFA/s320/work-all-diskusage.jpg" alt="" id="BLOGGER_PHOTO_ID_5156228261922223778" border="0" /&gt;&lt;/a&gt;I just found a program called &lt;a href="http://grandperspectiv.sourceforge.net/"&gt;Grand Perspective&lt;/a&gt;that present your disk usage as an interactive mipmap (see pic on  right).  Helping web nerds save hard drive space isn't finding hidden heart defects or keeping planes in the air, but I was struck by how well this program demonstrates the power of intelligent data exploration tools. Here are the &lt;a href="http://www.sciam.com/article.cfm?chanID=sa006&amp;amp;colID=13&amp;amp;articleID=00033494-443B-1237-81CB83414B7FFE9F"&gt;Tufte criteria&lt;/a&gt; for information presentation:
&lt;blockquote&gt;&lt;strong&gt;Documentary · Comparative · Causal · Explanatory · Quantified · Multivariate · Exploratory · Skeptical&lt;/strong&gt;&lt;/blockquote&gt;Each box is a file, and each top-level directory takes a continuous rectangular portion of the view.  Scanning a 350GB disk with a /lot/ of tiny files (5+ million for just the far top left corner, the MLB gameday dataset) took &amp;lt; 5 minutes.  You may highlight any box in a segment and navigate "down" to make that segment fill the screen, and may choose to color files by location, depth, name or extension (exploratory, multivariate).

&lt;/div&gt;The giant orange box in the top left was 15GB of pure junk -- apparently a CGI-script generating some page I was screenscraping went crazy and sent me 15GB of junk data, the same line repeated almost billions of times. I had /no/ idea it was sitting there. That dataset was supposed to be huge, so I had never drilled into the directory beyond my standard &lt;tt&gt;du -sc | sort -n&lt;/tt&gt; on the containing directory. The picture, however, showed at a glance what a table of numbers dramatically failed to do: that the directory consumed twice as much as it should. The &lt;span style="font-weight: bold;"&gt;simple metaphor&lt;/span&gt; of diskspace=area and the &lt;span style="font-weight: bold;"&gt;whole-disk view&lt;/span&gt; (explanatory, documentary) - highlighted something important I'd never noticed.

The giant cluster in the bottom right corner is a huge (~51GB) collection of video ephemera I only kinda cared about. I planned, someday, to sort them -- but for that effort and 51GB usage, it was clearly not worth it. By  &lt;span style="font-weight: bold;"&gt;enforcing comparisons&lt;/span&gt;, the data display made me reconsider the value vs. resource consumption of that project and make a more sound decision.
&lt;strong&gt;&lt;/strong&gt;
In all, I freed up almost 100GB and put a few bucks in his tip jar.  Joe Bob says &lt;a href="http://grandperspectiv.sourceforge.net/"&gt;Check it out&lt;/a&gt;.  (&lt;a href="http://lifehacker.com/software/disk-space/geek-to-live--visualize-your-hard-drive-usage-219058.php"&gt;Similar programs&lt;/a&gt; exist for Linux (Baobab) and Windows (WinDirStat) too.)&lt;div class="blogger-post-footer"&gt;&lt;script type="text/javascript"
  src="http://vizsage.com/assets/adsense-728x90.js"&gt;
&lt;/script&gt;
&lt;script type="text/javascript"
  src="http://pagead2.googlesyndication.com/pagead/show_ads.js"&gt;
&lt;/script&gt;&lt;/div&gt;</description><link>http://vizsage.com/blog/2008/01/power-of-good-visualization.html</link><author>flip</author></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-4201613802871642889.post-701349001020216456</guid><pubDate>Tue, 15 Jan 2008 02:08:00 +0000</pubDate><atom:updated>2008-01-14T20:16:37.589-06:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>infographic</category><category domain='http://www.blogger.com/atom/ns#'>data</category><category domain='http://www.blogger.com/atom/ns#'>personal</category><category domain='http://www.blogger.com/atom/ns#'>visualization</category><category domain='http://www.blogger.com/atom/ns#'>neato</category><category domain='http://www.blogger.com/atom/ns#'>graphic</category><category domain='http://www.blogger.com/atom/ns#'>metadata</category><category domain='http://www.blogger.com/atom/ns#'>music</category><category domain='http://www.blogger.com/atom/ns#'>design</category><category domain='http://www.blogger.com/atom/ns#'>restaurants</category><category domain='http://www.blogger.com/atom/ns#'>yearly</category><title>The 2007 Feltron Annual Report</title><description>&lt;a href="http://feltron.com/index.php?/content/2007_annual_report/"&gt;The 2007 Feltron Annual Report&lt;/a&gt; is available now.  In a series of elegant infographics, see the ambit of places he walked to in Brooklyn and Manhattan, review how many albums Mr. Feltron bought in the year (12 CDs, 1LP and 98 download tracks), and how often he visited bars in October (6 times; he made 57 total bar visits in the year, down 39% from 2006).  My print copy is on the way.  (&lt;a href="http://feltron.com/06report_index.html"&gt;last year's report&lt;/a&gt;).

Metadata is the new Eyeballs (which is the old Interaction).&lt;div class="blogger-post-footer"&gt;&lt;script type="text/javascript"
  src="http://vizsage.com/assets/adsense-728x90.js"&gt;
&lt;/script&gt;
&lt;script type="text/javascript"
  src="http://pagead2.googlesyndication.com/pagead/show_ads.js"&gt;
&lt;/script&gt;&lt;/div&gt;</description><link>http://vizsage.com/blog/2008/01/2007-feltron-annual-report.html</link><author>flip</author></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-4201613802871642889.post-8305377648952151972</guid><pubDate>Tue, 08 Jan 2008 18:48:00 +0000</pubDate><atom:updated>2008-01-27T19:40:56.893-06:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>software</category><category domain='http://www.blogger.com/atom/ns#'>tracker</category><category domain='http://www.blogger.com/atom/ns#'>bug</category><category domain='http://www.blogger.com/atom/ns#'>search</category><category domain='http://www.blogger.com/atom/ns#'>time</category><category domain='http://www.blogger.com/atom/ns#'>fast</category><category domain='http://www.blogger.com/atom/ns#'>delay</category><category domain='http://www.blogger.com/atom/ns#'>report</category><category domain='http://www.blogger.com/atom/ns#'>idea</category><category domain='http://www.blogger.com/atom/ns#'>clock</category><category domain='http://www.blogger.com/atom/ns#'>late</category><category domain='http://www.blogger.com/atom/ns#'>NTP</category><category domain='http://www.blogger.com/atom/ns#'>procrastinator</category><title>More things I wish someone else will write</title><description>&lt;p&gt;More random software ideas:&lt;/p&gt;&lt;ul&gt;
&lt;li&gt;Google search, restricted to find bug reports only.  You'd crawl usenet, sourceforge/google code,  debian etc. build farms, open issue trackers, mailing list archives and blogs; extract things in 'pre' tags, and look for repeated stanzas (these indicate where bug was pasted in).&lt;/li&gt;

&lt;li&gt;NTP server along the lines of the &lt;a href="http://davidseah.com/blog/comments/a-chindogu-clock-for-procrastinators/#commentstart"&gt;procrastinator's clock&lt;/a&gt;, that would dither the time (by extending or delaying each second) by up to a set amount fast, and never slow.  You'd have to be careful with rsync, server logs, kerberos/cookie session stores/other authentication... or maybe just use it at the app level, if your clocks will use NTP themselves.&lt;/li&gt;
&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;script type="text/javascript"
  src="http://vizsage.com/assets/adsense-728x90.js"&gt;
&lt;/script&gt;
&lt;script type="text/javascript"
  src="http://pagead2.googlesyndication.com/pagead/show_ads.js"&gt;
&lt;/script&gt;&lt;/div&gt;</description><link>http://vizsage.com/blog/2008/01/more-things-i-wish-someone-else-will.html</link><author>flip</author></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-4201613802871642889.post-7048843856909344856</guid><pubDate>Mon, 07 Jan 2008 15:56:00 +0000</pubDate><atom:updated>2008-01-07T09:58:01.676-06:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>mechanical</category><category domain='http://www.blogger.com/atom/ns#'>woodworking</category><category domain='http://www.blogger.com/atom/ns#'>periodic</category><category domain='http://www.blogger.com/atom/ns#'>fasteners</category><category domain='http://www.blogger.com/atom/ns#'>screws</category><category domain='http://www.blogger.com/atom/ns#'>reference</category><category domain='http://www.blogger.com/atom/ns#'>machining</category><category domain='http://www.blogger.com/atom/ns#'>guide</category><category domain='http://www.blogger.com/atom/ns#'>scale</category><category domain='http://www.blogger.com/atom/ns#'>table</category><category domain='http://www.blogger.com/atom/ns#'>bolts</category><title>Reference Cards</title><description>&lt;p&gt;Here are some pretty  &lt;a href="http://vizsage.com/other/cheatsheets/"&gt;reference cards&lt;/a&gt; I made a while back:&lt;/p&gt;&lt;ul&gt;
    &lt;li&gt;&lt;strong&gt;&lt;a href="http://vizsage.com/other/cheatsheets/ScaleLandmarks.pdf"&gt;Scale Landmarks&lt;/a&gt;&lt;/strong&gt;:    What's something you're familiar with that is about 10 nm big?  How do the speed of continental drift, a raindrop, a champion sprinter and an SR-71A Blackbird compare? What is the range between the least massive (electron) and most massive (universe) objects science can describe?&lt;/li&gt;
    &lt;li&gt;&lt;strong&gt;&lt;a href="http://vizsage.com/other/cheatsheets/PeriodicTable.pdf"&gt;Periodic Table&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
    &lt;li&gt;&lt;strong&gt;&lt;a href="http://vizsage.com/other/cheatsheets/PeriodicTableFlat.pdf"&gt;Periodic Table, Flat&lt;/a&gt;&lt;/strong&gt; -- material properties as a table and not as Mendeleev puts it.&lt;/li&gt;
    &lt;li&gt;&lt;strong&gt;&lt;a href="http://vizsage.com/other/cheatsheets/MechanicalInfo-Fasteners.pdf"&gt;Mechanical, Geometric and Material properties of Screws, Bolts and Fasteners&lt;/a&gt;&lt;/strong&gt; - 
        probably the most useful among these, this gives thread geometry, decimal inch/screw/metric equivalents, mechanical strengths, torque ratings and more.  Super handy for machining or general shop use.&lt;/li&gt;
    &lt;li&gt;A similar table for &lt;strong&gt;&lt;a href="http://vizsage.com/other/cheatsheets/MechanicalInfo-ANHardware.pdf"&gt;AN Hardware&lt;/a&gt;&lt;/strong&gt; (milspec fasteners used in airplanes, racecars and hot rods).&lt;/li&gt;
    &lt;li&gt;&lt;strong&gt;&lt;a href="http://vizsage.com/other/cheatsheets/MechanicalInfo-DecimalEquivalents.pdf"&gt;A flat table of decimal equivalents&lt;/a&gt;&lt;/strong&gt;: decimal and fractional inch, metric, and standard (US) screw sizes.&lt;/li&gt;
    &lt;li&gt;&lt;strong&gt;&lt;a href="http://vizsage.com/other/cheatsheets/MechanicalInfo-ASCIIChart.pdf"&gt;ASCII Chart&lt;/a&gt;&lt;/strong&gt; - easily index up hex, octal, ascii, symbol font/latin font/DOS font values for characters.&lt;/li&gt;
    &lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;script type="text/javascript"
  src="http://vizsage.com/assets/adsense-728x90.js"&gt;
&lt;/script&gt;
&lt;script type="text/javascript"
  src="http://pagead2.googlesyndication.com/pagead/show_ads.js"&gt;
&lt;/script&gt;&lt;/div&gt;</description><link>http://vizsage.com/blog/2008/01/reference-cards.html</link><author>flip</author></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-4201613802871642889.post-559955923258779283</guid><pubDate>Sun, 06 Jan 2008 06:10:00 +0000</pubDate><atom:updated>2008-01-06T21:51:25.636-06:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>data</category><category domain='http://www.blogger.com/atom/ns#'>TAGS</category><category domain='http://www.blogger.com/atom/ns#'>reclaim</category><category domain='http://www.blogger.com/atom/ns#'>tagging</category><category domain='http://www.blogger.com/atom/ns#'>flickr</category><category domain='http://www.blogger.com/atom/ns#'>tag</category><category domain='http://www.blogger.com/atom/ns#'>ratings</category><category domain='http://www.blogger.com/atom/ns#'>semantic</category><category domain='http://www.blogger.com/atom/ns#'>amazon</category><category domain='http://www.blogger.com/atom/ns#'>screenscrape</category><title>Owning my Metadata</title><description>&lt;p&gt;Dear Lazyweb,&lt;/p&gt;&lt;p&gt;

I'd like someone to invent a 'Metadata reclaimer': a program to screenscrape all my amazon ratings, flickr tags, facebook posts, etc.&lt;/p&gt;&lt;p&gt;

I try, as far as possible, to only use apps that let me keep ownership of my metadata.  As our friend &lt;a href="http://pud.com"&gt;pud&lt;/a&gt; has remarked, all successful internet enterprises share the same business model: either &lt;ul&gt;&lt;li&gt;&lt;strong&gt;People pay to Enter Data into your Database&lt;/strong&gt; (eBay, Google AdWords, Flickr, Second Life, World of Warcraft, IMDB pro, Craigslist), or less defensible,&lt;/li&gt;&lt;li&gt;&lt;strong&gt;People Enter Data into Your Database For Free while Other People Pay to Get it Out&lt;/strong&gt; (rapidshare, iTunes Music Store, Pud's Internal Memos; with youtube, myspace, epinions etc viewers pay with the tenuous currency of their ad brain).&lt;/li&gt;&lt;/ul&gt;&lt;/p&gt;&lt;p&gt;
There's nothing wrong with that; all these companies levelled their playing field in some fundamental and important way. (Well, nothing wrong unless you're the loathsome gracenote.com (formerly cddb), who turned an open community-generated resource into a closed database, without even the courtesy of a copy to fork from.)  &lt;/p&gt;&lt;p&gt;But it's fair to ask that I be able to export &lt;strong&gt;my&lt;/strong&gt; copy of the data I've added to their business asset, and to do so easily.&lt;p&gt;&lt;/p&gt;

Sites that play well with others:&lt;ul&gt;&lt;li&gt;my del.icio.us tags and bookmarks&lt;/li&gt;&lt;li&gt;my bloglines/google reader feeds&lt;/li&gt;&lt;li&gt;my librarything.com everything&lt;/li&gt;&lt;li&gt;my last.fm history&lt;/li&gt;&lt;li&gt;my iTunes playcounts, tags and ratings: mostly, I think?&lt;/li&gt;&lt;li&gt;Firefox bookmarks and history&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;Sites with an 'I gave up my metadata and all I got was this stupid webpage' policy:&lt;/p&gt;&lt;ul&gt;&lt;li&gt;facebook posts, friends, photos, everthing&lt;/li&gt;&lt;li&gt;flickr tags &amp;c&lt;/li&gt;&lt;li&gt;amazon recommendations&lt;/li&gt;&lt;li&gt;Google calendar mostly no (at least, the last time I tried to sync my address books it was a Giant Pain in the Ass: nothing was durably id'd and recurring events were semantically incorrect. (Yes, I'd love to have 96 separate entries for my Grandmother's birthday!)&lt;/li&gt;&lt;li&gt;eBay bids, purchases, ratings&lt;/li&gt;&lt;li&gt;Blogger: Blogs, yes if you remote host your site.  However, you can't even /list/ the blogger comments you've made, let alone export them.&lt;/li&gt;&lt;li&gt;I believe Myspace's engineers can't even spell XML&lt;/li&gt;&lt;/ul&gt;(I could be wrong about any of these except the last one).&lt;/p&gt;&lt;p&gt;

I'm picturing something with a plugin architecture -- the main app handles the screenscraping, authentication, form submission, web crawling and file export details; the plugin supplies URL wildcards and regexp's the data back into semantic structure.  With XML export, a motivated plugin author or well-itched user could supply a decent XSLT stylesheet to represent that metadata in a useful local fashion (and with helpful links back to the main site).  It would be useful to have plugins (trivial) and stylesheets (no more or less so) even for sites like Last.fm and Library Thing that Do The Right Thing by granting transparent access to your metadata.&lt;/p&gt;&lt;p&gt;

Much of this may exist in some form or another; for example the &lt;a href="http://connectedflow.com/flickrexport/"&gt;Aperture/iPhoto plugin&lt;/a&gt; will apparently sync your flickr and iPhoto tags, and embed the result into the app database.  But going from XML =&gt; app is more flexible -- and possibly easier -- than the other way 'round.&lt;/p&gt;&lt;p&gt;

I &lt;a href="http://mrflip.com/resources/Ratings.html"&gt;one off'ed this a while back for my Amazon ratings&lt;/a&gt;, but I just saw where I'd gone from ~350 to ~650 'things rated' since then.  I'm hoping the LazyWeb has solved my problem, since I'm not sure where I put those scripts.  (Ironic, considering my previous post.)&lt;p&gt;&lt;div class="blogger-post-footer"&gt;&lt;script type="text/javascript"
  src="http://vizsage.com/assets/adsense-728x90.js"&gt;
&lt;/script&gt;
&lt;script type="text/javascript"
  src="http://pagead2.googlesyndication.com/pagead/show_ads.js"&gt;
&lt;/script&gt;&lt;/div&gt;</description><link>http://vizsage.com/blog/2008/01/owning-my-metadata.html</link><author>flip</author></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-4201613802871642889.post-3876997615767394570</guid><pubDate>Sat, 05 Jan 2008 18:38:00 +0000</pubDate><atom:updated>2008-01-06T21:59:18.607-06:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>data</category><category domain='http://www.blogger.com/atom/ns#'>explore</category><category domain='http://www.blogger.com/atom/ns#'>screenscrape</category><category domain='http://www.blogger.com/atom/ns#'>ncdc</category><category domain='http://www.blogger.com/atom/ns#'>infochimp</category><category domain='http://www.blogger.com/atom/ns#'>retrosheet</category><category domain='http://www.blogger.com/atom/ns#'>perl</category><category domain='http://www.blogger.com/atom/ns#'>retrosheet.org</category><category domain='http://www.blogger.com/atom/ns#'>baseball</category><category domain='http://www.blogger.com/atom/ns#'>get</category><category domain='http://www.blogger.com/atom/ns#'>visualization</category><category domain='http://www.blogger.com/atom/ns#'>POST</category><category domain='http://www.blogger.com/atom/ns#'>vizsage</category><category domain='http://www.blogger.com/atom/ns#'>xml</category><category domain='http://www.blogger.com/atom/ns#'>python</category><category domain='http://www.blogger.com/atom/ns#'>weather</category><category domain='http://www.blogger.com/atom/ns#'>html</category><category domain='http://www.blogger.com/atom/ns#'>mashup</category><category domain='http://www.blogger.com/atom/ns#'>mash-up</category><title>50 years of Baseball Play-by-play data mashed with 50 years of hourly weather data FTW</title><description>&lt;p&gt;&lt;em&gt;&lt;small&gt;Note: I found this sitting in my drafts folder, unpublished.  It actually dates from October.&lt;/small&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;I've had two interesting realizations from the &lt;a href="http://vizsage.com/blog/2007/10/hourly-weather-data-for-each-retrosheet.html"&gt;Retrosheet Baseball data vs. Hourly Weather information&lt;/a&gt; mashup I've implemented.  The first is how my two favorite scripting languages (Python and Perl) compare.  The second is how the hard parts of this process is actually the stupidest part... there's four steps in doing an interesting visualization of open data.  In order of steps as well as decreasing difficulty and decreasing stupidity:&lt;/p&gt;&lt;ul&gt;
&lt;li&gt;Bring the data from behind its bureaucratic barriers&lt;/li&gt;
&lt;li&gt;Unlock it into a universal format&lt;/li&gt;
&lt;li&gt;Process and digest the data&lt;/li&gt;
&lt;li&gt;Actually explore, visualize and share the data&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;
The hardest and least justifiable steps are the first two, a problem we have to fix.[Edit: this is why I'm &lt;a href="http://infochimp.org"&gt;starting infochimp.org&lt;/a&gt;]  &lt;/p&gt;&lt;p&gt;

Here's a longer description of how I did the baseball games / weather data mashup.
&lt;/p&gt;&lt;p&gt;

Several significant parts of this project were written in Perl, for its superior text handling and for the ease of XML::Simple (which I love); several other parts were done in Python, for its more gracious object-orientation.&lt;/p&gt;&lt;p&gt;

To suck in the Hourly Weather Data files, you have to &lt;a href="http://cdo.ncdc.noaa.gov/pls/plclimprod/poemain.accessrouter?datasetabbv=DS3505"&gt;click through a 4-screen web form process&lt;/a&gt; to prepare a query.  Although it sends the final form submission as a POST query, the backend script does accept a GET url (you know, where the data is sent in the URL form.pl?param=val&amp;amp;param2=val&amp;amp;submit=yay instead of in the HTTP request).  There's an excellent &lt;a href="https://www.squarefree.com/bookmarklets/forms.html"&gt;POST to GET bookmarklet&lt;/a&gt; that will take any webpage form and make the parameters appear in the URL.  No guarantees that the backend script will accept this, but it's always worth a twirl for screenscraping webpages or just trying to understand what's going on behind the curtain.&lt;/p&gt;&lt;p&gt;

Now I need to know what queries to generate.  First I needed the location of each major league baseball stadium: Brian Foy posted a &lt;a href="http://www252.pair.com/comdog/google_earth/major_league_baseball_stadiums.kml"&gt;Google Earth index of Major League Stadiums&lt;/a&gt;, a structured XML file with latitude, longitude and other information.  I used the Perl XML::Simple package to bring in this file.  These simple routines just pull in the XML files and create a data structure (hashes and arrays of hashes) that mirror the XML tree.  The stream-based (SAX) parsers are burlier and more efficient, but for this one-off script, who cares?&lt;/p&gt;&lt;p&gt;

Next I needed the locations of all the weather stations.  Perl and Python both have excellent flat-file capabilities.  The global weather station directory is held in a flat file (meaning that each field is a fixed number of characters that line up in columns).  Here's the column header, a sample entry, and numbers showing the width of each field:&lt;pre&gt;
USAF   NCDC  STATION NAME                  CTRY  ST CALL  LAT    LON     ELEV*10
010010 99999 JAN MAYEN                     NO JN    ENJA  +70933 -008667 +00090
123456 12345 12345678901234567890123456789 12 12 12 1234  123123 1234123 123456&lt;/pre&gt;&lt;/p&gt;&lt;p&gt;

To break this apart, you just specify an 'unpack' format string.  'A' means an (8-bit) ASCII character; 'x' means a junk character:&lt;pre&gt;
A6    xA5   xA29                          xA2xA2xA2xA4  xxA6    xA7     xA6
&lt;/pre&gt; The result is an array holding each interesting (non-'x') field. The Perl code snippet:&lt;pre&gt;    # Flat file format
    my $fmt    = "A6x    A5x   A29x                          A2xA2xA2xA4xx  A6x  A7x   A6";
    my @fields = qw{id_USAF id_WBAN name region country state callsign lat lng elev};
    # Pull in each line
    for my $line (&lt;WSTNSFLAT&gt;) {
        next if length($line) &amp;lt; 79; chomp $line;
        # Unpack flat record
        my @flat = unpack($fmt, $line);
        # Process raw record 
        ...
    }
&lt;/pre&gt;  &lt;/p&gt;&lt;p&gt;I also grabbed the station files for Daily weather reports, since that data goes back much farther (generally, we have since ~1945 for Hourly and since ~1900 for Daily).&lt;/p&gt;&lt;p&gt;

Then I score each station by (Proximity and Amount-of-Date), and select the five best stations for each stadium.&lt;/p&gt;&lt;p&gt;

Now, I could of course use Perl to generate the POST request using the HTTP modules, but it was simpler to mindlessly just control click on a dozen links at a time and then answer each form.
and spit out an HTML file with a big matrix of URLs for each station, for a subset of years.  P (You can see the linkdump file here: http://vizsage.com/apps/baseball/results/weather/ParkWeatherGetterDirectory.html)&lt;/p&gt;&lt;p&gt;

I also use perl to clean up the XML generated by the MySQL Query Browser -- which returns a flat XML file with all fields as content, not attributes.  I just suck the file in with XML::Simple, walk down the resultant hash to create a saner (and semantic) data structure, then spit back out as XML.&lt;/p&gt;&lt;p&gt;

The python parts are not terribly interesting. I pull in the flat file, clean up a few data fields and convert in-band NULLs into actual NULLs (they use 99999 to represent a null value in a 5-digit field, for instance) then export the data as a CSV file (for a MySQL LOAD DATA INFILE query).  I chose python for this part because I find its object model cleaner -- it's easier to toss structured records around -- and the CSV module is a tad nicer. &lt;/p&gt;&lt;p&gt;

The idea I find most interesting is that we're starting to get enough rich data on the web to make these cross-domain data mashups easy and fun -- I did all this in less than a week.  With the effortless XML handling and text processing of modern scripting languages (and relieved from any efficiency concerns) it's easy to see forward to a future where we'll have all these datasets sitting at our fingertips.  This data set lets you examine ideas such as "How does the break distance of curveballs change with atmospheric temperature and pressure for a full baseball season?" "Effectiveness of pitchers against gametime temperature, stratified by age of pitcher or inning?"  "Batting average on fly balls vs. ground balls against % of total cloud cover?".  It's easy to come up with a variety of other "This Rich Dataset vs. That Rich Dataset" opportunities. Stock price and Earnings of Harley-Davidson vs. average household income, unemployment and percent of the population that has reached retirement age? Year-by-year movie attendance at comedies compared to dramas, Attendance at Baseball Games, and Sales of Fast Food vs. Consumer Satisfaction Index, national Suicide Rate, and Persons treated for mental health/substance abuse?  Presidential approval rating vs. gasoline prices and Consumer Price Index? Amazon.com sales rank, # mentions on Technorati blogs and # of mentions in mainstream media vs. time?&lt;/p&gt;&lt;p&gt;

The hard part is actually the stupidest part: to unlock the data from behind bureaucratic barriers (the first script I described), then to convert into a universal semantically rich data format (the second set of scripts I described).  Once one person has unlocked this data, however, it's there for the whole world to enjoy, and tools will evolve to capitalize on this bounty of rich, semantically tagged and freely available information.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;script type="text/javascript"
  src="http://vizsage.com/assets/adsense-728x90.js"&gt;
&lt;/script&gt;
&lt;script type="text/javascript"
  src="http://pagead2.googlesyndication.com/pagead/show_ads.js"&gt;
&lt;/script&gt;&lt;/div&gt;</description><link>http://vizsage.com/blog/2007/01/50-years-of-baseball-play-by-play-data.html</link><author>flip</author></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-4201613802871642889.post-4209040339248091342</guid><pubDate>Fri, 04 Jan 2008 04:36:00 +0000</pubDate><atom:updated>2008-01-06T21:28:01.232-06:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>time machine</category><category domain='http://www.blogger.com/atom/ns#'>subversion</category><category domain='http://www.blogger.com/atom/ns#'>rsync</category><category domain='http://www.blogger.com/atom/ns#'>hosting</category><category domain='http://www.blogger.com/atom/ns#'>svn</category><category domain='http://www.blogger.com/atom/ns#'>version</category><category domain='http://www.blogger.com/atom/ns#'>bluehost</category><category domain='http://www.blogger.com/atom/ns#'>disk</category><category domain='http://www.blogger.com/atom/ns#'>file</category><category domain='http://www.blogger.com/atom/ns#'>backup</category><category domain='http://www.blogger.com/atom/ns#'>distributed</category><category domain='http://www.blogger.com/atom/ns#'>quota</category><category domain='http://www.blogger.com/atom/ns#'>versioned</category><category domain='http://www.blogger.com/atom/ns#'>system</category><category domain='http://www.blogger.com/atom/ns#'>diff</category><title>Time Machine is neat-o, but I want a Time and Space Machine for my files</title><description>&lt;p&gt;I've long wished for a versioned home directory, but the svn/ish seem too heavyweight, and it's nice to have a live copy and not an opaque DB-ball.  The right answer the stunningly elegant &lt;a href="http://arstechnica.com/reviews/os/mac-os-x-10-5.ars/14"&gt;Time Machine&lt;/a&gt;.  The idea &lt;a href="http://www.mikerubel.org/computers/rsync_snapshots/"&gt;isn't&lt;/a&gt; &lt;a href="http://rsnapshot.org/"&gt;actually&lt;/a&gt; &lt;a href="http://users.softlab.ece.ntua.gr/~ttsiod/backup.html"&gt;new&lt;/a&gt;, and it can be approximated &lt;a href="http://code.google.com/p/flyback/"&gt;across platforms&lt;/a&gt; and &lt;a href="http://www.macosxhints.com/article.php?story=20071220105635147"&gt;remotely&lt;/a&gt; using standard Unix tools.  Now you just need a landing spot.&lt;/p&gt;&lt;p&gt;

Some remote backup solutions have come to the fore lately.  $Zero/year gets you 2GB remote backup from &lt;a href="https://mozy.com/?code=T7LAHN"&gt;Mozy&lt;/a&gt;. $60/year buys unlimited backup space at &lt;a href="https://mozy.com/?code=T7LAHN"&gt;Mozy&lt;/a&gt;, where 'unlimited' means the ~40 GB/month you'll see by leaving your pipe fully saturated 24/7.  A price between free and $100/year gets you the more intriguing &lt;a href="http://www.crashplan.com/"&gt;CrashPlan&lt;/a&gt;.  These are slick and easy.  If you don't know what ssh is, you want one of these.  If you do have ssh and you don't set your mom's computer up with the free Mozy account then you're a bad person. &lt;/p&gt;&lt;p&gt;

For the uber-uber-nerds, use what I'm using: an $85/year &lt;a href="http://www.bluehost.com/track/mrflipco/text1/"&gt;bluehost account&lt;/a&gt;.  You may think of it as a 600GB ssh'able rsync'able remote backup host that, by the way, can also act as a webserver.  You can install svn (as I have) to do versioning over svn+ssh.  Two years of bluehost costs the same as a 500GB hard drive+cheap enclosure, and they regularly increase your diskspace allowance at the same monthly price. &lt;small&gt;[Note: the preceding bluehost link will give me a kickback if you sign up through it.  Hit bluehost.com directly if that rubs you the wrong way.]&lt;/small&gt;&lt;/p&gt;&lt;p&gt;

As described below, all my various project shards get sync'ed back to my desktop PC.  The desktop then pushes my files out to a bluehost account via rsync, versioned with an &lt;a href="http://blog.interlinked.org/tutorials/rsync_time_machine.html"&gt;rsync-as-poor-man's-time machine&lt;/a&gt; script.  This gives me a live, versioned backup, accessible to me from anywhere by ssh, on Bluehosts' offsite, secure, RAID-UPS-and-diesel generator protected colocation, and at the end of a fat pipe.  After about a week or so for the initial ~50GB backup to roll in, daily incrementals will take an hour or two each day (bandwidth choked to 25 kBps).  When I leave town for a week next month I'll start pushing my music collection into its own unversioned directory.&lt;/p&gt;&lt;p&gt;

This is only for the stuff I've created or can't replace: not for system software and not for music/movies/media (apart from my iTunes.xml, inbox, and various bookmarks/pref's/stickiesDB/MySQLCachedQueries folders).  Unlike most people, I don't worry too much about backing up my system software.  Maybe I'm damaged from my Windows upbringing, but just reinstall from scratch if your OS gets hosed (I keep install disks and images around). The nascent defect may be present in the restore; the accumulated OS cruft certainly is.  You're already kinda screwed; better to take a certain two days and finish with a clean system than to fight a flaky restore and then spend those two days again.  Yes, you are allowed to come point and laugh if this happens to me.&lt;/p&gt;&lt;p&gt;

Beyond the backup, I have various levels of defense-in-depth for my data -- data that is created, changes daily and is essential for my economic well-being has four or more levels of redundancy.  Data that is intransigently huge but can be sourced elsewhere has no redundancy.  (There's no reason backing up my processed wikipedia dump, for instance: only the scripts that process it.)

Right now my fileverse is spread across &lt;ul&gt;&lt;li&gt;Desktop computer ~ 1TB&lt;/li&gt;&lt;li&gt;Flivo (my homebrew Tivo computer ~0.5TB&lt;/li&gt;&lt;li&gt;School account ~3GB&lt;/li&gt;&lt;li&gt;sourceforge account&lt;/li&gt;&lt;li&gt;Four webservers holding sites I operate or caretake ~ 1-5GB each&lt;/li&gt;&lt;li&gt;GMail account ~4GB&lt;/li&gt;&lt;li&gt;Flickr account ~6GB&lt;/li&gt;&lt;li&gt;iPhone, Google Calendar, Yahoo Address book, Plaxo&lt;/li&gt;&lt;li&gt;blogger/twitter/facebook/etc&lt;/li&gt;&lt;/ul&gt;&lt;/p&gt;

&lt;p&gt;This is organized as&lt;ul&gt;&lt;li&gt;All the stuff I'm currently messing with is in a 'now' folder.  This is what sync's to my school account, and every time I go on a trip I burn a DVD of the now folder to toss in my backpack.  (You never know when you'll want a file, and it enforces an occasional hardcopy backup).&lt;/li&gt;&lt;li&gt;Within the now folder, the stuff I develop for work is versioned with svn.  I house a private repository on my bluehost account and connect over svn+ssh.  I'm reasonably good about checking in every few hours or when I shift conceptual gears.&lt;/li&gt;&lt;li&gt;Most of the 'now' folder I keep in sloppy sync with the school account. ('sloppy' because I sync when I think of it and not through a cron script as I should).&lt;/li&gt;&lt;li&gt;Each year I move everything that isn't under current work out of the now folder and into an 'archive/YEAR' folder; there it sits and changes almost never.&lt;/li&gt;&lt;li&gt;GMail holds all my mail (sync'ed with IMAP)&lt;/li&gt;&lt;li&gt;Flickr holds an incomplete and poorly correlated segment of my photos (they own my metadata, and yes that bugs me).&lt;/li&gt;&lt;li&gt;iPhone sync handles the address book; gcal is still quite difficult.&lt;/li&gt;&lt;li&gt;The rest of the little metacontent is trapped, meh.&lt;/li&gt;&lt;li&gt;Each webserver's content is replicated to the desktop.  For small changes I'll sometimes diddle the file on the live server and then sync back later (tsk tsk); for heavier work I proceed locally and then deploy.&lt;/li&gt;&lt;/ul&gt;&lt;/p&gt;&lt;p&gt;

The usage breaksdown like this:&lt;pre&gt;
== Work space -- changes ~ daily ==
  2   GB         vizsage project software       Desktop, svn, school, vizsage, bkup
  9   GB         infochimp site &amp; working data  Desktop, svn, school, infochimp, bkup
== Work resources -- changes ~ weekly ==
100   GB         'huge' datasets                Desktop, infochimp
 60   GB         live local DB of datasets      Desktop
 30   GB         infochimp website DB           infochimp, bkup
== Slowly changing -- changes ~ semiyearly ==
  3   GB         Other projects, docs, stuff    Desktop, bkup
  4   GB         Archive - doesn't change       Desktop, bkup
                 (~ 300 MB / year for 11 years) 
  3   GB         Library (prefs,caches,etc)     Desktop
== Metacontent -- changes daily-weekly ==
 12   GB         Photos                         Desktop, Flickr
  3   GB         Mail                           Desktop, GMail, bkup
  ~   MB         iPhone/Addr Book/Calendar      Desktop, iPhone, GCal, Yahoo AddrBk
  ~   MB         this blogger blog              vizsage, Blogger
== Websites ==
  1   GB         website1                       Desktop, website1, bkup 
  2   GB         website2                       Desktop, website2, bkup
  6   GB         website3                       Desktop, website3, bkup
== Media ==
many  GB         music                          Desktop, some on iPhone, some on DVD
many  GB         recorded tv shows, movies, etc Desktop, data DVD
== System Software ==
some  GB         OS &amp; installed programs        on each machine
&lt;/pre&gt;&lt;/p&gt;

&lt;p&gt;I'd like to live in a world where I wouldn't have to worry about how these are partitioned across machines.  Changes made to 'website1', say, or to a project in 'now', would lazily propagate to each interested shard as well as to the remote time-machineish versioned backup.  At any time I could force an immediate sync, whether to deploy a change, to repair a mistake, or to satiate an OCD twinge, if I don't want to wait for automatic syncronization.&lt;/p&gt;&lt;p&gt;

I'm actually pretty close to having this out of a McGuyvered patchwork of rsync, svn, time machine, IMAP/Aperture+Flickr and distributed file systems, all enforced by cron.  I'm planning to soon waste a weekend buttoning up my sync scripts, getting everything to run daily and being superattentive in case I screw it up.  &lt;/p&gt;&lt;p&gt;

But it sure would be nifty to augment (a cross-platform) Time Machine into a Time and Space Machine. I'd see an overview of my distributed fileverse (versioned in time, distributed in shards according to how I use it), and I could delegate various live realizations, svn/diff-versioned backups or hard-link-versioned backups to each local or remote instance.  No single machine would necessarily hold the entire fileverse: note that a few things up there don't propagate back to my main desktop.  And hopefully the whole thing would have polished Apple Fit And Finish instead of Mad Max Homebrew Itworksithinkihope.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;script type="text/javascript"
  src="http://vizsage.com/assets/adsense-728x90.js"&gt;
&lt;/script&gt;
&lt;script type="text/javascript"
  src="http://pagead2.googlesyndication.com/pagead/show_ads.js"&gt;
&lt;/script&gt;&lt;/div&gt;</description><link>http://vizsage.com/blog/2008/01/time-machine-is-neat-o-but-i-want-time.html</link><author>flip</author></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-4201613802871642889.post-4936790771049023263</guid><pubDate>Thu, 13 Dec 2007 15:51:00 +0000</pubDate><atom:updated>2008-01-06T22:01:17.831-06:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>data</category><category domain='http://www.blogger.com/atom/ns#'>tool</category><category domain='http://www.blogger.com/atom/ns#'>info</category><category domain='http://www.blogger.com/atom/ns#'>woodworking</category><category domain='http://www.blogger.com/atom/ns#'>fasteners</category><category domain='http://www.blogger.com/atom/ns#'>vintage</category><category domain='http://www.blogger.com/atom/ns#'>screws</category><category domain='http://www.blogger.com/atom/ns#'>design</category><category domain='http://www.blogger.com/atom/ns#'>information</category><category domain='http://www.blogger.com/atom/ns#'>guide</category><category domain='http://www.blogger.com/atom/ns#'>infochimp</category><category domain='http://www.blogger.com/atom/ns#'>shop</category><title>Old-School Shop Guide</title><description>&lt;div style="float: right; margin-left: 10px; margin-bottom: 10px;"&gt; &lt;a href="http://www.flickr.com/photos/mrflip/2108973686/" title="photo sharing"&gt;&lt;img src="http://farm3.static.flickr.com/2078/2108973686_88944a07fd_m.jpg" alt="" style="border: solid 2px #000000;" /&gt;&lt;/a&gt; &lt;br /&gt; &lt;span style="font-size: 0.9em; margin-top: 0px;"&gt;  &lt;a href="http://www.flickr.com/photos/mrflip/2108973686/"&gt;ShopGuideFront300dpi.png&lt;/a&gt;&lt;/span&gt;&lt;/div&gt;I rediscovered this super-compact reference-and-tool-and-measuring device while looking for a tool. It is jam-packed with handy information for anyone doing things mechanical or woodworking. 

I got this from a family friend of a family friend -- I bought their lathe after her husband, an avid (and skilled) woodworker, had passed away.  She wanted his tools to go to someone who would love them and use them, which was me, which I do.  The lathe is good, but I've discovered after the fact that the throwins were the best part. The chisels are *top notch*, but still pale in comparison to getting "his old woodworking magazines." This turned out to be almost every issue of Fine Woodworking magazine, beginning in its first year of publication; somewhere in the stack was this nifty Shop Guide. &lt;br clear="all" /&gt;

&lt;div style="float: right; margin-left: 10px; margin-bottom: 10px;"&gt; &lt;a href="http://www.flickr.com/photos/mrflip/2108971426/" title="photo sharing"&gt;&lt;img src="http://farm3.static.flickr.com/2033/2108971426_2f506c2d01_m.jpg" alt="" style="border: solid 2px #000000;" /&gt;&lt;/a&gt; &lt;br /&gt; &lt;span style="font-size: 0.9em; margin-top: 0px;"&gt;  &lt;a href="http://www.flickr.com/photos/mrflip/2108971426/"&gt;ShopGuideBack300dpi.png&lt;/a&gt;&lt;/span&gt;&lt;/div&gt;I think this thing is so neat -- so much information in such a small space.  My &lt;a href="http://vizsage.com/other/flipopedia/MechanicalInfo.pdf"&gt;own mechanical data reference table&lt;/a&gt; (&lt;a href="http://vizsage.com/other/flipopedia"&gt;more here&lt;/a&gt;) has more numbers but less intrinsic functionality ...  What's really neat about this shop guide is how they used the shape of the guide itself as a tool.

Print this onto heavy cardstock and punch brother punch with care... enjoy!&lt;br clear="all" /&gt;&lt;div class="blogger-post-footer"&gt;&lt;script type="text/javascript"
  src="http://vizsage.com/assets/adsense-728x90.js"&gt;
&lt;/script&gt;
&lt;script type="text/javascript"
  src="http://pagead2.googlesyndication.com/pagead/show_ads.js"&gt;
&lt;/script&gt;&lt;/div&gt;</description><link>http://vizsage.com/blog/2007/12/shopguideback300dpipng.html</link><author>flip</author></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-4201613802871642889.post-8673647017453335325</guid><pubDate>Thu, 13 Dec 2007 06:33:00 +0000</pubDate><atom:updated>2008-01-07T15:29:27.974-06:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>data</category><category domain='http://www.blogger.com/atom/ns#'>underground</category><category domain='http://www.blogger.com/atom/ns#'>legal</category><category domain='http://www.blogger.com/atom/ns#'>dataset</category><category domain='http://www.blogger.com/atom/ns#'>legitimate</category><category domain='http://www.blogger.com/atom/ns#'>feed</category><category domain='http://www.blogger.com/atom/ns#'>movie</category><category domain='http://www.blogger.com/atom/ns#'>art</category><category domain='http://www.blogger.com/atom/ns#'>bittorrent</category><category domain='http://www.blogger.com/atom/ns#'>piracy</category><category domain='http://www.blogger.com/atom/ns#'>music</category><category domain='http://www.blogger.com/atom/ns#'>semantic</category><category domain='http://www.blogger.com/atom/ns#'>cover</category><category domain='http://www.blogger.com/atom/ns#'>rss</category><title>Leveraging the Bittorrent Underground for semantic data and media</title><description>&lt;p&gt;I just ran across a pretty interesting site called &lt;a href="http://www.coverbrowser.com"&gt;coverbrowser.com&lt;/a&gt;, which uses a variety of image APIs to pull in comic book, game, book, music, movie and other cover art. (Read the &lt;a href="http://blogoscoped.com/archive/2006-10-09-n22.html"&gt;technical details here&lt;/a&gt;).&lt;/p&gt;&lt;p&gt;

It reminded me of an idea I had while back but which I will never get around to implementing --- maybe you will, or for all I know someone's already been doing for years.  (Sidenote: I've had some people express interest in this, and have worked out some parts of it, but just don't have the time to complete it right now. If you'd like to help develop it get in touch).&lt;/p&gt;&lt;p&gt;

Many of the movie and music torrents on the, ahem, "Unauthorized Evaluation Copy" bittorrent sites contain hi-res scans of their cover art, and all of the major bittorrent sites maintain topic-specific RSS feeds.  &lt;/p&gt;&lt;p&gt;

As long as the torrent indexes the files individually (as not as an opaque .zip or .rar) -- and most do index individually -- you can target specific files within the torrent.  I don't know whether you could chop all the large-file-size copyright-problematic files that you don't want out of the torrent, or whether you'd have to hack Azureus or other bittorrent client (instructing it to get only *.{png,gif,jpg,jpeg,bmp,tiff,tff} or what have you).  Either way, you would then only be pushing out the bandwidth required to grab the photos and not the accompanying multi-megabyte file, and you would only be getting the information to which you assumedly have fair use rights for.  &lt;/p&gt;&lt;p&gt;

So you'd set up a daemon process that would &lt;/p&gt;&lt;ul&gt;
&lt;li&gt;watch the Movies and the Music RSS feeds off whichever or all of the sites, 
&lt;/li&gt;&lt;li&gt;identify albums whose cover art you lack, 
&lt;/li&gt;&lt;li&gt;pull in the bittorrent,
&lt;/li&gt;&lt;li&gt;but download only the cover art 
&lt;/li&gt;&lt;li&gt;and perhaps also process any of the accompanying semantic data&lt;/li&gt;
&lt;/ul&gt;You might have to get yourself a &lt;a href="http://en.wikipedia.org/wiki/Seedbox"&gt;seedbox&lt;/a&gt; to make this work, but they're not unaffordable. &lt;/p&gt;&lt;p&gt;

I think this would lead to a large stream of incoming cover art for music and other media files, complete with a reasonable amount of semantic information.&lt;/p&gt;&lt;p&gt;

There's probably a lot of other crowdsourced semantic data flowing through the underground, if someone actually created such a torrenting robot.  (And yes, I feel yucky using "crowdsourced" and "semantic data" in the same sentence).&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;script type="text/javascript"
  src="http://vizsage.com/assets/adsense-728x90.js"&gt;
&lt;/script&gt;
&lt;script type="text/javascript"
  src="http://pagead2.googlesyndication.com/pagead/show_ads.js"&gt;
&lt;/script&gt;&lt;/div&gt;</description><link>http://vizsage.com/blog/2007/12/leveraging-bittorrent-underground-for.html</link><author>flip</author></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-4201613802871642889.post-5492872193469094186</guid><pubDate>Wed, 05 Dec 2007 22:17:00 +0000</pubDate><atom:updated>2008-01-07T04:16:38.070-06:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>templating</category><category domain='http://www.blogger.com/atom/ns#'>xml</category><category domain='http://www.blogger.com/atom/ns#'>python</category><category domain='http://www.blogger.com/atom/ns#'>kid</category><category domain='http://www.blogger.com/atom/ns#'>databinding</category><category domain='http://www.blogger.com/atom/ns#'>perl</category><category domain='http://www.blogger.com/atom/ns#'>tree</category><category domain='http://www.blogger.com/atom/ns#'>element</category><category domain='http://www.blogger.com/atom/ns#'>xslt</category><title>Moving from Perl to Python with XML and Templating</title><description>Mr. XKCD is &lt;a href="http://xkcd.com/353/"&gt;correct in this&lt;/a&gt;.  (My friend &lt;a href="http://larssono.com/"&gt;Dr. Larsson&lt;/a&gt; has been saying this all along).

As I'm moving from data munging to data working-with, I've been moving from perl to python.

Recommended:&lt;ul&gt;
&lt;li&gt;&lt;a href="http://codespeak.net/lxml"&gt;lxml&lt;/a&gt; is a beautiful interface for dealing with XML in Python.  You get XPath and validation and namespaces and all that hooha but you don't have to think hard and you don't have to write SAX stream parsers or walk a DOM path.  You just say crap like&lt;pre&gt;
from lxml    import etree   
from urllib2 import urlopen
# Load file
uri   = "http://vizsage.com/apps/baseball/results/parkinfo/parkinfo-all.xml"
parks = etree.ElementTree(file=urlopen(uri))
# for each park (&amp;lt;park&amp;gt tag anywhere in document)
for (idx, park) in enumerate(parks.xpath('//park')): 
  # dump its id, time of service and name (@attr is XPath for 'corresponding attribute')
  print ' -- '.join(
    [ s+': '+','.join(park.xpath('@'+s)) 
      for s in ('parkID', 'beg', 'end', 'games', 'name',) 
    ])
&lt;/pre&gt;and you get this in return&lt;pre&gt;
parkID: MIL01 -- beg: 1878-05-14 -- end: 1878-09-14 -- games: 25   -- name: Milwaukee Base-Ball Grounds
parkID: MIL02 -- beg: 1884-09-27 -- end: 1885-09-25 -- games: 14   -- name: Wright Street Grounds
parkID: MIL03 -- beg: 1891-09-10 -- end: 1891-10-04 -- games: 20   -- name: Borchert Field
parkID: MIL04 -- beg: 1901-05-03 -- end: 1901-09-12 -- games: 70   -- name: Lloyd Street Grounds
parkID: MIL05 -- beg: 1953-04-14 -- end: 2000-09-28 -- games: 3484 -- name: County Stadium
parkID: MIL06 -- beg: 2001-04-06 -- end: NULL       -- games: 486  -- name: Miller Park&lt;/pre&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://codespeak.net/lxml/objectify.html"&gt;lxml.objectify&lt;/a&gt; is the replacement for &lt;a href="http://search.cpan.org/perldoc/XML::Simple"&gt;perl&lt;/a&gt;'s &lt;a href="http://www.mclean.net.nz/cpan/"&gt;XML::Simple&lt;/a&gt; we've all been looking for.  You just say gimme and it pulls in an XML file as the corresponding do-what-I-mean data structure (identical elements become arrays, tree leaves become atoms, tree structures become maps).&lt;/li&gt;
&lt;li&gt;&lt;a href="http://www.kid-templating.org/"&gt;Kid Templating&lt;/a&gt; is a great solution for XML transmogrifying, and I think I like it much better than XSLT.  It looks perfect for your "Anything =&amp;gt; XML" purposes, which is the hard part.  I suppose XSLT can do the "XML =&amp;gt; anything" tasks but those always look like stunts; the whole point of XML is that "Turn XML into whatever" tasks are easy, especially given a simple API like lxml or lxml.objectify.&lt;/li&gt;
&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;script type="text/javascript"
  src="http://vizsage.com/assets/adsense-728x90.js"&gt;
&lt;/script&gt;
&lt;script type="text/javascript"
  src="http://pagead2.googlesyndication.com/pagead/show_ads.js"&gt;
&lt;/script&gt;&lt;/div&gt;</description><link>http://vizsage.com/blog/2007/12/moving-from-perl-to-python-with-xml-and.html</link><author>flip</author></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-4201613802871642889.post-5714427084729204511</guid><pubDate>Fri, 26 Oct 2007 23:03:00 +0000</pubDate><atom:updated>2008-01-07T04:15:54.760-06:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>data</category><category domain='http://www.blogger.com/atom/ns#'>stadium</category><category domain='http://www.blogger.com/atom/ns#'>dataset</category><category domain='http://www.blogger.com/atom/ns#'>retrosheet</category><category domain='http://www.blogger.com/atom/ns#'>baseball</category><category domain='http://www.blogger.com/atom/ns#'>retrosheet.org</category><category domain='http://www.blogger.com/atom/ns#'>park</category><category domain='http://www.blogger.com/atom/ns#'>geolocation</category><category domain='http://www.blogger.com/atom/ns#'>weather</category><category domain='http://www.blogger.com/atom/ns#'>earth</category><category domain='http://www.blogger.com/atom/ns#'>google</category><category domain='http://www.blogger.com/atom/ns#'>mashup</category><category domain='http://www.blogger.com/atom/ns#'>mash-up</category><title>Hourly Weather data for each Retrosheet game</title><description>I noticed some suspect entries for game conditions in the eventfiles
and realized I could not only fix it but add a pretty useful dimension
to the retrosheet collection. The National Climate Data Center makes available &amp;quot;Global Hourly Surface Data&amp;quot; -- several dozen physical and
observational characterizations of the current weather, taken hourly.
This data goes back to the forties and sometimes to the start of the
century.&lt;p&gt;

Please enjoy this preliminary dataset giving the hourly weather data
for each game in Fenway since 1957:
&lt;a href="http://vizsage.com/apps/baseball/results/weather/"&gt;http://vizsage.com/apps/baseball/results/weather/&lt;/a&gt;&lt;p&gt;

(open the WeatherData-BOS07.* file of your choice)

I don't have all the data in hand yet, but I thought I'd get your
thoughts and see if anyone would like to help with some of the drudge
work.&lt;p&gt;


I'm excited about doing some fun things with the data, like see
knuckleball effectiveness vs. humidity or elderly pitchers vs.
temperature. Combined with the MLB gameday pitch trajectory info you could do physics &amp;quot;experiments&amp;quot;: show the break distance of all
curveballs vs. atmospheric pressure.&lt;p&gt;

Email me back if you're interested or with comments.&lt;p&gt;

&lt;Pre&gt;
-----------------------
DATA FIELDS AVAILABLE
-----------------------


The fields I've spit out are

-- game_ID, gamedate, gamenum_in_day, start_time, daygame_flag from
the cwgame output.
- temp deg C
The temperature of the air in degrees Celsius.
- press_atmos HPa
The atmospheric pressure at the observation point.
- press_sealvl HPa

The air pressure relative to Mean Sea Level (MSL).
- press_altim HPa
The pressure value to which an aircraft altimeter is set so that it
will indicate the altitude relative to mean sea level of an aircraft
on the ground at the location for which the value was determined.
- press_chg_3hr_del HPa
The absolute value of the quantity of change in atmospheric pressure
measured at the beginning and end of a three hour period.
- press_chg_3hr_obs --

The code that denotes the characteristics of an
ATMOSPHERIC-PRESSURE-CHANGE that occurs over a period of
three hours.
- wind_dir deg
The angle, measured in a clockwise direction, between true north and
the direction from which the wind is blowing.
- wind_obs --
The code that denotes the character of the WIND-OBSERVATION.
- wind_speed m/s

The rate of horizontal travel of air past a fixed point.
- wind_gust_speed m/s
The rate of speed of a wind gust.
- cloud_cover_low (frac)
The code that represents the fraction of the celestial dome covered
by all low clouds present. If no low clouds are present; the code
denotes the fraction covered by all middle level clouds present.
- vis_dist m
The horizontal distance at which an object can be seen and identified.

- sunshine_time min
The quantity of time sunshine occurred over the reporting period.
- wea_pr_m_obs_1 --
The code that denotes a specific type of weather observed manually.
- wea_pr_m_obs_2 --
The code that denotes a specific type of weather observed manually.
- wea_pr_m_obs_3 --
The code that denotes a specific type of weather observed manually.
- groundcond --

The code that denotes a type of Ground condition
- precip_hist_contin bool
The code that denotes whether precipitation is continuous (true) or
intermittent (false).
- precip_lq1_depth mm
The depth of LIQUID-PRECIPITATION that is measured at the time of an
observation. Unit:Millimeters
- precip_lq1_period hours
The quantity of time over which the LIQUID-PRECIPITATION was measured.

&lt;/pre&gt;

----------
WHAT I DID
----------&lt;p&gt;

I used Brian Foy's Google Earth index of Major League Stadiums:&lt;br/&gt;

&lt;a href="http://www252.pair.com/comdog/google_earth/major_league_baseball_stadiums.kml"&gt;http://www252.pair.com/comdog/google_earth/major_league_baseball_stadiums.kml&lt;/a&gt;
and the NCDC ISH-HISTORY file (gives locations for each weather station)
&lt;a href="ftp://ftp.ncdc.noaa.gov/pub/data/inventories/"&gt;ftp://ftp.ncdc.noaa.gov/pub/data/inventories/&lt;/a&gt;&lt;br/&gt;

to find the closest station with continuous data. (Turns out I could
have saved a ton of trouble by just using the nearest airport -- in
almost every case it was the best match.)&lt;p&gt;

Then I pulled down data sets from

&lt;a href="http://cdo.ncdc.noaa.gov/pls/plclimprod/poemain.accessrouter?datasetabbv=DS3505"&gt;http://cdo.ncdc.noaa.gov/pls/plclimprod/poemain.accessrouter?datasetabbv=DS3505&lt;/a&gt;
(If you're interested in replicating any of this I have a script that
sends a GET url to help automate the weather data collection.) The
last step is to match games with stadiums with locations, and dates
and times with hourly observations.&lt;p&gt;

I could be clever and subtle and use the start time and game duration
to grab only the hours of gameplay, but instead I just pull in the
records from 10:00am to 11:59pm for day games, and 5:00pm to 11:59pm
for night games. I suppose I'll fix it to see if a game overhangs
midnight and get the post-12am data for those only.&lt;p&gt;


-----------------------

WHAT YOU CAN DO TO HELP
-----------------------&lt;p&gt;

Geolocation for the rest of the stadiums&lt;p&gt;

Inspect the data for consistency and correctness&lt;p&gt;

If you have access to a computer at a .edu or .k12.us, or fancy GIS
data, help me grab the rest of the weather files.&lt;p&gt;

Email me if you'd like to help.&lt;p&gt;&lt;div class="blogger-post-footer"&gt;&lt;script type="text/javascript"
  src="http://vizsage.com/assets/adsense-728x90.js"&gt;
&lt;/script&gt;
&lt;script type="text/javascript"
  src="http://pagead2.googlesyndication.com/pagead/show_ads.js"&gt;
&lt;/script&gt;&lt;/div&gt;</description><link>http://vizsage.com/blog/2007/10/hourly-weather-data-for-each-retrosheet.html</link><author>flip</author></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-4201613802871642889.post-1843018927171937051</guid><pubDate>Fri, 26 Oct 2007 23:01:00 +0000</pubDate><atom:updated>2008-01-07T04:14:43.336-06:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>data</category><category domain='http://www.blogger.com/atom/ns#'>consistency</category><category domain='http://www.blogger.com/atom/ns#'>mining</category><category domain='http://www.blogger.com/atom/ns#'>format</category><category domain='http://www.blogger.com/atom/ns#'>error</category><category domain='http://www.blogger.com/atom/ns#'>bug</category><category domain='http://www.blogger.com/atom/ns#'>weather</category><category domain='http://www.blogger.com/atom/ns#'>retrosheet</category><category domain='http://www.blogger.com/atom/ns#'>retrosheet.org</category><category domain='http://www.blogger.com/atom/ns#'>mashup</category><category domain='http://www.blogger.com/atom/ns#'>baseball</category><category domain='http://www.blogger.com/atom/ns#'>mash-up</category><title>Retrosheet Eventfile Inconsistencies II</title><description>&lt;p&gt;I've found a few more inconsistencies and minor inaccuracies in the
retrosheet event files and game logs.&lt;/p&gt;&lt;p&gt;

I made a diff
(applied using the 'patch' tool) to mechanically recreate these
corrections:
&lt;a href="http://vizsage.com/apps/baseball/results/rseventfiles_20070923_patch.diff"&gt;http://vizsage.com/apps/baseball/results/rseventfiles_20070923_patch.diff&lt;/a&gt;&lt;/p&gt;&lt;p&gt;

I pulled these out by whipping up a few simple scripts (one-liners,
mostly) that extracts all unique values for each event file field.
For example, the only values for the "info,pitches" field are 'count,
'none' and 'pitches' -- just as promised in the documentation. The
"info,temp" field, however, has not only normal temperatures ("78", or
"104", or "0" for [unknown]) but also spurious values of '670' and
'700' (wrong), '8/7' (ill-formed) and '' (differs with the format
documentation).&lt;/p&gt;&lt;p&gt;

I'll posting all the dubious entries (event files version 2007 Sep 23)
I find at
http://vizsage.com/blog/2007/10/retrosheet-eventfile-inconsistencies.html
as comments.&lt;/p&gt;&lt;p&gt;

==================== Incorrect Data ====================&lt;/p&gt;&lt;p&gt;

&lt;pre&gt;
In 1993MIL.EVA:
info,start,spieb001,"Bim| Spiers",1,9,4
should be
info,start,spieb001,"Bill Spiers",1,9,4

These temperatures need fixing:
1988MON.EVN,info,temp,670
1988MON.EVN,info,temp,700
1964NYA.EVA,info,temp,8/7

I looked at a few suspiciously short games (&lt; 60 minutes):
This should be 1:58, according to the NYT box score:

http://select.nytimes.com/gst/abstract.html?res=FB0614F73D59107B93C4A8178FD85F4C\
8585F9
1958BOS.EVA,info,timeofgame,58
These two are correct:
1971BAL.EVA,info,timeofgame,48 BAL197107300 -- Game called due to rain
1976BOS.EVA,info,timeofgame,57 BOS197609100 -- Game called due to rain
Another thing to look at would be suspicious game length/number of
outs ratio, but I haven't done this yet.

I also checked a few games with attendance below 1000, but these seem
to be very cold or rescheduled days. I'll taka a peak sometime soon at
"game attendance less than two and a half standard deviations from
that year's average attendance" to see what sticks out. (I also
peeked at 2.5+ above -- those look like bandwagon game)

&lt;/pre&gt;
&lt;p&gt;
==================== Badly Formatted ====================&lt;/p&gt;

&lt;pre&gt;
These are probably correct but just ill-formatted:
1959CHN.EVN,info,timeofgame,0158
2001PIT.EVN,info,attendance, 34915
1962BOS.EVA,info,daynight,day,
1966ATL.EVN,info,howscored,"park"
1966HOU.EVN,info,howscored,"park"
1970CHA.EVA:data,er,roung101,4#
1958PIT.EVN:data,er,wills102,1y

In these files, the "howscored" field is spelled "howentered":
1990BOS.EVA,info,howentered,game
1990DET.EVA,info,howentered,game
1990DET.EVA,info,howentered,game
1990DET.EVA,info,howentered,game
1990DET.EVA,info,howentered,game
1990HOU.EVN,info,howentered,game
1990HOU.EVN,info,howentered,game
1990LAN.EVN,info,howentered,game
1990MON.EVN,info,howentered,game
1990MON.EVN,info,howentered,game
1990PIT.EVN,info,howentered,game
1990SFN.EVN,info,howentered,game
1990SFN.EVN,info,howentered,game
1990SLN.EVN,info,howentered,game
1990TEX.EVA,info,howentered,game
1990TEX.EVA,info,howentered,game

There are no "info,edittime" records -- is this purposeful?

&lt;/pre&gt;

&lt;p&gt;==================== Inconsistent with Documentation ====================&lt;/p&gt;

&lt;pre&gt;
In the 2003TBA.EVA file, the umpires are given by name and not by ID.

These are supposed to use 0 as the unknown value but in a few places
use a blank.
1990NYA.EVA,info,temp,
1978ATL.EVN,info,attendance,
1978NYA.EVA,info,attendance,
1979SDN.EVN,info,attendance,
2000PIT.EVN:info,windspeed,

There are some "info,ump[...],(None)" fields, and there are some
"info,ump[...]," fields. Does one indicate "unknown" and the other
indicate "none"? Or is this a formatting inconsistency?

These files have a bunch of "info,windspeed,unknown" fields (the dox
say "An unknown windspeed is indicated by -1."):
1969ATL.EVN 1969HOU.EVN 1969MON.EVN 1969PIT.EVN 1969SDN.EVN
1970ATL.EVN 1970HOU.EVN
These files have an "info,temp,unknown" field (the dox say "An unknown
temp is indicated by 0."):
1969ATL.EVN 1969HOU.EVN 1969MON.EVN 1969PIT.EVN 1969SDN.EVN 1970ATL.EVN
1970HOU.EVN 1990NYA.EVA

These lines have trailing spaces, which is harmless but still
shouldn't be there:
1958CHA.EVA:info,save,
1957BOS.EVA:com,"xwas a lot of action. Had this game been played
today, it no doubt"
1957BRO.EVN:com,"$In addition to 12,559 paid, 6000 knothole,"
1957CLE.EVA:com,"xCC4 changed E9/F.2-3;BX2(9)# to 9/F.2-3(E9)#"
1957MLN.EVN:com,"xCC4 per film, TSN 26 is DP"
1958CLE.EVA:com,"$ Strong wind to left; cool"
1958KC1.EVA:com,"xScoresheet scores DP as 142. I Checked with newspaper"
1958NYA.EVA:com,"$Total attendance: 13323"
1958SFN.EVN:com,"$paper box and Cin s/s has Cepeda and Sauer reversed"
1958SFN.EVN:com,"$paper box has stats that match SF s/s not Cin s/s"

Here are all the well-formed windspeed values:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
23 24
25 26 27 28 29 30 31 32 33 35 36 37 38 40 59 60 66 67 68 69 74 78 87
What are the units on these? If this is in MPH, 39 is Gale force
("Difficult to walk against wind. Twigs and small branches blown off
trees."), 55 is Storm ("Trees uprooted, structural damage likely") and 64
is ("Trees uprooted, structural damage likely").

Here are games with windspeeds over 40:
id,CHA197408270|windspeed,67
id,MIN198008190|windspeed,87
id,TOR198208030|windspeed,68
id,CHN198307042|windspeed,74
id,TOR198307270|windspeed,87
id,LAN199006050|windspeed,78
id,DET199506160|windspeed,87
id,CLE199609141|windspeed,69
id,COL199606150|windspeed,59
id,DET199704300|windspeed,66
id,TEX200104220|windspeed,40
id,SLN200610010|windspeed,60

&lt;/pre&gt;
&lt;p&gt;
The SLN200610010 event file gives a wind speed of 60mph (from
baseball-reference and ESPN),
but a) that's crazy and b) the weather report from that day doesn't
confirm it:&lt;/p&gt;&lt;p&gt;

http://www.wunderground.com/history/airport/KSTL/2006/10/1/DailyHistory.html?req\
_city=NA&amp;req_state=NA&amp;req_statename=NA
Which gives 83F, 9mph SSW wind, clear&lt;/p&gt;&lt;p&gt;

See also my next message, about getting weather data for each game.&lt;/p&gt;&lt;p&gt;

The BGAME.exe documentation says "WindSpeed: 0 Unknown, 1 Known, other
value is the wind speed" but I think it should be "WindSpeed: -1
Unknown other value is the wind speed in miles per hour".&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;script type="text/javascript"
  src="http://vizsage.com/assets/adsense-728x90.js"&gt;
&lt;/script&gt;
&lt;script type="text/javascript"
  src="http://pagead2.googlesyndication.com/pagead/show_ads.js"&gt;
&lt;/script&gt;&lt;/div&gt;</description><link>http://vizsage.com/blog/2007/10/retrosheet-eventfile-inconsistencies-ii.html</link><author>flip</author></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-4201613802871642889.post-3710601585060580220</guid><pubDate>Fri, 26 Oct 2007 09:33:00 +0000</pubDate><atom:updated>2007-10-26T04:55:51.615-05:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>rusty kuntz</category><category domain='http://www.blogger.com/atom/ns#'>asdrubal</category><category domain='http://www.blogger.com/atom/ns#'>phenomenal smith</category><category domain='http://www.blogger.com/atom/ns#'>mysql</category><category domain='http://www.blogger.com/atom/ns#'>name</category><category domain='http://www.blogger.com/atom/ns#'>baseball</category><category domain='http://www.blogger.com/atom/ns#'>yclept</category><category domain='http://www.blogger.com/atom/ns#'>hall of fame</category><category domain='http://www.blogger.com/atom/ns#'>first name</category><category domain='http://www.blogger.com/atom/ns#'>cabrera</category><title>The Asdrubal Carrera Hall of Fame</title><description>&lt;p&gt;Inspired by one of &lt;a href="http://awfulannouncing.blogspot.com/"&gt;Tim McCarver&lt;/a&gt;'s &lt;a href="http://shutuptimmccarver.com/"&gt;flights of fancy&lt;/a&gt; during the ALCS, I present &lt;a href="http://vizsage.com/apps/baseball/results/UniqueFirstNames.xml"&gt;The Asdrubal Carrera Hall of Fame&lt;/a&gt;, open to anyone in unique possession of a particular first name among Major League baseball players: &lt;a href="http://vizsage.com/apps/baseball/results/UniqueFirstNames.xml"&gt;LIST&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;You may be familiar with Honus Wagner, Eppa Rixey, Boog Powell or Yogi Berra.  But have you heard the storied diamond exploits of Firpo Mayberry, Zoilo Versalles, Pi Schwert or Bevo LeBourveau?  OK, then how about Mysterious Walker, The Only Nolan, or Phenomenal Smith?&lt;/p&gt;

&lt;p&gt;For some dinnertime fun over the holidays, discuss the relative merits of Urban  Shocker, Twink  Twining, Pussy  Tebeau, Bris  Lord, Boob  Fowler, Crazy  Schmit, Creepy  Crespi, Cuddles  Marshall,    Vinegar Bend Mizell, and Buttercup Dickerson.  (Unfortunately, 12 other players keep Rusty Kuntz off this list.)&lt;/p&gt;

&lt;p&gt;Other stunningly yclept combatants include Ambiorix Burgos,  Alamazoo Jennings, Welcome  Gaston, Chicken  Hawks, Sixto  Lezcano, Wheezer  Dell, Yam  Yaryan, Yo-Yo  Davalillo, Admiral Schlei, Boss  Schmidt, Brick  Smith, Brickyard Kennedy, Broadway Jones, Cannonball Titcomb, Baby Doll Jacobson, Sweetbreads Bailey, Zaza  Harvey, Bubbles  Hargrave, Pickles  Dillhoefer, Double Joe Dwyer, Cowboy  Jones, Coot  Veal, Mul  Holland, Live Oak Taylor, Skyrocket Smith, Kaiser  Wilhelm, Kewpie  Pennington, Possum  Whitted, Snooks  Dowd, and Mox  McQuery.&lt;/p&gt;

&lt;p&gt;See &lt;a href="http://vizsage.com/apps/baseball/results/UniqueFirstNames.xml"&gt;the list&lt;/a&gt; for links to each player's Baseball Reference page.  Nerds may additionally view the generating mySQL query &lt;a href="http://vizsage.com/apps/baseball/results/UniqueFirstNames.sql.txt"&gt;here&lt;/a&gt;.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;script type="text/javascript"
  src="http://vizsage.com/assets/adsense-728x90.js"&gt;
&lt;/script&gt;
&lt;script type="text/javascript"
  src="http://pagead2.googlesyndication.com/pagead/show_ads.js"&gt;
&lt;/script&gt;&lt;/div&gt;</description><link>http://vizsage.com/blog/2007/10/asdrubal-carrera-hall-of-fame.html</link><author>flip</author></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-4201613802871642889.post-5298338102971272016</guid><pubDate>Wed, 17 Oct 2007 22:01:00 +0000</pubDate><atom:updated>2008-01-07T04:11:45.560-06:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>data</category><category domain='http://www.blogger.com/atom/ns#'>consistency</category><category domain='http://www.blogger.com/atom/ns#'>mining</category><category domain='http://www.blogger.com/atom/ns#'>format</category><category domain='http://www.blogger.com/atom/ns#'>error</category><category domain='http://www.blogger.com/atom/ns#'>bug</category><category domain='http://www.blogger.com/atom/ns#'>retrosheet</category><category domain='http://www.blogger.com/atom/ns#'>retrosheet.org</category><category domain='http://www.blogger.com/atom/ns#'>baseball</category><title>Retrosheet Eventfile Inconsistencies</title><description>Here are a few inconsistent records in the &lt;a href="http://retrosheet.org"&gt;retrosheet.org&lt;/a&gt; event files of 2007 Sep 23.  I'm using chadwick and not the retrosheet DOS utils, but I think I've source all these to the original event files.

Weird Attendance in gamelog GL1941.TXT:&lt;pre&gt;  WS1194107220 (WS1 vs DET) has '1500 e' as its attendance&lt;/pre&gt;Weird Start Time in eventfiles:
  Many daynight records lack an AM or PM.  I assume the time mapping of times are as follows:&lt;pre&gt;   daynight  start_time   24hr Time
   D or N    0            Unknown
    D        1000..1259   1000h to 1259h
    D        100..459     1300h to 1659h
    N        500..1150    1700h to 1359h&lt;/pre&gt;  In that case, here are some weird start times reported by cwgame:&lt;pre&gt;  - Negative start time:
      2003 D 0  -195 SEA 2003 04 15        SEA200304150    info,starttime,-2:05PM   info,daynight,day
  - No daynight flag:
      1998 D 0   506 LAN 1998 08 30        LAN199808300    info,starttime,5:06      -- no daynight --
  - Plainly inconsistent daynight flag:
      1985 D 1   605 CIN 1985 06 21        CIN198506211    info,starttime,6:05PM    info,daynight,day
      1960 N 0   135 BOS 1960 04 19        BOS196004190    info,starttime,1:35PM    info,daynight,night
  - Second half of a double header, listed as a day game despite 5pm or later start:
      1966 D 2   507 BAL 1966 10 02        BAL196610022    info,starttime,5:07PM    info,daynight,day
      2001 D 2   500 PHI 2001 05 27        PHI200105272    info,starttime,5:00PM    info,daynight,day
      2001 D 2   519 PIT 2001 06 03        PIT200106032    info,starttime,5:19PM    info,daynight,day
      2001 D 2   625 MIN 2001 05 26        MIN200105262    info,starttime,6:25PM    info,daynight,day
      2001 D 2   719 CHA 2001 09 04        CHA200109042    info,starttime,7:19PM    info,daynight,day
      2001 D 2   738 CHN 2001 08 20        CHN200108202    info,starttime,7:38PM    info,daynight,day
      2001 D 2   752 PIT 2001 09 03        PIT200109032    info,starttime,7:52PM    info,daynight,day
      2001 D 2   753 SLN 2001 08 03        SLN200108032    info,starttime,7:53PM    info,daynight,day
  - Start times that appear to be after midnight (this could be correct):
      1996 N 1    35 CIN 1996 06 25        CIN199606251    info,starttime,0:35      info,daynight,night
      1998 N 0   105 LAN 1998 06 13        LAN199806130    info,starttime,1:05      info,daynight,night
      1966 N 2  1207 BAL 1966 06 08        BAL196606082    info,starttime,12:07AM   info,daynight,night
 &lt;/pre&gt;These eventfile games have more than one "info,daynight" record&lt;pre&gt;  ATL197004150    info,starttime,0:00PM   info,daynight,day       info,daynight,night
  ATL197004160    info,starttime,0:00PM   info,daynight,day       info,daynight,night
  ATL197005260    info,starttime,0:00PM   info,daynight,day       info,daynight,night
  ATL197006191    info,starttime,0:00PM   info,daynight,day       info,daynight,night
  ATL197006192    info,starttime,0:00PM   info,daynight,day       info,daynight,night
  ATL197006200    info,starttime,0:00PM   info,daynight,day       info,daynight,night
  ATL197006210    info,starttime,0:00PM   info,daynight,day       info,daynight,night
  ATL197007031    info,starttime,0:00PM   info,daynight,day       info,daynight,night
  ATL197007032    info,starttime,0:00PM   info,daynight,day       info,daynight,night
  ATL197007050    info,starttime,0:00PM   info,daynight,day       info,daynight,night
  ATL197009220    info,starttime,0:00PM   info,daynight,day       info,daynight,night
  ATL197009230    info,starttime,0:00PM   info,daynight,day       info,daynight,night
  ATL197009240    info,starttime,0:00PM   info,daynight,day       info,daynight,night
  ATL197009250    info,starttime,0:00PM   info,daynight,day       info,daynight,night
  ATL197009260    info,starttime,0:00PM   info,daynight,day       info,daynight,night
  ATL197009270    info,starttime,0:00PM   info,daynight,day       info,daynight,night
  HOU197006220    info,starttime,0:00PM   info,daynight,day       info,daynight,night
  HOU197008031    info,starttime,0:00PM   info,daynight,day       info,daynight,night
  HOU197008032    info,starttime,0:00PM   info,daynight,day       info,daynight,night
  HOU197008040    info,starttime,0:00PM   info,daynight,day       info,daynight,night
  HOU197009010    info,starttime,0:00PM   info,daynight,day       info,daynight,night
  HOU197009110    info,starttime,0:00PM   info,daynight,day       info,daynight,night
  HOU197009130    info,starttime,0:00PM   info,daynight,day       info,daynight,night&lt;/pre&gt;This eventfile game is missing an "info,daynight" record:&lt;pre&gt;  LAN199808300    info,starttime,5:06&lt;/pre&gt;File Structure in eventfile 2001HOU.EVN:&lt;pre&gt;  2001HOU.EVN lacks a trailing newline (unix commands hate this).&lt;/pre&gt;Here are the unix commands I used to dump all that info.  Sorry for the one-linerism.&lt;pre&gt;# How many have a negative starttime?
grep 'info,starttime,-' *.EV*

# How many have missing or extra "info,daynight" fields?
# -- pull out the info, daynight and starttime records in order
# -- slurp the whole file as one giant string with internal linebreaks;
# -- split each stretch following an id,XXXX record into one line
# -- dump lines that have none or more than one daynight record
  cat *.EV* | egrep '^(id,|info,daynight|info,starttime)' | \
    perl -e '$_ = join(" ",&lt;&gt;); s/[\r\n]+/!!!/g; @games= (split /id,/, $_);
      shift @games;
      for $game (@games) {
          $game =~ s/!!!/\t/g; print "$game\n" if (($game !~ m/daynight/) || ($game =~ m/daynight.*daynight/));
      }'

# How many have a start_time and daynight_flag that disagree?
# -- use cwgame to pull off the gameID,start_time,daynight_flag records;
#    put it into a temporary file    
# -- Use a big stupid regex to find
#    . start_time that is &gt;  500 and marked day
#    . start_time that is &lt;  500 and marked night 
#    . start_time that is &gt; 1200 and marked night 
#    . start_time that is &lt;  100 
#    . start_time that is negative
( for ((year=1957;$year&lt;=2006;year++)) ; do \
     for teamfile in ${year}*.[Ee][Vv]* ; do \
     cwgame -y $year -f '0-0,4-4,6-6' $teamfile 2&gt;/dev/null ; \
     done; \
  done ) &gt; /tmp/starttimeIDs.txt
cat /tmp/starttimeIDs.txt | \
  perl -ne '(m/"(\w\w\w)(\d\d\d\d)(\d\d)(\d\d)(\d)",(12\d\d|[1234]\d\d|\d\d|[1-9]|-\d+),"(N)"/ ||
    m/"(\w\w\w)(\d\d\d\d)(\d\d)(\d\d)(\d)",((?:5|6|7)\d\d|.*-.*|\d\d|[1-9]),"(D)"/)    &amp;&amp;
    printf "%s %s %5d %s %s %s %s\n", $7, $5, $6, $1, $2, $3, $4;' | sort
&lt;/pre&gt;&lt;div class="blogger-post-footer"&gt;&lt;script type="text/javascript"
  src="http://vizsage.com/assets/adsense-728x90.js"&gt;
&lt;/script&gt;
&lt;script type="text/javascript"
  src="http://pagead2.googlesyndication.com/pagead/show_ads.js"&gt;
&lt;/script&gt;&lt;/div&gt;</description><link>http://vizsage.com/blog/2007/10/retrosheet-eventfile-inconsistencies.html</link><author>flip</author></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-4201613802871642889.post-8796229594604558329</guid><pubDate>Fri, 07 Sep 2007 21:02:00 +0000</pubDate><atom:updated>2007-09-07T17:13:21.544-05:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>data</category><category domain='http://www.blogger.com/atom/ns#'>scrabble</category><category domain='http://www.blogger.com/atom/ns#'>nerd nerd nerd nerd nerd</category><title>Rules of thumb for Rack Leave in Scrabble</title><description>This isn't exactly within the ambit of this blog but at least it's about data.  

While I should have been doing work, I instead made an awesome
spreadsheet to find rules of thumb for what the best Scrabble rack
leaves are.  (rules of thumb below, tables here: &lt;a href="http://vizsage.com/other/scrabble/RackLeaveRules.html"&gt;http://vizsage.com/other/scrabble/RackLeaveRules.html&lt;/a&gt;)

The computer program &lt;a href="http://web.mit.edu/jasonkb/www/quackle/"&gt;Quackle&lt;/a&gt; is one of the strongest scrabble players in the world.  It uses the following 'Superleave' valuation: &lt;ul&gt;  &lt;li&gt;&lt;a href="http://web.mit.edu/~jasonkb/Public/scrabble/superleaves"&gt;http://web.mit.edu/~jasonkb/Public/scrabble/superleaves&lt;/a&gt;&lt;br/&gt;(warning: huge-assed file)&lt;/li&gt;  &lt;li&gt;&lt;a href="http://vizsage.com/other/data/superleaves.xls"&gt;http://vizsage.com/other/data/superleaves.xls&lt;/a&gt;&lt;br/&gt;(The above, sorted by value, only up-to-4 leaves)&lt;/li&gt;&lt;/ul&gt;
To find "synergies" and "anti-synergies" (dysphoria?), I calculated the marginal valuation for each combo.  Basically, how much of the value for each two-leave is explained by the valuation of the component one-leaves, etc? For example, &lt;br/&gt; - From S (7.35) and M (0.08), the joint valuation of MS is 7.44, a marginal gain of 0.1: the joint valuation is almost entirely from S&amp;M.  (&lt;-- will lead to interesting google hits).  This combination has no synergy.&lt;br/&gt; - From Q (-9.0) and U (-5.1), the joint valuation of QU is 0.2, a marginal gain of 14.3.  This is by far the largest synergy; next is ZO at +3.2.&lt;p&gt;
I also played with three-letter synergies -- 3-leave valuations marginally different from the most explanatory 2-leave.  

General Lessons:&lt;ul&gt;&lt;li&gt;Get a feel for the 1-leave list, and the learn these:&lt;ul&gt; &lt;li&gt; Synergy:       
     QU OZ JU CH GN WY IN DE JK ER EV
     GIN JKY JKU ERS KWY HWY ?IN EST JOW ?AL ?EL ?IL IST&lt;/li&gt;&lt;li&gt;Anti-Synergy:    
     BP CG FP MV PV CW CQ QS SX LQ 
     BV SZ QR BC CZ VZ MQ RX GQ + most things with blank
     BPV CGQ BCG LQR FPV LNQ SVZ CMQ CLQ BCV BNV KTV 
     LMQ GKT CFV GMQ FSV LNR DGT&lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;&lt;li&gt; Worth keeping with a blank:
   The letters in "Lei an orc DTM" + the following digrams
    IN AL IL EL CI AN ER EN AC AR IT NO 
    QU ET DE CO AT OR LO GN OT AM DI CE 
    IM IR DO MO GI AB AG&lt;/li&gt;&lt;li&gt; double letters are bad (duh), except FF, which is good.&lt;/li&gt;&lt;/ul&gt;See the spreadsheet at &lt;a href="http://vizsage.com/other/data/superleaves.xls"&gt;http://vizsage.com/other/data/superleaves.xls&lt;/a&gt;). Don't go betting the house on these results....&lt;/p&gt;

Tables (including 1-tile-leave values) are available &lt;a href="http://vizsage.com/other/scrabble/RackLeaveRules.html"&gt;here&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;script type="text/javascript"
  src="http://vizsage.com/assets/adsense-728x90.js"&gt;
&lt;/script&gt;
&lt;script type="text/javascript"
  src="http://pagead2.googlesyndication.com/pagead/show_ads.js"&gt;
&lt;/script&gt;&lt;/div&gt;</description><link>http://vizsage.com/blog/2007/09/rules-of-thumb-for-rack-leave-in.html</link><author>flip</author></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-4201613802871642889.post-9178160423512100690</guid><pubDate>Sat, 01 Sep 2007 07:49:00 +0000</pubDate><atom:updated>2007-09-04T13:46:19.078-05:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>flex</category><category domain='http://www.blogger.com/atom/ns#'>symbolic</category><category domain='http://www.blogger.com/atom/ns#'>as3</category><category domain='http://www.blogger.com/atom/ns#'>intersection</category><category domain='http://www.blogger.com/atom/ns#'>vector</category><category domain='http://www.blogger.com/atom/ns#'>integral</category><category domain='http://www.blogger.com/atom/ns#'>Mathematics</category><category domain='http://www.blogger.com/atom/ns#'>algebra</category><category domain='http://www.blogger.com/atom/ns#'>equation</category><category domain='http://www.blogger.com/atom/ns#'>matrix</category><category domain='http://www.blogger.com/atom/ns#'>as3mathlib</category><category domain='http://www.blogger.com/atom/ns#'>derivative</category><category domain='http://www.blogger.com/atom/ns#'>Math</category><category domain='http://www.blogger.com/atom/ns#'>polynomial</category><category domain='http://www.blogger.com/atom/ns#'>actionscript</category><category domain='http://www.blogger.com/atom/ns#'>as</category><category domain='http://www.blogger.com/atom/ns#'>bezier</category><title>as3mathlib (formerly WIS math libraries)</title><description>&lt;p&gt;I've just imported the WIS mathematics library -- an excellent collection of mathematics routines -- onto &lt;a href="http://code.google.com/p/as3mathlib/"&gt;Google Code&lt;/a&gt;.  (You'll find the &lt;a href="http://members.shaw.ca/flashprogramming/wisASLibrary/wis/index.html"&gt;Actionscript 2 version of the library&lt;/a&gt; at its original site)&lt;/p&gt;&lt;p&gt;This library carries a BSD-ish license and includes support for
&lt;/p&gt;&lt;ul&gt;&lt;li&gt;Geometric Objects and Intersection calculations&lt;/li&gt;&lt;li&gt;    Integral and Differential equation calculations&lt;/li&gt;&lt;li&gt;Bezier, Quadric, Polynomial, Complex, Vector and Matrix calculations&lt;/li&gt;&lt;li&gt;    Symbolic expression parsing &lt;/li&gt;&lt;/ul&gt;&lt;p&gt;I'm converting the library to Actionscript 3 from Actionscript 2 as time and necessity allow.  (That's converting as in getting it to work, and converting as in getting it to be object/pattern oriented).  Right now it builds without errors and only a few warnings, but I haven't applied any of the unit tests or checked it for correctness or compatibility.&lt;/p&gt;&lt;p&gt;If you see the value of updating this well-thought out collection of functions, please get in touch and I will add you as a developer.  The code is quite modular: it will be straigforward to take modest chunks and get them working independently.  I wrote the original author and maintainer, who responded "By all means, continue in the evolution/integration of my library to support AS3" -- but please let me know of any other efforts to update this code, or if a similar or superior math library exists, so that I don't waste my time :).
&lt;/p&gt;&lt;p&gt;Email me [flip at the mrflip with the dot and the com] or comment on this post if you'd like to pitch in!&lt;/p&gt;
&lt;!-- ckey="2D834602" --&gt;&lt;div class="blogger-post-footer"&gt;&lt;script type="text/javascript"
  src="http://vizsage.com/assets/adsense-728x90.js"&gt;
&lt;/script&gt;
&lt;script type="text/javascript"
  src="http://pagead2.googlesyndication.com/pagead/show_ads.js"&gt;
&lt;/script&gt;&lt;/div&gt;</description><link>http://vizsage.com/blog/2007/09/as3mathlib-formerly-wis-math-libraries.html</link><author>flip</author></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-4201613802871642889.post-774573965206618225</guid><pubDate>Mon, 27 Aug 2007 15:06:00 +0000</pubDate><atom:updated>2008-01-07T04:11:00.703-06:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>data</category><category domain='http://www.blogger.com/atom/ns#'>DC</category><category domain='http://www.blogger.com/atom/ns#'>map</category><category domain='http://www.blogger.com/atom/ns#'>metro</category><category domain='http://www.blogger.com/atom/ns#'>differential</category><category domain='http://www.blogger.com/atom/ns#'>underground</category><category domain='http://www.blogger.com/atom/ns#'>washington</category><category domain='http://www.blogger.com/atom/ns#'>mapping</category><category domain='http://www.blogger.com/atom/ns#'>time</category><category domain='http://www.blogger.com/atom/ns#'>geometry</category><category domain='http://www.blogger.com/atom/ns#'>london</category><category domain='http://www.blogger.com/atom/ns#'>visualization</category><category domain='http://www.blogger.com/atom/ns#'>vizsage</category><category domain='http://www.blogger.com/atom/ns#'>isochronic</category><category domain='http://www.blogger.com/atom/ns#'>tube</category><category domain='http://www.blogger.com/atom/ns#'>subway</category><title>Subway Geography and Geometry</title><description>I've written an applet that lets you &lt;a href="http://vizsage.com/apps/subzero/"&gt;reimagine the geography of greater Washington, DC&lt;/a&gt; area with "distance" measured by subway-travel-time, measured by subway-travel-cost, or as the standard clarified subway wall map would deform it.

&lt;img src="http://vizsage.com/apps/subzero/assets/screenshots/time-kingst-700.jpg" /&gt;

This was in large part inspired by Oskar Karlin's beautifully rendered &lt;a href="http://www.oskarlin.com/2005/11/29/time-travel"&gt;Isochronic Elephant-Castle map of the London Underground&lt;/a&gt; and the &lt;a href="http://www.tom-carden.co.uk/p5/tube_map_travel_times/applet/"&gt;interactive     tube mapplet&lt;/a&gt; from Tom Carden.

&lt;a href="http://www.fakeisthenewreal.org/subway/" rel="nofollow"&gt;Subway Maps of the world all on the same scale&lt;/a&gt; is pretty interesting, as is this directory of &lt;a href="http://ni.chol.as/media/sillytube.html" rel="nofollow"&gt;remixed London Underground maps&lt;/a&gt;.  There's a few interesting images on wiki commons, like &lt;a href="http://commons.wikimedia.org/wiki/Image:NYC_subway_simplified_map.png"&gt;this geographical map&lt;/a&gt; within this &lt;a href="http://commons.wikimedia.org/wiki/New_York_City_Subway"&gt;gallery&lt;/a&gt;.

Also, you can download the image files (very large, register with each other) from Wikipedia:
  &lt;ul&gt;   &lt;li&gt;&lt;a href="http://commons.wikimedia.org/wiki/Image:DC_Area_Road_Map_With_FontSubset.svg"&gt;Greater Washington, DC Area:
    Road Map&lt;/a&gt;&lt;/li&gt;      &lt;li&gt;&lt;a href="http://commons.wikimedia.org/wiki/Image:WashingtonDCTopoMap.jpg"&gt;Greater Washington, DC Area: Topological
    Map&lt;/a&gt;&lt;/li&gt;      &lt;li&gt;&lt;a href="http://commons.wikimedia.org/wiki/Image:WashingtonDCAerialPhoto_2590x2000.jpg"&gt;Greater Washington, DC
    Area: Aerial Photo&lt;/a&gt;&lt;/li&gt;    &lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;script type="text/javascript"
  src="http://vizsage.com/assets/adsense-728x90.js"&gt;
&lt;/script&gt;
&lt;script type="text/javascript"
  src="http://pagead2.googlesyndication.com/pagead/show_ads.js"&gt;
&lt;/script&gt;&lt;/div&gt;</description><link>http://vizsage.com/blog/2007/08/ive-written-applet-that-lets-you.html</link><author>flip</author></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-4201613802871642889.post-995116906846019340</guid><pubDate>Fri, 24 Aug 2007 20:08:00 +0000</pubDate><atom:updated>2007-08-25T03:57:45.290-05:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>o'reilly</category><category domain='http://www.blogger.com/atom/ns#'>flex</category><category domain='http://www.blogger.com/atom/ns#'>oreilly</category><category domain='http://www.blogger.com/atom/ns#'>actionscript cookbook</category><category domain='http://www.blogger.com/atom/ns#'>actionscript</category><category domain='http://www.blogger.com/atom/ns#'>patch</category><category domain='http://www.blogger.com/atom/ns#'>diff</category><title>Patches to the AS3 Cookbook Code</title><description>&lt;p&gt;
The &lt;a href="http://www.oreilly.com/catalog/actscpt3ckbk/"&gt;Actionscript 3 Cookbook&lt;/a&gt; is a very helpful reference, and the example code that came with it has many good examples.  Unfortunately there's a modicum of bitrot in the code: compiler warnings and  errors when compiling under strict mode.
&lt;/p&gt;&lt;p&gt;Here is a &lt;a href="http://www.blogger.com/files/AS3CB-patch-2007-08-24.diff"&gt;patch&lt;/a&gt; against the version of the code I downloaded on 2007-08-24:&lt;/p&gt;&lt;blockquote&gt;&lt;a href="http://vizsage.com/blog/files/AS3CB-patch-2007-08-24.diff"&gt;files/AS3CB-patch-2007-08-24.diff&lt;/a&gt;&lt;/blockquote&gt;&lt;p&gt;&lt;/p&gt;Apply it &lt;a href="http://vizsage.com/blog/2007/08/how-to-make-patch-using-diff.html"&gt;thusly&lt;/a&gt;.
&lt;p&gt;&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;script type="text/javascript"
  src="http://vizsage.com/assets/adsense-728x90.js"&gt;
&lt;/script&gt;
&lt;script type="text/javascript"
  src="http://pagead2.googlesyndication.com/pagead/show_ads.js"&gt;
&lt;/script&gt;&lt;/div&gt;</description><link>http://vizsage.com/blog/2007/08/patches-to-as3-cookbook-code.html</link><author>flip</author></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-4201613802871642889.post-4773520512962351771</guid><pubDate>Fri, 24 Aug 2007 15:44:00 +0000</pubDate><atom:updated>2007-09-01T02:31:42.001-05:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>command line</category><category domain='http://www.blogger.com/atom/ns#'>cvs</category><category domain='http://www.blogger.com/atom/ns#'>patch</category><category domain='http://www.blogger.com/atom/ns#'>diff</category><category domain='http://www.blogger.com/atom/ns#'>svn</category><title>How to make a patch using diff</title><description>I always forget the command to use, and google is strangely devoid of helpful/correct advice.  Therefore I'm posting this for my own and future generations' reference.
&lt;h3&gt;How to Generate a Patch from Standalone Code&lt;/h3&gt;For non-(svn|cvs),
&lt;ul&gt;&lt;li&gt;You should have a directory holding the original source (i.e. "dir") and a directory holding the modified (i.e. "dir-orig").&lt;/li&gt;&lt;li&gt;Obviously, don't modify dir-orig (that is, it should match the author's).  If you don't trust yourself, do a &lt;span style="font-family:monospace;"&gt;chmod -R a-w dir-orig&lt;/span&gt; to recursively mark the directory read-only.&lt;code&gt;&lt;/code&gt;&lt;/li&gt;&lt;li&gt;Generate the patch by going to the parent directory (holding &lt;code&gt;dir-orig &lt;/code&gt;and &lt;code&gt; dir&lt;/code&gt;) and running the command&lt;blockquote&gt;&lt;code&gt;diff -Nuwr dir-orig dir &gt; /tmp/my-happy-patch.diff&lt;/code&gt;&lt;/blockquote&gt;&lt;ul&gt;&lt;li&gt;dir and dir-orig should be paths to the dirs in question, obviously.&lt;/li&gt;&lt;li&gt;&lt;code&gt;-N&lt;/code&gt; creates newly added files (treats absent files as empty files)&lt;/li&gt;&lt;li&gt;&lt;code&gt;-u&lt;/code&gt; creates a "unified" diff -- it's hunam readable and works well with patch&lt;/li&gt;&lt;li&gt;&lt;code&gt;-w&lt;/code&gt; ignores whitespace, which is polite if your (clearly superior) formatting policy differs from the original.&lt;/li&gt;&lt;li&gt;&lt;code&gt;-r&lt;/code&gt; recursively descends the source tree.&lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;&lt;li&gt;Other helpful options:&lt;/li&gt;&lt;ul&gt;&lt;li&gt;&lt;code&gt;-p&lt;/code&gt; (applied to C or C++ code) shows the function the new code appears in.  Only use this with C or C++ code (i.e. YMMV)
&lt;/li&gt;&lt;li&gt;&lt;code&gt;-u6&lt;/code&gt; (or any number following the -u) gives that many lines of context (the default is 3, which should be fine for code that isn't changing like the star of a Tootsie stage performance)&lt;/li&gt;&lt;li&gt;&lt;code&gt;-x ".??*"&lt;/code&gt; ignores .DS_Store, Eclipse and other hidden-file turds, if you have those.  Emacsen should add &lt;code&gt;-x "*~"&lt;/code&gt;.&lt;/li&gt;&lt;/ul&gt;&lt;li&gt;Sanity-check the change
&lt;code&gt;less /tmp/my-happy-patch.diff&lt;/code&gt;&lt;/li&gt;&lt;/ul&gt;&lt;h3&gt;How to Generate a Patch from Subversion or CVS&lt;/h3&gt;To generate a patch from cvs or svn, (advice horked from &lt;a href="http://www.xsmiles.org/participate.html"&gt;X-Smiles.org&lt;/a&gt;):
&lt;ul&gt;&lt;li&gt;Make sure you are synchronized with the latest sources:
&lt;code&gt;$ cvs update src&lt;/code&gt;
(or wherever your changes are; use a directory that spans all the changed modules or the trunk directory.)
&lt;/li&gt;&lt;li&gt;Sanity-check the change:
&lt;code&gt;$ cvs diff src&lt;/code&gt;
&lt;/li&gt;&lt;li&gt;Generate the patch (replace -Nuwr with whatever you decided works for you from the options above).
&lt;code&gt;$ svn --diff-cmd=diff -x-Nuw src &gt; /tmp/my-happy-patch.diff&lt;/code&gt;&lt;/li&gt;&lt;li&gt;(Rather than include &lt;code&gt;-x ""&lt;/code&gt; args, you should be adding turd files to your &lt;code&gt;~/.subversion/config&lt;/code&gt;, for instance
&lt;code&gt;global-ignores = *.o *.lo *.la #*# .*.rej *.rej .*~ *~ .#* .DS_Store&lt;/code&gt;
or however you pronounce that in a &lt;code&gt;~/.cvsrc&lt;/code&gt; or &lt;code&gt;.cvsignore&lt;/code&gt;.)&lt;/li&gt;&lt;li&gt;Sanity-check the patch
&lt;code&gt;less /tmp/my-happy-patch.diff&lt;/code&gt;&lt;/li&gt;&lt;/ul&gt;(By the way, if your &lt;a href="http://www.google.com/search?hl=en&amp;q=windows+sucks&amp;amp;btnG=Search"&gt;broken OS&lt;/a&gt; lacks a command line, you might be able to &lt;a href="http://cygwin.com/"&gt;add one&lt;/a&gt;).
&lt;h3&gt;How to use a Patch&lt;/h3&gt;To apply such a patch,
&lt;ul&gt;&lt;li&gt;Download the patch and save it somewhere intelligent.&lt;/li&gt;&lt;li&gt;If you're &lt;span style="font-style: italic;"&gt;not&lt;/span&gt; using version control, &lt;span style="font-weight: bold;"&gt;!!make a copy of the source tree!!&lt;/span&gt;
&lt;/li&gt;&lt;li&gt;Change directories &lt;span style="font-style: italic;"&gt;into&lt;/span&gt; that new folder --the one holding the unmolested (or svn trunk) code.&lt;/li&gt;&lt;li&gt;Sanity check: run the command
&lt;code&gt;cat &lt;/code&gt;&lt;code&gt;/tmp/my-happy-patch.diff | &lt;/code&gt;&lt;code&gt;patch -p1 --dry-run
&lt;/code&gt;&lt;/li&gt;&lt;li&gt;If you see results like
&lt;code&gt;patching file foo/bar/my-happy-file.as&lt;/code&gt;
...
you're good to go:
&lt;code&gt;cat &lt;/code&gt;&lt;code&gt;/tmp/my-happy-patch.diff | &lt;/code&gt;&lt;code&gt;patch -p1&lt;/code&gt;&lt;/li&gt;&lt;li&gt;Pitfalls:&lt;/li&gt;&lt;ul&gt;&lt;li&gt;If you get a patch taken from &lt;span style="font-style: italic;"&gt;within&lt;/span&gt; the modified directory, change -p1 to -p0.&lt;/li&gt;&lt;li&gt;If you get a patch with the original and modified dirs reversed, add a --reverse flag.
&lt;/li&gt;&lt;/ul&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;script type="text/javascript"
  src="http://vizsage.com/assets/adsense-728x90.js"&gt;
&lt;/script&gt;
&lt;script type="text/javascript"
  src="http://pagead2.googlesyndication.com/pagead/show_ads.js"&gt;
&lt;/script&gt;&lt;/div&gt;</description><link>http://vizsage.com/blog/2007/08/how-to-make-patch-using-diff.html</link><author>flip</author></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-4201613802871642889.post-5618334501307336030</guid><pubDate>Mon, 20 Aug 2007 17:09:00 +0000</pubDate><atom:updated>2007-08-24T14:39:32.990-05:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>flex</category><category domain='http://www.blogger.com/atom/ns#'>error</category><category domain='http://www.blogger.com/atom/ns#'>Mathematics</category><category domain='http://www.blogger.com/atom/ns#'>rotate</category><category domain='http://www.blogger.com/atom/ns#'>scale</category><category domain='http://www.blogger.com/atom/ns#'>translate</category><category domain='http://www.blogger.com/atom/ns#'>concat</category><category domain='http://www.blogger.com/atom/ns#'>matrix</category><category domain='http://www.blogger.com/atom/ns#'>matrices</category><category domain='http://www.blogger.com/atom/ns#'>actionscript</category><category domain='http://www.blogger.com/atom/ns#'>transform</category><category domain='http://www.blogger.com/atom/ns#'>adobe</category><category domain='http://www.blogger.com/atom/ns#'>shear</category><category domain='http://www.blogger.com/atom/ns#'>invert</category><title>Flex Demo: Matrix Math (and an error in the Actionscript docs)</title><description>I'm working on something that uses (an algorithm similar to) texture mapping, for which I want to precalculate the .invert() of a whole bunch of .transform.matrix objects.  I'll post something about that in the next coupla days.

Meanwhile, I found something perplexing in the Actionscript documentation but the possibility exists that I am just a dope so please point out an error in my reasoning.

As you may know, you can represent any arbitrary combination of 2-D scalings, skews, rotations and translations using standard matrix operations. The &lt;a href="http://livedocs.adobe.com/flex/2/langref/flash/geom/Matrix.html"&gt;Actionscript docs for the Matrix class&lt;/a&gt; mention this in passing, but has elements .b and .c switched: it should be
&lt;blockquote&gt;&lt;img style="width: 72px; height: 65px;" src="http://vizsage.com/demos/matrixmathdemo/theory/genericMatrix.png" alt="right: [ [a,c,tx] [b,d,ty] [0,0,1] ]" border="0" /&gt; and not &lt;img style="width: 72px; height: 65px;" src="http://vizsage.com/demos/matrixmathdemo/theory/genericMatrix-wrong.png" alt="wrong: [ [a,c,tx] [c,d,ty] [0,0,1] ]" border="0" /&gt;.&lt;/blockquote&gt;
I whipped up a &lt;a href="http://vizsage.com/demos/matrixmathdemo/MatrixMathDemo.html"&gt;MatrixMathDemo&lt;/a&gt; in flex to demonstrate the issue:
&lt;ul&gt;&lt;li&gt;&lt;a href="http://vizsage.com/demos/matrixmathdemo/MatrixMathDemo.html"&gt;Demo&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://vizsage.com/demos/matrixmathdemo/srcview/index.html"&gt;Source&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://vizsage.com/demos/matrixmathdemo/docs/index.html"&gt;Docs&lt;/a&gt;&lt;/li&gt;&lt;/ul&gt;and a writeup on &lt;a href="http://vizsage.com/demos/matrixmathdemo/theory/MatrixMath.pdf"&gt;Mathematical matrices and the actionscript Matrix transformations&lt;/a&gt; [PDF].  The 2d, 3d, 4th tabs compare the Matrix methods concat(), invert() and (deltaP/p)ointTransform() respectively with an explicit calculation of the corresponding Matrix operation: they show that in fact the documentation has &lt;span style="font-style: italic;"&gt;b&lt;/span&gt; and&lt;span style="font-style: italic;"&gt; c&lt;/span&gt; switched.  (The code is &lt;a href="http://vizsage.com/license/Visage-Deed-BY.html"&gt;free to reuse or modify&lt;/a&gt; (but give credit) in case that's useful.)

Some references:
&lt;ul&gt;&lt;li&gt;This &lt;a href="http://www.senocular.com/flash/tutorials/transformmatrix/"&gt;flash-specific tutorial at senocular.com&lt;/a&gt; is good, though the matrices are transposed from what is typically presented.
&lt;/li&gt;&lt;li&gt;The posts on the flashcoders list &lt;a href="http://www.mail-archive.com/flashcoders@chattyfig.figleaf.com/msg11331.html"&gt;A little matrix.invert() mystery&lt;/a&gt; and &lt;a href="http://www.mail-archive.com/flashcoders@chattyfig.figleaf.com/msg11401.html"&gt;followup&lt;/a&gt; are explained by &lt;span style="font-style: italic;"&gt;b&lt;/span&gt; and&lt;span style="font-style: italic;"&gt; c&lt;/span&gt; being switched; they link to &lt;a href="http://kiroukou.media-box.net/blog/mes-recherches-sur-flash/62-classe-matrix-de-flash8-eronnee.html"&gt;post on a french blog&lt;/a&gt; that is helpful if you parlez.
&lt;/li&gt;&lt;li&gt;To brush up on matrix mathematics, please see the &lt;a href="http://mccammon.ucsd.edu/%7Eadcock/matrixfaq.html#Q41"&gt;Matrix and Quaternion FAQ&lt;/a&gt; or &lt;a href="http://en.wikipedia.org/wiki/Matrix_%28mathematics%29"&gt;Wikpedia&lt;/a&gt;. &lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;script type="text/javascript"
  src="http://vizsage.com/assets/adsense-728x90.js"&gt;
&lt;/script&gt;
&lt;script type="text/javascript"
  src="http://pagead2.googlesyndication.com/pagead/show_ads.js"&gt;
&lt;/script&gt;&lt;/div&gt;</description><link>http://vizsage.com/blog/2007/08/flex-demo-matrix-math-and-error-in.html</link><author>flip</author></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-4201613802871642889.post-1172639074595725267</guid><pubDate>Wed, 25 Jul 2007 09:51:00 +0000</pubDate><atom:updated>2007-08-24T14:33:47.087-05:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>flex</category><category domain='http://www.blogger.com/atom/ns#'>elisp</category><category domain='http://www.blogger.com/atom/ns#'>mode</category><category domain='http://www.blogger.com/atom/ns#'>mxml</category><category domain='http://www.blogger.com/atom/ns#'>xml</category><category domain='http://www.blogger.com/atom/ns#'>emacs</category><category domain='http://www.blogger.com/atom/ns#'>actionscript</category><category domain='http://www.blogger.com/atom/ns#'>as</category><title>Emacs modes for Flex</title><description>&lt;ul&gt;&lt;li&gt;Emacs modes for Flex:&lt;/li&gt;&lt;ul&gt;&lt;li&gt;XML:
&lt;a href="http://www.oreillynet.com/mac/blog/2003/10/a_new_xmlediting_mode_for_emac.html"&gt;nXML-mode for Emacs from James Clark&lt;/a&gt;
&lt;a href="http://www.ibm.com/developerworks/xml/library/x-emacs/"&gt;Using Emacs for XML documents&lt;/a&gt;
&lt;/li&gt;&lt;li&gt;Actionscript:
&lt;a href="http://blog.pettomato.com/?cat=7"&gt;actionscript-mode.el&lt;/a&gt; for editing actionscript files in emacs.
&lt;/li&gt;&lt;li&gt;At least right now it seems you want this &lt;a href="http://www.thaiopensource.com/download/nxml-mode-20041004.tar.gz"&gt;xml mode&lt;/a&gt; and this &lt;a href="http://blog.pettomato.com/content/actionscript-mode.el"&gt;actionscript-mode.el&lt;/a&gt;.&lt;/li&gt;&lt;li&gt;Then, add
&lt;blockquote&gt;&lt;pre&gt;(setq auto-mode-alist (append (list
 '("\\.as\\'"   . actionscript-mode)
 '("\\.\\(xml\\|xsl\\|rng\\|xhtml\\|mxml\\)\\'" . nxml-mode)
 ;; add more modes here
 ) auto-mode-alist))

;;
;; ------------------ Magic for XML Mode ----------------
;;

(setq nxml-mode-hook
    '(lambda ()
 (setq tab-width        2
       indent-tabs-mode nil)
       (set-variable 'nxml-child-indent     2)
       (set-variable 'nxml-attribute-indent 2)
       ))
&lt;/pre&gt;&lt;/blockquote&gt;&lt;/li&gt;&lt;li&gt;You can use &lt;code&gt; M-x customize-group RET nxml-highlighting-faces RET&lt;/code&gt; to fix your colors the way you like 'em.
&lt;/li&gt;&lt;/ul&gt;&lt;li&gt;Setting up asdoc to work within Flex Builder:&lt;/li&gt;&lt;ul&gt;&lt;li&gt;&lt;a href="http://www.peterelst.com/blog/2006/09/03/flex-builder-2-ant-support/"&gt;First, install ant support&lt;/a&gt; (Ant is &lt;a href="http://ant.apache.org/"&gt;an offshoot of apache&lt;/a&gt; and is like Makefile only more betterer.)&lt;/li&gt;&lt;li&gt;Then set up a &lt;a href="http://blog.bittube.com/2006/08/15/ant-buildxml-for-asdocs-generation/"&gt;build.xml&lt;/a&gt; in your docs/ directory to build the documentation set.
&lt;/li&gt;&lt;li&gt;I had to modify mine a bit: I added
&lt;code&gt;&amp;lt;property name="Templates.dir" location="${FlexSDK.dir}/asdoc/templates/"/&amp;gt;
&amp;lt;arg line='-templates-path ${Templates.dir}'/&amp;gt;
&lt;/code&gt;&lt;/li&gt;&lt;li&gt;I also linked the flex home to a no-funny-characters dir:
&lt;pre&gt;    ln -s "/Applications/Applications/Adobe Flex Builder 2" /work/ProgramStores/Flex
   cd /work/ProgramStores/Flex
   ln -s "Flex SDK 2" FlexSDK&lt;/pre&gt;Then I exported the location for the asdoc file:
&lt;pre&gt;    export FLEX_HOME=/work/ProgramStores/Flex/FlexSDK&lt;/pre&gt; or else I got the error message:
&lt;pre&gt;    Exception in thread "main" java.lang.NoClassDefFoundError: Flex&lt;/pre&gt;
&lt;/li&gt;&lt;/ul&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;script type="text/javascript"
  src="http://vizsage.com/assets/adsense-728x90.js"&gt;
&lt;/script&gt;
&lt;script type="text/javascript"
  src="http://pagead2.googlesyndication.com/pagead/show_ads.js"&gt;
&lt;/script&gt;&lt;/div&gt;</description><link>http://vizsage.com/blog/2007/07/emacs-modes-for-flex.html</link><author>flip</author></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-4201613802871642889.post-6039094362099134586</guid><pubDate>Tue, 24 Jul 2007 05:54:00 +0000</pubDate><atom:updated>2007-08-25T03:53:30.248-05:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>flex</category><category domain='http://www.blogger.com/atom/ns#'>custom</category><category domain='http://www.blogger.com/atom/ns#'>xmlns</category><category domain='http://www.blogger.com/atom/ns#'>namespace</category><category domain='http://www.blogger.com/atom/ns#'>mxml</category><category domain='http://www.blogger.com/atom/ns#'>xml</category><category domain='http://www.blogger.com/atom/ns#'>compc</category><category domain='http://www.blogger.com/atom/ns#'>manifest</category><category domain='http://www.blogger.com/atom/ns#'>URL</category><title>Adobe Flex and Custom Namespace / manifest.xml</title><description>Create a file "manifest.xml" and add the following:
&lt;blockquote&gt;&lt;pre&gt;&amp;lt;?xml version="1.0"?&amp;gt;
&amp;lt;?componentpackage&amp;gt;
&amp;lt;?!--
URI http://vizsage.com/vzg
namespace gg
package com. vizsage
--&amp;gt;
&amp;lt;component id="widget1" class="com.vizsage.controls.widget1"/&amp;gt;
&amp;lt;component id="widget2" class="com. vizsage.controls.widget2"/&amp;gt;
&amp;lt;?/componentpackage&amp;gt;
&lt;/pre&gt;&lt;/blockquote&gt;Notes:
&lt;ul&gt;&lt;li&gt;The class part should give the full path (with / turned into .) to the corresponding .as files.
&lt;/li&gt;&lt;li&gt;You don't need one &amp;lt;component/&amp;gt; for each .as file, just one for each component.
&lt;/li&gt;&lt;li&gt;The comment part, like most comments (and many goggles), does nothing: all you need is a &lt;component&gt;&amp;lt;component/&amp;gt; for each widget.
&lt;/component&gt;&lt;/li&gt;&lt;li&gt;You don't have to follow the tld.domain.clevername.widgetname format, but it's what all the cool kids are doing.  Just make sure the dotted path matches your files' path.&lt;/li&gt;&lt;li&gt;The dotted path and the namespace URL don't have anything to do with each other.&lt;/li&gt;&lt;li&gt;In fact, the namespace URL is &lt;span style="font-style: italic;"&gt;completely made up&lt;/span&gt;: it doesn't have to exist; the compiler doesn't look for it; hell, adobe's URL doesn't even &lt;a href="http://www.adobe.com/2006/mxml"&gt;exist&lt;/a&gt;.  It's just a tag for uniquely identifying a namespace. All that matters is that the namespace in your compiler flags and your mxml files match up.&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;If you use Flex Builder, go into the library project's properties, into the "Library Compiler" field -- add the namespace and manifest.xml into the respective fields. If you use the standalone package, you'll have to add an option for the &lt;span style="text-decoration: underline;"&gt;Component Compiler&lt;/span&gt; &lt;blockquote&gt;&lt;pre&gt;-include-namespaces="http://vizsage.com/vzg"  -namespace "http://vizsage.com/vzg" manifest.xml&lt;/pre&gt;&lt;/blockquote&gt;&lt;/p&gt;&lt;ul&gt;&lt;li&gt;You need to include the namespace &lt;span style="font-style: italic;"&gt;and&lt;/span&gt; define it&lt;/li&gt;&lt;li&gt;The &lt;code&gt;-namespace&lt;/code&gt; flag takes &lt;span style="font-style: italic;"&gt;two&lt;/span&gt; arguments (a namespace and a manifest.xml)&lt;/li&gt;&lt;li&gt;The URI here has to match the ns:URL in your .mxml file.&lt;/li&gt;&lt;/ul&gt;&lt;component&gt;Now your .mxml files (which can be anywhere, and not in that project) start off like&lt;/component&gt;&lt;blockquote&gt;&lt;pre&gt;&amp;lt;?xml version="1.0" encoding="utf-8"?&amp;gt;
&amp;lt;mx:Application
xmlns:mx="http://www.adobe.com/2006/mxml"
xmlns:gg="http://vizsage.com/vzg"
layout="absolute" width="100%" height="100%"
viewSourceURL="srcview/index.html"&amp;gt;
&amp;lt;?!-- ... your mxml file ... --&amp;gt;
&lt;/pre&gt;&lt;/blockquote&gt;&lt;ul&gt;&lt;li&gt;Make damn sure the xmlns URI matches what you used before.  I spent 30 minutes figuring out that &lt;code&gt;http://www.vizsage.com/vzg&lt;/code&gt; and &lt;code&gt;http://vizsage.com/vzg&lt;/code&gt; weren't the same thing.
&lt;/li&gt;&lt;li&gt;In Flex Builder 2, you need to get your project's properties, go into "Flex Build Path", then the "Library Path" pane, and "Add SWC" (the one you built with your custom components).
&lt;/li&gt;&lt;li&gt;For the command-line tools, add a flag &lt;pre&gt;-library-path+=/abs/olute/path/to/library.swc&lt;/pre&gt;Make sure that's a += there.&lt;/li&gt;&lt;li&gt;Either way, applications (as opposed to libraries) don't need any compiler flags or manifest.xml nothing.  The library uniquely identifies itself within a namespace, and provides files in the right .com.foo.bar hierarchy.  When your .mxml file (asserts a namespace) and (includes the file) everything turns out right.
&lt;/li&gt;&lt;/ul&gt;For more &lt;a href="http://blog.flashgen.com/2007/07/04/manifests-namespaces-and-flex-builder-2/"&gt;about namespaces see here&lt;/a&gt;, with one caveat: I think you're better off using the &lt;code&gt;&lt;a href="http://livedocs.adobe.com/flex/201/html/wwhelp/wwhimpl/common/html/wwhelp.htm?context=LiveDocs_Book_Parts&amp;file=compilers_123_09.html"&gt;-load-config+=&lt;/a&gt; &lt;/code&gt;trick (to just tack on your changes) than hacking stuff into the main flex-config.xml file.&lt;code&gt;
&lt;a href="http://livedocs.adobe.com/flex/201/html/wwhelp/wwhimpl/common/html/wwhelp.htm?context=LiveDocs_Book_Parts&amp;amp;file=compilers_123_09.html"&gt;&lt;/a&gt;&lt;/code&gt;&lt;div class="blogger-post-footer"&gt;&lt;script type="text/javascript"
  src="http://vizsage.com/assets/adsense-728x90.js"&gt;
&lt;/script&gt;
&lt;script type="text/javascript"
  src="http://pagead2.googlesyndication.com/pagead/show_ads.js"&gt;
&lt;/script&gt;&lt;/div&gt;</description><link>http://vizsage.com/blog/2007/07/adobe-flex-and-custom-namespace.html</link><author>flip</author></item></channel></rss>