Lego Mindstorm Codecs Added to ffmpeg

Open source is so awesome.  So are nerds.

Demo of Scary Vulnerability Using Safari Autofill

Brilliant + terrifying.

Visit the above link in Safari.  It’s just a demo of the vulnerability; nothing bad will happen. 

Can I get a w00t w00t‽

Can I get a w00t w00t‽

Two fantastic tidbits from Ars on the recent zero-day Window’s exploit that takes advantage of a flaw in Windows shortcuts.

The best option for mitigating the flaw is to disable Windows’ ability to show shortcuts’ icons. However, this mitigation comes at some cost; it removes all the icons from the Start menu, for example, which is sure to be detrimental to usability.

and

Microsoft doesn’t list Windows 2000 or Windows XP Service Pack 2 as vulnerable, but this is because they are no longer supported—they are just as vulnerable as more recent versions, but will not receive a patch.

Sigh.

Tuesday, July 20, 2010 — 2 notes
hugehuge:

True.

hugehuge:

True.

No really, Fluid is open source

This is awesome.  Todd Ditchendorf open-sourced Fluid.app on GitHub.  So what are you waiting for?  

New in Campfire: Drag and drop uploading

Yes!

Pretty interesting post from Percona this week about the cost of scaling out MySQL.  They compared the cost/benefit of increasing memory vs switching to an SSD.
The above graph is a little confusing at first, but take a look at the 6GB point on the x-axis.  By switching to an SSD, you can get a 5x speed boost (TPS).  To do the same thing with RAM, you’d have to add another 10GB.
Anecdotally we all knew SSDs improve performance, but it’s great to see someone throw some numbers behind that claim. Of course there are a million YMMV caveats here (data size, read-write ratio, active data vs historical, etc) but it’s always good to have alternative ways to scale out a single host that don’t just involve throwing RAM at it.

Pretty interesting post from Percona this week about the cost of scaling out MySQL.  They compared the cost/benefit of increasing memory vs switching to an SSD.

The above graph is a little confusing at first, but take a look at the 6GB point on the x-axis.  By switching to an SSD, you can get a 5x speed boost (TPS).  To do the same thing with RAM, you’d have to add another 10GB.

Anecdotally we all knew SSDs improve performance, but it’s great to see someone throw some numbers behind that claim. Of course there are a million YMMV caveats here (data size, read-write ratio, active data vs historical, etc) but it’s always good to have alternative ways to scale out a single host that don’t just involve throwing RAM at it.

OS X Geek Tip of the Day

When OS X Disk Utility tells you that your disk has “issues” and you must boot up from a system disk and run Disk Utility again, you can simply boot in single-user mode by holding down command-S during bootup and run `fsck -fy` at the command prompt. It does the same thing.

Now there’s no need to (a) look for your OS X install disk or (b) wait the 10 minutes for your system to boot off a DVD.

Monday, March 29, 2010

Never mind that real businesses track all of their data in SQL databases that scale just fine… By replacing MySQL or Postgres with a different, new data store, you have traded a well-enumerated list of limitations and warts for a newer, poorly understood list of limitations and warts, and that is a huge business risk… The sooner your company admits this, the sooner you can get down to some real work.

You said it, Ted Dziuba!

Did you know that you can specify an IP address in your browser using a base other than base 10?  For example, the following are all equivalent representations of 66.102.13.99 (one of Google’s IPs)

From a post on Viruslist:

Now, by itself, this isn’t terribly interesting from a technical perspective; this “feature” of IP specification has been around for quite a while.
What is interesting is that due to the relative obscurity of using such methods to denote an IP or URL, it is quite feasible that existing security products do not correctly identify the URLs as valid or flag them as malicious when they point to existing known bad websites.

In other words, known malicious sites may be able to sneak through web filters because the different representations aren’t recognized.

Wednesday, March 24, 2010 — 4 notes
Counting down the days until a VZW Nexus One or HTC Incredible. 

Counting down the days until a VZW Nexus One or HTC Incredible. 

Too many tables; MySQL can only use 61 tables in a join

AKA You’re Doing it Wrong

Huge CSV Data Exports in Rails

I’m a strong believer in data portability for software as a service applications.  As a consumer, I never want to use a service that won’t let me get my data out.  And as a provider, I’d much prefer to compete on features, quality, customer service, etc than on locking in customers.   Additionally,  it takes a lot of hubris to assume that your application can provide all the analysis a customer may want to do with her data.  Letting customers import data into Excel or a similar tool is incredibly valuable.

At Mobile Commons, we have a ton of data that customers may want to export.  User profiles, text messages, donations, phone call records, etc.  And customers want to use different query parameters for their export.  For example, pull every constituent in New York that has opted in to a particular advocacy campaign and donated money.

This basic idiom repeats itself over and over again in our application:

  • A customer pulls up a page with a customizeable set of search parameters
  • An HTML view displays the data with pagination
  • A CSV export button enables them to download the data, but obviously without pagination this time

In Rails, this is pretty easy to accomplish with respond_to blocks.  We setup the ActiveRecord finders, then return the appropriate content type:

For example, consider the following URL: http://foo.example.com/articles/123/comments?published=true&after=2010-01-01&order=updated_at


class CommentsController
  def index
    finder = current_user.articles.find(params[:article_id]).comments
    finder = finder.published if params[:published] = true
    finder = finder.scoped(:conditions=>'created_at > ?', params[:after])
    finder = finder.scoped(:order=>params[:order_by])
    respond_to do |format|
      format.html do
        finder.paginate( params[:page])
      end
      format.csv do
        send_csv finder.all.to_csv
      end
    end
  end
end

This works quite well.  The code is DRY and easy to read.  The problem comes when the number of comments grows large.  Let’s say we have 1,000 comments in the database (for example, if we become the first google result for “facebook login”).  The pagination version still works fine. We show 10 on a page and let the user click through lots of pages. But the CSV version falls flat as we try to pull 1,000 records from the database at once on every request.

No problem!  This is a perfect opportunity for asychronous processing.  We push a message onto a queue, run the report in the background, and provide a callback URL for the user to pickup the CSV when it’s complete.

This approach works great, but makes for an interesting architectural decision: where should the finder logic live?  It seemed like the controller was a good place for it, but now we want to access it from another process that knows nothing about MVC.  You probably don’t realize how much business logic is implicitly encoded in your routes until you try something like this.  For example, how did the current_user variable get set?  Maybe by parsing the subdomain?  And how do we know what article_id is?  We extracted it from the URL path.

Suddenly you realize that repackaging that logic to be used in both places and remain DRY may not be trivial.  For one controller it’s not so bad, but remember that  you may have this idiom repeated over and over in 20 different controllers, each with their own routes and query params.  The question really becomes: what message should you put on the queue so that your workers don’t all need custom duplicate logic?

We ended up taking an rather clever approach to this problem (thanks, Justin).  We already have code that works when it’s a web request.  What if we just serialize the URL and HTTP params and put them on the queue.  Our worker can pick them up and just call the controller method directly.

Rails already has this packaged nicely for us with the ActionController::Integration::Session class, which is typically used for integration testing.


#first deserialize the message to a hash object
path   = message[:path]
params = message[:params]
app = ActionController::Integration::Session.new
app.host! "foo.example.com"
app.get(path, params)
csv = app.response.body

Works like a charm!  We go through the same code path as a regular web request, but we do it all in-process.  We never go over the wire.  Suddenly you don’t care if the process takes 2 seconds or 2 minutes.  It’s running happily in the background and you’re good to go.

Then days go by, then the weeks.  Suddenly your comment thread has gone from 1,000 records to 10,000 records.  The background job gets slower and slower.  Which is okay, but you notice it’s using more and more machine resources.

What’s happening?  Well, a good guess is that finder.all is putting all 10,000 ActiveRecord objects into memory at once, which consumes a ton of resources.  If you haven’t read the Engine Yard post on memory bloat, I highly recommend it.

No problem!  Rails 2.3 ships with easy utility functions find_in_batches & find_each which pulls 1,000 records at a time and iterates through them all.


format.csv do
  csv = ""
  finder.all.find_each do |records|
    csv << records.to_csv
  end
  send_csv csv
end

Great!  Memory usage plummets and everything is honky dory.

Then days go by, then the weeks.  Suddenly your comment thread has gone from 10,000 records to 100,000 records.  The background job gets slower and slower.  Which is okay, but you notice it’s using more and more machine resources.

Wait, what?  Why did you repeat that?  What happened?  I thought we fixed that.

Yeah, me too.  But suddenly our export started taking forever and using 1GB+ of memory.  What the deuce?  find_each should have solved this problem.  We tested the export code over and over.  We couldn’t reproduce the problem.  We had unit tests and integration tests and load tests. We tried in IRB and rake tasks and just couldn’t reproduce.

After much investigation and a great tip from Aaron Quint, we discovered a nasty little gotcha:  in Rails 2, the ActiveRecord cache is enabled automatically for all controller actions (http://dev.rubyonrails.org/changeset/6189).  Because we were calling app.get in our worker task, it was automatically loading the query cache.  So every instance of a Comment was being cached, all 100,000 of them, even if we were only finding 1,000 at a time.  The problem only showed up when called via the controller because the cache is disabled by default everywhere else.

Turns out the problem is trivially fixable by wrapping find_each in an uncached{} block:


format.csv do
  csv = ""
  Comment.uncached do
    finder.all.find_each do |records|
      csv << records.to_csv
    end
  end
  send_csv csv
end

And voila!  We now have a background task that calls a single bit of DRY controller code, processes in batches, and doesn’t use a lot of memory.  Wawaweewa!

Tuesday, March 9, 2010

Admit It, Microsoft: You Suck at the Web – GigaOM

Let’s see… 60% of the browser market, one of the top 3 most visited web sites, one of the top 3 search engines, and $40B in cash…

I would hate to see what things would look like if they were just “pretty good at the web.”