Wednesday, October 20, 2010

<< vs +=

So I was assigned a story today at work that called for parsing a 80k+ row csv file, and adding some data to it, then throwing it into a model. Simple enough. My first run just to get it running looped over the rows and called model.create for each one. This was obviously pretty slow. It had to create a new object for every row in the csv file. So I re factored and decided to build one very large sql insert query string. The code looked something like this:

sql = "INSERT INTO table (column_1) VALUES "
rows.each do |row|
sql += "(row.value),"
end

To my surprise when I benchmarked it verses the previous method it was slower! After almost giving up on the insert statement I changed += to <<. I re-ran the code and expected minimal speed increase. To my surprise the code went from taking about 2 hours to about 2 minutes!

I think the speed increase comes from += creating a new string object every time, whereas
<< literally concatenates it. Here are some benchmarks.
>> slow = Benchmark.measure {
?> str = ""
>> 100000.times {
?> str += "a"
>> }
>> }
=> #

>>
?> fast = Benchmark.measure {
?> str = ""
>> 100000.times {
?> str << "a"
>> }
>> }
>> puts slow
86.310000 0.400000 86.710000 ( 86.780992)
=> nil
>> puts fast
0.030000 0.020000 0.050000 ( 0.049042)
=> nil
>>

86 seconds vs .04 seconds

8 comments:

  1. This comment has been removed by the author.

    ReplyDelete
  2. Thanks for catching that. I wrote it too quickly and didn't proof read :/ I've fixed it. Thanks!

    ReplyDelete
  3. Great result Adam! Using << is more than a hundred times faster on my system too. I use Ruby 1.9.2, Ubuntu 10.04).

    This is an excellent gain and IMHO worthwhile to alert the Ruby core team to consider optimization.

    ReplyDelete
  4. Sometimes I forget that not everyone knows this. Thanks for spreading the word.

    ReplyDelete
  5. I love ruby.

    class String
    def +(o)
    self << o.to_s
    end
    end

    ----

    require 'benchmark'

    class String
    def +(o)
    self << o.to_s
    end
    end

    slow = Benchmark.measure {
    str = ""
    100000.times {
    str += "a"
    }
    }


    fast = Benchmark.measure {
    str = ""
    100000.times {
    str << "a"
    }
    }


    puts slow
    # BEFORE: 2.490000 0.960000 3.450000 ( 3.628836)
    # AFTER: 0.030000 0.000000 0.030000 ( 0.037329)

    puts fast
    # BEFORE:0.030000 0.000000 0.030000 ( 0.026932)
    # AFTER: 0.030000 0.000000 0.030000 ( 0.025144)

    ReplyDelete
  6. 1.8.7 is about as fast as 1.9.2. Rubinius is surprisingly slow. http://gist.github.com/650719

    ReplyDelete
  7. > This is an excellent gain and IMHO worthwhile to alert the Ruby core team to consider optimization.

    No, var += sth resolve to var = var + sth, and operators like +, at least on String, should always create a new Object (to not modify the original).

    @Rob Lowe: So that is not a great idea, except for fun :)

    You just have to know about the right method, and avoid unnecessary Object creation by mutating, for example.

    ReplyDelete
  8. Also, check out the MySQL LOAD DATA command. It's much faster than INSERT.

    ReplyDelete