Thursday, November 30, 2006

String#split, String#scan, and String#each_byte which is faster?

irb(main):026:0> Benchmark.bm do |bm|
irb(main):027:1* bm.report("split:") {
10000.times do a = "1234567890".split('') end }
irb(main):028:1> bm.report(" scan:") {
10000.times do a = "1234567890".scan(/./) end }
irb(main):029:1> bm.report(" eb:") {
10000.times do "1234567890".each_byte { |by| (a ||= []) << by } end }
irb(main):030:1> end
user system total real
split: 0.320000 0.000000 0.320000 ( 0.321568)
scan: 0.200000 0.000000 0.200000 ( 0.210951)
eb: 0.260000 0.030000 0.290000 ( 0.345428)

So, I am surprised that scan was faster, did you guess that? I wonder if pre-compiling the regex will make it even faster?
irb(main):033:0> Benchmark.bm do |bm|
irb(main):034:1* bm.report("split:") {
10000.times do a = "1234567890".split('') end }
irb(main):035:1> bm.report(" scan:") {
10000.times do a = "1234567890".scan(rx) end }
irb(main):036:1> bm.report(" eb:") {
10000.times do "1234567890".each_byte { |by| (a ||= []) << by } end }
irb(main):037:1> end
user system total real
split: 0.280000 0.010000 0.290000 ( 0.292449)
scan: 0.180000 0.000000 0.180000 ( 0.180988)
eb: 0.280000 0.050000 0.330000 ( 0.367461)

Interesting, I may need to dig deep into the ruby core and see what makes the speed difference. I believe split and scan are implemented as native code. I can understand why each_byte is the slowest given the loop complexity. More may follow as I dig in just for fun and the learning value.

2 Comments:

Blogger Developer said...

That is really cool. Thanks for the article!

2:57 AM  
Anonymous Anonymous said...

This is backwards. Each byte is fastest. Each character next.

9:59 AM  

Post a Comment

<< Home