Thursday, November 30, 2006

String#split, String#scan, and String#each_byte which is faster?

irb(main):026:0> Benchmark.bm do |bm|
irb(main):027:1* bm.report("split:") {
10000.times do a = "1234567890".split('') end }
irb(main):028:1> bm.report(" scan:") {
10000.times do a = "1234567890".scan(/./) end }
irb(main):029:1> bm.report(" eb:") {
10000.times do "1234567890".each_byte { |by| (a ||= []) << by } end }
irb(main):030:1> end
user system total real
split: 0.320000 0.000000 0.320000 ( 0.321568)
scan: 0.200000 0.000000 0.200000 ( 0.210951)
eb: 0.260000 0.030000 0.290000 ( 0.345428)

So, I am surprised that scan was faster, did you guess that? I wonder if pre-compiling the regex will make it even faster?
irb(main):033:0> Benchmark.bm do |bm|
irb(main):034:1* bm.report("split:") {
10000.times do a = "1234567890".split('') end }
irb(main):035:1> bm.report(" scan:") {
10000.times do a = "1234567890".scan(rx) end }
irb(main):036:1> bm.report(" eb:") {
10000.times do "1234567890".each_byte { |by| (a ||= []) << by } end }
irb(main):037:1> end
user system total real
split: 0.280000 0.010000 0.290000 ( 0.292449)
scan: 0.180000 0.000000 0.180000 ( 0.180988)
eb: 0.280000 0.050000 0.330000 ( 0.367461)

Interesting, I may need to dig deep into the ruby core and see what makes the speed difference. I believe split and scan are implemented as native code. I can understand why each_byte is the slowest given the loop complexity. More may follow as I dig in just for fun and the learning value.

Monday, November 13, 2006

Strange how string splitting is the same in Ruby, Python, and Perl, but maybe not?

Recently I wanted to have the individual characters of a string be assigned to left hand local variables. The string is always the same size, so after validating the size I wanted a single statement to assign values to local variables for each character within the string. Python was the most recent language I had spent time in, so I was reminded of the following:
>>> s = "12"
>>> a, b = s
>>> a
'1'
>>> b
'2'
>>>

This was exactly how I accomplished it before, so how about ruby. Let's try that in ruby using irb
irb(main):001:0> s = "12"
=> "12"
irb(main):002:0> a, b = s
=> ["12"]
irb(main):003:0> a
=> "12"
irb(main):004:0> b
=> nil
irb(main):005:0>

That is definetly not what I wanted. I thought so, how about perl?
$ perl -e '$s="12";($a,$b)=split("", $s);print "$a\n";print "$b\n"'
1
2

That works and looks like it would be close to the same in ruby I guess. Let's try:
irb(main):001:0> s = "12"
=> "12"
irb(main):002:0> s.split("")
=> ["1", "2"]
irb(main):003:0> a, b = s.split("")
=> ["1", "2"]
irb(main):004:0> a
=> "1"
irb(main):005:0> b
=> "2"
irb(main):006:0>

Now I have accomplished my original goal in ruby. Looking back I see python using an implicit action which wins for breavity, not that this was a competition. Ruby is close to that with an explicit action in an object esque, or message reciever way. Perl is explicit too, but dare I say a little harder to follow, and certainly not in a object esque syntax. It is amazing how close and far away the implementations are. As a final note Python's split does not allow the empty string, look at the ipython session below:
In [1]: "12".split("")
---------------------------------------------------------------------------
exceptions.ValueError Traceback (most recent call last)
/home/johnnyp/
ValueError: empty separator
In [2]:

Again because I had done a fair amount of python most recently I was surprised and expected the python behaviour a,b = "12" in ruby. This all led to asking on the ruby-talk mailing list. because I thought I might be missing some hidden ruby syntax. I initially used String#scan in my script and was quickly reminded of split from responses on the mailing list. Split more than likely can be found in your language of the day; However, be aware that it might miss behave ;-)

Thursday, November 09, 2006

Why Python? Actually why not python?

Eric Raymond (ESR) wrote "Why Python?" which is amazingly persistent in popularity on the linuxjournal.com website. I even asked ESR at a LinuxWorld what he thought about this fact and he remarked that he was pleasantly surprised. I can remember reading it and being inspired to try python. I used python in a successfully project and enjoyed it, mostly its dynamic nature. So, this blog is called ruby-talk, where am I going with this and what does it have to do with ruby? Let's see what led me away from python.

The repetitious "self" is often cited, but it actually did not bother me, nor did strict indentation. What really got me was the global keyword, pythons super syntax, and over abundance of web development frameworks. The web development framework situation has really changed and I only mention it because it was then an issue for me and would not be today.

Looking at the global keyword, say in a python module you might have:

foo = "bar"
def zoo:
global foo
foo = "baz"
end

This code shows how zoo can rebind foo, that is change it from "bar" to "baz" and have that change visible outside of zoo. I just could not get used to that and while it is small it continually annoyed me. Now, on to python's super syntax, for example:

class Derived(base.Base):
def __init__(self,x,y,z,**kw):
super(Derived, self).__init__(self,x,y,**kw)

Every time I looked at this it was not evident within seconds what is happening, nor obvious even after studying it. I won't explain it here as that is not my point, or intention, so if you need an explanation then you are getting my point here.

These may be small things, but they continually bothered me. I am thankful to "Why Python?" for getting me looking at python, but what it really did was get me on the dynamic language page. This led me to look at the other dynamic language Ruby. Upon looking at Ruby I began to see it was striking this really nice balance between several influential languages. What bothered me in python no longer bothered me in Ruby and I also gained some of the things I missed in perl. For me I am continually able to look at Ruby code from others, code I write, and it meshes nicely with the way I think. What a nice surprise.