Sunday, April 8, 2007

Using Regexp

I got a task to parse file consisting of SQL commands and various noisy lines including empty, comment and echo. I decided to use ruby String::scan to extract SQL commands from files. The only problem is how to write an correct regexp. I started with some test
entire=<<-EOF
create table A(id number)
/
create table B(id number)
/
EOF

entire.scan(/create table.*\//) #=>[]
I passed to String::scan a regexp that represents a String starting with 'create table' following by any number of characters and ending with '/'. But there is something wrong instead of
=> ["create table A(id number)\n/", "create table B(id number)\n/"]
I got a empty array
=>[]
I have looked at documentation, tried different alternatives without success. Then I picked the book Mastering Regular Expressions and read the chapter. It is immediately clear to me, instead of using '.' as any character, I should use [^\/] mean any character different from '/'. The correct usage of regexp is
entire.scan(/create table[^\/]*\//) 
#=> ["create table A(id number)\n/", "create table B(id number)\n/"]
In order to support multi-line and avoid case sensitive, I just add two options 'm' and 'i' to the regexp
entire.scan(/create table[^\/]*\//im) 

No comments: