I’m going to take a break from all my recent QGIS posts to talk about some MapBasic news… I’m proud to announce the release of MbRegEx, an open-source library for using regular expressions in MapBasic scripts!
If you’re not familiar with regular expressions, they’re an extremely powerful tool for string manipulation. They can be somewhat daunting at first, but with a bit of practice they’ll open up all kinds of string processing which would otherwise be extremely convoluted or impossible.
Up until now there’s been no way of using the beauty of regular expressions within MapInfo. Now, with MbRegEx, all their mighty power can be fully utilised within your MapBasic scripts!
Included Functions
The library contains 5 functions for use in MapBasic scripts:
RegExTest(string, regex): returns true or false depending on whether string contains the specified regular expression. For example, RegExTest( “ABC123”, “[A-Z]{3}\d{3}” ) returns true, RegExTest( “AB12”, “[A-Z]{3}\d{3}” ) returns false. This function really comes in handy when selecting records, eg:
SELECT * FROM address WHERE RegExTest( street, "MAIN (ST|RD)( [NESW])?$")
to select all addresses where the street is either “MAIN ST” or “MAIN RD” and which may have an optional N/E/S/W suffix.
RegExMatch(string, regex): returns the first part of string which matches the specified regular expression. Great for quickly extracting specific parts of a string. Let’s say we’ve got an address field, and we’d like to grab just the street number from it. We could use the regular expression “^(\d+[A-Z]*)”, which will match any leading numbers with optional letter suffixes. So RegExMatch( “12A Main St”, “^(\d+[A-Z]*)” ) returns “12A”, whereas RegExMatch( “Upper Main St”, “^(\d+[A-Z]*)”) will return an empty string. Trying to achieve this same task using built-in MapBasic functions alone would be extremely tedious and error-prone!
RegExMatchAll(string, regex, array) and RegExMatchMultiple(string, regex, array) will fill array with all the parts of string which pass the specified regular expression. RegExMatchAll is used when you have just one capturing group in your regular expression, and RegExMatchMultiple is used for multiple capturing groups.
This is probably best demonstrated with some examples:
RegExMatchAll( "12A Main St", "\b(\w+?)\b", sMatches )
will fill the sMatches array with all the individual words from “12A Main St”. So sMatches(1) = “12A”, sMatches(2) = “Main”, etc.
RegExMatchMultiple( "10.5.2", "(\d+)\.(\d+)\.(\d+)", sMatches )
fills sMatches with the results of the three capturing groups (the bracketed parts) of the regular expression. Thus sMatches(1) = “10”, sMatches(2) = “5” and sMatches(3) = “2”.
Again, these two functions are great for splitting a string into its component parts or doing something fancy like extracting all the phone numbers from a paragraph of text.
Lastly, RegExReplace(string, regex, replacement, destination) replaces the parts of string which match the regular expression with replacement, and stores the result in destination. One handy use for this is removing invalid characters from a string, such as
RegExReplace("T/:e@st!i~n&g#1,^2}3", "[^A-Za-z0-9]", "_", sDest)
This results in sDest = “T__e_st_i_n_g_1__2_3”.
Getting Started
There’s a few extra examples and full instructions for using these functions in the “test.mb” file included with MbRegEx. You can use these as a starting point for your own MapBasic scripts. If regular expressions are new to you, I recommend regular-expressions.info as a great resource together with an online regex visualiser such as debuggex.
Also, a quick warning – MbRegEx does no validation or testing of your regular expressions, so be careful with badly formatted expressions as they’ll cause MapInfo to crash. If in doubt, run the expression through an online tool to validate it first.
Downloads
The full source code is available on GitHub, under a public domain license. If you’re looking for the easiest way to get started, just download this zip which contains both the MbRegEx DLL and a sample MapBasic program demonstrating all the different regular expression functions which are available in the library.
End note – building the MbRegEx DLL file
I’ve included a CodeBlocks project file with the source, but building the MbRegEx.dll file can be difficult – you’ll need to first compile the boost libraries using mingw on your system. Then, make sure that in the CodeBlocks project build options, under Linker settings you have your “stage\lib\libboost_regex-mgw46-1_52.a” (or similar) correctly linked. Also, under Search directories add your boost folder to the Compiler tab.