Tuesday, November 29, 2011

Selecting random table row

Say we got a table with distributors. Each row contains a link to distributor details matching:
*flow=distributors&transition=view_details*

To extract number of such rows one would:

  1. use SET EVAL on HTML source extracted by EXTRACT=HTM. This fails if HTML contains quotation marks. Which it does.
  2. use SEARCH command with REGEXP match on , but global matching does not work yet

Oh dear, nothing really works. Let's be clever and exploit the fact that when we EXTRACT, the result is added to !EXTRACT variable. Extraction results are separated by [EXTRACT] string. If extraction fails, #EANF# is added.

So if we extract 3 links on a page and only 1 is real, we get:
*flow=distributors&transition=view_details*[EXTRACT]#EANF#[EXTRACT]#EANF#

No we can use SET/EVAL on this string:

js = """
var la1="{{!EXTRACT}}";
la1 = la1.split('[EXTRACT]');
var x=0;
for(var i=0; i < la1.length; i++){
   if (la1[i] != "#EANF#") {
      x++
   }
};
x
"""

set_eval('!VAR1', js)
Please refer to toolbox for set_eval function.