For the love of parsing

by Laura Betz

We look at a project and see the many things that need to be accomplished. One of the things I rather enjoy figuring is how to parse text. With data conformity being enforced during data entry there should not be as much to parse out. However, sometimes we acquire information from other avenues such as API calls or HTML that has a “text blob” which needs sorted out.

There are different methods to parse data, and it helps that the data has a considerable amount of consistency. One of the more popular ways is regular expressions which the setup is to make a pattern (or a rule) of how to find a certain piece of text. Without plugins there isn’t an easy way to use regular expressions, but parsing is still possible. In FileMaker, using calculation functions like PatternCount or Substitute make it possible to search for specific text strings. Then after finding the text, it can be processed and formatted into the type of data needed.

Below, I have an example of a calculation that parses out the amount of time entered and then changes the data into a uniform format. Time can be entered and formatted differently, which both examples of time format and parsing are illustrated in another one of my past blogs Just In Time. Since the hour comes before the minute, when extracting the numbers, it can be assumed that the first number in the text is the hour and the second is the minute. In the event that there is only one number entered and unable to tell whether the number is an hour or minute, then there is likely a time notation that can be referred. With a somewhat consistent pattern the certain pieces of information have been extracted and put into a similar format.

In combination of the FileMaker platform and parsing data can be better managed and more useful. Instead of having to search through blobs of text, they can now be properly shown in a way that is easier to read. Also, since the data would be more uniformed and segmented, it is easier for those to be used elsewhere in the system, and therefore accomplishing more in the system.

 


NOTE: PatternCount

Checks for patterns of strings in the input. This is used below to check the time notation for whether it includes hours and/or minutes (or neither).

 

NOTE: Substitute

Listed to be possibly formatted as:

Substitute ( text ; [search1; replace1]; [search2; replace2]; … [searchN; replaceN] )

The part in italics is optional and can be repeated up to 999 times.

Also, the order of the pairs is able to affect the output.

This function is used below to remove the time notation from the input so that only the numbers remain.

 

EXAMPLE

Input Output
1 Hour 30 Minutes 1:30:00
3HR45MIN 3:45:00
2:30 2:30:00

Let (
[
   ttime = TABLE::TimeElapsed ;
   hrBool = Case (
      PatternCount ( ttime ; "hour" ) or PatternCount ( ttime ; "hr" ) ) or PatternCount ( ttime ; "hs" ) ;
1 ; 0 ) ;
   minBool = Case (
      PatternCount ( ttime ; "minute" ) or PatternCount ( ttime ; "min" ) ;
1 ; 0 ) ;
   numTime = GetAsNumber ( ttime ) ;
   tformat = GetAsTime ( ttime ) ;
   tnum =  Substitute ( Upper ( ttime ) ; [ " " ; "" ] ;
       [ "HS" ; "¶" ] ; [ "HRS." ; "¶" ] ; [ "HRS" ; "¶" ] ; [ "HR" ; "¶" ] ;
       [ "HOURS" ; "¶" ] ; [ "HOUR" ; "¶" ] ;
       [ "MINUTES" ; "¶" ] ; [ "MINUTE" ; "¶" ] ;
       [ "MIN." ; "¶" ] ; [ "MINS" ; "¶" ] ; [ "MIN" ; "¶" ]
     ) ;
   tnumMin = GetValue ( tnum ; 1 ) * 60  + GetValue ( tnum ; 2 ) ;
   hrMin = Case (
      ValueCount ( tnum ) > 1 ; "both" ;
   hrBool = 1 ; numTime * 60 ;
   minBool = 1 ; numTime ;
    0 ) ; 
timeFormat =
      Case ( tformat ≠ "?" ; tformat ;
         PatternCount ( ttime ; "n/a" ) or PatternCount ( ttime ; "na" ) ; "N/A" ;
         hrMin = "both" ; GetAsTime ( tnumMin * 60 ) ;
         GetAsTime ( hrMin * 60 )
      ) ;
result = timeFormat
] ;
   result
)