Detect time in text with Regexes

For Clye I want people to be able to send date suggestions and to add time based notifications. So what I want is for people to type in tomorrow 6PM and it should be detected as a time reference. So well, what are the options? I could go with an already done and working parser, but unfortunately I could not find any that is able to detect all those places in a text. Thats why I built my Own.

If you think of text detection, of course Regular Expressions come to mind. This is exactly what I did. But first I started with define an Interface. Well I want a string to go in and a ISO8601 formatted string to be returned if it is a valid time. Also Time could be relative and I want my function easily testable, so the function also takes the current time as a parameter.

function parseNaturalTime(now: Date, inp: string): string | null

That simple enough. I am german and it should parse german input, so pleas excuse that the logic is specific to the german language, however a version for different languages is not that different.

So first I started by specifying some Testcases. (I use jest, but mocka or similar would do as well)

import { anyRegex, parseNaturalTime } from "./date";
import { parseISO } from "date-fns";

const now = parseISO("2021-05-05T12:00");

function check(inp: string, value: string, n = now) {
   test(`"${inp}" ausgehend von ${n.toISOString()}`, () => {
      expect(parseNaturalTime(n, inp)).toEqual(value);
   });
}

check("06.11.1996", "1996-11-06");
check("6.11.96", "1996-11-06");
check("1.5.1996", "1996-05-01");

check("morgen", "2021-05-06");
check("Morgen", "2021-05-06");
check("heute", "2021-05-05");

check("5:00", "2021-05-05T05:00");

check("morgen 15:00", "2021-05-06T15:00");

check("morgen um 15:00", "2021-05-06T15:00");

check("montag 8:00", "2021-05-10T08:00");
check("dienstag 8:00", "2021-05-11T08:00");
check("mittwoch 8:00", "2021-05-12T08:00");
check("donnerstag 8:00", "2021-05-06T08:00");
check("freitag 8:00", "2021-05-07T08:00");
check("samstag 8:00", "2021-05-08T08:00");
check("sonntag 8:00", "2021-05-09T08:00");

Good so now we can start implementing the logic.

There are a few ways to go now. I could for example use a parser generator. While I have used those before and really liked Chevrotain, it is probably a little bit overkill and heavy for what I am trying to do here. So I opted for Regular expressions instead. Yes they can get very complicated and hard to read, but with the help of named capturing groups and Template string, it is actually relatively pleasent. Without further ado this is what I came up with:

// match normal date definition
const dateRegex = /([0-3]?[0-9]).([0-1]?[0-9]).([0-9]{4}|[0-9]{2})/;
// match day names or relative day sepcific modifiers
const dayRegex = /(morgen|heute|gestern|vorgester|übermorgen|montag|dienstag|mittwoch|donnerstag|freitag|samstag|sonntag)/i;
// match time definition
const timeRegex = /([0-2]?[0-9]):([0-5][0-9])/;
// optinally combine date and time with optional keyword "um"
export const anyRegex = new RegExp(
   `(((?<date>${dateRegex.source})|(?<day>${dayRegex.source}))(\\s+(?:um\\s+)?(?<time>${timeRegex.source}))?)|(?<time2>${timeRegex.source})`,
   "i"
);

Well not the prettiest, but actually not too complicated and and relatively easy to use. One of the most anoying this is getting offsets of groups right to extract the correct value. This however is solved by the named capturing groups so event the extraction code is quite easy.

export function parseNaturalTime(now: Date, inp: string): string | null {
   const match = inp.match(anyRegex);
   if (!match?.groups) return null;
   const { date, day, time, time2 } = match.groups;
   let res = "";
   if (date) {
      const [d, m, y] = date.split(".");
      let year = +y;
      if (year < 100) year = year > 30 ? year + 1900 : year + 2000;
      res = printDate(year, +m, +d);
   } else if (day) {
      const dow = now.getDay();
      let dayOffset = 0;
      const c = (v: number) => {
         const r = (v + 7 - dow) % 7;
         return r === 0 ? 7 : r;
      };
      switch (day.toLowerCase()) {
         case "morgen":
            dayOffset += 1;
            break;
         case "gestern":
            dayOffset -= 1;
            break;
         case "vorgestern":
            dayOffset -= 2;
            break;
         case "übermorgen":
            dayOffset += 2;
            break;
         case "montag":
            dayOffset = c(1);
            break;
         case "dienstag":
            dayOffset = c(2);
            break;
         case "mittwoch":
            dayOffset = c(3);
            break;
         case "donnerstag":
            dayOffset = c(4);
            break;
         case "freitag":
            dayOffset = c(5);
            break;
         case "samstag":
            dayOffset = c(6);
            break;
         case "sonntag":
            dayOffset = c(0);
            break;
      }
      const nd = new Date(now);
      nd.setDate(nd.getDate() + dayOffset);
      res = printDate(nd.getFullYear(), nd.getMonth() + 1, nd.getDate());
   } else {
      res = printDate(now.getFullYear(), now.getMonth() + 1, now.getDate());
   }

   const t = time ?? time2;
   if (t)
      res +=
         "T" +
         t
            .split(":")
            .map((v) => v.padStart(2, "0"))
            .join(":");

   return res;
}

function printDate(year: number, month: number, day: number) {
   const y = year.toString().padStart(4, "0");
   const m = month.toString().padStart(2, "0");
   const d = day.toString().padStart(2, "0");
   return `${y}-${m}-${d}`;
}

If you listened closely there is one last thing to do. I wanted to find all appearances in a string. Well thats also very easy given this input. I can just use the anyRegex and find the places. Then I can all parseNaturalTime on all matches and voila there are the dates.

Of course this is far from a complete or let alone multilingual solution, but it is a good starting point. I hope you learned something or got inspired to use the awesome regex features ;-).