April 10, 2020

A regex puzzle: match a, b, or a and b

For a project I'm parsing a textual representation of a time duration. I stumbled on an interesting regex problem while doing that.

Concretely, I'm parsing these indications on my Audible library page of how long I've listened to an audiobook. I want to extract the hours and/or minutes from this duration.

This was my first regex attempt : /(\d{1,2})h(?:\s?(\d{1,2})m)?/

Regex 101 link

It seemed to work well. But as the regex nerd might see, I soon encountered a bug when there was less than one hour left: the hours had disappeared, and my regex didn't account for that.

Regex 101 link

So how to solve this issue?

A first idea is to make the hour part and minute part optional with the ? operator:


This seems to work. And it does, given that the input string is always exactly in the format "8h 2m left".

Regex 101 unit test link

But one "issue": this regex isn't robust in the input it accepts.

If any characters are added before or behind, it will fail. Because it also matches the empty string!

See this test, note the space in front of the 8m.

Regex 101 link

Why? Because the regex a?b? matches either a or b, but matching nothing is also OK! And for the quick thinkers: a|b would in this case not be a solution, because there are capturing groups in a and b.

I'm abstracting a as the hour-matcher and b as the minute matcher here.

That's the problem that intrigued me:

How can you match 'a', 'b', or 'a and b' while preserving the same single capturing groups in a and b?

I took this as a regex puzzle and came up with some solutions:

Solution 1: Brute force with code

The first solution is not really a regex solution: use a global flag on the previous regex, and then to write code to scan for "valid" matches. Empty string matches are discarded, only a filled match is kept.

Works, but I'm looking for a regex-only solution.

Solution 2: Brute force regex

A straightforward regex solution is this: a|b|ab

But I didn't like this. Because I have a capturing group in both a and b, and by duplicating a and b into ab, that means I'd also duplicate the capturing groups.

Then we go from this simple scenario:

  • match group 1 contains hours
  • match group 2 contains minutes

To this more complicated one:

  • match group 1 contains hours if the string is an hour-only string like '8h left'
  • match group 2 contains minutes if the string is a minute-only string like '8h left'
  • match group 3 contains hours if the string is combined hour-minute string like '8h 3m left'
  • match group 4 contains minutes if the string is combined hour-minute string like '8h 3m left'

That means the code handling the regex match would need if/else logic to deal with these scenarios. A compromise you might take, but I'm looking for a "regex-only" solution.

Solution 3: Explicitly not matching an empty string?

We could solve this if we could tell the regex that it can never match an empty string somehow.

I tried to look around for this briefly, but couldn't find anything that worked.

See this StackOverflow thread and let me know if you make it work :)

Solution 4: Only match when something good lies ahead

This is the solution that finally worked for me: a positive lookahead.

This piece of regex (?=\d{1,2}\s*(?:h|m)) in front of the rest tells the matcher that it should look ahead for something with one or two digits and a 'h' or 'm'. Only when it finds this in front of itself, it can start with the real matching and capturing groups.

It can be abstracted as (?=a|b)a?b?, but my concrete implementation takes some shortcuts there.

This solution has the robustness against spaces and only 2 capturing groups. I added the global flag in this example to demonstrate what it does:

See this regex101 for the full regex.

The regex does not support the notion of "days" or "seconds", or epxressions like "less than one minute". I hope these will not appear in the Audible page :)

June 09, 2019

✅ Voc Enhancer's first feature request

A while ago, a new review came in for Vocabulary.com Enhancer, an extension I made that adds functionality like translations to the popular English word learning site vocabulary.com.

very useful! can you add the pronunciation button to the "list page"? thank you!

̶ Yao Xiao, May 19 2019, Chrome

Reason enough for me to jump back in the development! I feel it's important to maintain the extension and help users where necessary, especially now that 150 people are using it daily. That's the only way it will stay relevant, grow and provide more value.

The pronunication button is now there in v0.6, enjoy Yao! And being back in it, I tackled one more task.

Upping that UX ✨

For another feature of the extension, a collection of links to external services in a list of saved words, I was annoyed by the repeated icons to external services: they were bloating the page. The words should stay the main content of the page, and this was not the case.

The solution: piggyback

Now the icon row is collapsible and collapsed by default to avoid visual bloat. I had some fun (and trouble) implementing CSS animations in inline-blocks 🙃. Here's a little demo of this functionality (or disguised advertisement for Google Define, youglish, and the Urban Dictionary).

Some trade-offs & decisions I made while designing this change:

  • A horizontal collapse with a triangle icon ▸ is not common. They are mostly used to hide and show a vertical content container, like a review with spoilers. In this case, the three-dot "More" burger menu would have been the more standard design pattern. I still chose the horizontal collapse, just because it was fun to implement and to experiment with it.
  • The "link" 🔗 icon to the left is slightly messy. Omitting it looks better, but it does add necessary meaning to the open/hide triangle button.
  • To open the link configuration, the standard settings ⚙️ icon was used because it was available in the existing font symbol set of the site. Otherwise, an editing pen symbol would have been more fitting.

I still want to see how I can visually tune the menu so it looks more organized. If a proper UI designer is reading this, feel free to send help! Feedback is also welcome, always.

May 26, 2019

Sleep Visualization

Last week I spent some hours on visualizing my sleep pattern. It is based on 99 days of manually tracked data (yes!). This is a first part of a much larger data tracking project in progress. See a rendered full screen version here.

Stylistically I used a minimalistic color scheme that could've come straight from Hundred Rabbits1. Obviously, this visualization still misses a legend or visual cues to give meaning to the elements. But maybe the air of mystery around my actual sleep/wake times is not too bad ;)

A little legend:

  • Each vertical white line represents a night, time goes from left to right.
  • The top red dots are wake-up times. The higher the earlier.
  • The bottom white dots are sleep times. The lower the later.
  • The dotted lines are stay-awake-in-bed times - somehow I found this data interesting - maybe because I want to reduce that time 🙃

Technically I had fun implementing a parser that could understand my ambiguous notes on sleep & wake times. I used an Airtable2() spreadsheet with a row for every day and columns for sleep and wake times. The format is not consistent though. I could write "2:15" to mean a bed time at 2:15 AM the next day, or "11" to mean 11 PM that same day. Moment.js was helpful. For the visualization itself I used the JS vector interface library paper.js just to try it out. It's great for what it does - more advanced than the p5.js library I had used before. But I might still replace it it with D3.js as the latter offers more tools for working with data dynamically, as I used in a similar project for visualizing the movies I've seen.

You might already be able to spot that starting a 9 to 5 job had an effect on my sleep stability the last weeks. To be continued!

  1. Hundred Rabbits is an 2-person artistic collective making tools, games, recipes, and videos on board of a sailboat. Hundred Rabbits is cool, check them out. 

  2. Airtable.com is spreadsheets on steroids. Google Sheets but better, with a nice auto-generated API. My Net Promoter Score is skyrocketing here. 

April 10, 2019

Disinformation documentaries

Today I watched an interesting documentary (in Dutch) from Flemish public television about Russian disinformation in the US and Europe. Among the cases it explores are Russian blaming of Belgian F-16 fighters for killing civilians in Syria and the defamation of a Finnish reporter that investigated Russian interference in Finland.

The techniques used to disinform are not new however. While social media is the prevalent medium today, masses have been misled or centuries by parties that have incentive to do so. It reminded me of Merchants of Doubt, a 2014 feature-length documentary that tells the stories of the tobacco industry fighting medical counterpressure with lies, and climate change deniers tainting scientific consensus with fake science.

The main takeaway is that sowing lies causes people to doubt truths spread in the mainstream media. Little false discourse is necessary to achieve a sufficient level of confusion about eg. climate change. When people are confused, they stick to what they want to believe and ignore information that disconfirms their beliefs (see cognitive dissonance). That is the goal of the disinformers.

These documentaries remain relevant today with climate change denier Trump in office and the Russians doing their thing as we speak. They teach us to be wary of counter-information, wherever it may come from. Recommended to watch!

November 29, 2018

Selecting text where you can't

Sometimes you want to copy-paste text from a web page, but it won't work. Here are a few common reasons and workarounds.

1. The text is embedded in a link

The text might be contained by a link. If you click+drag to select text, you will move the link element! Sometimes it is not visible that the text you're trying to select is, in fact, a link.

Solution: hold the ALT key while selecting (Option key on Mac). This will allow you to select text in a clickable area like a button

Try to select me!

2. Selecting is intentionally disabled on the site

The site developer probably does not want you to copy the text. This happens regularly on news websites. They include JavaScript code that captures all your clicks and selections and 'kills' these events. Do not try this on a site where you don't want to lose progress of some sort

Solution: block JavaScript on the site

  1. Install the μBlock Origin plugin (Firefox, Chrome, Safari, Microsoft Edge)
  2. Click on the μBlock icon to scripts for the site
  3. Reload the page
  4. Copy/pasting is now possible.

You want to re-enable the JavaScript after your copy-paste, a lot of sites depend on it today to function properly.

PS: μBloc Origin is a versatile, lightweight and open-source ad-blocker. It is probably better than other adblock plugins you might have installed.

May 01, 2018

Installing your Firefox add-on permanently

When you're developing a cross-browser add-on, you probably want to try it out for a while in your daily browser. Unfortunately, contrary to Chrome, if you temporarily load your extension in Firefox, it will be gone after a restart.

That's because Firefox needs to sign your add-on before you can install it anywhere.

The documentation explains so, but in a convoluted way. It's actually pretty simple:

  1. Register at FF's Developer Hub (top right) if you don't have a FF account yet.
  2. Go to the add-on submission page.
  3. Choose 'on your own'. This wil immediately sign your add-on, but it won't be listed in the add-on site for distribution. Ideal for a test version.
  4. Upload your zipped add-on files.
  5. Sign, download the .xpi file & enjoy. It can be installed from about:addons → gear icon → Install Add-on from file
February 23, 2018

Dear Professor,

Like most students, I made a resolution this semester to have a fresh start, to attend all classes and make sure I understand the main points in the lectures. I've been able to live up to that promise these two weeks, but I'm afraid you just made me break it.

The reason? It's not your stuttering, I can live with that. Your verbal skills and physical presence are decent too. The problem is, there is almost zero added value in coming to your class.

Almost every single word you say is projected black-white next to you.

Then, what do you expect me to do when sitting there? Should I listen to you, or read the PowerPoint content? When I try to listen, the slides distract me. When I try to read, your talking distracts me. The people around me checking their 9GAG feeds on Facebook don't help either. I'd rather just browse the slides at home.

This is the prime example of PowerPoint illiteracy. Slides are meant to support your message, they should not overwhelm everyone with an exhaustive summary of the course content. Do you want to distribute your course notes? Awesome, just publish them online, thank you. They would form a terrific summary for later reference.

But please, do not project these slides. They're the reason I won't be in your class today.

With kind regards,

Thor Galle

PS: check out this TED talk. It might help you.

January 21, 2018

Technology extension opportunities in real life

It's striking how some insights from school can slip into daily life when you least expect them to. Today my sister showed off a Brother LW-20, an electronic typerwriter. She had rescued it from our grandparents who wanted to discard it. It's a nice example of how a new technology (the computer) substitutes an old one (the typewriter), something I learned about recently.

When hearing "typewriter", I would think about the mechanical type with pounding metal letter sticks. Yet, this particular model was on the market somewhere in the 90's: a period were computers were growing fast. A computer had more possibilities concerning word processing than typewriters, Microsoft Word was already available in the 80's.

And that's where the insight striked: according to a paper of Adner & Kapoor (2015), old technologies can still survive for some time while better technologies are available, given that there is an extension opportunity for the old technology. More specifically, some innovations for the new technology can be "spilled back" to the old one.

This seems to be the case for the Brother: a German Wikipedia article tells us it was marketed as an affordable, dedicated word processor. It had some "modern" features like a screen to edit text on and compatibility with floppy storage for documents: features that got spilled over from computers.

It's the closest thing to a computer my grandparents have ever owned, and at the same time, it represented the last gasp of the typewriter.

October 27, 2017

Whatever happened to Google Tasks

As a meek Google user, I've been sporting their Calendar for years. Years in which I have used a feature that is being casted into obscurity: Google Tasks. It once started as a fancy addition to Calendar. Now it seems to be present only for legacy purposes:

  • You have to activate it by activating a separate Tasks "calendar".
  • The tasks overview still has the calendar styling from 2009.
  • You can't display them at the same time as Reminders. You have to "switch". One of the most clumsy UX things I've seen from Google.

Then why not only use Reminders? There is no overview of Reminders on the web version. Now let this just be the most useful way to keep an eye on long-term deadlines. I will stick with the 2009 tasks, until the reminder overview jumps from the phone to the desktop. Or until the plug gets pulled, of course.

October 10, 2017

A week at Hackages

Last week I got the opportunity to join a TypeScript/Angular/Ionic training at Hackages in Brussels. They're a consulting and teaching company specialized in modern web technologies.

The training followed a learning-by-doing methodology. We were submerged in a continuous exercise session, of which a large part revolved around fixing common mistakes planted in example code. Sometimes important concepts were briefly explained & demoed, but mostly we tried to fulfill the requirements of an incomplete program, with little pointers on how to code these.

So, we were still scouring the web for documentation, like you would when learning a new language on your own. The difference with self-study is: 1) you can ask when you're really stuck for a while 2) you're doing pair programming.

This is generally a valuable learning approach. But I have my critiques:

  • The line between learning by fixing mistakes & being frustrated by annoying bugs is thin.
  • Maybe this is not the best way to learn "good practices": the internet or your intuition does not always point in the right direction.
August 23, 2017

Ideating the logs tab

Sometimes I want to share thought that doesn't fit in a single tweet, nor deserves a full-blown blog article.

That's why I started this section of my site, where I'll log these thoughts or whatever they might be. You'll be able to scroll right through, every log should fit on a large screen in its entirety.