Regex in Unqork
Overview
When working with end-user input fields, you'll want to control what's entered into them. One way to do that is by using Regex. Regex, or Regular Expression, is a complex set of rules that control the patterns of strings as they're entered into a field.
Regex is a concept that exists outside of Unqork, but you can use it in your Unqork application in a few ways.
What You'll Learn
After completing this
What is Regex?
Regex is a character sequence that describes what pattern a string must follow. You've probably encountered Regex before without even realizing it. One of the most common places you find Regex is signing up for a website. You're asked to enter a password with specific characters. If you enter a password that doesn't meet the specified criteria, you're given an error. This is all possible thanks to Regex. You can even use Regex to ensure that an email address entered is an actual email address.
Regex Syntax
Regex can be confusing when you first look at it. You might even think someone ran their hands across the keyboard. But like everything you learned until now, Regex has a syntax you can follow. This syntax keeps your Regex effective and consistent. Here's a look at the Regex used to check for a valid US email address:
/^([a-zA-Z0-9._%-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,6})*$/
Let's break it down.
We know that email addresses always have a couple of things in common. The most famous is the @ sign. You'll find this around the middle of the whole address. So, we know the Regex is looking for a pattern something like _________@____ in the email address string. Now, look at that sample Regex again and see if you can spot where this comes in:
/^([a-zA-Z0-9._%-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,6})*$/
To be more specific, you'll notice the @ sign separates two sets of square brackets. And there's also a third set of square brackets towards the end of the expression, this one set off by a period. Let's simplify this expression:
[ ] @ [ ] . [ ]
Things are starting to resemble an email address. Let's keep going, this time focusing on that first set of square brackets and what it contains:
[a-zA-Z0-9._%-]
Here, we list what characters are allowed in this section of the email address. Regex uses the ASCII code index to control the following ranges:
Display | ASCII Range | Character Range | Notes |
---|---|---|---|
a-z |
97-122 |
a-z |
Case sensitive, lower case |
A-Z |
65-90 |
A-Z |
Case sensitive, upper case |
0-9 |
48-57 |
0-9 |
Case sensitive, numeric (lower) |
You can find the full ASCII code index here: https://www.ascii-code.com/.
The a-zA-Z0-9 in that bracket states that any letter (uppercase or lowercase) and any number is valid. But what about the ._%-? The period here indicates a set of specific allowed characters rather than a range. Your end-user is also allowed to type an underscore (_), a percentage sign (%), or a hyphen (-) in the first part of their email.
The last piece to note here is the plus sign (+) that follows the first set of square brackets. This means the preceding section can have as many characters that match the criteria as possible but that it must have at least one character.
So, let's recap the requirements for the first bracketed section. Our Regex expression states that the string:
-
Can include any uppercase or lowercase letter.
-
Can include any number.
-
Can include underscores, percentage signs, and hyphens.
-
Must include at least one of the above characters.
That covers everything before the @ in an email address! From there, we have the @ itself, which is looking for a literal match. (You can tell this because there are no other characters around it in the regular expression.) And after that, we have our second set of square brackets:
[a-zA-Z0-9.-]
This looks a lot like what we saw in the first set of square brackets. The only difference is the removal of underscores and percentage signs. This means these characters are not allowed in this portion of the email address.
Moving right along, we see a \. combination. Before, a period indicated a specific list of characters would follow it. Here, it means something different. By placing a backslash before a period, you're telling the Regex to treat it like a literal period. In other words, the \. means you're looking for a period at this point in the email address. And that brings us to our last set of square brackets:
[a-zA-Z]
This, again, looks a lot like our other two sets of brackets. Here, though, we're only allowing for uppercase and lowercase letters. No numbers or special characters here! This represents the .com, .net, .org, .etc part of an email address. We know authentic emails only have letters here, so our Regex doesn't allow anything else. And the next part of our regular expression ties right into that too:
{2,6}
Notice how this section uses curly braces instead of square brackets. So, it's not specifying acceptable characters. Instead, it indicates the number of characters allowed in brackets that precede it. This means the portion of the email address after the final period must be between two and six characters long.
From there, the rest is standard Regex housekeeping:
-
The / at the beginning and end of regular expression note that this is a regular expression.
-
The ^ at the second character notes this is the start of the expected string.
-
The parentheses note that what's enclosed should be treated as one whole string.
-
The * near the end means this regular expression can be used 0 times or as many times as needed. So, you could have 0 valid email addresses or as many valid email addresses as you want.
-
The $ ends the Regex string.
With all that in mind, take one more look at the entire Regex string and break it down into the parts you just read:
If you want a shortcut to breaking down Regex syntax in the future, visit https://regex101.com/ and add it to your bookmarks.
Regex can do more than just verify an email address. For a full breakdown on Regex syntax, visit https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions/Cheatsheet. And for syntax specific to querying MongoDB, visit https://docs.mongodb.com/manual/reference/operator/query/regex/.
Using Regex in Unqork
Now that you know the basics of Regex, you can use it in Unqork. You have two options for how to use Regex while configuring:
-
Using the Regular Expression Pattern settings in Unqork components.
-
Using the REGEXMATCH formula in your Calculators and Data Workflows.
Native Unqork Components
Some of the most common places that use Regex in Unqork are hidden in plain sight. By now, you've probably used an Email or a Phone Number component. Have you ever wondered how these components control what's entered into them? The answer is Regex. In these instances, it all happens behind the scenes.
Sometimes you want to customize the restrictions on certain fields, though. For that, you have other options. For example, you can customize a Text Field component.
The Text Field has a dedicated setting where you can add your custom Regex. In Advanced, you can see a setting called Regular Expression Pattern (Regex).
Here, you can add the regular expression that fits your needs. If your end-user enters text that doesn't meet these requirements, they see an error. (You can set a custom error message here, too.)
This is a designated Regex setting. So, you don't need the opening or closing forward slashes when creating your expression.
Here's what the configuration window looks like with the Regex set to check for a proper email:
And here's what you see in Express View if something other than an email is entered:
Regex Formulas
Another option for using Regex in Unqork is to use Regex formulas. REGEXMATCH is a formula you can use in components to check a string against a regular expression. If you're familiar with using formulas in Calculators and Data Workflow components, you can add the REGEXMATCH formula to your list of go-to formulas. This is what the REGEXMATCH formula looks like:
REGEXMATCH(stringToCheck,regexPattern)
In this formula, regexPattern is also specifically used for regular expressions. So, you'll omit the opening and closing slashes.
REGEXMATCH works like any other formula. In a Data Workflow you can use Formula or Create Value operators to check a single value. Or, you can use Create Field to check a field across an entire table. Both stringToCheck and regexPattern can be hard coded directly into the formula. Or, you can set them dynamically using the argument ports and inserting A into the formula itself. Either way, your result is always a boolean: true, indicating a match, or false, indicating no match.
There's also a second formula called REGEXTRACT. REGEXTRACT is a formula that returns part of a string that matches a regular expression. So, let's say you only want to pull the portion of an email that comes before the @ sign. You can do that using the REGEXTRACT formula. This is what the REGEXTRACT formula looks like:
REGEXEXTRACT(stringToCheck,regexPattern)
The syntax is the same as REGEXMATCH. But instead of a boolean, the output is part of the string matching your regular expression. If no part of the string matches, your output is a null value.