How to: Create a Batch Loop in Data Workflow

Overview

Sometimes you'll want to perform one action on a large group of data at once. Think of working with a massive data set that you want to turn into Unqork submissions. You'll use one of Unqork's internal APIs (application programming interfaces) to do that. You know that the API call will be the same to create each submission. But a Create Submissions API call has a limit, allowing for the creation of only 50 submissions at a time. So, you'll need to group your data into sets of 50 before sending it through the API call. To do that, you'll build a batch loop Data Workflow.

What You'll Learn

In this article, you'll learn:

What is a Batch Loop?

A batch loop Data Workflow works by separating your data into groups of a certain size. You can set the number of items you want in each batch while configuring your Data Workflow. Then, the Data Workflow separates your data and assigns each group its own index. It's this group index the looping Data Workflow then references while running.

Let's look more closely at the Create Submissions API call example above. Say you have a data set of Fortune 500 companies. That data set includes the company's name, their rank, their revenue, and their profit. You want to turn each of these companies into its own Unqork submission. The easiest way to do this is to send your data through an API call in batches. So, you'll configure one Data Workflow to separate out 50 companies at a time. Then, you'll configure a second Data Workflow to send each batch of 50 through a Create Submissions API call.

Here's how your module will look in the Module Builder:

Your completed use case won't show anything in Express View because the loop happens behind the scenes. So, you'll only see activity in the DevTools Console. Here's how that will look:

What You'll Need

For this use case, you'll need:

  • 1 Data Table

  • 2 Hidden components

  • 1 Initializer component

  • 2 Data Workflow components

To set up your first Data Workflow, you'll need:

  • 2 Input operators

  • 1 Create Index operator

  • 1 Create Field operator

  • 1 Size operator

  • 3 Formula operators

  • 1 Decision operator

  • 1 Convert Value operator

  • 1 Filter operator

  • 4 Output operators

  • 4 Console operators

To set up your second Data Workflow, you'll need:

  • 1 Input operator

  • 1 Console operator

  • 1 Output operator

NOTE  These instructions assume you have a new module open, saved, and with a title.

Configuration

Configure the Data Table Component

The Data Table you'll use to build this use case is quite long. Normally, you'd use a batch loop Data Workflow to handle your submissions. So, to simulate that, this same Data Table has 500 rows. To make things easier, we've created one you can copy into your own module.

1. Open the sample data module here: https://training.unqork.io/#/form/6086c37bb08e7e0a26942e1f/edit.
2. Hover over the dtFortuneFive Data Table.

A 5-button toolbar appears above the component on hover-over.

3. Holding Command or Control on your keyboard, click the (Settings) button. This copies the component to your clipboard.
4. In your own module, click the More + button next to your module's title.
5. Under Clipboard, click Paste.
6. In the Paste Contents Here field, type Command + V or Control + V. This pastes the component from your clipboard.
7. Click OK.

You'll see the dtFortuneFive Data Table added to your canvas. You can open this component to verify the data copied successfully. Your Data Table will have 500 rows and should look like this:

Configure the First Hidden Component

Each row in your table has a number assigned to it. That number is the index. In a simple looping Data Workflow, this is the index you'd reference in your Data Workflow. When looping in batches, you'll assign one index to several rows of your data. This is how your Data Workflow processes batches of data at once. So, you'll use a Hidden component to keep track of which group was last processed. Just like with a simple looping Data Workflow, you'll set the Default Value to 0. And later, you'll add logic that adds to this as your submissions are created.

1. Drag and drop a Hidden component onto your canvas. Place your Hidden component below your Data Table.
2. Enter indexLoopBulk in the Property ID and Canvas Label Text fields.
3. Enter 0 in the Default Value field.
4. Click Save.

Configure the First Data Workflow

Next, let's start building out your loop. You'll manage that with a Data Workflow component. The configuration here may look complex, but we'll explain what each operator does as you add it.

1. Drag and drop a Data Workflow component onto your canvas. Place your Data Workflow below your Hidden component.
2. Enter dwfLoopGroup in the Canvas Label Text and Property Name fields.

NOTE  This image shows how your Data Workflow will look at the end of this configuration. You'll add some operators shown here later.

Configure the First Input Operator

This Input operator references your indexLoopBulk Hidden component. This is how your Data Workflow knows which group of data needs to be processed next.

1. Drag and drop an Input operator onto your Data Workflow canvas.
2. Configure the Input operator's Info window as follows:

Setting

Value

Category

Input

Component

indexLoopBulk

Required

No

Source

Default

Configure the First Console Operator

You'll add several Console operators during this configuration. These come in handy so you can see what's happening behind the scenes. You can reference these in the DevTools Console in Express View for troubleshooting purposes. This Console lets you view the current index being processed.

1. Drag and drop a Console operator onto your Data Workflow canvas.
2. Configure the Console operator's Info window as follows:

Setting

Value

Category

Console

Label

Current Index

3. Connect the output port (right) of the Input operator to the input port (left) of the Console operator.

Configure the First Formula Operator

This Formula operator adds 1 to the index of the previous index group processed. This is how your Data Workflow notes that it's processed each group for when the loop comes back around. The formula =SUM(A,1) takes the value passed to the operator (A) and adds 1 to it. Using A acts as an alias for values entering an operator's input port.

1. Drag and drop a Formula operator onto your Data Workflow canvas.
2. Configure the Formula operator's Info window as follows:

Setting

Value

Category

Formula Value

Label

Iterate

Formula/Expression

=SUM(A,1)

3. Connect the output port (right) of the Input operator to the input port (left) of the Formula operator.

Configure the First Output Operator

This Output operator replaces the value in your indexLoopBulk Hidden component with the result of your Formula operator. If you didn't do this, your Data Workflow would always process the index group 0 because that's the Hidden component's Default Value. Now with each loop of the Data Workflow, the value coming into the indexLoopBulk Input operator will increase by 1.

1. Drag and drop an Output operator onto your Data Workflow canvas.
2. Configure the Output operator's Info window as follows:

Setting

Value

Category

Output

Component

indexLoopBulk

Action

value

3. Connect the output port (right) of the Formula operator to the input port (left) of the Output operator.

Configure the Second Input Operator

This Input operator is what brings your actual data into your Data Workflow.

1. Drag and drop another Input operator onto your Data Workflow canvas.
2. Configure the Input operator's Info window as follows:

Setting

Value

Category

Input

Component

dtFortuneFive

Required

No

Source

Default

Configure the Create Index Operator

When processing items one at a time, you can use the index of each row in your table. But to process items as a group, you'll need to create a new index, one that refers to a group of items together. This process takes a few operators to complete. To start, let's create a new index that you can manipulate. For that, you'll add a Create Index operator.

1. Drag and drop a Create Index operator onto your Data Workflow canvas.
2. Configure the Create Index operator's Info window as follows:

Setting

Value

Category

Create Index

Label

Indexer

Index Name

indexer

Starting Index

0

Keys

 

3. Connect the output port (right) of the dtFortuneFive Input operator to the input port (left) of the Create Index operator.

Configure the Create Field Operator

Once your data has passed through the Create Index operator, each row now has a new field called indexer. This gives you a version of your index you can work with. So, let's assign each row in your table to a group. You'll store this index assignment in a new field using a Create Field operator.

Remember, each group can hold 50 items. So, you'll divide the value in the indexer field by 50. You'll also want to round that down to the nearest integer so the same value is assigned to 50 items. You'll store the result of this equation in a new field called indexGrp. So, your entire formula will be indexGrp=INT(indexer/50).

1. Drag and drop a Create Field operator onto your Data Workflow canvas.
2. Configure the Create Field operator's Info window as follows:

Setting

Value

Category

Formula

Label

Index Group

Do Not Sanitize Formula

Yes (checked)

Field 1

indexGrp=INT(indexer/50)

Field 2

 

Field 3

 

Field 4

 

Field 5

 

3. Connect the output port (right) of the Create Index operator to the input port (left) of the Create Field operator.

Configure the Second Console Operator

This is the second Console in this configuration. This Console lets you see the new group indexes assigned to your data.

1. Drag and drop another Console operator onto your Data Workflow canvas.
2. Configure the Console operator's Info window as follows:

Setting

Value

Category

Console

Label

Index Group

3. Connect the output port (right) of the Create Field operator to the input port (left) of the Index Group Console operator.

Configure the Size Operator

To split your data into groups of 50, you'll first need to determine how many total groups you'll have. The first step in doing that is to find how large your data table is. You'll use a Size operator to find that.

1. Drag and drop a Size operator onto your Data Workflow canvas.
2. Configure the Size operator's Info window as follows:

Setting

Value

Category

Size

Label

Total Size of Array

3. Connect the output port (right) of the dtFortuneFive Input operator to the input port (left) of the Size operator.

Configure the Second Formula Operator

Next, you'll need to divide the size of your table by the intended size of your groups. In this case, we want each group to hold 50 pieces of data. So, you'll use the formula =INT(A/50). Here, A refers to the data passed through the operator's input port. And =INT turns the resulting value into an integer. You'll want an integer because you'll always have a whole number of indexes.

1. Drag and drop another Formula operator onto your Data Workflow canvas.
2. Configure the Formula operator's Info window as follows:

Setting

Value

Category

Formula Value

Label

Grouped Index

Formula/Expression

=INT(A/50)

3. Connect the output port (right) of the Size operator to the input port (left) of the Grouped Index Formula operator.

Configure the Third Formula Operator

Just like in a simple looping Data Workflow, you need to construct a way to know if there is more data to process or if you've reached the end of your data set. To do that, you'll use another Formula operator. You'll use the formula =CONCATENATE(A,"<=",_arg) to do that. Here, A is the value in your indexLoopBulk Hidden component. And _arg is the value found by your groupedIndex Formula operator. Later, you'll use a Decision operator to find whether this statement is true or not.

1. Drag and drop another Formula operator onto your Data Workflow canvas.
2. Configure the Formula operator's Info window as follows:

Setting

Value

Category

Formula Value

Label

Decision Argument

Formula/Expression

=CONCATENATE(A,"<=",_arg)

3. Connect the output port (right) of the indexLoopBulk Input operator to the input port (left) of the Decision Argument Formula operator.
4. Connect the output port (right) of the Grouped Index Formula operator to the argument port (top) of the Decision Argument Formula operator.

Configure the Third Console Operator

This is the third Console in this configuration. This Console lets you view the result of your Decision Argument Formula operator.

1. Drag and drop another Console operator onto your Data Workflow canvas.
2. Configure the Console operator's Info window as follows:

Setting

Value

Category

Console

Label

Index Argument

3. Connect the output port (right) of the Decision Argument Formula operator to the input port (left) of the Index Argument Console operator.

Configure the Decision Operator

The Decision operator decides whether there are more submissions to create. It does this by looking at the expression created by your Decision Argument Formula operator. If this expression is true, the Decision passes your data through the upper output port. If this expression is false, the Decision passes your data through the lower output port. So, your Create Field operator serves as the input here, passing along data from your dtFortuneFive Data Table. And your Decision Argument Formula operator serves as the argument.

Let's take a moment to explain why you're having the Decision operator check whether the statement A<=_arg is true. You need a way to tell your Data Workflow when to stop the loop. It should stop when there are no more rows to process in your Data Table. Or in this case, no more batches to process. So, you're creating a way for the Data Workflow to check if the number stored in the index Hidden component is less than your total number of grouped indexes. That's what A<=_arg represents here.

After the first loop of the Data Workflow, the value stored in the indexLoopBulk Hidden component is 1. So, on the next loop, the Data Workflow processes the second group (index 1). Your formula includes <= so it can account for any data that may not fit evenly into your batches of 50. If you have a number of rows that don't divide neatly into your batches, you'll want to go one index further during your loop. And including that equals sign here lets you do that. And once the statement returns as false, you now have a way to stop the Data Workflow.

1. Drag and drop a Decision operator onto your Data Workflow canvas.
2. Configure the Decision operator's Info window as follows:

Setting

Value

Category

Decision

Input List

 

Condition

_arg

3. Connect the output port (right) of the Index Group Create Field operator to the input port (left) of the Decision operator.
4. Connect the output port (right) of the Decision Argument Formula operator to the argument port (top) of the Decision operator.

Configure the Convert Value Operator

Next, we want to filter the data based on the index group currently being processed. To do that, you'll need to reference that index group number. You can do this by converting the value in your indexLoopBulk Hidden component to a number. You'll use a Convert Value operator for that.

1. Drag and drop a Convert Value operator onto your Data Workflow canvas.
2. Configure the Convert Value operator's Info window as follows:

Setting

Value

Category

Convert to Value

Label

Index to Number

Cast To

Number

3. Connect the output port (right) of the indexLoopBulk Input operator to the input port (left) of the Convert Value operator.

Configure the Filter Operator

Now, let's add your Filter operator. Here, you'll filter out the data in the index group you're currently processing. So, you'll set the upper output port of your Decision operator as the input. Then, you'll set the Convert Value operator as the argument. Finally, you'll set your Expression as indexGrp=_arg so the operator looks for the data that has an indexGrp value matching the argument. Any data that matches passes through the upper output port of the Filter operator. And any data that doesn't match passes through the lower output port.

1. Drag and drop a Filter operator onto your Data Workflow canvas.
2. Configure the Filter operator's Info window as follows:

Setting

Value

Category

Filter

Label

indexGrp=_arg

Do Not Sanitize Formula

Yes (checked)

Expression

indexGrp=_arg

3. Connect the output port (right) of the Decision operator to the input port (left) of the Filter operator.
4. Connect the output port (right) of the Convert Value operator to the argument port (top) of the Filter operator.
5. Click Save.

Configure the Second Hidden Component

Before we configure the rest of your Data Workflow, let's add a place to store its result. You'll use another Hidden component for that. This Hidden component will store each group of data as it's retrieved by your Filter operator. Your loop overwrites this each time it runs. So, this will only ever hold the group of data you're currently processing.

1. Drag and drop another Hidden component onto your canvas. Place your Hidden component below your Data Workflow.
2. Enter selectedGroup in the Property ID and Canvas Label Text fields.
3. Click Save.

Update the First Data Workflow Component

Now that you have a Hidden component to hold your output, let's add that to your Data Workflow. This tells your Filter operator where to store the data it retrieves. You'll later reference this data in a separate Data Workflow. That second Data Workflow processes your actual operation. In this use case, that's where you'd actually create your submissions. This first Data Workflow only manages your loop.

1. Hover over the dwfLoopGroup Data Workflow.

A 5-button toolbar appears above the component on hover-over.

2. Using the toolbar, click the (Settings) button.

Configure the Second Output Operator

This Output operator shows your Filter operator where to store the data it retrieves. So, you'll select the selectedGroup Hidden component you just added.

1. Drag and drop another Output operator onto your Data Workflow canvas.
2. Configure the Output operator's Info window as follows:

Setting

Value

Category

Output

Component

selectedGroup

Action

value

3. Connect the upper output port (right) of the Filter operator to the input port (left) of the selectedGroup Output operator.

Configure the Fourth Console Operator

This is the Fourth Console operator in this configuration. This Console operator shows the data separated out by the Filter operator. You can use this to view what's passed to your selectedGroup Hidden component.

1. Drag and drop another Console operator onto your Data Workflow canvas.
2. Configure the Console operator's Info window as follows:

Setting

Value

Category

Console

Label

Rows

3. Connect the upper output port (right) of the Filter operator to the input port (left) of the Rows Console operator.
4. Click Save.

Configure the Second Data Workflow

Now that you have your loop configured, you can set up a Data Workflow to perform your operation. In this use case, this is where you would configure the creation of your submissions.

NOTE  This article is focused on the looping aspect of a Data Workflow. So, we won't build the entire logic to create submissions. Instead, we'll focus on what you need in this second Data Workflow so your loop runs.

1. Drag and drop another Data Workflow component onto your canvas. Place your Data Workflow below your selectedGroup Hidden component.
2. Enter dwfOperationGroup in the Canvas Label Text and Property Name fields.

Configure the Input Operator

This Input operator brings in the data stored in your selectedGroup Hidden component. When creating your submissions, you would reference this instead of your larger data table. That's because this Hidden component holds only the data for one group of 50 entries at a time.

1. Drag and drop an Input operator onto your Data Workflow canvas.
2. Configure the Input operator's Info window as follows:

Setting

Value

Category

Input

Component

selectedGroup

Required

Yes

Source

Default

Configure the Console Operator

This is the only Console operator you'll add in this Data Workflow. This Console lets you see the data pulled into your Data Workflow from the selectedGroup Hidden component.

1. Drag and drop a Console operator onto your Data Workflow canvas.
2. Configure the Console operator's Info window as follows:

Setting

Value

Category

Console

Label

Create Module Submissions

3. Connect the output port (right) of the Input operator to the input port (left) of the Console operator.

Configure the Output Operator

Next, you'll add an Output operator here to trigger your first Data Workflow to run again once your submissions are created.

1. Drag and drop an Output operator onto your Data Workflow canvas.
2. Configure the Output operator's Info window as follows:

Setting

Value

Category

Output

Component

dwfLoopGroup

Action

trigger

3. Connect the output port (right) of the Input operator to the input port (left) of the Output operator.
4. Click Save.

Update the First Data Workflow

Now that you have a Data Workflow configured to run your operation. Let's configure your first Data Workflow to trigger it. In this step, we'll also add a way to stop your loop once you've created all of your submissions.

1. Hover over the dwfLoopGroup Data Workflow.

A 5-button toolbar appears above the component on hover-over.

2. Using the toolbar, click the (Settings) button.

Configure the Third Output Operator

You'll need to tell your second Data Workflow that the data it needs is ready. So, you'll use an Output operator to tell the second Data Workflow to run.

1. Drag and drop another Output operator onto your Data Workflow canvas.
2. Configure the Output operator's Info window as follows:

Setting

Value

Category

Output

Component

dwfOperationGroup

Action

trigger

3. Connect the upper output port (right) of the Filter operator to the input port (left) of the dwfOperationGroup Output operator.

Configure the Fourth Output Operator

Remember that your Decision operator knows whether there are more submissions to create. If there aren't, the Decision passes data through its lower output port. So, let's add an Output operator there to stop the looping process.

1. Drag and drop another Output operator onto your Data Workflow canvas.
2. Configure the Output operator's Info window as follows:

Setting

Value

Category

Output

Component

_self

Action

value

3. Connect the lower output port (right) of the Decision operator to the input port (left) of the _self Output operator.
4. Click Save.

Configure the Initializer Component

Finally, let's add an Initializer component to start the whole operation. You'll set this component to trigger your looping Data Workflow.

1. Drag and drop an Initializer component onto your canvas. Place your Initializer below your Data Table.
2. Enter initLoop in the Property ID and Canvas Label Text fields.
3. Select New Submission as the Trigger Type.
4. In the Outputs table, enter the following:
Property ID Type Value

dwfLoopGroup

trigger

GO

5. Click Save.
6. Save your module.

Now you can check your work. Preview your module in Express View and open the DevTools Console. You'll see a lot of activity from your various Consoles. That shows your loop is working properly. Let's look at some key points.

First, you'll see your Index Group Console. This is where your Data Workflow assigns the index group. So, the first 50 items show an indexGrp of 0, the next 50 show an indexGrp of 1, and so on. You'll see this in the image below:

From there, you'll see your Current Index Console. This shows which indexGrp is being processed. In the below image, you'll see the current indexGrp is 0. Next, you'll see your Index Argument and your Rows Console. These show how your Data Workflow checked to see if there was more data to process, as well as the data as it's passed through your Filter operator. Here, you'll only see 50 pieces of data at a time. That's because you're only seeing each group as it passes through this portion of your Data Workflow.

And finally, you'll see your Create Module Submissions Console. This shows that your batch of data has made it to your second Data Workflow. You'll see each of these steps repeated until all your data has been processed.

Lab

You can view this complete use case here: https://training.unqork.io/#/form/6081961fd455ab3509979026/edit.

Best Practices

  • Data Workflows timeout after 5 minutes in all environments. Build Batch Loop Data Workflows to complete operations within 5 minutes to prevent timeouts.

  • If you don't plan to use disabled components in your application, remove them to ensure optimal performance. Remember to check all active components that connect to disabled components. Ensure active components still function properly after you remove the disabled ones.

  • Add labels to all Data Workflow operators to describe their function. These labels make it easier to know the purpose of an operator without having to open the Info window.

  • Select the Do Not Sanitize setting in all your operators to improve application performance.

  • Organize Data Workflow components based on their function in your application.

  • Use the component's Notes tab to comment on complex data processes. Add notes to explain what components are being triggered, trigger types, and the importance of each component.