How to: Create a Batch Loop in Data Workflow

Prev Next

Overview

Create a Batch Loop in a Data Workflow to  perform one action on a large group of data at once. Think of working with a massive data set that you want to turn into Unqork submissions. You'll use one of Unqork's internal APIs (application programming interfaces) to do that. You know that the API call will be the same to create each submission. But a Create Submissions API call has a limit, allowing for the creation of only 50 submissions at a time. So, you'll need to group your data into sets of 50 before sending it through the API call. To do that, you'll build a batch loop Data Workflow.

What is a Batch Loop?

A batch loop Data Workflow works by separating your data into groups of a certain size. You can set the number of items you want in each batch while configuring your Data Workflow. Then, the Data Workflow separates your data and assigns each group its own index. It's this group index the looping Data Workflow then references while running.

Let's look more closely at the Create Submissions API call example above. Say you have a data set of Fortune 500 companies. That data set includes the company's name, their rank, their revenue, and their profit. You want to turn each of these companies into its own Unqork submission. The easiest way to do this is to send your data through an API call in batches. So, you'll configure one Data Workflow to separate out 50 companies at a time. Then, you'll configure a second Data Workflow to send each batch of 50 through a Create Submissions API call.

Here's how your module will look in the Module Builder:

Your completed use case won't show anything in Express View because the loop happens behind the scenes. So, you'll only see activity in the DevTools Console. Here's how that will look:

What You'll Need

For this use case, you need:

To set up your first Data Workflow, you need:

To set up your second Data Workflow, you need:

These instructions assume you have a new module open, saved, and with a title.

Configuration

Configure the Data Table Component

The Data Table you'll use to build this use case is quite long. Normally, you'd use a batch loop Data Workflow to handle your submissions. So, to simulate that, this same Data Table has 500 rows. To make things easier, we've created one you can copy into your own module.

  1. Open the sample data module here: https://training.unqork.io/#/form/6086c37bb08e7e0a26942e1f/edit.

  2. Hover over the dtFortuneFive Data Table component.

    A 5-button toolbar appears above the component on hover-over.

  3. Holding Command or Control on your keyboard, click the   (Settings) button. This copies the component to your clipboard.

  4. In the upper-right corner of your own module, click (Options) button .

  5. Click Paste Module Definition.

  6. In the Module Definition field, type Command + V or Control + V. This pastes the component from your clipboard.

  7. Click Paste.

You'll see the dtFortuneFive Data Table added to your canvas. You can open this component to verify the data copied successfully. Your Data Table will have 500 rows and should look like this:

Configure the indexLoopBulk Hidden Component

Each row in your table has a number assigned to it. That number is the index. In a simple looping Data Workflow, this is the index you'd reference in your Data Workflow. When looping in batches, you'll assign one index to several rows of your data. This is how your Data Workflow processes batches of data at once. So, you'll use a Hidden component to keep track of which group was last processed. Just like with a simple looping Data Workflow, you'll set the Default Value to 0. And later, you'll add logic that adds to this as your submissions are created.

  1. Drag and drop a Hidden component onto your canvas, placing it below the dtFortuneFive Data Table component.

  2. In the Property ID field, enter indexLoopBulk.

  3. In the Canvas Label Text field, enter indexLoopBulk.

  4. In the component's configuration menu, click Data.

  5. In the Default Value field, enter 0.

  6. Click Save & Close.

Configure the dwfLoopGroup Data Workflow

Next, let's start building out your loop. You'll manage that with a Data Workflow component. The configuration here may look complex, but we'll explain what each operator does as you add it.

  1. Drag and drop a Data Workflow component onto your canvas, placing it below the indexLoopBulk Hidden component.

  2. In the Canvas Label Text and Property Name fields, enter dwfLoopGroup.

    This image shows how your Data Workflow will look at the end of this configuration. You'll add some operators shown here later.

Configure the First Input Operator

This Input operator references your indexLoopBulk Hidden component. This is how your Data Workflow knows which group of data needs to be processed next.

  1. Drag and drop an Input operator onto the Data Workflow canvas.

  2. Configure the Input operator's Info window as follows:

    Setting

    Value

    Category

    Input

    Component

    indexLoopBulk

    Required

    No

    Source

    Default

Configure the First Console Operator

You'll add several Console operators during this configuration. These come in handy so you can see what's happening behind the scenes. You can reference these in the DevTools Console in Express View for troubleshooting purposes. This Console lets you view the current index being processed.

  1. Drag and drop a Console operator onto the Data Workflow canvas.

  2. Configure the Console operator's Info window as follows:

    Setting

    Value

    Category

    Console

    Label

    Current Index

  3. Connect the output port (right) of the indexLoopBulk Input operator to the input port (left) of the Current Index Console operator.

Configure the First Formula Operator

This Formula operator adds 1 to the index of the previous index group processed. This is how your Data Workflow notes that it's processed each group for when the loop comes back around. The formula =SUM(A,1) takes the value passed to the operator (A) and adds 1 to it. Using A acts as an alias for values entering an operator's input port.

  1. Drag and drop a Formula operator onto your Data Workflow canvas.

  2. Configure the Formula operator's Info window as follows:

    Setting

    Value

    Category

    Formula Value

    Label

    Iterate

    Formula/Expression

    =SUM(A,1)

  3. Connect the output port (right) of the Input operator to the input port (left) of the Iterate Formula operator.

Configure the First Output Operator

This Output operator replaces the value in your indexLoopBulk Hidden component with the result of your Formula operator. If you didn't do this, your Data Workflow would always process the index group 0 because that's the Hidden component's Default Value. Now with each loop of the Data Workflow, the value coming into the indexLoopBulk Input operator will increase by 1.

  1. Drag and drop an Output operator onto the Data Workflow canvas.

  2. Configure the Output operator's Info window as follows:

    Setting

    Value

    Category

    Output

    Component

    indexLoopBulk

    Action

    value

  3. Connect the output port (right) of the Iterate Formula operator to the input port (left) of the indexLoopBulk Output operator.

Configure the Second Input Operator

This Input operator is what brings the data into the Data Workflow.

  1. Drag and drop another Input operator onto the Data Workflow canvas.

  2. Configure the Input operator's Info window as follows:

    Setting

    Value

    Category

    Input

    Component

    dtFortuneFive

    Required

    No

    Source

    Default

Configure the Create Index Operator

When processing items one at a time, you can use the index of each row in your table. But to process items as a group, you'll need to create a new index, one that refers to a group of items together. This process takes a few operators to complete. To start, let's create a new index that you can manipulate. For that, you'll add a Create Index operator.

  1. Drag and drop a Create Index operator onto the Data Workflow canvas.

  2. Configure the Create Index operator's Info window as follows:

    Setting

    Value

    Category

    Create Index

    Label

    Indexer

    Index Name

    indexer

    Starting Index

    0

    Keys

  3. Connect the output port (right) of the dtFortuneFive Input operator to the input port (left) of the IndexerCreate Index operator.

Configure the Create Field Operator

Once your data has passed through the Create Index operator, each row now has a new field called indexer. This gives you a version of your index you can work with. So, let's assign each row in your table to a group. You'll store this index assignment in a new field using a Create Field operator.

Remember, each group can hold 50 items. So, you'll divide the value in the indexer field by 50. You'll also want to round that down to the nearest integer so the same value is assigned to 50 items. You'll store the result of this equation in a new field called indexGrp. So, your entire formula will be indexGrp=INT(indexer/50).

  1. Drag and drop a Create Field operator onto the Data Workflow canvas.

  2. Configure the Create Field operator's Info window as follows:

    Setting

    Value

    Category

    Formula

    Label

    Index Group

    Do Not Sanitize Formula

    Yes (checked)

    Field 1

    indexGrp=INT(indexer/50)

    Field 2

    Field 3

    Field 4

    Field 5

  3. Connect the output port (right) of the indexer Create Index operator to the input port (left) of the Index GroupCreate Field operator.

Configure the Second Console Operator

This is the second Console in this configuration. This Console lets you see the new group indexes assigned to your data.

  1. Drag and drop another Console operator onto the Data Workflow canvas.

  2. Configure the Console operator's Info window as follows:

    Setting

    Value

    Category

    Console

    Label

    Index Group

  3. Connect the output port (right) of the Index Group Create Field operator to the input port (left) of the Index Group Console operator.

Configure the Size Operator

To split your data into groups of 50, you'll first need to determine how many total groups you'll have. The first step in doing that is to find how large your data table is. You'll use a Size operator to find that.

  1. Drag and drop a Size operator onto the Data Workflow canvas.

  2. Configure the Size operator's Info window as follows:

    Setting

    Value

    Category

    Size

    Label

    Total Size of Array

  3. Connect the output port (right) of the dtFortuneFive Input operator to the input port (left) of the Total Size of Array Size operator.

Configure the Second Formula Operator

Next, you'll need to divide the size of your table by the intended size of your groups. In this case, we want each group to hold 50 pieces of data. So, you'll use the formula =INT(A/50). Here, A refers to the data passed through the operator's input port. And =INT turns the resulting value into an integer. You'll want an integer because you'll always have a whole number of indexes.

  1. Drag and drop a Formula operator onto the Data Workflow canvas.

  2. Configure the Formula operator's Info window as follows:

    Setting

    Value

    Category

    Formula Value

    Label

    Grouped Index

    Formula/Expression

    =INT(A/50)

  3. Connect the output port (right) of the Total Size of Array Size operator to the input port (left) of the Grouped Index Formula operator.

Configure the Third Formula Operator

Just like in a simple looping Data Workflow, you need to construct a way to know if there is more data to process or if you've reached the end of your data set. To do that, you'll use another Formula operator. You'll use the formula =CONCATENATE(A,"<=",_arg) to do that. Here, A is the value in your indexLoopBulk Hidden component. And _arg is the value found by your groupedIndex Formula operator. Later, you'll use a Decision operator to find whether this statement is true or not.

  1. Drag and drop another Formula operator onto the Data Workflow canvas.

  2. Configure the Formula operator's Info window as follows:

    Setting

    Value

    Category

    Formula Value

    Label

    Decision Argument

    Formula/Expression

    =CONCATENATE(A,"<=",_arg)

  3. Connect the output port (right) of the indexLoopBulk Input operator to the input port (left) of the Decision Argument Formula operator.

  4. Connect the output port (right) of the Grouped Index Formula operator to the argument port (top) of the Decision Argument Formula operator.

Configure the Third Console Operator

This is the third Console in this configuration. This Console lets you view the result of your Decision Argument Formula operator.

  1. Drag and drop another Console operator onto your Data Workflow canvas.

  2. Configure the Console operator's Info window as follows:

    Setting

    Value

    Category

    Console

    Label

    Index Argument

  3. Connect the output port (right) of the Decision Argument Formula operator to the input port (left) of the Index Argument Console operator.

Configure the Decision Operator

The Decision operator decides whether there are more submissions to create. It does this by looking at the expression created by your Decision Argument Formula operator. If this expression is true, the Decision passes your data through the upper output port. If this expression is false, the Decision passes your data through the lower output port. So, your Create Field operator serves as the input here, passing along data from your dtFortuneFive Data Table. And your Decision Argument Formula operator serves as the argument.

Let's take a moment to explain why you're having the Decision operator check whether the statement A<=_arg is true. You need a way to tell your Data Workflow when to stop the loop. It should stop when there are no more rows to process in your Data Table. Or in this case, no more batches to process. So, you're creating a way for the Data Workflow to check if the number stored in the index Hidden component is less than your total number of grouped indexes. That's what A<=_arg represents here.

After the first loop of the Data Workflow, the value stored in the indexLoopBulk Hidden component is 1. So, on the next loop, the Data Workflow processes the second group (index 1). Your formula includes <= so it can account for any data that may not fit evenly into your batches of 50. If you have a number of rows that don't divide neatly into your batches, you'll want to go one index further during your loop. And including that equals sign here lets you do that. And once the statement returns as false, you now have a way to stop the Data Workflow.

  1. Drag and drop a Decision operator onto the Data Workflow canvas.

  2. Configure the Decision operator's Info window as follows:

    Setting

    Value

    Category

    Decision

    Input List

    Condition

    _arg

  3. Connect the output port (right) of the Index Group Create Field operator to the input port (left) of the Decision operator.

  4. Connect the output port (right) of the Decision Argument Formula operator to the argument port (top) of the Decision operator.

Configure the Convert Value Operator

Next, we want to filter the data based on the index group currently being processed. To do that, you'll need to reference that index group number. You can do this by converting the value in your indexLoopBulk Hidden component to a number. You'll use a Convert Value operator for that.

  1. Drag and drop a Convert Value operator onto the Data Workflow canvas.

  2. Configure the Convert Value operator's Info window as follows:

    Setting

    Value

    Category

    Convert to Value

    Label

    Index to Number

    Cast To

    Number

  3. Connect the output port (right) of the indexLoopBulk Input operator to the input port (left) of the Index to Number Convert Value operator.

Configure the Filter Operator

Now, let's add your Filter operator. Here, you'll filter out the data in the index group you're currently processing. So, you'll set the upper output port of your Decision operator as the input. Then, you'll set the Convert Value operator as the argument. Finally, you'll set your Expression as indexGrp=_arg so the operator looks for the data that has an indexGrp value matching the argument. Any data that matches passes through the upper output port of the Filter operator. And any data that doesn't match passes through the lower output port.

  1. Drag and drop a Filter operator onto the Data Workflow canvas.

  2. Configure the Filter operator's Info window as follows:

    Setting

    Value

    Category

    Filter

    Label

    indexGrp=_arg

    Do Not Sanitize Formula

    Yes (checked)

    Expression

    indexGrp=_arg

  3. Connect the output port (right) of the Decision operator to the input port (left) of the indexGrp=_arg Filter operator.

  4. Connect the output port (right) of the Index to Number Convert Value operator to the argument port (top) of the indexGrp=_arg Filter operator.

  5. Click Save.

Configure the selectedGroup Hidden Component

Before we configure the rest of your Data Workflow, let's add a place to store its result. You'll use another Hidden component for that. This Hidden component will store each group of data as it's retrieved by your Filter operator. Your loop overwrites this each time it runs. So, this will only ever hold the group of data you're currently processing.

  1. Drag and drop a  Hidden component onto your canvas, placing it below the dwfLoopGroup Data Workflow component.

  2. In the Property ID and Canvas Label Text fields, enter selectedGroup.

  3. Click Save & Close.

Update the First Data Workflow Component

Now that you have a Hidden component to hold your output, let's add that to your Data Workflow. This tells your Filter operator where to store the data it retrieves. You'll later reference this data in a separate Data Workflow. That second Data Workflow processes your actual operation. In this use case, that's where you'd actually create your submissions. This first Data Workflow only manages your loop.

  1. Hover over the dwfLoopGroup Data Workflow component.

    A 5-button toolbar appears above the component on hover-over.

  2. Using the toolbar, click the   (Settings) button.

Configure the Second Output Operator

This Output operator shows your Filter operator where to store the data it retrieves. So, you'll select the selectedGroup Hidden component you just added.

  1. Drag and drop a Output operator onto the Data Workflow canvas.

  2. Configure the Output operator's Info window as follows:

    Setting

    Value

    Category

    Output

    Component

    selectedGroup

    Action

    value

  3. Connect the upper output port (right) of the indexGrp=_arg Filter operator to the input port (left) of the selectedGroup Output operator.

Configure the Fourth Console Operator

This is the Fourth Console operator in this configuration. This Console operator shows the data separated out by the Filter operator. You can use this to view what's passed to your selectedGroup Hidden component.

  1. Drag and drop another Console operator onto your Data Workflow canvas.

  2. Configure the Console operator's Info window as follows:

    Setting

    Value

    Category

    Console

    Label

    Rows

  3. Connect the upper output port (right) of the indexGrp=_arg Filter operator to the input port (left) of the Rows Console operator.

  4. Click Save.

Configure the Second Data Workflow

Now that you have your loop configured, you can set up a Data Workflow to perform your operation. In this use case, this is where you would configure the creation of your submissions.

This article is focused on the looping aspect of a Data Workflow. So, we won't build the entire logic to create submissions. Instead, we'll focus on what you need in this second Data Workflow so your loop runs.

  1. Drag and drop  Data Workflow component onto your canvas, placing it below the selectedGroup Hidden component.

  2. In the Canvas Label Text and Property Name fields, enter dwfOperationGroup.

Configure the Input Operator

This Input operator brings in the data stored in your selectedGroup Hidden component. When creating your submissions, you would reference this instead of your larger data table. That's because this Hidden component holds only the data for one group of 50 entries at a time.

  1. Drag and drop an Input operator onto the Data Workflow canvas.

  2. Configure the Input operator's Info window as follows:

    Setting

    Value

    Category

    Input

    Component

    selectedGroup

    Required

    Yes

    Source

    Default

Configure the Console Operator

This is the only Console operator you'll add in this Data Workflow. This Console lets you see the data pulled into your Data Workflow from the selectedGroup Hidden component.

  1. Drag and drop a Console operator onto your Data Workflow canvas.

  2. Configure the Console operator's Info window as follows:

    Setting

    Value

    Category

    Console

    Label

    Create Module Submissions

  3. Connect the output port (right) of the selectedGroup Input operator to the input port (left) of the Create Module Submissions Console operator.

Configure the Output Operator

Next, you'll add an Output operator here to trigger your first Data Workflow to run again once your submissions are created.

  1. Drag and drop an Output operator onto the Data Workflow canvas.

  2. Configure the Output operator's Info window as follows:

    Setting

    Value

    Category

    Output

    Component

    dwfLoopGroup

    Action

    trigger

  3. Connect the output port (right) of the selectedGroup Input operator to the input port (left) of the dwfLoopGroup Output operator.

  4. Click Save.

Update the First Data Workflow

Now that you have a Data Workflow configured to run your operation. Let's configure your first Data Workflow to trigger it. In this step, we'll also add a way to stop your loop once you've created all of your submissions.

  1. Hover over the dwfLoopGroup Data Workflow component.

    A 5-button toolbar appears above the component on hover-over.

  2. Using the toolbar, click the   (Settings) button.

Configure the Third Output Operator

You'll need to tell your second Data Workflow that the data it needs is ready. So, you'll use an Output operator to tell the second Data Workflow to run.

  1. Drag and drop another Output operator onto your Data Workflow canvas.

  2. Configure the Output operator's Info window as follows:

    Setting

    Value

    Category

    Output

    Component

    dwfOperationGroup

    Action

    trigger

  3. Connect the upper output port (right) of the indexGrp=_arg Filter operator to the input port (left) of the dwfOperationGroup Output operator.

Configure the Fourth Output Operator

Remember that your Decision operator knows whether there are more submissions to create. If there aren't, the Decision passes data through its lower output port. So, let's add an Output operator there to stop the looping process.

  1. Drag and drop another Output operator onto the Data Workflow canvas.

  2. Configure the Output operator's Info window as follows:

    Setting

    Value

    Category

    Output

    Component

    _self

    Action

    value

  3. Connect the lower output port (right) of the Decision operator to the input port (left) of the _self Output operator.

  4. Click Save.

Configure the Initializer Component

Finally, let's add an Initializer component to start the whole operation. You'll set this component to trigger your looping Data Workflow.

  1. Drag and drop an  Initializer component onto your canvas, placing it below the dtFortuneFive  Data Table.

  2. In the Property ID and Canvas Label Text fields, enter initLoop.

  3. In the component's configuration menu, click Actions.

  4. From the Trigger Type drop-down, select New Submission.

  5. In the Outputs table, enter the following:

    Property ID

    Type

    Value

    1

    dwfLoopGroup

    trigger

    GO

  6. Click Save & Close.

  7. Save your module.

Now you can check your work. Preview your module in Express View and open the DevTools Console. You'll see a lot of activity from your various Consoles. That shows your loop is working properly. Let's look at some key points.

First, you'll see your Index Group Console. This is where your Data Workflow assigns the index group. So, the first 50 items show an indexGrp of 0, the next 50 show an indexGrp of 1, and so on. You'll see this in the image below:

From there, you'll see your Current Index Console. This shows which indexGrp is being processed. In the below image, you'll see the current indexGrp is 0. Next, you'll see your Index Argument and your Rows Console. These show how your Data Workflow checked to see if there was more data to process, as well as the data as it's passed through your Filter operator. Here, you'll only see 50 pieces of data at a time. That's because you're only seeing each group as it passes through this portion of your Data Workflow.

And finally, you'll see your Create Module Submissions Console. This shows that your batch of data has made it to your second Data Workflow. You'll see each of these steps repeated until all your data has been processed.

Lab

You can view this complete use case here: https://training.unqork.io/#/form/6081961fd455ab3509979026/edit.

Best Practices

  • Data Workflows timeout after 5 minutes in all environments. Build Batch Loop Data Workflows to complete operations within 5 minutes to prevent timeouts.

  • If you don't plan to use disabled components in your application, remove them to ensure optimal performance. Remember to check all active components that connect to disabled components. Ensure active components still function properly after you remove the disabled ones.

  • Add labels to all Data Workflow operators to describe their function. These labels make it easier to know the purpose of an operator without having to open the Info window.

  • Select the Do Not Sanitize setting in all your operators to improve application performance.

  • Organize Data Workflow components based on their function in your application.

  • Use the component's Notes tab to comment on complex data processes. Add notes to explain what components are being triggered, trigger types, and the importance of each component.