Skip to content

Web Automation Actions

ThinkAutomation includes a set of web automation actions that allow your Automations to retrieve data from websites, send data to web services, work with APIs, and download files. These actions enable you to make HTTP requests, authenticate with external systems, post data (including JSON or files), convert and parse returned content, and securely handle OAuth authorization. They are commonly used for system integrations, web data retrieval, API workflows, and automated file transfers.

HTTP Get

Reads a http resource using HTTP GET or a local file path and assigns the returned HTML to a variable.

Use this Action to read any http resource (web page) or a local html file. Specify the URL Or File Path To Get (including any query string).

If the web resource requires authentication then specify the Authentication method and optionally a User Name/Password or an OAuth Auth Token retrieved from a previous OAuth SignIn action. Does not apply if reading from a local file path. You can also use Amazon AWS Signed Request authentication for signing AWS requests.

Optionally specify any Query String Parameters and Custom Headers to add to the request. Query string parameters can either be specified in the URL itself or in the Query String Parameters grid. If you specify query string parameters in the grid any %variable% replacements will be automatically URL encoded.

You can also optionally specify to Add To Local Cache. If this option is enabled, ThinkAutomation will maintain a local cached copy of the content. You can specify the number of Minutes the content should remain cached. If the same URL is requested again within this period it will be read from the cache.

You can specify the Connection Timeout (in seconds). This is the number of seconds to wait for the initial connection. The Response Timeout is the number of seconds to wait for a response after the connection has been made.

The Convert Returned Content To option enables you to convert the http response content. Options are:

  • Nothing : The response is returned as is.
  • Convert HTML To Plain Text : Removes all HTML tags and returns only readable text.
  • Convert HTML To Markdown : Converts the HTML to Markdown text. Images will be removed. The tags <nav> and <footer> will also be removed before conversion. If you need finer control over HTML to Markdown conversion, leave the HTML as is and use the Text Operation action.
  • Convert HTML To XML : Converts the HTML to well-formed XML allowing easier parsing.
  • Convert HTML To XML (Drop Formatting) : Converts the HTML to XML and drops all formatting tags, styles, images, scripts etc. This allows easier parsing of specific text elements. See: HTML To XML.
  • Convert HTML To Json (Drop Formatting) : Converts HTML To Json and drops all formatting tags, styles, images, scripts etc. This allows easier parsing of specific text elements. See: HTML To Json.
  • Convert XML To Json : Converts XML to Json. Useful if the HTTP response is XML format and you need to work with Json.
  • Convert CSS To Inline Styles : Moves all CSS styles sheets to inline style attributes. This enables the HTML to be sent via email as most email clients only support inline styles.
  • Convert Relative Links To Absolute Links : Converts all relative links to absolute links. For example, if requesting a URL from http://www.mysite.com: <img src="image.png"> becomes <img src="http://www.mysite.com/image.png">
  • Convert CSS To Inline And Relative Links To Absolute : Performs both above operations.

The returned content can then be assigned to a variable. Select from the Assign Content To list. You can then make use of the returned content in subsequent Actions.

You can optionally assign any HTML <title> tag to a variable. Select from the Assign Title To list. If the returned content is not HTML or has no <title> tag the variable will be set to blank.

You can optionally assign any <meta description='xxx'> description tag to a variable. Select from the Assign Description To list. If the returned content is not HTML or has no description tag then the variable will be set to blank.

Response Status

The HTTP response status code & response headers can also optionally be assigned to variables.

The status code will be the HTTP response status (200, 404 etc). A status code of <100 indicates a connection error (eg: 2 = 'DNS lookup failed', 3 = 'DNS lookup timeout', 6 = 'Connect timeout'). The error details will be added to the log.

If the Throw Error On HTTP Errors option is enabled then the Automation will log an error if the HTTP status is an error status (404, 500 etc). If this option is not enabled then an error will not be raised (the status will still be logged). This is useful if the purpose of your Automation is to check for HTTP errors. Note: Connection errors will always throw an error.


HTTP Post

Post data to a web resource using HTTP POST/PUT/PATCH or DELETE.

Specify the Post To URL of the http resource to post to.

Note: If you use %variable% replacements inside the URL - the variable values must be URL Encoded.

If the web resource requires authentication then specify the Authentication method and optionally a User Name/Password or an OAuth Auth Token retrieved from a previous OAuth SignIn action.

Select the Post Type you wish to perform. This can be:

  • Regular POST - for posting form field values.
  • Json POST - for posting Json data.
  • Json PUT - for posting Json data using a HTTP PUT instead of POST.
  • Json PATCH - for posting Json patch data.
  • Custom - for performing custom HTTP Posts using data you specify.
  • Stream Binary File - for binary POST/PUT of files.
  • HTTP DELETE - for sending delete requests.

For regular Posts you can specify any number of Names and Values. The Values can be fixed or %variabl% replacements or combinations of both. You can also specify a Type of Text (the default) or File (see Uploading Files below).

For JSON POST or PUT you must specify the JSON Text.

For Custom Posts you must specify the Custom data. This would be the body of the postdata - not the headers.

For all post types you can specify any Query String Parameters. These can either be specified in the URL itself or in the Query String Parameters grid. If you specify query string parameters in the grid any %variable% replacements will be automatically URL encoded.

For all post types you can specify any Custom Headers. Multiple header names & value pairs can be specified. Any existing HTTP headers will be replaced if you use a standard header.

You can specify the Connection Timeout (in seconds). This is the number of seconds to wait for the initial connection. The Response Timeout is the number of seconds to wait for a response after the post has been made.

The HTTP response status code, response headers & response body can also optionally be assigned to variables.

Uploading Files

You can upload files in two ways. Using the Regular Post option, or the Stream Binary File option. The choice will depend on the end point you are posting to.

Using the Regular Post option you can specify the Type as File against a specific form Post Value. The Value should then be a local file path (or a %variable% pointing to a file path). If the Name is not specified then the file name will be used. Adding files to a regular post will result in a POST content-type of 'multipart/form-data'.

Using the Stream Binary File option you specify a File Path. The file content will be posted and the content-type will be set based on the file extension.

Response Status

The HTTP response status code & response headers can also optionally be assigned to variables.

The status code will be the HTTP response status (200, 404 etc). A status code of <100 indicates a connection error (eg: 2 = 'DNS lookup failed', 3 = 'DNS lookup timeout', 6 = 'Connect timeout'). The error details will be added to the log.

If the Throw Error On HTTP Errors option is enabled then the Automation will log an error if the HTTP status is an error status (404, 500 etc). If this option is not enabled then an error will not be raised (the status will still be logged). This is useful if the purpose of your Automation is to check for HTTP errors. Note: Connection errors will always throw an error.


Download File

Download a file via HTTP from any URL.

Enter the URL To Download. This can be any HTTP or HTTPS URL that points to a downloadable file.

If the web resource requires authentication then specify the Authentication method and optionally a User Name/Password or an OAuth Auth Token retrieved from a previous OAuth SignIn action.

Optionally specify any Query String Parameters and Custom Headers to add to the request.

You can specify the Connection Timeout (in seconds). This is the number of seconds to wait for the initial connection. The Response Timeout is the number of seconds to wait for the file to be downloaded after the connection has been made.

Enter or select the Save To Folder. This is the folder on your file system that you want the downloaded file to be saved in.

Optionally enter a Use Filename. If no filename is specified then the filename will be extracted from the URL.

Enable the Make Filename Unique option to append a unique number to the filename if a file already exists in the selected folder with the same name.

Once the file is downloaded the resulting full path/filename will be assigned to the variable selected from the Assign Returned File Path To list.

You can then use this variable to perform other actions such as adding the file as an attachment to outgoing emails.

The Automation will raise an error if the download fails for any reason.


OAuth SignIn

Sign in to an OAuth endpoint to obtain an authorization token.

This action can be used to obtain an authorization token from an OAuth enabled web API. The token can then be used on subsequent HTTP GET or POST actions.

Select the Type. This can be one of the ThinkAutomation connected app types, or Generic.

Enter the Name. This is simply an identifier to show in your Automation actions list. Any text can be used.

Generic OAuth

For generic OAuth you must then supply the Authorization Endpoint, Token Endpoint, Client ID, Client Secret and Scope. These settings depend on the API you want to use. Consult the API documentation for the service you want to use to obtain the correct values.

The Additional Options tab provides some additional options that your OAuth provider may require.

Some OAuth providers can provide additional parameters in the redirect request that is sent back to ThinkAutomation. One such case is for QuickBooks, which returns a realmId parameter. For these cases you can specify parameter names in the Extract Custom Redirect URL Parameters grid. In the Assign To column specify the ThinkAutomation variable to receive each parameter value. You can then use this value on subsequent actions.

Click the Sign In button to begin the sign in process. A browser session will be started to complete the sign in.

Select a variable to receive the authorization token from the Assign Authorization To list.

On any subsequent HTTP GET, HTTP POST or Read JSON Document actions you can then set the Authentication method to OAuth and then Auth Token to the variable value selected above.

ThinkAutomation will automatically refresh the authorization token when it expires.

As with all other Action settings, the Client ID and Client Secret are stored securely in the ThinkAutomation metadata database.


Cloud Storage

Download, Upload or Delete files using various cloud storage providers.

Select the Provider:

  • Amazon S3
  • Google Drive
  • Google Cloud Storage
  • Microsoft OneDrive
  • IBM Cloud Object Storage
  • Wasabi
  • DigitalOcean Spaces
  • Linode
  • Azure Blob
  • Azure File

Depending on the provider, click the Sign In button or enter your Access Key and Secret Key (Amazon S3, Google Cloud Storage, IBM, Wasabi, DigitalOcean Spaces & Linode will also need the Region) and click the Connect button to connect.

You can create Global or Solution constants for your Access Key/Secret Keys so they can be used on multiple actions.

Select Upload, Download or Delete from the Operation selector.

Downloading

In the Remote Files navigator you can navigate folders and files. Double-click a file to add it to the Download Files entry. Multiple files can be downloaded within the same action. Separate each file with a comma.

You can also specify the Downloaded Files directly by entering the paths (or use %variable% replacements). Each file must specify the full path (beginning with /). For S3 compatible providers (Amazon S3, Wasabi, IBM, DigitalOcean, Linode) the Bucket Name must be the first part of the path (Eg: /bucketname/docs/quote1.pdf).

Wildcards

Download paths can contain wildcards (Eg: /myfiles/docs/*.pdf). Each file in the folder matching the mask will be downloaded. You can specify multiple download masks separated by commas (Eg: /myfiles/docs/*.pdf, /myfiles/docs/*.docx)

Specify the Save To folder where you want the downloaded files saved to.

The Assign To variable will receive the local path/filename where the file(s) have been downloaded to. Multiple files will be separated by commas.

Uploading

In the Remote Files navigator you can navigate folders and files. Select the folder where you want to upload files to.

You can optionally specify an Append To Remote Path folder. If a path is specified here then files will be uploaded to Remote Path + Append To Remote Path. For example: If you selected a remote path of '/Documents/' and the Append To Remote Path was set to 'Attachments\PDF' then files will be uploaded to '/Documents/Attachments/PDF'. You can use %variables% in the Append To Remote Path entry.

In the Upload Files entry enter or select the local files to upload (use %variable% replacements if required). Multiple files should be separated by commas.

You also have the option of uploading Attachments. Select the Include Incoming Attachments option and specify the Mask.

The Assign To variable be receive the remote path names for the uploaded files(s). Multiple files will be separated by commas.

Deleting

In the Remote Files navigator you can navigate folders and files. Double-click a file to add it to the Delete Files entry. Multiple files can be downloaded within the same action. Separate each file with a comma.

You can also specify the Delete Files directly by entering the paths (or use %variable% replacements). Each file must specify the full path (beginning with /). For S3 compatible providers (Amazon S3, Wasabi, IBM, DigitalOcean, Linode) the Bucket Name must be the first part of the path (Eg: /bucketname/docs/quote1.pdf).

Wildcards

Delete paths can contain wildcards (Eg: /myfiles/docs/*.pdf). Each file in the folder matching the mask will be deleted. You can specify multiple delete masks separated by commas (Eg: /myfiles/docs/*.pdf, /myfiles/docs/*.docx.

The Assign To variable will receive the remote path/filename for each deleted file. Multiple files will be separated by commas.


Wait For Webhook

Pauses execution of the Automation until a webhook callback is made from a 3rd party web service.

This action can be used to integrate with external web API's. Each processed message has a unique callback URL. This is available via the %Msg_WebCallbackUrl% built-in variable.

If you pass this variable via any HTTP Get or HTTP Post actions to a 3rd party API that offers a webhook response, then you can use this action to pause execution of the current message until the webhook response is received. You can then assign any of the parameters passed back with the webhook to variables in your Automation.

Enter the Name. The name shows in the actions list - but is not used otherwise.

Specify the Maximum Wait in minutes. If a response is not received before this time the Automation will continue.

You can send a response to the webhook call. Specify the Response Type and Response Data. This is optional and will depend on the 3rd party API.

The Request Parameter Assignments grid can be used to map parameters sent by the webhook to fields & variables in your Automation.

To use this action you would first use the HTTP Get or HTTP Post actions to make a request to the 3rd party web API. As part of the request you would include the %Msg_WebCallbackUrl% variable. This will tell the API the URL to use to make the webhook callback. Consult the API documentation for the parameter name to use.

After the HTTP Get/Post action you would add a Wait For Webhook action to pause execution until the webhook callback is received.


Call A Soap Web Service

Executes a SOAP or .NET Web Service and returns the results to ThinkAutomation variables.

Method URI (Namespace)

This is the namespace of the web service. You can find the namespace by viewing the Web Service Definition (WSDL). For .NET Web Services view the .asmx page and click the Service Description link. The namespace will be shown in the targetNamespace element. .NET Web Services have a default namespace of http://tempuri.org. You should change this before making your web services public.

Method Name

This is the name of the method you want to call. You can view a list of available methods provided by the web service by viewing the asmx file.

Action URI

This defaults to the Method URI/Method name.

URL (asmx)

Enter the full public URL to the asmx page. You can use a secure address (https://) if required.

If your web services require a login before being accessed enter the User Name & Password and the Authentication type.

You must specify a value for all of the Parameters that the web service method expects. Enter the parameter Name and Value. For the value you can use %variable% replacements.

Assign Returned Value To

You can assign the returned value to a ThinkAutomation variable. Select the variable from the drop down list. If the web service returns a single value then this will be assigned to the variable. If the web service returns a complex data type - such as a DataSet - then then entire XML response will be assigned.

Output Parameters

If the web service returns output parameters then you can assign individual parameter values to ThinkAutomation fields or variables. If a parameter name specified is not returned as an output parameter then ThinkAutomation will scan the returned XML and extract the specified parameter name as a tag and assign to the tag value to the ThinkAutomation variable.


Check SSL Certificate

Checks the validity and expiry date for the SSL certificate used on any host/URL.

This action can be used to monitor the SSL certificates used on your web sites. For example, your Automation could send a notification email or SMS when a certificate is about to expire.

Specify the Host Name. This would normally be the root web address, eg: thinkautomation.com. If you specify a full URL then the host name will be extracted. Use %variable% replacements if required. Specify the Port. This would normally be 443.

You can specify the Connection Timeout (in seconds). This is the number of seconds to wait for the initial connection.

In the Expiry Days entry, specify the number of days before the certificate expiry date where the status should be set to 'expiring'.

Select the variable to receive the certificate status from the Assign Certificate Status to list. The status will be set to valid, invalid, expired, expiring, none or an error message:

  • valid - the certificate is valid and not about to expire within the Expiry Days.
  • invalid - the certificate is invalid.
  • expired - the certificate has expired.
  • expiring - the certificate is about to expire (within the Expiry Days).
  • none - no certificates.
  • error: {message} - if the host cannot be reached.

If the host cannot be reached, then the status will be set to 'error: {error description}' (eg: 'error: DNS lookup failed').

Select the variable to receive the expiry date from the Assign Expiry Date To list.

The complete certificate chain can also be assigned to a variable in Json format:

[
  {
    "subject": "CN=mydomain.com",
    "issuer": "C=US, O=DigiCert Inc, OU=www.digicert.com, CN=RapidSSL TLS RSA CA G1",
    "validFrom": "2023-11-02T00:00:00",
    "validTo": "2024-11-01T23:59:59",
    "usage": "serverAuth,clientAuth",
    "root": false
  },
  {
    "subject": "C=US, O=DigiCert Inc, OU=www.digicert.com, CN=RapidSSL TLS RSA CA G1",
    "issuer": "C=US, O=DigiCert Inc, OU=www.digicert.com, CN=DigiCert Global Root G2",
    "validFrom": "2017-11-02T12:24:33",
    "validTo": "2027-11-02T12:24:33",
    "usage": "serverAuth,clientAuth",
    "root": false
  },
  {
    "subject": "C=US, O=DigiCert Inc, OU=www.digicert.com, CN=DigiCert Global Root G2",
    "issuer": "C=US, O=DigiCert Inc, OU=www.digicert.com, CN=DigiCert Global Root G2",
    "validFrom": "2013-08-01T12:00:00",
    "validTo": "2038-01-15T12:00:00",
    "usage": "",
    "root": true
  }
]

Select the variable to receive the Json from the Assign Json to list. This is optional, but allows you to examine the certificate chain further in your Automation.


FTP Upload

Uploads files or attachments to an FTP or SFTP server.

First select FTP or SFTP.

Enter your FTP/STFP Host, User Name & Password.

Click the Connect button to connect to your FTP/SFTP Server.

For FTP servers: You may need to uncheck the Passive Mode option if your FTP server doesn't support passive mode. You can select the Secure Mode of 'Auth TLS', 'SSL' & 'None' if your FTP server requires a secure connection.

You can then select a Remote Path to upload your files to.

Instead of selecting a remote path you can specify a path in the Force Remote Path To entry. You can manually enter a path or use a %variable% (or combination). Enable the Create If Missing option if the path should be created on the remote sever if it does not exist.

Syncing A Local Folder

To sync a local folder, select the folder in the Sync From Local Folder entry. Enable Include Sub Folders to include all sub-folders. Enter any Sync Masks. This is a comma separated list of file masks to include in the Sync Folder upload. For example: "*.html,*.css" would only upload files with html or css extensions. Leave blank or set to *.* for all file types. Enable the Skip Upload If Existing File Is The Same Size & Date option to only upload new/updated files during the folder sync.

Uploading Individual Files

The Upload Local Files can be used to select individual local files to upload. Multiple files can be added, separated by commas. You can use %variable% replacements for filenames if required.

Uploading Attachments

To upload attachments enable the Include Incoming Attachments option and specify the Attachment Mask.

In all upload cases the Skip Upload If Existing File Is The Same Size & Date option can be enabled to prevent files that already exist in the Remote Path from being uploaded again if the file size and date are the same.

Enable the Show Progress In Log option to add log entries to show the progress of the upload.

The number of files uploaded can be returned to a variable. Select the variable from the Assign Results To list.


FTP Download

Download files from an FTP or SFTP server.

First select FTP or SFTP.

Enter your FTP Host, User Name & Password.

Click the Connect button to connect to your FTP Server. You may need to uncheck the Passive Mode option if your FTP server doesn't support passive mode.

You can select the Secure Mode of 'Auth TLS', 'SSL' & 'None' if your FTP server requires a secure connection.

You can then select a Remote Path to download files from.

Download Or Sync

From the Download Or Sync option select:

Download Files

Select or enter the Remote File Or Mask to download. You can specify a remote file or a mask (eg: *.pdf)

Select the Save To folder to save the downloaded files to.

If Delete Downloaded Files After Message Is Processed is enabled then ThinkAutomation will remove the file when the Automation completes for the current message. This is useful if you wish to use the file in the Automation (for example, to send the document as an attachment with the Send Email action), but do not need to keep a local copy afterwards.

The Assign Downloaded Files To variable will receive a comma separated list of local file paths.

Sync Local Folder

This option allows you to synchronize a remote path to a local folder (and optionally all sub folders). All new and changed files in the specified remote path will be downloaded to the specified local folder.

To sync a local folder, select the local folder in the Local Folder entry. Enable Include Sub Folders to include all sub-folders. Enter any Sync Masks. This is a comma separated list of file masks to include in the Sync Folder download. For example: "*.html,*.css" would only sync files with html or css extensions. Leave blank or set to *.* for all file types.

When the Automation executes, the Remote Path will be scanned and compared to the Local Folder. Any missing files, newer files or files with size differences will be downloaded. Any new remote sub folders will be created in the local folder (if the Include Sub Folders option is enabled).

The Assign Downloaded Files To variable will receive a comma separated list of local file paths for all new/changed files downloaded for the current sync. If no new or changed files are downloaded then the assign to variable will be blank.


Get Browser Info

Extracts browser name, version & operating system information from a User Agent string.

Specify a user agent or %variable% in the Get Browser Info For User Agent entry.

You can the select variables to receive:

  • Browser Name
  • Version
  • Operating System
  • Is Mobile (true or false)
  • Is Spider (true or false)

This Action is useful when receiving messages via the API - from web forms, or web requests. It allows you to obtain browser information for the user making the request.

For API received messages the originating user-agent is added to the Message Headers. You can extract this using the Set Variable action with the Extract Header Value operation and the Value set to User-Agent.


Wrap HTML

Wraps text content inside HTML tags to create a viewable HTML page. This action is useful if you have content that you have created earlier in your Automation that you then want to wrap inside a HTML page. The resulting HTML can then be used for outgoing emails, API responses or for any other purpose.

Specify the Content To Wrap. This can contain %variable% replacements. You can use Markdown to easily render tables etc. The Markdown will be converted to HTML before being wrapped. You can also specify HTML directly (HTML should not include the body tags). You should not use HTML and Markdown combined.

You can also specify a Title, Header and Footer. These are optional. The Header and Footer can contain Markdown or HTML.

Styling

On the Style tab you can specify various styling options:

  • None - no styling will be added.
  • Default - a simple stylesheet will be added (you can edit the default stylesheet in the Server Settings).
  • Bootstrap - the page will use Bootstrap styles.

You can also specify the page Background and Foreground colors and optionally a Header Image URL.

You can optionally specify your own Style Sheet Path Or URL. If a file path is used then the file will be read and added to the page. If a URL is specified then a link to it will be added.

If the Convert CSS To Inline option is enabled then any stylesheets to will be converted to inline style attributes and the stylesheets removed. This enables the HTML to be sent via email as most email clients only support inline styles (this option cannot be used if you select the Bootstrap style option).

Meta Tags

Click the Meta Tags tab to add any Meta Tags (description, keywords, author etc). Tag values can use %variable% replacements.

Click the Preview button to preview the results.

Select the variable to receive the new HTML from the Assign To list.

Example:

<!doctype html>
<html>
  <head>
    <meta charset="utf-8">
    <meta name="viewport" value="width=device-width, initial-scale=1.0">
    <meta name="description" value="Test Page">
    <meta name="author" value="Parker Software">
    <title>Title</title>
  </head>
  <body style="color:#262626">
    <div class="container">
      <div>
        <h2>Title</h2>
        <div class="header">
          <p>Header</p>
        </div>
        <div id="ta_mainDiv">
          Hello World
        </div>
      </div>
      <div class="footer">Footer</div>
    </div>
  </body>
</html>

The above example uses no styling. If you wanted to create your own stylesheet then you would create classes for h2 and the header & footer classes. The class ta_mainDiv will contain the wrapped content.


Web Spider

Crawls (spider's) a URL and returns a list of URLs found. The list can either be returned as a text with one URL per line or as CSV or Json containing each URL, Title, Description, Keywords and Last Modified Date.

The Web Spider Action only crawls the specified URL. It does not crawl outbound links.

Specify the URL to spider.

Specify any Avoid Patterns (separated by semi colons). Adds wildcard patterns to prevent spidering matching URLs. For example, if "*/assets/*" is added, then any URL containing "/assets/" is not spidered. The "*" character matches zero or more of any character.

Optionally specified a date (or %variable% containing a date) in the Only Modified Since entry. If a date is specified then only URL's with a Last-Modified header date greater than this date will be returned.

Set the Maximum URLs that you want to spider for the site.

Enable the Chop Querystrings to remove the ?query portion from any URLs. This can be done to avoid auto-generated content.

The Web Spider Action will check any robots.txt file. It will not download pages denied by robots.txt

The Return As option can be set to:

URLs one per line

For example:

https://www.testsite.com/
https://www.testsite.com/page2.htm

CSV Containing URL, Title, Description, Keywords, Modified Date

For example:

URL,Title,Description,Keywords,LastModDate
https://www.testsite.com/,Title1,Test Description 1,"keyword1,keyword2",2025-02-26 15:12:12
https://www.testsite.com/page2.htm,Title 2,Test Description 2,"keyword1,keyword2",22025-02-26 15:12:12

JSON Array Containing, URL, Title, Description, Keywords, Modified Date

For example:

[
  {
    "URL": "https://www.testsite.com/",
    "Title": "Title 1",
    "Description": "Test Description 1",
    "Keywords": "keyword1,keyword2",
    "LastModDate": "2025-02-26T15:12:12" 
  },
  {
    "URL": "https://www.testsite.com/page2",
    "Title": "Title 2",
    "Description": "Test Description 2",
    "Keywords": "keyword1,keyword2",
    "LastModDate": "2025-02-26T15:12:12"
  }
]

Select the variable to receive the results from the Assign To list.

You can also assign a list of outbound links found across all URLs spidered. Select the variable to receive outbound links from the Assign Outbound Links to list. Outbound links are returned as a text string with one link per line.

This Action is useful when you need to load content for an entire site - for example: If loading a site to add to a Knowledge Store or Vector Database for use with AI. You could first spider a site and then use the For..Each.. Line In action to loop through the site adding each page content to a Knowledge Store/Vector Database Collection, using the page title as the article titles.

For example:

Spider Site For AI Use Automation
// Add site to vector database
SpiderURL="https://www.optimagpt.com"
Markdown=
Title=
Content=
PageURL=
URLList=
LastDate=
 
// Get the last date we spidered
LastDate=Embedded Value StoreGetIn"SpiderDates"For Key%SpiderURL%
 
// Get a list of page changes since the last run
URLList=Web SpiderURL%SpiderURL%Only Modified Since%LastDate%
 
For EachLine In%URLList%[Assign To: PageURL]
Content=HTTP GetFrom%PageURL%[Assign Title To: Title]
If%Content%Is Not BlankThen
// Convert the page content to Markdown
Markdown=Text OperationConvert: HTML To Markdown%Content%Drop Tags"header,nav,footer,form"(Suppress Links)(Suppress Images)
// Add/update the Markdown in the vector database.
Embedded Vector DatabaseUpdateIn"OptimaGPT"Key%Title%=%Markdown%
End If
Next Loop
 
Embedded Value StoreSetIn"SpiderDates"Key%SpiderURL%=%DateTimeUtc%
 
Return%URLList%

Note: This action may take several minutes for large sites.