Workflows · From Tasks · from_web_scraper

Purpose

Launches a headless browser and scrapes the data from a website.

Currently Workflows supports the following extractions:

cookies

Task Type: `cookies`

Store the cookies that were collected at the end of the (headless) browser session.

Properties

property	type	required	description
`type`	enumerator (see description)	yes	Must be `cookies`.
`actions`	array	yes	Array of actions to perform. Supports both `url` and `click`. See below for more details per action type.

Properties Actions

Action type = url

property	type	required	description
`type`	enumerator (url)	yes	Must be `url`.
`value`	string	yes	The url to navigate to.

Action type = click

property	type	required	description
`type`	enumerator (click)	yes	Must be `click`.
`by`	enumerator (class_name, css_selector, id, link_text, name, partial_link_text, tag_name, xpath)	yes	What method should be used to find the element to click on.
`value`	string	yes	Element specific value that should be used to find the element to click on.

Example usage

extract:
    type: local_storage
    actions:
      - type: url
        value: https://www.onesecondbefore.com
      - type: click
        by: id
        value: btn-f82hf-allow
      - type: url
        value: https://www.onesecondbefore.com
      - type: url
        value: https://www.onesecondbefore.com/contact/
      - type: url
        value: https://www.onesecondbefore.com/resources/

Details

item	description
`Pre-formatted schema`	Yes. This from task comes with a pre-formatted schema. Schema depends on the type.
`Used Technology`	The from task uses Selenium and a headleass Chrome browser to scrape the data.