Workflows · From Tasks · from_web_scraper

Purpose

Launches a headless browser and scrapes the data from a website.

Currently Workflows supports the following extractions:

  1. cookies

Task Type: cookies

Store the cookies that were collected at the end of the (headless) browser session.

Properties

propertytyperequireddescription
typeenumerator (see description)yesMust be cookies.
actionsarrayyesArray of actions to perform. Supports both url and click. See below for more details per action type.

Properties Actions

Action type = url

propertytyperequireddescription
typeenumerator (url)yesMust be url.
valuestringyesThe url to navigate to.

Action type = click

propertytyperequireddescription
typeenumerator (click)yesMust be click.
byenumerator (class_name, css_selector, id, link_text, name, partial_link_text, tag_name, xpath)yesWhat method should be used to find the element to click on.
valuestringyesElement specific value that should be used to find the element to click on.

Example usage

extract:
    type: local_storage
    actions:
      - type: url
        value: https://www.onesecondbefore.com
      - type: click
        by: id
        value: btn-f82hf-allow
      - type: url
        value: https://www.onesecondbefore.com
      - type: url
        value: https://www.onesecondbefore.com/contact/
      - type: url
        value: https://www.onesecondbefore.com/resources/

Details

itemdescription
Pre-formatted schemaYes. This from task comes with a pre-formatted schema. Schema depends on the type.
Used TechnologyThe from task uses Selenium and a headleass Chrome browser to scrape the data.