Workflows · To Tasks

To tasks handle all data exports from your data lake to a destination. You can configure the items below. Click on the name for more info.

task
extract
load

Extract

The extract part of the configuration considers the data to be selected from your data lake that will be uploaded to a destination.

Extract types

Below are the types of extract. Click on an extract type for an explanation.

database (BigQuery)
database (Snowflake)
storage

extract: database (BigQuery)

Extract data from a BigQuery table.

Example usage

Example of extracting the data for the to_aws_sqs task.

source: database
query: | 
        SELECT 
            'https://sqs.eu-west-1.amazonaws.com/12345678909/my-message-queue' AS QueueUrl,
            '{"run_id": "{{ task.run_id }}", "task_type": "{{ task.type }}"}' AS MessageBody,
            123 AS DelaySeconds

property	type	required	description
`conn_id`	string	no	Name of the connection. If not declared, client_cloud.db_conn_id is used.
`source`	enumerator (database, storage)	no	Default is `database`
`query`	string	yes	Use either query or template (below). Query to be executed, whose results will be uploaded to the destination.
`template`	string	yes	Use either query(above) or template. Contains a link to a file in the `includes` folder in your repository that contains the SQL statement. This query will be executed and the results will be uploaded to the destination.
`project_id`	string	no	Project ID of the destination table. If not declared, client_cloud.project_id is used.
`project_id`	string	no	Project ID of the destination table. If not declared, client_cloud.project_id is used.
`dataset_id`	string	no	Dataset ID of the destination table. If not declared, client_cloud.dataset_id is used.
`table_id`	string	no	Table ID of the destination table. If not declared, task.id is used.
`use_legacy_sql`	yesno (boolean)	no	Default is `no`. If legacy SQL should be used.
`params`	object	no	Parameters that can be set. Useful for templating.

extract: database (Snowflake)

Extract data from a Snowflake table.

Example usage

Example of extracting for the to_aws_sqs task.

source: database
query: | 
        SELECT 
            'https://sqs.eu-west-1.amazonaws.com/12345678909/my-message-queue' AS QueueUrl,
            '{"run_id": "{{ task.run_id }}", "task_type": "{{ task.type }}"}' AS MessageBody,
            123 AS DelaySeconds

property	type	required	description
`source`	enumerator (database, storage)	no	Default is `database`
`conn_id`	string	no	Name of the connection. If not declared, client_cloud.db_conn_id is used.
`query`	string	Use either query or template (below)	Query to be executed, whose results will be uploaded to the destination.
`template`	string	Use either query or template (below)	Contains a link to a file in the `includes` folder in your repository that contains the SQL statement. This query will be executed and the results will be uploaded to the destination.
`database`	string	yes	Database of the destination table. If not declared, client_cloud.database is used.
`schema`	string	noe	Schema of the destination table. If not declared, client_cloud.schema is used.
`table`	string	noe	Table of the destination table. If not declared, task.id.upper() is used.
`params`	object	no	Parameters that can be set. Useful for templating.

extract: storage

Example usage

Example of extracting the data files with prefix s3://my-test-bucket/some/folder/part-

extract:
    source: storage
    conn_id: s3
    bucket: my-test-bucket
    prefix: some/folder/part-

Extracts data files from Amazon S3, Google Cloud Storage or Azure Blob Storage (in beta).

property	type	required	description
`source`	enumerator (database, storage)	yes	Set to `storage`
`conn_id`	string	no	Name of the connection. If not declared, client_cloud.storage_conn_id is used.
`bucket`	string	no	Name of the bucket. If not declared, client_cloud.bucket is used.
`prefix`	string	no	Prefix of the file(s).
`project_id`	string	no	Google Cloud only. Project ID of the bucket. If not declared, client_cloud.project_id is used.

Load

The `load` part of the configuration considers loading the data to external destinations. The vocabulary of the destination is leading in this part. Will Workflows talk about fields and records, it will use columns and rows here if the data destination uses those.

We identify two types of data destinations: management and data uploads. Management tasks sync a table with a destination. For example a table with all DoubleClick display ads including text and bidding amounts with DV360. Data upload tasks are tasks that upload data to an endpoint. An example of this is for example uploading a file to an external FTP server or uploading audiences to Facebook.

New task types are added to Workflows on a monthly basis. Currently Workflows supports the following to tasks: