Tasks ยท Overview

Tasks refer the individual data tasks that form a job. Onesecondbefore divides the world in three parts: from, do and to.

  • From tasks
    `From` tasks handle all data imports. Onesecondbefore can handle all systems as long as they support data export. If your system is not in our current list, contact our support team to request the data import. All `from` tasks perform an extract (from the datasource), transform (manipulate the incoming data if needed), load (to your data lake) and validate (the result) `From` tasks come with pre-defined and described schemas (field comments in the table) where possible. `From` tasks also handle the deduplication of the target table, to make sure that you don't import the same data twice.
  • Do tasks
    `Do` tasks handle all intra data lake tasks. Typical use-cases are to load files from storage to a table or run a query and save the results in a table. `Do` tasks also contain process flow tasks, like the do_zilch or do_continue.
  • To tasks
    `To` tasks handle all data uploads to data destinations. We identify 2 categories: management & data uploads. Management tasks sync an online spreadsheet (e.g. with audience or campaign information like max click cost) with a marketing platform. This is especially useful for mass updates. You can use a single source (your data lake) and manage all your marketing platforms with 1 press on a button. Data upload tasks upload data to external systems. Examples are uploading a file to an external FTP server (to_ftp) or uploading audiences to Facebook or DoubleClick.

Task

The `task` part of the configuration considers task related settings. It can be configured for all task types.

Example usage

task:
    type: from_google_analytics
    start_date: yesterday -3 days
    end_date: today

Properties

propertytyperequireddescription
typeenumeratoryesContains type of task. Must be one of:
  1. from_appfigures
  2. from_apple_app_store_connect
  3. from_aws_s3
  4. from_bigquery
  5. from_bing
  6. from_bluesky
  7. from_dcm
  8. from_dcbm
  9. from_dpg_datalab
  10. from_facebook
  11. from_ftp
  12. from_google_ads
  13. from_google_analytics
  14. from_google_drive
  15. from_google_search_console
  16. from_google_sheets
  17. from_imap
  18. from_linkedin
  19. from_looker
  20. from_url
  21. from_salesforce
  22. from_snowflake
  23. from_x
  24. from_xandr

The items below are discussed in more detail in the Do section

  1. do
  2. do_zilch
  3. do_continue
  4. do_profiles
  5. do_sessionize

The items below are discussed in more detail in the To section

  1. to_aws_s3
  2. to_aws_sns
  3. to_aws_sqs
  4. to_dcm
  5. to_dpg_datalab
  6. to_doubleclick_offline_conversions
  7. to_meta
  8. to_meta_custom_audience
  9. to_meta_offline_conversions
  10. to_google_analytics_data_import
  11. to_google_analytics_management
  12. to_google_measurement_protocol_v3
  13. to_ftp
  14. to_xandr
  15. to_xandr_server_side_segmentation
idstringyesDefault value is the filename without extension. Unique name for the task.
trigger_datestringread-onlyTimestamp when the job (not the task) was triggered in the local timezone. Useful in deduplicating tables and in SQL templates.
run_idstringread-onlyUnique id per run. Every time a task runs, it receives a unique 8 character alphanumeric string.
tmp_dirstringread-onlyTemporary folder on the worker machine where data will be stored during it's lifetime. Once the task is done, the worker and all data on it will be irreversibly deleted.
start_daterelative or absolute date or date & timeyes Start date of the period that will be selected in the datasource. Can be filled with an absolute or relative date. Read more about relative date and time here.
end_datestring, date or date & timeyes End date of the period that will be selected in the datasource. Can be filled with an absolute or relative date. Read more about relative date and time here.
loop_byenumerator (year, month, week, day, hour, minute, second, file, list)noLoop the task depending on the enumerator value. If year, month, week, day, hour, minute or second the loop will add an equal time frame to the start_date until the end_date is reached. If list the loop will cycle through the values in the loop_list. If file the loop will cycle through each file on a data source. This is especially useful when downloading a large data files in many different chunks.
loop_listarraynoContains a list of values to loop through.
loop_valuestring or intread-onlyContains the actual value of the loop when loop_by is used. Automatically set by Workflows.
loop_indexintread-onlyStarts at 0. Contains the number of the loop. Use in combination with loop_by
loop_start_datedate or datetimeread-onlySet to task.start_date. Only available when loop_by=hour or loop_by=day. When used, task.start_date will be overwritten with the timeframe of the current loop.
loop_end_datedate or datetimeread-onlySet to task.end_date. Only available when loop_by=hour or loop_by=day. When used, task.end_date will be overwritten with the timeframe of the current loop.
resource_sizeenumerator (0, 1, 2, 4, 8, 16, 32, 64, 128, 256)noDefault is 0. Resource size to use for the task. Number corresponds with the amount of CPU (0 being 0.25). The memory of the instance is 8 x resource_size Gib. E.g. a resource_size of 16 means 16 CPU with 8 x 16 = 128 Gib.