Recipes allow a series of Wrangles to be defined and run as an automated sequence.
Recipes are written with YAML and follow a typical ETL format. Recipes are divided into four main sections:
Autocompletion and validation can be added in many code editors.
Set up Recipe Validation
wrangles.recipe.run(recipe = 'my_recipe.wrgl.yml', variables = my_variables, dataframe = my_data, functions = [my_function], timeout = 30)
Recipes execute code. Be careful running recipes from remote sources. Only run recipes from sources you trust.
Parameter | Required | Data Type | Notes |
---|---|---|---|
recipe | ✓ | str | YAML recipe, filepath to a YAML file containing the recipe, url to a YAML file containing the recipe or a model id to the recipe. |
variables | dict | (Optional) A dictionary of custom variables to override placeholders in the recipe. Variables can be indicated as ${MY_VARIABLE}. Variables can also be overwritten by Environment Variables. | |
dataframe | Pandas Dataframe | (Optional) Pass in a pandas dataframe, instead of defining a read section within the YAML | |
functions | list, dict | (Optional) A function or list of functions that can be called as part of the recipe. Functions can be referenced as custom.function_name | |
timeout | str | (Optional) Set a timeout for the recipe in seconds. If not provided, the time is unlimited. |
Recipes execute code. Be careful running recipes from remote sources. Only run recipes from sources you trust.
# file: recipe.wrgl.yml
# ---
# Convert a CSV file to an Excel file
# and change the case of a column.
read:
- file:
name: file.csv
wrangles:
- convert.case:
input: column
case: upper
write:
- file:
name: file.xlsx
To run a recipe from a recipe, see the recipe connector
For input for wrangles, and columns for read and write, a variety of tools can be used to dynamically select which columns to use.
A (*) can be used as a wildcard character to match any characters.
# Using concatenate to combine multiple columns.
wrangles:
- merge.concatenate:
input:
- Col* # This will match any column beginning 'Col'
output: Join Col
char: ', '
| → |
|
Wrangles will generally fail if applied to a column that doesn't exist. A question mark (?) can be added to the end of a column name to make it optional. This disables the validation, but may still fail if the column is essential to the functioning of the wrangle.
wrangles:
# Merge the contents of column1 and column2
# to a list. Also merge column3 if it exists,
# but do not fail if it does not exist
- merge.to_list:
input:
- Col1
- Col2
- Col3? # this will include Col3 if it exists
output: output_column
| → |
|
A dash (-) attached to the front of the column name will not include that column. Typically, this can be combined with a wildcard to find all columns matching a pattern except certain ones.
# Concatenate all columns beginning Col, except Col2
wrangles:
- merge.concatenate:
input:
- Col*
- -Col2
output: Join Col
char: ', '
| → |
|