Recipes allow a series of Wrangles to be defined and run as an automated sequence.
Recipes are written with YAML and follow a typical ETL format. Recipes are divided into four main sections:
Autocompletion and validation can be added in many code editors.
Set up Recipe Validation
SampleParameters¶ Example Run Command
wrangles.recipe.run(recipe = 'my_recipe.wrgl.yml', variables = my_variables, dataframe = my_data, functions = [my_function], timeout = 30)
Recipes execute code. Be careful running recipes from remote sources. Only run recipes from sources you trust.
Parameter |
Required |
Data Type |
Notes |
recipe |
✓ |
str |
YAML recipe, filepath to a YAML file containing the recipe, url to a YAML file containing the recipe or a model id to the recipe. |
variables |
|
dict |
(Optional) A dictionary of custom variables to override placeholders in the recipe. Variables can be indicated as ${MY_VARIABLE}. Variables can also be overwritten by Environment Variables. |
dataframe |
|
Pandas Dataframe |
(Optional) Pass in a pandas dataframe, instead of defining a read section within the YAML |
functions |
|
list, dict |
(Optional) A function or list of functions that can be called as part of the recipe. Functions can be referenced as custom.function_name |
timeout |
|
str |
(Optional) Set a timeout for the recipe in seconds. If not provided, the time is unlimited. |
Recipes execute code. Be careful running recipes from remote sources. Only run recipes from sources you trust.
# file: recipe.wrgl.yml
# ---
# Convert a CSV file to an Excel file
# and change the case of a column.
read:
- file:
name: file.csv
wrangles:
- convert.case:
input: column
case: upper
write:
- file:
name: file.xlsx
To run a recipe from a recipe, see the recipe connector
¶ Dynamic Column Selections
For input for wrangles, and columns for read and write, a variety of tools can be used to dynamically select which columns to use.
A (*) can be used as a wildcard character to match any characters.
# Using concatenate to combine multiple columns.
wrangles:
- merge.concatenate:
input:
- Col* # This will match any column beginning 'Col'
output: Join Col
char: ', '
¶ Optional Columns
Wrangles will generally fail if applied to a column that doesn't exist. A question mark (?) can be added to the end of a column name to make it optional. This disables the validation, but may still fail if the column is essential to the functioning of the wrangle.
wrangles:
# Merge the contents of column1 and column2
# to a list. Also merge column3 if it exists,
# but do not fail if it does not exist
- merge.to_list:
input:
- Col1
- Col2
- Col3? # this will include Col3 if it exists
output: output_column
¶ Exclude Columns
A dash (-) attached to the front of the column name will not include that column. Typically, this can be combined with a wildcard to find all columns matching a pattern except certain ones.
# Concatenate all columns beginning Col, except Col2
wrangles:
- merge.concatenate:
input:
- Col*
- -Col2
output: Join Col
char: ', '
Positional inputs allow users to call the input columns by a zero based index. Slicing can be used to call a range of columns (non-inclusive), or their index can be called individually. Calling an individual index that is out of range will cause an error, while slicing out of range is allowed. Negative values are not allowed, and will also error.
# Concatenate all columns beginning Col, except Col2
wrangles:
- convert.case:
input: 0
output: Lower
case: lower
# Concatenate all columns beginning Col, except Col2
wrangles:
- merge.concatenate:
input:
- 0
- 2
output: Join Col
char: ', '
# Concatenate all columns beginning Col, except Col2
wrangles:
- merge.concatenate:
input: 0:5
output: Join Col
char: ', '