Recipes allow a series of Wrangles to be defined and run as an automated sequence.
Recipes are written with YAML and follow a typical ETL format. Recipes are divided into three main sections:
Autocompletion and validation can be added in many code editors.
Set up Recipe Validation
wrangles.recipe.run(recipe = 'my_recipe.wrgl.yml', variables = my_variables, dataframe = my_data, functions = [my_function], timeout = 30)
Recipes execute code. Be careful running recipes from remote sources. Only run recipes from sources you trust.
Parameter | Required | Data Type | Notes |
---|---|---|---|
recipe | ✓ | str | YAML recipe, filepath to a YAML file containing the recipe, url to a YAML file containing the recipe or a model id to the recipe. |
variables | dict | (Optional) A dictionary of custom variables to override placeholders in the recipe. Variables can be indicated as ${MY_VARIABLE}. Variables can also be overwritten by Environment Variables. | |
dataframe | Pandas Dataframe | (Optional) Pass in a pandas dataframe, instead of defining a read section within the YAML | |
functions | list, dict | (Optional) A function or list of functions that can be called as part of the recipe. Functions can be referenced as custom.function_name | |
timeout | str | (Optional) Set a timeout for the recipe in seconds. If not provided, the time is unlimited. |
Recipes execute code. Be careful running recipes from remote sources. Only run recipes from sources you trust.
# file: recipe.wrgl.yml
# ---
# Convert a CSV file to an Excel file
# and change the case of a column.
read:
- file:
name: file.csv
wrangles:
- convert.case:
input: column
case: upper
write:
- file:
name: file.xlsx
To run a recipe from a recipe, see the recipe connector
Additional tools allow users to modify a Wrangle in a way that is clean and easy without having to add any additional steps.
Wildcard expansion allows users to call on all columns that share the same name, but end in some number or digit without explicitly naming each column.
# Using concatenate to combine multiple columns
wrangles:
- merge.concatenate:
input:
- Col*
output: Join Col
char: ', '
| → |
|
Wildcards can also be used in the output for Wrangles that output more than one column where the columns are not yet named.
wrangles:
- split.list:
input: Column
output: Column*
Column | Column1 | Column2 | Column3 | |||
---|---|---|---|---|---|---|
['A', 'B', 'C'] | → | A | B | C |
Not Columns can be used anywhere that a wildcard input can be used. When used along with a wildcard input, Not Columns will exclude the columns listed with a dash (-) attached to the front of the column name.
# Using concatenate to combine multiple columns
wrangles:
- merge.concatenate:
input:
- Col*
- -Col2
output: Join Col
char: ', '
| → |
|