Run a custom classification wrangle on the specified column or columns. A classification wrangle must be trained first.
Click here to learn how to use Classify Wrangles in Excel.
SampleParameterswrangles:
- classify:
input: Products
output: Category
model_id: ${model_id}
where: Products = Milk
| | |
|
→
|
Products |
Category |
Rice |
|
Milk |
Dairy |
|
Parameter |
Required |
Data Type |
Notes |
input |
✓ |
str, list |
|
output |
✓ |
str, list |
|
model_id |
✓ |
str |
ID of the classification model to be used |
where |
|
str |
Filter the data to only apply the wrangle to certain rows using an equivalent to a SQL where criteria, such as column1 = 123 OR column2 = 'abc' |
where_params |
|
str |
Variables to use in conjunctions with where. This allows the query to be parameterized. This uses sqlite syntax (? or :name) |
if |
|
str |
A condition that will determine whether the action runs or not as a whole. |
Add or Subtract time from a date
SampleParameterswrangles:
- date_calculator:
input: Date
output: New Date
operation: subtract # Optional default is addition
time_unit: days
time_value: 1
Parameter |
Required |
Data Type |
Notes |
input |
✓ |
str, pandas timestamp |
|
output |
|
str, pandas timestamp |
|
operation |
|
str |
Date operation, defaults to add. |
time_unit |
|
str |
Time unit for operation, defaults to none. See below for options |
time_value |
|
float |
Time unit value for operation, defaults to none. |
where |
|
str |
Filter the data to only apply the wrangle to certain rows using an equivalent to a SQL where criteria, such as column1 = 123 OR column2 = 'abc' |
where_params |
|
str |
Variables to use in conjunctions with where. This allows the query to be parameterized. This uses sqlite syntax (? or :name) |
if |
|
str |
A condition that will determine whether the action runs or not as a whole. |
List of time units
- years
- months
- weeks
- days
- hours
- minutes
- seconds
- milliseconds
Look up data from a saved Lookup Wrangle
Click here to learn how to use Lookup Wrangles in Excel.
SampleParameterswrangles:
- lookup:
input: State
output:
- Abbreviation
model_id: 55555555-5555-5555
| | |
State |
Texas |
New York |
Virginia |
|
→
|
|
Parameter |
Required |
Data Type |
Notes |
input |
✓ |
str |
The input column that contains the keys to be looked up in the Lookup Wrangle |
output |
|
str, list |
Output columns from Lookup Wrangle. If not present a dictionary with output values will be returned. |
model_id |
✓ |
str |
ID of the classification model to be used |
where |
|
str |
Filter the data to only apply the wrangle to certain rows using an equivalent to a SQL where criteria, such as column1 = 123 OR column2 = 'abc' |
where_params |
|
str |
Variables to use in conjunctions with where. This allows the query to be parameterized. This uses sqlite syntax (? or :name) |
if |
|
str |
A condition that will determine whether the action runs or not as a whole. |
Added v0.5
Apply mathematical calculations to columns. Also called as maths
Note: Spaces within column headers are replaced with an underscore automatically. In order for this wrangle to function properly, this must be taken into account in the input.
SampleParameterswrangles:
- math:
input: sqrt(Values)
output: Square Root
| | |
|
→
|
Values |
Square Root |
4 |
2 |
9 |
3 |
16 |
4 |
|
Math supports the set of operators listed below:
- Logical operators:
&, |, ~
- Comparison operators:
<, <=, ==, !=, >=, >
- Unary arithmetic operators:
-
- Binary arithmetic operators:
+, -, *, /, **, %, <<, >>
Parameter |
Required |
Data Type |
Notes |
input |
✓ |
str |
The mathematical expression using column names. e.g. column1 * column2 + column3 |
output |
✓ |
str |
The column to output the results to |
where |
|
str |
Filter the data to only apply the wrangle to certain rows using an equivalent to a SQL where criteria, such as column1 = 123 OR column2 = 'abc' |
where_params |
|
str |
Variables to use in conjunctions with where. This allows the query to be parameterized. This uses sqlite syntax (? or :name) |
if |
|
str |
A condition that will determine whether the action runs or not as a whole. |
Added v1.6
The Python Wrangle allows executing simple Python commands inline within a recipe. Row values are referenced by the column name and commands are evaluated once per row. Spaces within column names are replaced by underscores (_). Additionally, all columns are available as a dict named kwargs. For more complex Python, use Custom Functions.
Note, this evaluates the python command - be especially cautious including variables from untrusted sources within the command string.
SampleParameterswrangles:
- python:
output: result
command: My_Column.upper()
My Column |
result |
example text |
EXAMPLE TEXT |
Note: The python wrangle replaces spaces in column headers with an underscore so it is important to keep this in mind when writing your commands.
Parameter |
Required |
Data Type |
Notes |
input |
|
str, list |
Name or list of input column(s) to filter the data available to the command. Useful in conjunction with kwargs to target a variable range of columns. |
output |
✓ |
str, list |
Name or list of output column(s). To output multiple columns, return a list of the corresponding length. |
command |
✓ |
str |
Python command. This must return a value. |
if |
|
str |
A condition that will determine whether the action runs or not as a whole. |
except |
|
any |
Value to return for the row if an exception occurs during the evaluation. If not provided, an exception will be raised as normal. If multiple output columns are specified, this must match the length. |
The python wrangle supports including parameters which allows you to parameterize a variable of unknown origin safely. For example, if you wanted to slice a string or a list like shown below:
wrangles:
- python:
output: sliced
command: input_column[:i]
i: ${var}
In the above code, i is a variable who's value will be dictated by the user.
Added v0.5
Run a recipe as a wrangle.
SampleParameterswrangles:
- recipe:
name: recipe1.wrgl.yaml
The output columns of the Wrangle-Recipe will be added to the dataframe.
Parameter |
Required |
Data Type |
Notes |
name |
✓ |
str |
file name/path of the recipe or the model id |
input |
|
str, list |
Columns to only apply the recipe to. |
output |
|
str, list |
Columns to output from the recipe. |
variables |
|
dict |
A dictionary of variables to pass to the recipe, defaults to pass all variables through. |
output_columns |
|
str, list |
Columns to output from the recipe, defaults to none. |
functions |
|
str, list |
A list of functions used in the recipe |
where |
|
str |
Filter the data to only apply the wrangle to certain rows using an equivalent to a SQL where criteria, such as column1 = 123 OR column2 = 'abc' |
where_params |
|
str |
Variables to use in conjunctions with where. This allows the query to be parameterized. This uses sqlite syntax (? or :name) |
if |
|
str |
A condition that will determine whether the action runs or not as a whole. |
Remove all the elements that occur in one list from another.
SampleParameters¶ Removing Words From a Column
wrangles:
- remove_words:
input: Description
to_remove: # To Remove columns must be list
- Materials
- Colours
output: Product
tokenize_to_remove: True
ignore_case: False
| | |
Description |
Materials |
Colours |
Steel Blue Bottle |
['Steel'] |
['Blue'] |
['Steel', 'Blue', 'Bottle'] |
['Steel'] |
['Blue'] |
|
→
|
|
Parameter |
Required |
Data Type |
Notes |
input |
✓ |
str, list |
|
output |
|
str, list |
If omitted, overwrites input |
to_remove |
✓ |
str, list |
Column or list of columns with a list of words to be removed |
tokenize_to_remove |
|
bool |
Tokenize all to_remove inputs, defaults to false |
ignore_case |
|
bool |
Ignore input and to_remove case, defaults to true |
where |
|
str |
Filter the data to only apply the wrangle to certain rows using an equivalent to a SQL where criteria, such as column1 = 123 OR column2 = 'abc' |
where_params |
|
str |
Variables to use in conjunctions with where. This allows the query to be parameterized. This uses sqlite syntax (? or :name) |
if |
|
str |
A condition that will determine whether the action runs or not as a whole. |
Quick find and replace for simple values. Can use regex in the find field.
Click here to learn how to use Replace Wrangles in Excel.
SampleParameterswrangles:
- replace:
input: Product Data
find: brg
replace: bearing
| | |
Product Data |
SKF ball brg |
brg seal |
|
→
|
Product Data |
SKF ball bearing |
bearing seal |
|
Parameter |
Required |
Data Type |
Notes |
input |
✓ |
str, list |
|
output |
|
str, list |
Overwrites input if none given |
find |
✓ |
str |
String or regex pattern to find |
replace |
✓ |
str |
Value to replace the string or regex pattern |
where |
|
str |
Filter the data to only apply the wrangle to certain rows using an equivalent to a SQL where criteria, such as column1 = 123 OR column2 = 'abc' |
where_params |
|
str |
Variables to use in conjunctions with where. This allows the query to be parameterized. This uses sqlite syntax (? or :name) |
if |
|
str |
A condition that will determine whether the action runs or not as a whole. |
Round numbers in a column to the nearest decimal point of your choosing.
Click here to learn how to use Round Wrangles in Excel.
SampleParameters¶ Rounding a Column
wrangles:
- round:
input: Cost Per Unit
output: Cost Rounded
decimals: 2
| | |
Cost Per Unit |
3.14159 |
2.71828 |
|
→
|
Cost Per Unit |
Cost Rounded |
3.14159 |
3.14 |
2.71828 |
2.72 |
|
Parameter |
Required |
Data Type |
Notes |
input |
✓ |
str, list |
Name of the input column |
output |
|
str, list |
Overwrites input if none given |
decimals |
|
int |
Number of decimal places to round to, default is zero |
where |
|
str |
Filter the data to only apply the wrangle to certain rows using an equivalent to a SQL where criteria, such as column1 = 123 OR column2 = 'abc' |
where_params |
|
str |
Variables to use in conjunctions with where. This allows the query to be parameterized. This uses sqlite syntax (? or :name) |
if |
|
str |
A condition that will determine whether the action runs or not as a whole. |
Calculate the similarity of two vectors.
Note: Similarity only works on vectors. In order to produce vectors for a column of strings, use create.embeddings.
SampleParameters¶ Rounding a Column
wrangles:
- create.embeddings:
input: col1
api_key: ${my_key}
output: col1 embeddings
- create.embeddings:
input: col2
api_key: ${my_key}
output: col2 embeddings
- similarity:
input:
- col1 embeddings
- col2 embeddings
output: similarity
method: adjusted cosine
| | |
col1 |
col2 |
col1 embeddings |
col2 embeddings |
similarity |
SKF |
Timken |
[1, 2, 3, 4] |
[4, 3, 2, 1] |
0.158931 |
Ball Bearing |
Roller Bearing |
[5, 6, 7, 8] |
[5, 6, 7, 9] |
0.942437 |
|
Parameter |
Required |
Data Type |
Notes |
input |
✓ |
list |
Two columns of vectors to compare the similarity of. |
output |
✓ |
str |
Name of the output column. |
method |
|
str |
The type of similarity to calculate (cosine or euclidean), defaults to cosine. Adjusted cosine adjusts the default cosine calculation to cover a range of 0-1 for typical comparisons. |
where |
|
str |
Filter the data to only apply the wrangle to certain rows using an equivalent to a SQL where criteria, such as column1 = 123 OR column2 = 'abc' |
where_params |
|
str |
Variables to use in conjunctions with where. This allows the query to be parameterized. This uses sqlite syntax (? or :name) |
if |
|
str |
A condition that will determine whether the action runs or not as a whole. |
List of methods:
- cosine
- adjusted cosine
- euclidean
¶ Standardize
Run a standardize wrangle. e.g. A wrangle that expands abbreviations. A standardization wrangle must be trained first.
Click here to learn how to use Standardize Wrangles in Excel.
SampleParameterswrangles:
- standardize:
input: Abbrev
output: Abbreviations
model_id: code_here
| | |
|
→
|
Abbreviations |
As Soon As Possible |
Estimated Time of Arrival |
|
Parameter |
Required |
Data Type |
Notes |
input |
✓ |
str, list |
|
output |
|
str, list |
If omitted, overwrites input |
model_id |
✓ |
str, list |
The ID of the wrangle to use (do not include 'find' and 'replace') |
case_sensitive |
|
boolean |
Allows the wrangle to be case sensitive if set to True, default is False. |
where |
|
str |
Filter the data to only apply the wrangle to certain rows using an equivalent to a SQL where criteria, such as column1 = 123 OR column2 = 'abc' |
where_params |
|
str |
Variables to use in conjunctions with where. This allows the query to be parameterized. This uses sqlite syntax (? or :name) |
if |
|
str |
A condition that will determine whether the action runs or not as a whole. |
This does not currently work with objects. If your table contains objects, convert.to_json prior to using SQL.
Apply a SQL command to the current dataframe.
Only SELECT statements are supported - the result will be the output. The current table is called df.
SampleParameterswrangles:
- sql:
command: |
SELECT header1, header2
FROM df
WHERE header1 >= 2
| | |
header1 |
header2 |
header3 |
1 |
a |
x |
2 |
b |
y |
3 |
c |
z |
|
→
|
|
Parameter |
Required |
Data Type |
Notes |
command |
✓ |
str |
SQL Command. The table is called df. For specific SQL syntax, this uses the SQLite dialect. |
if |
|
str |
A condition that will determine whether the action runs or not as a whole. |
Sql is not compatible with where filtering
Translate the input column to another language. Powered by DeepL. A list of language codes can be found here.
Click here to learn how to use Translate Wrangles in Excel.
SampleParameterswrangles:
- translate:
input: Español
output: English
source_language: Spanish
target_language: English (British)
Parameter |
Required |
Data Type |
Notes |
input |
✓ |
str, list |
|
output |
✓ |
str, list |
|
target_language |
✓ |
str |
Code of the language to translate to |
source_language |
|
str |
Code of the language to translate from. If omitted, automatically detects the input language |
case |
|
lower/ upper/ title/ sentence |
Allow changing the case of the input prior to translation. |
where |
|
str |
Filter the data to only apply the wrangle to certain rows using an equivalent to a SQL where criteria, such as column1 = 123 OR column2 = 'abc' |
where_params |
|
str |
Variables to use in conjunctions with where. This allows the query to be parameterized. This uses sqlite syntax (? or :name) |
if |
|
str |
A condition that will determine whether the action runs or not as a whole. |