Select columns from the dataframe
wrangles:
- select.columns:
input: Manufacturer
| → |
|
Parameter | Required | Data Type | Notes |
---|---|---|---|
input | ✓ | str, list | Name of the column(s) to select |
if | str | A condition that will determine whether the action runs or not as a whole. |
Select a named element of a dictionary.
wrangles:
- select.dictionary_element:
input: Properties
output: Shapes
element: shapes
default: square
where: Part Number = 1234
| → |
|
When passing element as a list, the output will remane a dictionary even if it is only a list of one. In the example below, we have overwritten the input column by not providing an output.
wrangles:
- select.dictionary_element:
input: Properties
element:
- shapes
- materials
| → |
|
Wildcards can be used to select dictionary elements. As before, when passing element as a list, the output remanes a dictionary. Note: When using a wildcard or regex, element must be passed as a list and therefore the output will remain a dictionary.
wrangles:
- select.dictionary_element:
input: Properties
output: Output Dict
element:
- Col*
| → |
|
Regex patterns can be used to select dictionary elements. As before, when passing element as a list, the output remanes a dictionary. Note: When using a wildcard or regex, element must be passed as a list and therefore the output will remain a dictionary.
wrangles:
- select.dictionary_element:
input: Properties
output: Output Dict
element:
- "regex: .*2"
| → |
|
Selected dictionary elements can also be renamed in the output by following the example below.
wrangles:
- select.dictionary_element:
input: Properties
output: Output Dict
element:
- Col1: Column 1
- Col2: Column 2
| → |
|
By adding a default, missing dictionary elements are filled in upon output. Without the use of a default, the wrangle will throw an error when it encounters a missing element.
wrangles:
- select.dictionary_element:
input: Properties
output: Output Dict
element:
- Col1
- Col3
default:
Col1: Z
Col3: Y
| → |
|
Parameter | Required | Data Type | Notes |
---|---|---|---|
input | ✓ | str, list | Name of the input column |
output | str, list | If omitted, the input column will be overwritten. | |
element | ✓ | str, list | Key(s) to select from the dictionary. |
default | str | Set the default value to return if the specified element doesn't exist. | |
where | str | Filter the data to only apply the wrangle to certain rows using an equivalent to a SQL where criteria, such as column1 = 123 OR column2 = 'abc' | |
where_params | str | Variables to use in conjunctions with where. This allows the query to be parameterized. This uses sqlite syntax (? or :name) | |
if | str | A condition that will determine whether the action runs or not as a whole. |
Select elements of lists or dicts using python syntax like col[1:3]['key'].
wrangles:
- select.element:
input: Column 1[0]
output: First Element
Column 1 | Column 1 | First Element | ||
---|---|---|---|---|
['A', 0.6] | → | ['A', 0.6] | 'A' |
Parameter | Required | Data Type | Notes |
---|---|---|---|
input | ✓ | str, list | Name of the input column and sub elements. This permits by index for lists or dict and by key for dicts e.g. col[0]['key'] // [{"key":"val"}] -> "val" |
output | str, list | Name of the output column(s) | |
default | any | Set the default value to return if the specified element doesn't exist. | |
if | str | A condition that will determine whether the action runs or not as a whole. |
Group and aggregate data based on certain criteria.
wrangles:
- select.group_by:
by:
- Product Type
sum: Quanitity
mean: Price ($)
| → |
|
wrangles:
- select.group_by:
by: group
custom.sum_times_two: agg
def sum_times_two(x):
return sum(x) * 2
| → |
|
Note: Recipes using select.group_by must be output to a new sheet because they change the shape of the dataframe.
Parameter | Required | Data Type | Notes |
---|---|---|---|
by | list | List of the input columns to group on, defaults to none | |
first | str, list | The first value for these column(s) | |
last | str, list | The last value for these column(s) | |
list | str, list | Group and return all values for these column(s) as a list | |
min | str, list | The minimum value for these column(s) | |
max | str, list | The maximum value for these column(s) | |
mean | str, list | The mean (average) value for these column(s) | |
median | str, list | The median value for these column(s) | |
nunique | str, list | The count of unique values for these column(s) | |
count | str, list | The count of values for these column(s) | |
std | str, list | The standard deviation of values for these column(s) | |
sum | str, list | The total of values for these column(s) | |
any | str, list | Return true if any of the values for these column(s) are true | |
all | str, list | Return true if all of the values for these column(s) are true | |
p75 | str, list | Get a percentile. Note, you can use any integer here for the corresponding percentile. | |
if | str | A condition that will determine whether the action runs or not as a whole. | |
custom.placeholder | str, list | Placeholder for custom functions. Replace 'placeholder' with the name of the function. |
Return the first n rows
wrangles:
- select.head:
n: 2
| → |
|
Parameter | Required | Data Type | Notes |
---|---|---|---|
n | ✓ | integer | Number of rows to return |
if | str | A condition that will determine whether the action runs or not as a whole. |
Select the option with the highest confidence from multiple columns.
Inputs are expected to be of the form [value, confidence_score]
wrangles:
- select.highest_confidence:
input:
- Col1
- Col2
- Col3
output: Highest Confidence
Col1 | Col2 | Col3 | Highest Confidence | |||
---|---|---|---|---|---|---|
['A', 0.79] | ['B', 0.77] | ['C', 0.99] | → | ['C', 0.99] |
When a list of two outputs is given, the item and it's confidence are split into different columns
wrangles:
- select.highest_confidence:
input:
- Col1
- Col2
- Col3
output:
- Item
- Confidence
Col1 | Col2 | Col3 | Item | Confidence | |||
---|---|---|---|---|---|---|---|
['A', 0.79] | ['B', 0.77] | ['C', 0.99] | → | C | .99 |
Parameter | Required | Data Type | Notes |
---|---|---|---|
input | ✓ | list | The name of the input column |
output | ✓ | str, list | If a list of two, seperates the element and it's confidence |
where | str | Filter the data to only apply the wrangle to certain rows using an equivalent to a SQL where criteria, such as column1 = 123 OR column2 = 'abc' | |
where_params | str | Variables to use in conjunctions with where. This allows the query to be parameterized. This uses sqlite syntax (? or :name) | |
if | str | A condition that will determine whether the action runs or not as a whole. |
Select characters from the left of the input. Using a negative length will reverse the side of select, ie select right.
wrangles:
- select.left:
input: Column
output: Result
length: 3
Column | Result | |||
---|---|---|---|---|
pudding | → | pud |
Parameter | Required | Data Type | Notes |
---|---|---|---|
input | ✓ | str, list | |
output | str, list | If omitted, overwrites input | |
length | ✓ | int | Number of characters to include |
where | str | Filter the data to only apply the wrangle to certain rows using an equivalent to a SQL where criteria, such as column1 = 123 OR column2 = 'abc' | |
where_params | str | Variables to use in conjunctions with where. This allows the query to be parameterized. This uses sqlite syntax (? or :name) | |
if | str | A condition that will determine whether the action runs or not as a whole. |
Select a numbered element of a list (zero indexed)
wrangles:
- select.list_element:
input: Col1
output: Second Element
element: 2 # Zero Indexed
default: F
Col1 | Third Element | |||
---|---|---|---|---|
['A', 'B', 'C'] | → | C | ||
['D', 'E'] | → | F |
Parameter | Required | Data Type | Notes |
---|---|---|---|
input | ✓ | str, list | |
output | str, list | If omitted, overwrites input | |
element | int | The numbered element of the list to select. Starts from zero. If omitted, defaults to 0 | |
default | str | Set the default value to return if the specified element doesn't exist. | |
where | str | Filter the data to only apply the wrangle to certain rows using an equivalent to a SQL where criteria, such as column1 = 123 OR column2 = 'abc' | |
where_params | str | Variables to use in conjunctions with where. This allows the query to be parameterized. This uses sqlite syntax (? or :name) | |
if | str | A condition that will determine whether the action runs or not as a whole. |
Select characters from the right of the input. Using a negative length will reverse the side of select, ie select left.
wrangles:
- select.right:
input: Column
output: Result
length: 3
Column | Result | |||
---|---|---|---|---|
pudding | → | ing |
Parameter | Required | Data Type | Notes |
---|---|---|---|
input | ✓ | str, list | |
output | str, list | If omitted, overwrites input | |
length | ✓ | int | Number of characters to include |
where | str | Filter the data to only apply the wrangle to certain rows using an equivalent to a SQL where criteria, such as column1 = 123 OR column2 = 'abc' | |
where_params | str | Variables to use in conjunctions with where. This allows the query to be parameterized. This uses sqlite syntax (? or :name) | |
if | str | A condition that will determine whether the action runs or not as a whole. |
Select a random sample of rows.
wrangles:
- select.sample:
rows: 2
| → |
|
wrangles:
- select.sample:
rows: .25
| → |
|
Parameter | Required | Data Type | Notes |
---|---|---|---|
rows | ✓ | int, number, str | If a whole number, will select that number of rows. If a decimal between 0 and 1 will select that fraction of the rows e.g. 0.1 => 10% of rows will be returned. Can only be a string if it is a valid number e.g. "two". |
if | str | A condition that will determine whether the action runs or not as a whole. |
Select characters from the middle of the input.
wrangles:
- select.substring:
input: Column
output: Result
start: 2
length: 3
Column | Result | |||
---|---|---|---|---|
pudding | → | udd |
wrangles:
- select.substring:
input: Column
output: Result
start: 2
Column | Result | |||
---|---|---|---|---|
pudding | → | udding |
wrangles:
- select.substring:
input: Column
output: Result
length: 3
Column | Result | |||
---|---|---|---|---|
pudding | → | pud |
Parameter | Required | Data Type | Notes |
---|---|---|---|
input | ✓ | str, list | |
output | str, list | If omitted, overwrites input | |
start | int | The position of the first character to select. If ommited will start from the beginning and length must be provided. | |
length | int | The length of the string to select. If ommited will select to the end of the string and start must be provided. | |
where | str | Filter the data to only apply the wrangle to certain rows using an equivalent to a SQL where criteria, such as column1 = 123 OR column2 = 'abc' | |
where_params | str | Variables to use in conjunctions with where. This allows the query to be parameterized. This uses sqlite syntax (? or :name) | |
if | str | A condition that will determine whether the action runs or not as a whole. |
Return the last n rows
wrangles:
- select.tail:
n: 2
| → |
|
Parameter | Required | Data Type | Notes |
---|---|---|---|
n | ✓ | integer | Number of rows to return |
if | str | A condition that will determine whether the action runs or not as a whole. |
Select the first option if it exceeds a given threshold, else the second option.
wrangles:
- select.threshold:
input:
- Col1
- Col2
output: Result
threshold: .77
Col1 | Col2 | Result | |||
---|---|---|---|---|---|
['A', 0.6] | ['B', 0.79] | → | B |
Parameter | Required | Data Type | Notes |
---|---|---|---|
input | ✓ | list | |
output | ✓ | str | |
threshold | ✓ | float | Threshold above which to choose the first option, otherwise the second |
where | str | Filter the data to only apply the wrangle to certain rows using an equivalent to a SQL where criteria, such as column1 = 123 OR column2 = 'abc' | |
where_params | str | Variables to use in conjunctions with where. This allows the query to be parameterized. This uses sqlite syntax (? or :name) | |
if | str | A condition that will determine whether the action runs or not as a whole. |