Pandas functions within Wrangles are currently under development and therefore do not possess all the functionality of pandas or other Wrangles. See below for details.
Pandas functions within recipes allow users to employ the powerful pandas Python package seamlessly into their recipe without using any custom code or writing any Python script. Pandas is a very powerful data tool with a wide range of functions, but there are some restrictions as to which pandas functions work in a recipe. Of course, any pandas functions which do not work in a recipe will still work within a custom function.
Note: These are just a very small handful of examples for pandas functions, many of which have a native Wrangles counterpart which will be noted below each example where applicable.
wrangles:
- pandas.drop_duplicates: {}
| → |
|
This example does not use any parameters, see pandas.drop_duplicates for function parameters.
See format.remove_duplicates for the native Wrangle equivalent.
wrangles:
- pandas.groupby:
parameters:
by: Product Type
| → |
|
Parameter | Required | Data Type | Notes |
---|---|---|---|
parameters | ✓ | dictionary | The "parameters" parameter is a dictionary of all the parameters needed for the function |
by | ✓ | str, list | mapping, function, str, or iterable to be used for grouping |
where | str | Filter the data to only apply the wrangle to certain rows using an equivalent to a SQL where criteria, such as column1 = 123 OR column2 = 'abc' | |
where_params | str | Variables to use in conjunctions with where. This allows the query to be parameterized. This uses sqlite syntax (? or :name) | |
if | str | A condition that will determine whether the action runs or not as a whole. |
More parameters for this function can be found in the pandas.groupby documentation.
See select.group_by for the native Wrangles equivalent.
wrangles:
- pandas.sample:
parameters:
n: 2
| → |
|
Parameter | Required | Data Type | Notes |
---|---|---|---|
n | integer | The number of rows to be selected, defaults to 1. | |
where | str | Filter the data to only apply the wrangle to certain rows using an equivalent to a SQL where criteria, such as column1 = 123 OR column2 = 'abc' | |
where_params | str | Variables to use in conjunctions with where. This allows the query to be parameterized. This uses sqlite syntax (? or :name) | |
if | str | A condition that will determine whether the action runs or not as a whole. |
See pandas.sample for more parameters and information on this function.
See select.sample for the native Wrangles equivalent.
Pandas functions within recipes are restricted to those that return a dataframe, or a column of the same length as the input dataframe. Functions which return return a series or an object will have to have custom functions written in order to work.
Some functions may work on the dataframe as a whole but not on individual columns. If this occurs, try running the function on the entire dataframe and verify that the results are what was intended.