Pandas functions within Wrangles are currently under development and therefore do not possess all the functionality of pandas or other Wrangles. See below for details.
Pandas functions within recipes allow users to employ the powerful pandas Python package seamlessly into their recipe without using any custom code or writing any Python script. Pandas is a very powerful data tool with a wide range of functions, but there are some restrictions as to which pandas functions work in a recipe. Of course, any pandas functions which do not work in a recipe will still work within a custom function.
¶ pandas.drop_duplicates
wrangles:
- pandas.drop_duplicates: {}
| | |
Part Number |
Item |
123456 |
ball bearing |
789123 |
angle grinder |
456789 |
screwdriver |
123456 |
ball bearing |
|
→
|
Part Number |
Item |
123456 |
ball bearing |
789123 |
angle grinder |
456789 |
screwdriver |
|
This example does not use any parameters, see pandas.drop_duplicates for function parameters.
¶ pandas.groupby
SampleParameters¶ Group DataFrame using a mapper or by a Series of columns.
wrangles:
- pandas.groupby:
parameters:
by: Product Type
| | |
Product Type |
Description |
bearings |
14mm skf radial ball bearing |
hardware |
1/4-20x3" machine screw |
bearings |
3"odx2.5"id thrust bearing |
hardware |
m6x35mm stainless steel bolt |
|
→
|
Product Type |
Description |
bearings |
14mm skf radial ball bearing |
bearings |
3"odx2.5"id thrust bearing |
hardware |
1/4-20x3" machine screw |
hardware |
m6x35mm stainless steel bolt |
|
Parameter |
Required |
Data Type |
Notes |
parameters |
✓ |
dictionary |
The "parameters" parameter is a dictionary of all the parameters needed for the function |
by |
✓ |
str, list |
mapping, function, str, or iterable to be used for grouping |
where |
|
str |
Filter the data to only apply the wrangle to certain rows using an equivalent to a SQL where criteria, such as column1 = 123 OR column2 = 'abc' |
where_params |
|
str |
Variables to use in conjunctions with where. This allows the query to be parameterized. This uses sqlite syntax (? or :name) |
if |
|
str |
A condition that will determine whether the action runs or not as a whole. |
More parameters for this function can be found in the pandas.groupby documentation.
¶ pandas.sample
SampleParameters¶ Selects a random sample from the dataframe
wrangles:
- pandas.sample:
parameters:
n: 2
| | |
Voltage |
Current |
Resistance |
12v |
6a |
2ohm |
18v |
6a |
3ohm |
24v |
12a |
2ohm |
36v |
12a |
3ohm |
|
→
|
Voltage |
Current |
Resistance |
12v |
6a |
2ohm |
24v |
12a |
2ohm |
|
Parameter |
Required |
Data Type |
Notes |
n |
|
integer |
The number of rows to be selected, defaults to 1. |
where |
|
str |
Filter the data to only apply the wrangle to certain rows using an equivalent to a SQL where criteria, such as column1 = 123 OR column2 = 'abc' |
where_params |
|
str |
Variables to use in conjunctions with where. This allows the query to be parameterized. This uses sqlite syntax (? or :name) |
if |
|
str |
A condition that will determine whether the action runs or not as a whole. |
See pandas.sample for more parameters and information on this function.
Pandas functions within recipes are restricted to those that return a dataframe, or a column of the same length as the input dataframe. Functions which return return a series or an object will have to have custom functions written in order to work.
Some functions may work on the dataframe as a whole but not on individual columns. If this occurs, try running the function on the entire dataframe and verify that the results are what was intended.