Dppd verbs¶
Pandas DataFrame methods¶
Within a dp(), all pandas.DataFrame
methods and accessors work
as you’d expect them to [1].
Example:
>>> dp(mtcars).rank().pd.head(5)
name mpg cyl disp hp drat wt qsec vs am gear carb
0 18.0 19.5 15.0 13.5 13.0 21.5 9.0 6.0 9.5 26.0 21.5 25.5
1 19.0 19.5 15.0 13.5 13.0 21.5 12.0 10.5 9.5 26.0 21.5 25.5
2 5.0 24.5 6.0 6.0 7.0 20.0 7.0 23.0 25.5 26.0 21.5 4.0
3 13.0 21.5 15.0 18.0 13.0 8.5 16.0 26.0 25.5 10.0 8.0 4.0
4 14.0 15.0 25.5 27.5 21.0 10.5 19.0 10.5 9.5 10.0 8.0 12.5
You can even continue working with Series within the dp and convert them back to a DataFrame later on:
>>> dp(mtcars).set_index('name').sum().loc[X > 15].to_frame().pd
0
mpg 642.900
cyl 198.000
disp 7383.100
hp 4694.000
drat 115.090
wt 102.952
qsec 571.160
gear 118.000
carb 90.000
concat¶
concat
combines this DataFrame and another one.
Example:
>>> len(mtcars)
32
>>> len(dp(mtcars).concat(mtcars).pd)
64
unselect¶
unselect
drops by column specification
[2].
Example:
>>> dp(mtcars).unselect(lambda x: len(x) <= 3).pd.head(1)
name disp drat qsec gear carb
0 Mazda RX4 160.0 3.9 16.46 4 4
distinct¶
distinct
selects unique rows, possibly
only considering a column specification
.
Example:
>>> dp(mtcars).distinct('cyl').pd
name mpg cyl disp hp drat wt qsec vs am gear carb
0 Mazda RX4 21.0 6 160.0 110 3.90 2.62 16.46 0 1 4 4
2 Datsun 710 22.8 4 108.0 93 3.85 2.32 18.61 1 1 4 1
4 Hornet Sportabout 18.7 8 360.0 175 3.15 3.44 17.02 0 0 3 2
transassign¶
transassign
creates a new DataFrame based on
this one.
Example:
>>> dp(mtcars).head(5).set_index('name').transassign(kwh = X.hp * 0.74).pd
kwh
name
Mazda RX4 81.40
Mazda RX4 Wag 81.40
Datsun 710 68.82
Hornet 4 Drive 81.40
Hornet Sportabout 129.50
add_count¶
add_count
adds the group count to each row.
This is a good example verb to get started on writting own.
Example:
>>> dp(mtcars).groupby('cyl').add_count().ungroup().sort_index().head(5).select(['name','cyl','count']).pd
name cyl count
0 Mazda RX4 6 7
1 Mazda RX4 Wag 6 7
2 Datsun 710 4 11
3 Hornet 4 Drive 6 7
4 Hornet Sportabout 8 14
as_type¶
as_type
quickly converts the type of columns by
a column_specification.
Example:
>>> dp(mtcars).astype(['-qsec', '-name'], int).pd.head()
name mpg cyl disp hp drat wt qsec vs am gear carb
0 Mazda RX4 21 6 160 110 3 2 16.46 0 1 4 4
1 Mazda RX4 Wag 21 6 160 110 3 2 17.02 0 1 4 4
2 Datsun 710 22 4 108 93 3 2 18.61 1 1 4 1
3 Hornet 4 Drive 21 6 258 110 3 3 19.44 1 0 3 1
4 Hornet Sportabout 18 8 360 175 3 3 17.02 0 0 3 2
[1] | Except for the deprecated pandas.DataFrame.select() , which is shadowed
by our verb select . |
[2] | ‘drop’ is already a pandas method name - pandas.DataFrame.drop() |
categorize¶
Turn columns into pandas.Categoricals. Default categories are unique values in the order they appear in the dataframe. Pass None to use sorted unique values (ie. pandas.Categorical default behaviour).
unique_in_order¶
Does what it says on the tin.
binarize¶
Convert categorical columns into ‘regression columns’, i.e. X with values a,b,c becomes three binary columns X-a, X-b, X-c which are True exactly where X was a, etc.
rename_columns / reset_columns¶
Wraps df.columns = … into an inline call. Accepts either a list, a function, a callable (called once for each column with the old columnl, or a string (for single column dataframes). Also accepts None, which resets the columns to list(X.columns) (useful to work around a categorical-columns-can’t-add-any bug).
ends¶
heads and tails at once.
natsort¶
Sort via the natsort package.
display¶
call display(X) - for inline display in jupyter notebooks.