Dppd verbs

Pandas DataFrame methods

Within a dp(), all pandas.DataFrame methods and accessors work as you’d expect them to [1].


>>> dp(mtcars).rank().pd.head(5)
  name   mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
0  18.0  19.5  15.0  13.5  13.0  21.5   9.0   6.0   9.5  26.0  21.5  25.5
1  19.0  19.5  15.0  13.5  13.0  21.5  12.0  10.5   9.5  26.0  21.5  25.5
2   5.0  24.5   6.0   6.0   7.0  20.0   7.0  23.0  25.5  26.0  21.5   4.0
3  13.0  21.5  15.0  18.0  13.0   8.5  16.0  26.0  25.5  10.0   8.0   4.0
4  14.0  15.0  25.5  27.5  21.0  10.5  19.0  10.5   9.5  10.0   8.0  12.5

You can even continue working with Series within the dp and convert them back to a DataFrame later on:

>>> dp(mtcars).set_index('name').sum().loc[X > 15].to_frame().pd
mpg    642.900
cyl    198.000
disp  7383.100
hp    4694.000
drat   115.090
wt     102.952
qsec   571.160
gear   118.000
carb    90.000


concat combines this DataFrame and another one.


>>> len(mtcars)
>>> len(dp(mtcars).concat(mtcars).pd)


unselect drops by column specification [2].


>>> dp(mtcars).unselect(lambda x: len(x) <= 3).pd.head(1)
        name   disp  drat   qsec  gear  carb
0  Mazda RX4  160.0   3.9  16.46     4     4


distinct selects unique rows, possibly only considering a column specification.


>>> dp(mtcars).distinct('cyl').pd
                name   mpg  cyl   disp   hp  drat    wt   qsec  vs  am  gear  carb
0          Mazda RX4  21.0    6  160.0  110  3.90  2.62  16.46   0   1     4     4
2         Datsun 710  22.8    4  108.0   93  3.85  2.32  18.61   1   1     4     1
4  Hornet Sportabout  18.7    8  360.0  175  3.15  3.44  17.02   0   0     3     2


transassign creates a new DataFrame based on this one.


>>> dp(mtcars).head(5).set_index('name').transassign(kwh = X.hp * 0.74).pd
Mazda RX4           81.40
Mazda RX4 Wag       81.40
Datsun 710          68.82
Hornet 4 Drive      81.40
Hornet Sportabout  129.50


add_count adds the group count to each row.

This is a good example verb to get started on writting own.


>>> dp(mtcars).groupby('cyl').add_count().ungroup().sort_index().head(5).select(['name','cyl','count']).pd
                name  cyl  count
0          Mazda RX4    6      7
1      Mazda RX4 Wag    6      7
2         Datsun 710    4     11
3     Hornet 4 Drive    6      7
4  Hornet Sportabout    8     14


as_type quickly converts the type of columns by a column_specification.


>>> dp(mtcars).astype(['-qsec', '-name'], int).pd.head()
                name  mpg  cyl  disp   hp  drat  wt   qsec  vs  am  gear  carb
0          Mazda RX4   21    6   160  110     3   2  16.46   0   1     4     4
1      Mazda RX4 Wag   21    6   160  110     3   2  17.02   0   1     4     4
2         Datsun 710   22    4   108   93     3   2  18.61   1   1     4     1
3     Hornet 4 Drive   21    6   258  110     3   3  19.44   1   0     3     1
4  Hornet Sportabout   18    8   360  175     3   3  17.02   0   0     3     2
[1]Except for the deprecated pandas.DataFrame.select(), which is shadowed by our verb select.
[2]‘drop’ is already a pandas method name - pandas.DataFrame.drop()


Turn columns into pandas.Categoricals. Default categories are unique values in the order they appear in the dataframe. Pass None to use sorted unique values (ie. pandas.Categorical default behaviour).


Does what it says on the tin.


Convert categorical columns into ‘regression columns’, i.e. X with values a,b,c becomes three binary columns X-a, X-b, X-c which are True exactly where X was a, etc.

rename_columns / reset_columns

Wraps df.columns = … into an inline call. Accepts either a list, a function, a callable (called once for each column with the old columnl, or a string (for single column dataframes). Also accepts None, which resets the columns to list(X.columns) (useful to work around a categorical-columns-can’t-add-any bug).


heads and tails at once.


Sort via the natsort package.


call display(X) - for inline display in jupyter notebooks.