A safer pandas IndexSlice
Pandas's IndexSlice
has one major paper cut that has come to cut me time and time again. Let's set up a small example dataframe, with a hierarchical index missing 2
on the second level and just enough data to display well.
import pandas as pd
df = pd.DataFrame(
data=[["A", "B"], ["C", "D"]],
index=pd.MultiIndex.from_tuples([(1, 1), (1, 3)])
)
0 1
1 1 A B
3 C D
All good when accessing 3
on the second level.
idx = pd.IndexSlice # just one time setup
df.loc[idx[:, 3], :]
0 1
1 3 C D
But when accessing 2
on the second level...
df.loc[idx[:, 2], :]
KeyError: 2
...KeyError
, that's not very good.
It makes sense, but now I have to ask for forgiveness, which is fine if it happens once like in this example, but it happens once mostly because I produced a single example, in real life it happened a lot more.
Asking for forgiveness on every df.loc[...]
or adding a membership tests for every level, is very tedious...
However, indexing with a slice bound with the same value on each side, will not raise anything but instead return an empty dataframe.
df.loc[idx[:, 2:2], :]
Empty DataFrame
Columns: [0, 1]
Index: []
Now, this is pretty easy to abuse with a class derived from pandas's own IndexSlice
which replaces the scalar arguments with slices from and to those arguments.
class _SaferIndexSlice(object):
"""Reimplementation of pd.IndexSlice, used for indexing.
Difference being it forces scalar indexes into slices bound by the scalar.
This prevents indexing KeyError and instead returns empty DataFrame/Series.
"""
def __getitem__(self, args):
return tuple(
slice(v, v) if not isinstance(v, slice) else v for v in args
)
SaferIndexSlice = _SaferIndexSlice()
And if we let her rip, no more KeyError
...
sidx = SaferIndexSlice
df.loc[sidx[:, 2], :]
Empty DataFrame
Columns: [0, 1]
Index: []
...while still working exactly the same for other instances.
df.loc[sidx[:, 3], :]
0 1
1 3 C D
df.loc[sidx[:, 2:2], :]
Empty DataFrame
Columns: [0, 1]
Index: []