A safer pandas IndexSlice

2024-07-27

Pandas's IndexSlice has one major paper cut that has come to cut me time and time again. Let's set up a small example dataframe, with a hierarchical index missing 2 on the second level and just enough data to display well.

import pandas as pd

df = pd.DataFrame(
    data=[["A", "B"], ["C", "D"]],
    index=pd.MultiIndex.from_tuples([(1, 1), (1, 3)])
)

     0  1
1 1  A  B
  3  C  D

All good when accessing 3 on the second level.

idx = pd.IndexSlice  # just one time setup

df.loc[idx[:, 3], :]

     0  1
1 3  C  D

But when accessing 2 on the second level...

df.loc[idx[:, 2], :]

KeyError: 2

...KeyError, that's not very good.

It makes sense, but now I have to ask for forgiveness, which is fine if it happens once like in this example, but it happens once mostly because I produced a single example, in real life it happened a lot more.

Asking for forgiveness on every df.loc[...] or adding a membership tests for every level, is very tedious...

However, indexing with a slice bound with the same value on each side, will not raise anything but instead return an empty dataframe.

df.loc[idx[:, 2:2], :]

Empty DataFrame
Columns: [0, 1]
Index: []

Now, this is pretty easy to abuse with a class derived from pandas's own IndexSlice which replaces the scalar arguments with slices from and to those arguments.

class _SaferIndexSlice(object):
    """Reimplementation of pd.IndexSlice, used for indexing.

    Difference being it forces scalar indexes into slices bound by the scalar.
    This prevents indexing KeyError and instead returns empty DataFrame/Series.
    """

    def __getitem__(self, args):
        return tuple(
            slice(v, v) if not isinstance(v, slice) else v for v in args
        )

SaferIndexSlice = _SaferIndexSlice()

And if we let her rip, no more KeyError...

sidx = SaferIndexSlice

df.loc[sidx[:, 2], :]

Empty DataFrame
Columns: [0, 1]
Index: []

...while still working exactly the same for other instances.

df.loc[sidx[:, 3], :]

     0  1
1 3  C  D

df.loc[sidx[:, 2:2], :]

Empty DataFrame
Columns: [0, 1]
Index: []

#pandas #python