1

Consider that we have a dataset that represents some purchases. Products that have been bought together have the same basket ID.

When a purchased product is edited (e.g. the wrong price was inserted at first) it does not replace the original record. Instead, a new record is made for EVERY product of that basket ID and a new Basket ID is assigned to the purchase.

For example consider a purchase of a bottle of milk and a chocolate:

    Product   Price BasketID   PreviousBasketID
0   Milk      2     1234       Null
1   Chocolate 3     1234       Null

Let's say that we'd like to edit the price of chocolate. Then the dataset would be:

    Product   Price BasketID   PreviousBasketID
0   Milk      2     1234       Null
1   Chocolate 3     1234       Null
2   Milk      2     5678       1234
3   Chocolate 4     5678       1234

Is there a way to keep only the latest version of the basket (i.e. BasketID = 5678) and get rid of any previous versions?

lifetea
  • 27
  • 4

1 Answers1

1

Can you remove any rows that have a BasketID that appears in PreviousBasketID?

Something like:

df = df[~df["BasketID"].isin(df["PreviousBasketID"])]

Here the ~ means bitwise not. See here for more info.

Alex
  • 6,610
  • 3
  • 20
  • 38