0

may I check what does this line do?

df = df[~df[runner].str.contains("[a-z]").fillna(False)]

Is this code remove all rows that contain string that start with alphabet? 2nd question is what is the purpose of ~? What does it do?

Thanks

hpaulj
  • 221,503
  • 14
  • 230
  • 353
vitalstrike82
  • 191
  • 1
  • 14

1 Answers1

1

This code is masking a DataFrame.

The RegEx "[a-z]" means contains any character 'a to z' (not 'starting with', as this would be "^[a-z]").

The .fillna(False) means every NaN is treated as False for this Mask.

The ~ is inverting the Mask, so that the unselected rows are returned.

Be aware that the rows containing NaN are included. If this is not intended you must use .fillna(True).

tweini
  • 806
  • 8
  • 12
  • Worth noting possibly more idiomatic is to use the `na` parameter: `~df[runner].str.contains("[a-z]", na=False)` rather than using `fillna` as a subsequent step. – jpp Oct 07 '19 at 00:29