2

Under certain cases, doing an assignment on a DataFrame created using .dropna() emits the infamous SettingWithCopyWarning. So I followed what this SO answer suggests, and inspected the ._is_view and ._is_copy attributes:

In [1]: import pandas as pd                                                                                                                                                                                                                   

In [2]: import numpy as np                                                                                                                                                                                                                    

In [3]: df1 = pd.DataFrame([[1.0, np.nan, 2, np.nan], 
   ...:                    [2.0, 3.0, 5, np.nan], 
   ...:                    [np.nan, 4.0, 6, np.nan]], columns=list("ABCD")); df1                                                                                                                                                              
Out[3]: 
     A    B  C   D
0  1.0  NaN  2 NaN
1  2.0  3.0  5 NaN
2  NaN  4.0  6 NaN

In [4]: df2 = df1.dropna(axis="columns", how="all")   

In [5]: df2._is_view, df2._is_copy                                                                                                                                                                                                            
Out[5]: (False, <weakref at 0x7f56e8ecfd60; to 'DataFrame' at 0x7f56e916f9d0>)

In [6]: df2["A"] = df2["A"].fillna(df2["B"])                                                                                                                                                                                                  
<ipython-input-6-9c99ac454f90>:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df2["A"] = df2["A"].fillna(df2["B"])                                                                                                                                                                                        

However, this _is_copy is set to None if you actually call .copy() (misleading variable name?), which on the other hand removes the warning:

In [17]: df3 = df1.dropna(axis="columns", how="all").copy()                                                                                                                                                                                   

In [18]: df3._is_view, df3._is_copy                                                                                                                                                                                                           
Out[18]: (False, None)

In [19]: df3["A"] = df3["A"].fillna(df3["B"])                                                                                                                                                                                                 

In [20]: 

But, finally, .fillna result does have _is_copy set to None:

In [20]: df1 = pd.DataFrame([[1.0, np.nan], [2.0, 3.0]], columns=["A", "B"]); df1                                                                                                                                                             
Out[20]: 
     A    B
0  1.0  NaN
1  2.0  3.0

In [21]: df2 = df1.dropna(axis="columns", how="all")                                                                                                                                                                                          

In [22]: df2._is_view, df2._is_copy                                                                                                                                                                                                           
Out[22]: (False, None)

In [23]: df2["A"] = df2["A"].fillna(df2["B"])                                                                                                                                                                                                 

In [24]:  

Can anybody explain what's the meaning of the SettingWithCopyWarning in this context and why do I still get despite _is_view being False in all cases?

I am particularly worried about this, in light of what the pandas docs say:

Sometimes a SettingWithCopy warning will arise at times when there’s no obvious chained indexing going on. These are the bugs that SettingWithCopy is designed to catch!

astrojuanlu
  • 6,744
  • 8
  • 45
  • 105

2 Answers2

0

The SettingWithCopyWarning is raised because there is potential ambiguity in value assignment. When you assigned values to df2["A"], and pandas knows that df2 is derived from df1, it is not clear if the change should also affect df1, which is why the warning is raised so that the user can check. pandas tracks this using _is_copy, so _is_view doesn't really have anything to do with it. If you manually create an explicit copy with copy(), pandas knows the user does not expect changes in df2 to affect df1, and thus does not raise warnings.

As for your observation that dropna is not consistent in setting _is_copy, it has to do with how the dataframes were created, and how dropna was implemented.

The following is the first df1:

# First df1
In [3]: df1 = pd.DataFrame([[1.0, np.nan, 2, np.nan],
   ...:     ...:     [2.0, 3.0, 5, np.nan],
   ...:     ...:     [np.nan, 4.0, 6, np.nan]], columns=list("ABCD"))

In [4]: df1
Out[4]:
     A    B  C   D
0  1.0  NaN  2 NaN
1  2.0  3.0  5 NaN
2  NaN  4.0  6 NaN

In [5]: a = df1.dropna(axis="columns", how="all")

In [6]: a._is_copy
Out[6]: <weakref at 0x7f94e78427c0; to 'DataFrame' at 0x7f94d5ad06d0>

In [7]: a
Out[7]:
     A    B  C
0  1.0  NaN  2
1  2.0  3.0  5
2  NaN  4.0  6

Notice that one column is dropped.

Following is the second df1:

# Second df1
In [8]: df1 = pd.DataFrame([[1.0, np.nan], [2.0, 3.0]], columns=["A", "B"])

In [9]: df1
Out[9]:
     A    B
0  1.0  NaN
1  2.0  3.0

In [10]: b = df1.dropna(axis="columns", how="all")

In [11]: b
Out[11]:
     A    B
0  1.0  NaN
1  2.0  3.0

In [12]: b._is_copy # No output, i.e. is None

Notice that none of the columns were all NaNs, thus no columns were dropped.

Following is an excerpt of the source code for fillna:

# permalink: https://github.com/pandas-dev/pandas/blob/951239398789b077689371d30906cab5b9248f4e/pandas/core/frame.py#L6014
...

    if np.all(mask):
        result = self.copy()
    else:
        result = self.loc(axis=axis)[mask]

...

Apparently, if no rows met the criteria to be dropped, a copy of the dataframe is returned. Else, an indexed dataframe using .loc is returned.

tnwei
  • 860
  • 7
  • 15
0

It is interesting to see that with df2 = df1.dropna(axis="columns", how="all")

  1. If something is dropped, df2._is_copy_ is referencing to the df1, warning happens if you try to modify df2 in some way
  2. If nothing drops, df2._is_copy_ is none, no warning

This difference must have been done under the hood of dropna()

See the comment in the post: pandas-still-getting-settingwithcopywarning-even-after-using-loc

Roy
  • 53
  • 5