0

I have a dataframe like that contains of 5 columns , I want to update one column based on the other 4 columns, the dataframe looks like that

from       via      to       x       y 
 3          2       13      in       out
 3          2       15      in       out
 3          2       21      in       out
13          2       3             
15          2       13     
21          2       13
1          12        2 
1          12        2
1          12        22
2          12        1      in
2          12        22     in      out
22         12        2    

the idea is to fill the column X depending on values on the other four columns, the sequence should be like that: i have to check if x and y have values , if yes, then i have to use the corresponding values of (from ,via) and compare it in all rows with the values of (to, via) if they are the same, so i have to assign the value of Y which is correponding to (from, via) to the column X at the row which has equal values of ( to, via) so at this example, I can see that (from=3, Via=2 has x and y values, so i will take the value of (from=3, Via=2) and compare it with the values of (to, via) in all rows, then I can assign the value of (y=out) at the rows which has (to=3, via=10)

the final result should be like that:

from       via      to       x       y 
 3          2       13      in       out
 3          2       15      in       out
 3          2       21      in      
13          2       3       out      
15          2       13      out
21          2       13
1          12        2      out 
1          12        2      out
1          12        22     out
2          12        1      in
2          12        22     in      out
22         12        2      out

how can i do that in pandas dataframe?

moataz ali
  • 21
  • 4
  • You should put in some code that sets up the data frame. Even pseudo-code of your algorithm would be helpful if you don't know how to write the code. You might read about iterating over rows in a data frame at https://stackoverflow.com/questions/16476924/how-to-iterate-over-rows-in-a-dataframe-in-pandas?rq=1 – David Gaertner Mar 11 '20 at 13:29
  • I am not sure I understand your question. The example you mentioned also talks about from=3, Via=10, but there is no row with via=10. As @DavidGaertner mentioned, can you post a algorithm or write the logic in a better way? – davidbilla Mar 11 '20 at 13:58
  • @davidbilla sorry, I wrote it wrongly, it should be via= 2, i edited it now, I think it will be more clear – moataz ali Mar 12 '20 at 13:02

1 Answers1

0

I cannot find exactly the same result, but I have used the described algo:

# identify the lines where a change will occur and store the index and the new  value
tmp = df.assign(origix=df.index).merge(df[~df['x'].isna() & ~df['y'].isna()], 
                                       left_on = ['from', 'via'], right_on = ['to', 'via'],
                                       suffixes=('_x', '')).set_index('origix')

# apply changes in dataframe:
df.loc[tmp.index, 'x'] = tmp['y']

it gives:

Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252
  • Thanks Serge, it worked but it gives me the result as an object indexed with the original index, this object contains the values I need, but do you know how can I insert these values into the column of X? – moataz ali Mar 12 '20 at 12:59
  • The line `df.loc[tmp.index, 'x'] = tmp['y']` should to it. It does in my tests... – Serge Ballesta Mar 12 '20 at 13:27