1

I have a R DataFrame df with the following content:

Serial N         year         current
   B              10            14
   B              10            16
   C              12            11
   D              40            20
   B              11            15
   C              12             9

I would like to add a column which compute the average current if serial number and the year are the same serial number and years. I would like to have something like this

Serial N         year         current    Average 
   B              10            14          15
   B              10            16          15
   C              12            13          12
   D              40            20          20
   B              11            15          15
   C              12            11          12

I wrote this

for (i in unique(df$Serial_N))
   {
       for (j in unique (df$year))
         {  data=subset(df,Serial_N==i & year==j)   
            df$Average<-mean(data$current)  
          }
    }     

When I run it, I have the following error

Error in `$<-.data.frame`(`*tmp*`, "Average", value = NaN) : 
replacement has 1 row, data has 0 

What is problem? How can I fix it?

user3841581
  • 2,637
  • 11
  • 47
  • 72

1 Answers1

1

We can use data.table. Convert the 'data.frame' to 'data.table' (setDT(df)), grouped by 'SerialN', and 'year', we get the mean of 'current' and assign it to a new column 'Average'.

library(data.table)
setDT(df)[, Average := mean(current) ,.(SerialN, year)]
akrun
  • 874,273
  • 37
  • 540
  • 662
  • This is what I get :could not find function "setDT" – user3841581 Feb 23 '16 at 18:52
  • @user3841581 Sorry, I didn't specify the library. You have to call `library(data.table)` after installing it using `install.packages('data.table')` – akrun Feb 23 '16 at 18:55
  • I installed it and it seems to work. Thank you – user3841581 Feb 23 '16 at 21:21
  • Sorry, I thought it was marked as duplicate – user3841581 Feb 23 '16 at 21:43
  • Just by curiosity, what is I wanted the average row wise; for example if we had current 1 and current 2. and I wanted the average of the two row wise given that they have the same serial number? – user3841581 Feb 23 '16 at 21:56
  • @user3841581 You can use `rowMeans` i.e. `setDT(df)[, Average := rowMeans(.SD), .SDcols= current1:current2]` – akrun Feb 23 '16 at 22:01
  • still from data.table? wow; it works like the pandas dataFrames. Thanks again – user3841581 Feb 23 '16 at 22:03
  • @user3841581 Or use `Reduce` i.e. `setDT(df)[, Average := Reduce(`+`, .SD)/length(.SD), .SDcols= current1:current2]` – akrun Feb 23 '16 at 22:06
  • When I try using rowsMeans, I have the following: rowMeans(.SD) : 'x' must be numeric ? Does it not handle nan? – user3841581 Feb 23 '16 at 22:14
  • @user3841581 Because `.SD` selects all the columns . If there are non-numeric columns, we have to specify the numeric columns with `.SDcols=..` – akrun Feb 23 '16 at 22:18
  • But all the values I have are numeric or they have nan. My current 1 and current 2 are all numeric or have na – user3841581 Feb 23 '16 at 22:34
  • @user3841581 Please check the `str(df)` It could be that the column class is different. – akrun Feb 24 '16 at 03:40
  • when I tried the method you initially mentioned (setDT(df)[, Average := mean(current) ,.(SerialN, year)]) on another dataset, the resulting average contain only NA, even though current has numeric values. Any ideas on what is the problem? – user3841581 Mar 13 '16 at 15:24
  • @user3841581 It may be because there are missing values in the data.set. Use `na.rm=TRUE` i.e. `setDT(df)[, Average := mean(current, na.rm=TRUE), .(SerialN, year)]` – akrun Mar 13 '16 at 15:27
  • 1
    It works perfectly. Thanks a lot – user3841581 Mar 13 '16 at 15:50
  • just by curiousity again, what if I just wanted to apply the mean on only Serial N with value B only? – user3841581 Mar 16 '16 at 16:58
  • @user3841581 You may need to specify the 'i' i.e. `setDT(df)[SerialN=='B', mean(current)]` – akrun Mar 16 '16 at 18:39
  • Thank you. This cover the foundation on the usage of table. Thanks a lot – user3841581 Mar 16 '16 at 19:48