1

I have this dataframe:

df<-data.frame(a=c("a1","a2","a3","a4","b1","b2","b3","b4","a1","a2","a3","a4","b1","b2","b3","b4"), b=c("x1","x2","x3","total","x1","x2","x3","total", "x1","x2","x3","total","x1","x2","x3","total"), reg=c("A","A","A","A","A","A","A","A","B", "B","B","B","B","B","B","B"), c=c(1:16))

which looks like:

    a     b reg  c
1  a1    x1   A  1
2  a2    x2   A  2
3  a3    x3   A  3
4  a4 total   A  4
5  b1    x1   A  5
6  b2    x2   A  6
7  b3    x3   A  7
8  b4 total   A  8
9  a1    x1   B  9
10 a2    x2   B 10
11 a3    x3   B 11
12 a4 total   B 12
13 b1    x1   B 13
14 b2    x2   B 14
15 b3    x3   B 15
16 b4 total   B 16

columns 'a', 'b' and 'reg' are categorical variables. What I want to do is to create a new column which divides x(i), where i=1,2,3 with 'total' (x(i)/total) for each category in reg' and ina' columns.

Can someone help me with this ?

user17880
  • 183
  • 1
  • 11

2 Answers2

1

Assuming your df is ordered like your example .

library(zoo)
df$NEW=df$c
df$NEW[df$b!='total']=NA
df$NEW=na.locf(df$NEW,fromLast=T,na.rm=F)
df$NEW=df$c/df$NEW

df
    a     b reg  c       NEW
1  a1    x1   A  1 0.2500000
2  a2    x2   A  2 0.5000000
3  a3    x2   A  3 0.7500000
4  a4 total   A  4 1.0000000
5  b1    x1   A  5 0.6250000
6  b2    x2   A  6 0.7500000
7  b3    x2   A  7 0.8750000
8  b4 total   A  8 1.0000000
9  a1    x1   B  9 0.7500000
10 a2    x2   B 10 0.8333333
11 a3    x2   B 11 0.9166667
12 a4 total   B 12 1.0000000
13 b1    x1   B 13 0.8125000
14 b2    x2   B 14 0.8750000
15 b3    x2   B 15 0.9375000
16 b4 total   B 16 1.0000000

Base on Op's explanation ,Below are working for the real data of he/she.(From OP)

data1$shares<-NA 
id<-which(data1$Occupation=='Total') 
data1$shares[id]<-data1$2014[id]
data1$shares=na.locf(data1$shares,fromLast=T,na.rm=F) 
data1$shares=data1$2014/data1$shares
BENY
  • 317,841
  • 20
  • 164
  • 234
  • for some reason did not work. Is it because there are instances where `total` is `NAN`, whereas column 'c' might also include 'NAN' values ? – user17880 Oct 11 '17 at 15:14
  • @user17880 if total is NA , how you due with that – BENY Oct 11 '17 at 15:28
  • ideally, in this example, it should return 'NA'. – user17880 Oct 11 '17 at 15:33
  • @user17880 I will adding `df$NEW[df$b=='total'&is.na(df$NEW)]=-1` and at the end adding `df$NEW[df$NEW<0]=NA` – BENY Oct 11 '17 at 15:36
  • 1
    Still, did not work. But, I found this modification to your code to be working for my purpose: 'data1$shares<-NA id<-which(data1$Occupation=='Total') data1$shares[id]<-data1$`2014`[id] data1$shares=na.locf(data1$shares,fromLast=T,na.rm=F) data1$shares=data1$`2014`/data1$shares' – user17880 Oct 11 '17 at 15:44
  • @user17880 as long as problem is solved :) cheer , a kind reminder , nexttime provide a reproducible example, or You will have the same issue , *sample data solution dose not work for the real data* – BENY Oct 11 '17 at 15:46
  • @user17880 also If you want , you can mark this as answer.Or maybe modify your original post to the case you mentioned , I will either delete this answer or modify it – BENY Oct 11 '17 at 15:49
  • 1
    No, is better to modify your own answer. I think you deserve most of the credit :) – user17880 Oct 11 '17 at 15:50
  • @user17880 just edit and do mention this is from you :-) – BENY Oct 11 '17 at 15:53
0

Just using R base:

df<-data.frame(a=c("a1","a2","a3","a4","b1","b2","b3","b4","a1","a2","a3","a4","b1","b2","b3","b4"), b=c("x1","x2","x3","total","x1","x2","x3","total", "x1","x2","x3","total","x1","x2","x3","total"), reg=c("A","A","A","A","A","A","A","A","B", "B","B","B","B","B","B","B"), c=c(1:16))

totals <- data.frame(To=df[df$b=='total',4])
totals$from <- c(1, totals$To[1:nrow(totals)-1]+1)
df$NEW = df$c/totals[findInterval(x=df$c, vec=c(rbind(totals$from, totals$to))), 1]
df

Output:

    a     b reg  c       NEW
1  a1    x1   A  1 0.2500000
2  a2    x2   A  2 0.5000000
3  a3    x3   A  3 0.7500000
4  a4 total   A  4 1.0000000
5  b1    x1   A  5 0.6250000
6  b2    x2   A  6 0.7500000
7  b3    x3   A  7 0.8750000
8  b4 total   A  8 1.0000000
9  a1    x1   B  9 0.7500000
10 a2    x2   B 10 0.8333333
11 a3    x3   B 11 0.9166667
12 a4 total   B 12 1.0000000
13 b1    x1   B 13 0.8125000
14 b2    x2   B 14 0.8750000
15 b3    x3   B 15 0.9375000
16 b4 total   B 16 1.0000000
Patricio Moracho
  • 717
  • 11
  • 15