Edit: Here's an attempt to better communicate my question and provide a reproducible example. Would someone please (1) explain what's wrong with my approach, (2) offer any sensible solution?
My data look like this:
DF.PSoft <- structure(list(Last = structure(c(4L, 3L, 2L, 1L, 5L), .Label = c("Carruthers",
"Fester", "Mauger", "Schofield", "Vanhoy"), class = "factor"),
Salary = structure(c(5L, 3L, 1L, 2L, 4L), .Label = c("121991.0",
"142403.0", "47305.0", "47740.0", "49172.0"), class = "factor"),
Dept1 = structure(c(2L, 1L, 2L, 1L, 2L), .Label = c("215086",
"221230"), class = "factor"), Distrib1 = structure(c(2L,
1L, 4L, 1L, 3L), .Label = c("100.0", "50.0", "75.0", "90.0"
), class = "factor"), Dept2 = structure(c(2L, 1L, 3L, 1L,
4L), .Label = c("", "026112", "215086", "221704"), class = "factor"),
Distrib2 = structure(c(3L, 1L, 4L, 1L, 2L), .Label = c("0.0",
"15.0", "40.0", "5.0"), class = "factor"), Dept3 = structure(c(3L,
1L, 3L, 1L, 2L), .Label = c("", "215086", "221704"), class = "factor"),
Distrib3 = structure(c(2L, 1L, 3L, 1L, 2L), .Label = c("0.0",
"10.0", "5.0"), class = "factor")), .Names = c("Last", "Salary",
"Dept1", "Distrib1", "Dept2", "Distrib2", "Dept3", "Distrib3"
), row.names = c(NA, -5L), class = "data.frame")
>DF.PSoft
Last Salary Dept1 Distrib1 Dept2 Distrib2 Dept3 Distrib3
# 1 Schofield 49172.0 221230 50.0 026112 40.0 221704 10.0
# 2 Mauger 47305.0 215086 100.0 0.0 0.0
# 3 Fester 121991.0 221230 90.0 215086 5.0 221704 5.0
# 4 Carruthers 142403.0 215086 100.0 0.0 0.0
# 5 Vanhoy 47740.0 221230 75.0 221704 15.0 215086 10.0
The data describe people working in various departments. Schofield spends 50% of his time in Dept. 221230, 40% in 026112, and 10% in 221704. The real data set has 10 total Dept and Distrib columns; in this example, I'm working with 3 Dept/Distrib columns.
I want to reshape the data into a new frame that shows Last, Salary, the "Dept" column containing 215086 (if any "Dept" column matches), and the corresponding "Distrib" column:
>DF.Desired
Last Salary Dept Distrib
# 1 Mauger 47305.0 215086 100.0
# 2 Fester 121991.0 215086 5.0
# 3 Carruthers 142403.0 215086 100.0
# 4 Vanhoy 47740.0 215086 10.0
How can I do this? I've been frustrated in trying to figure this out. Here's what I have so far.
Show all indices where data == "215086".
test <- which(DF.PSoft=="215086", arr.in=TRUE) >test row col # [1,] 2 3 # [2,] 4 3 # [3,] 3 5 # [4,] 5 7Create an empty DF that will hold data I pull in next steps.
DF.blank <- data.frame(Last=character(dim(test)[1]), Salary=character(dim(test)[1]), Dept=character(dim(test)[1]), Distrib=character(dim(test)[1]), stringsAsFactors=FALSE) >DF.blank Last Salary Dept Distrib # 1 # 2 # 3 # 4Fill the empty DF with the data I want. Subset the entire data set (DF.PSoft) using indices from 'test', which searches for all entries == "215086". Get DF.PSoft row numbers as listed in 'test', get DF.PSoft columns 1, 2, whichever columns contain "215086" (this pulls in "215086" from the appropriate "Dept" column of DF.PSoft), and the column to the right of that positive search result (pulls in appropriate Distrib). If I'm thinking about this correctly, this method will work no matter how many "Dept" or "Distrib" columns I have in my file. I'd like to preserve that capability.
for(i in 1:dim(test)[1]){ DF.blank[i,] <- DF.PSoft[test[i,1], c(1,2, test[i,2], test[i,2]+1)] }Infuriatingly, I get this result:
>DF.blank Last Salary Dept Distrib # 1 3 3 1 1 # 2 1 2 1 1 # 3 2 1 3 4 # 4 5 4 2 2Interestingly, printing the DF.PSoft subset appears to work as expected:
for (i in 1:dim(test)[1]) { print(DF.PSoft[test[i, 1], c(1, 2, test[i, 2], test[i, 2] + 1)]) } # Last Salary Dept1 Distrib1 # 2 Mauger 47305.0 215086 100.0 # Last Salary Dept1 Distrib1 # 4 Carruthers 142403.0 215086 100.0 # Last Salary Dept2 Distrib2 # 3 Fester 121991.0 215086 5.0 # Last Salary Dept3 Distrib3 # 5 Vanhoy 47740.0 215086 10.0
Many thank for your advice, and sorry again that I started out with a muddled question.