0

So I have a really large dataset that has some missing/bad data. I would like to code the missing data using an IF else statement. Instead of assigning just one value for all of the missing/bad ones, I want to assign base on a fraction.

So for instance for df below:

Assign 50% of the df$col2==B to BLUE and the other 50% to RED

col1  col2
1     a
2     a
3     b
4     b

I know you can do:

if else( df$col2==b, "BLUE", df$col1)  

but I want:

 col1  col2
1     a
2     a
3     BLUE
4     RED

I'm looking to do the partitioning base of the condition.

Jaap
  • 81,064
  • 34
  • 182
  • 193
Alice Work
  • 185
  • 1
  • 2
  • 10
  • Adding a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) may make it easier for the StackOverflow community to help find an answer. – bouncyball May 31 '17 at 20:39

1 Answers1

0

You can do this by generating a vector of "Red" and "Blue" to select as the replacement when needed.

## Generate some random data with missing values
set.seed(2017)
a = sample(c("Red", "Blue"), 20, replace=TRUE)
a = ifelse(runif(20, 0, 1) < 0.12, NA, a)

## Now replace missing
a = ifelse(is.na(a), 
          sample(c("Red", "Blue"), length(a), replace=TRUE, prob=c(0.5,0.5)), a)
G5W
  • 36,531
  • 10
  • 47
  • 80