0

I have a very large dataframe, part of which looks like this:

col1 col2
A 3
A 4
B 5
B 7

I know all rows with the same value in col1 should have the same value in col2, and any deviation is due to measuring uncertainty. I want to assign the cells in col2 the average value for all col2 cells in rows with the same col1 value, resulting in something like this:

col1 col2
A 3.5
A 3.5
B 6
B 6

The real dataset is too large to do this manually for each individual unique col1 value. Does anyone have any idea on how to automate this? Thanks in advance.

Steven96
  • 1
  • 1

1 Answers1

0

You can use group_by() and mutate() from the tidyverse package to achieve this. First you group for col1 and then you use mutate() to write the means into col2:

library(tidyverse)
col1 <- c("A", "A", "B", "B")
col2 <- c(3,4,5,7)

df <- data.frame(col1, col2)

df %>%
  group_by(col1) %>%
  mutate(col2 = mean(col2))
#> # A tibble: 4 × 2
#> # Groups:   col1 [2]
#>   col1   col2
#>   <chr> <dbl>
#> 1 A       3.5
#> 2 A       3.5
#> 3 B       6  
#> 4 B       6

Created on 2022-09-06 with reprex v2.0.2

Noah
  • 440
  • 2
  • 9