R: assign incremental numbers to rows containing a same label

Question

Given a dataframe df as follows:

chrom   position    strand  value   label
 chr1      17432         -      0   romeo
 chr1      17433         -      0   romeo
 chr1      17434         -      0   romeo
 chr1      17435         -      0   romeo
 chr1      17409         -      1  juliet
 chr1      17410         -      1  juliet
 chr1      17411         -      1  juliet

For each group of labels, I would like to number the lines that share the same label starting from 1 and put those numbers in a new column. (I don't just want to count them, the goal is to number them). The output should look something like this:

chrom   position    strand  value   label  number
 chr1      17432         -      0   romeo       1
 chr1      17433         -      0   romeo       2
 chr1      17434         -      0   romeo       3
 chr1      17435         -      0   romeo       4
 chr1      17409         -      1  juliet       1
 chr1      17410         -      1  juliet       2
 chr1      17411         -      1  juliet       3

Is there a function or package that does the job?

Please don't use the dataframes tag. For R, you want data.frame. — Frank, Mar 14 '16 at 20:35

Vincent · Accepted Answer · 2014-02-09T23:50:00.190

dat <- read.table(header = TRUE, text = "chrom   position    strand  value   label
chr1       17432    -           0   romeo
chr1       17433    -           0   romeo
chr1       17434    -           0   romeo
chr1       17435    -           0   romeo
chr1       17409    -           1   juliet
chr1       17410    -           1   juliet
chr1       17411    -           1   juliet")

#install.packages('dplyr')
library(dplyr)
dat %.%
  group_by(label) %.%
  mutate(number = 1:n())

Source: local data frame [7 x 6]
Groups: label

  chrom position strand value  label number
1  chr1    17432      -     0  romeo      1
2  chr1    17433      -     0  romeo      2
3  chr1    17434      -     0  romeo      3
4  chr1    17435      -     0  romeo      4
5  chr1    17409      -     1 juliet      1
6  chr1    17410      -     1 juliet      2
7  chr1    17411      -     1 juliet      3

I am sure there are many other possibilities in R. Data.Table is one (see example below). Not sure why I needed to add print() to show the result however.

require(data.table)
dt <- data.table(dat)
print(dt[, number := 1:.N, by = label])

   chrom position strand value  label number
1:  chr1    17432      -     0  romeo      1
2:  chr1    17433      -     0  romeo      2
3:  chr1    17434      -     0  romeo      3
4:  chr1    17435      -     0  romeo      4
5:  chr1    17409      -     1 juliet      1
6:  chr1    17410      -     1 juliet      2
7:  chr1    17411      -     1 juliet      3

Wow, it's working like magic! But I can't figure out how to save the output to a new `dataframe`... — biohazard, Feb 09 '14 at 19:54
Nevermind I figured it out! Just needed to put the whole thing in between `(` `)` — biohazard, Feb 09 '14 at 19:59

score 1 · Answer 2 · answered Jul 10 '17 at 11:41

1

Executing Vincents solution resulted in an error for me:

could not find function "%.%"

However changing %.% for %>% did the trick for me:

library(dplyr)
dat %>%
    group_by(label) %>%
    mutate(number = 1:n())

Note, I'm using dplyr version 0.7.1

answered Jul 10 '17 at 11:41

Sebastian Müller

573
4
15

R: assign incremental numbers to rows containing a same label

2 Answers2

Linked

Related