-5

I have an excel file with a list of emails and channels that collected it. How can I know how many emails per channel are duplicated using R and automate it (every time I import a different file just have to run it and get the results ) ?

Thank you!!

  • Possible duplicate of [Find duplicate values in R](https://stackoverflow.com/questions/16905425/find-duplicate-values-in-r) – user3640617 Aug 04 '17 at 19:45
  • Thank you for the answer. Actually, this post helps me count the number of repeated emails but doesn't tell how can I group the count based on each channel.. am I right? :) sorry for the basic knowledge ! – user8419142 Aug 04 '17 at 20:18
  • It would be for the best to go through some R tutorials. There are many ways of doing this. You could split the data by channel and find duplicates, or perhaps calculate the difference between unique and full set for each channel... – Roman Luštrik Aug 04 '17 at 20:36

1 Answers1

0

Assuming the "df" dataframe has the relevant variables under the names "channel" and "email", then:

To get the number of unique channel-email pairs:

dim(unique(df[c("channel", "email")]))[1]

To get the sum of all channel-email observations:

sum(table(df$channel, df$email))

To get the number of duplicates, simply subtract the former from the later:

sum(table(df$channel, df$email)) - dim(unique(df[c("channel", "email")]))[1]
Nicolás Velasquez
  • 5,623
  • 11
  • 22