Find duplicate registers in R

Question

I have an excel file with a list of emails and channels that collected it. How can I know how many emails per channel are duplicated using R and automate it (every time I import a different file just have to run it and get the results ) ?

Thank you!!

Possible duplicate of [Find duplicate values in R](https://stackoverflow.com/questions/16905425/find-duplicate-values-in-r) — user3640617, Aug 04 '17 at 19:45
Thank you for the answer. Actually, this post helps me count the number of repeated emails but doesn't tell how can I group the count based on each channel.. am I right? :) sorry for the basic knowledge ! — user8419142, Aug 04 '17 at 20:18
It would be for the best to go through some R tutorials. There are many ways of doing this. You could split the data by channel and find duplicates, or perhaps calculate the difference between unique and full set for each channel... — Roman Luštrik, Aug 04 '17 at 20:36

score 0 · Accepted Answer · answered Aug 04 '17 at 20:57

Assuming the "df" dataframe has the relevant variables under the names "channel" and "email", then:

To get the number of unique channel-email pairs:

dim(unique(df[c("channel", "email")]))[1]

To get the sum of all channel-email observations:

sum(table(df$channel, df$email))

To get the number of duplicates, simply subtract the former from the later:

sum(table(df$channel, df$email)) - dim(unique(df[c("channel", "email")]))[1]

Find duplicate registers in R

1 Answers1