0

I'm having a dataset containing of Weekly Brand Sales data and I want to filter out the brands that contain 0 sales. I identify these brands through:

SynergyZerosalesbrands<- DatasetThesisSynergyClean %>% mutate(Sales, ZeroSales = ifelse(Sales == 0, 1, 0)) 

Now, it assigns a 1 to the brands in the weeks with zero sales, but only for the specific weeks it has zero sales (week 1:208 is full range). I would like to delete not only these weeks, but the whole brand. I have to find a command that makes all the other weeks of the brand that has indicated zero sales become a 1 as well. An example to work with!

structure(list(Week = 7:17, Category = c("2", "2", "2", "2", 
"2", "2", "2", "2", "2", "2", "2"), Brand = c("3", "3", "3", 
"3", "3", "3", "3", "3", "3", "3", "3"), Display = c(0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0), Sales = c(0, 0, 0, 0, 13.440948, 40.097397, 
32.01384, 382.169189, 2830.748779, 4524.460938, 1053.590576), 
    Price = c(0, 0, 0, 0, 5.949999, 5.95, 5.950003, 4.87759, 
    3.787015, 3.205987, 4.898724), Distribution = c(0, 0, 0, 
    0, 1.394019, 1.386989, 1.621416, 8.209759, 8.552915, 9.692097, 
    9.445554), Advertising = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0), lnSales = c(11.4945151554497, 11.633214247508, 11.5862944141137, 
    11.5412559646132, 11.4811122484454, 11.4775106999991, 11.6333660772506, 
    11.4859819773102, 11.5232680456161, 11.5572670584292, 11.5303686934256
    ), IntrayearCycles = c(4.15446534315765, 3.62757053512638, 
    2.92387946552647, 2.14946414386239, 1.40455011205262, 0.768856938870769, 
    0.291497141953598, -0.0131078404184544, -0.162984144025091, 
    -0.200882782749248, -0.182877633924882), `Competitor Advertising` = c(10584.87063, 
    224846.3243, 90657.72553, 0, 0, 0, 2396.54212, 0, 0, 0, 40343.49444
    ), `Competitor Display` = c(0.385629, 2.108133, 2.515806, 
    4.918288, 3.81749, 3.035847, 2.463194, 3.242594, 1.850399, 
    1.751096, 1.337943), `Competitor Prices` = c(5.30989, 5.372752, 
    5.3717245, 5.3295525, 5.298393, 5.319466, 5.1958415, 5.2941095, 
    5.296757, 5.294059, 5.273578), ZeroSales = c(1, 1, 1, 1, 
    0, 0, 0, 0, 0, 0, 0)), .Names = c("Week", "Category", "Brand", 
"Display", "Sales", "Price", "Distribution", "Advertising", "lnSales", 
"IntrayearCycles", "Competitor Advertising", "Competitor Display", 
"Competitor Prices", "ZeroSales"), row.names = 1255:1265, class = "data.frame")
PimM
  • 25
  • 6
  • Hello PimM and welcome to StackOverflow (SO). Could you provide a [minimal reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example)? – nghauran Dec 14 '17 at 09:07
  • try `library(dplyr); df %>% filter(!(Brand %in% .[Sales==0,'Brand']))` – Prem Dec 14 '17 at 09:39
  • Thanks, but does not work as intended/wished, since this does not filters the data or produces a new dataframe with the right values.. – PimM Dec 14 '17 at 09:45
  • @PimM just assign the filtered solution to `df_new`. `df_new <- df %>% filter(!(Brand %in% .[Sales==0,'Brand']))` – Prem Dec 14 '17 at 10:04
  • Tried to do this, but then it returns an empty dataframe with only the variable names. Am I doing something wrong? I apply this after I did the command above. df_new <- SynergyZerosalesbrands %>% filter(!(Brand %in% .[Sales==0,'Brand'])) – PimM Dec 14 '17 at 10:11
  • Sample data has only one BrandID `3` and for this ID you have `0` sales in some week. As per your requirement (if I understood it correctly!) you want to filter out this BrandID so you are left with zero rows when you execute the code. Suggest you to run the code on complete set of data. – Prem Dec 14 '17 at 10:23
  • I did so, but it still remains brands that have zero sales OR it empties the full dataframe. I want to have a dataframe that only contains brands that have sales in ALL 208 weeks. If there is one 0 in one of the brand weeks, the brand can be fully deleted. – PimM Dec 14 '17 at 10:36

1 Answers1

0

Note: I have slightly modified your sample data to accommodate all possible cases (use it from below dput output).

In the below sample data (i.e. df) you can observe that Brand ID

  • 3 has zero sales in some weeks,
  • 4 has zero sales in all available weeks and
  • 5 has 'non-zero' sales in all available weeks.

So as per your UC, Brand ID that should be there in the final output will be 5 only.


library(dplyr)
df_new <- df %>% filter(!(Brand %in% unique(.[Sales==0,'Brand'])))
df_new

Output is:

  Week Category Brand Display    Sales    Price Distribution Advertising  lnSales IntrayearCycles
1   16        2     5       0 4524.461 3.205987     9.692097           0 11.55727      -0.2008828
2   17        2     5       0 1053.591 4.898724     9.445554           0 11.53037      -0.1828776
  Competitor Advertising Competitor Display Competitor Prices
1                   0.00           1.751096          5.294059
2               40343.49           1.337943          5.273578


#sample data
> dput(df)
structure(list(Week = 7:17, Category = c("2", "2", "2", "2", 
"2", "2", "2", "2", "2", "2", "2"), Brand = c("3", "3", "3", 
"3", "3", "3", "4", "4", "4", "5", "5"), Display = c(0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0), Sales = c(0, 0, 0, 0, 13.440948, 40.097397, 
0, 0, 0, 4524.460938, 1053.590576), Price = c(0, 0, 0, 0, 5.949999, 
5.95, 5.950003, 4.87759, 3.787015, 3.205987, 4.898724), Distribution = c(0, 
0, 0, 0, 1.394019, 1.386989, 1.621416, 8.209759, 8.552915, 9.692097, 
9.445554), Advertising = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), 
    lnSales = c(11.4945151554497, 11.633214247508, 11.5862944141137, 
    11.5412559646132, 11.4811122484454, 11.4775106999991, 11.6333660772506, 
    11.4859819773102, 11.5232680456161, 11.5572670584292, 11.5303686934256
    ), IntrayearCycles = c(4.15446534315765, 3.62757053512638, 
    2.92387946552647, 2.14946414386239, 1.40455011205262, 0.768856938870769, 
    0.291497141953598, -0.0131078404184544, -0.162984144025091, 
    -0.200882782749248, -0.182877633924882), `Competitor Advertising` = c(10584.87063, 
    224846.3243, 90657.72553, 0, 0, 0, 2396.54212, 0, 0, 0, 40343.49444
    ), `Competitor Display` = c(0.385629, 2.108133, 2.515806, 
    4.918288, 3.81749, 3.035847, 2.463194, 3.242594, 1.850399, 
    1.751096, 1.337943), `Competitor Prices` = c(5.30989, 5.372752, 
    5.3717245, 5.3295525, 5.298393, 5.319466, 5.1958415, 5.2941095, 
    5.296757, 5.294059, 5.273578)), .Names = c("Week", "Category", 
"Brand", "Display", "Sales", "Price", "Distribution", "Advertising", 
"lnSales", "IntrayearCycles", "Competitor Advertising", "Competitor Display", 
"Competitor Prices"), class = "data.frame", row.names = 1255:1265)

Hope this helps!

Prem
  • 11,775
  • 1
  • 19
  • 33
  • This is indeed exactly what I want. It is so weird. If I get my full dataframe and replicate your code, only replacing "df" with my full dataframe "DatasetThesisSynergy", it creates a new_df with 0 variables... What could possibly go wrong? – PimM Dec 14 '17 at 11:28
  • run `table(df$Brand, df$Sales==0)` and see if you get any Brand ID which has `TRUE = 0`. If not then you are rightly getting blank o/p as there are no Brand ID in your data whose sales are non-zero in all available weeks. – Prem Dec 14 '17 at 11:32
  • It gives me FALSE TRUE 2 79184 688 3 75915 3957 4 72229 7643 Maybe there is a misunderstanding. IF a brand has 0 sales in one of the 208 weeks, THEN the whole brand can be completely deleted. – PimM Dec 14 '17 at 11:42
  • `table` output says that you don't have any data which can fulfill your condition. Maybe you need to revisit your condition as well as data and then come back with a proper requirement. (playing around with `table()` might help you know more about your data) – Prem Dec 14 '17 at 11:47
  • I'll give it a shot and will come back to you if I found a solution! Thanks for the help! – PimM Dec 14 '17 at 12:24