2

I'm an utter beginner in R - fumbling my way through it for degree :)

i need to summarize a very large data set by site, as there are currently multiple rows per site and around 70 columns of variables - both numeric and categorical. i'm looking at seedling regeneration at each site.

I have 45 study sites, and trying to summarize all my variables per site. currently - each of the study sites has a number of plant species ranging from 5-30+ => so i can have up to 30 rows for each site, as each new species per site has its own row with #trees, #saplings#, seedlings, other variables as columns.

i've tried this code:

i <- sapply(data.df, is.factor)  ### convert "factor" variables to "character" for dply analysis
data.df[i] <- lapply(data.df[i], as.character)

select(data.df,site,total_seedlings_m2,age,age_category,landuse_history, exotic_landcover_types,native_landcover_types,prcnt_light_transmittance,avg_canopy_height,prcnt_total_herb_cover,annual_rainfall_mm,annual_sunshine_hours,annual_temp_mean,annual_ground_frost_days,annual_rel_humidity,daily_air_rh_range,daily_air_temp_range,daily_soil_temp_range,total_trees_m2,total_basal_area_m2)
group_by_(site)

summarise_all(data.df)  

i want to summarise all columns (although i need to do a mixture of Sum/Mean for different variables)

I'm just trialling this method. when i want to group data by site - which should give me 45 data rows, i get an error:

Error in UseMethod("group_by_") : no applicable method for 'group_by_' applied to an object of class "character"

it says i'm using "group_by_" when im actually using "group_by"

is there an easy fix? and is there a way to be able to summarise all columns and either add or average columns depending on variable (I would "sum" seedlings counts and would get Mean of micro-climate data)

first time asking for help online so hopefully this makes a little bit of sense :)

neilfws
  • 32,751
  • 5
  • 50
  • 63
  • 4
    "im actually using group_by" - not according to the code in your question. It contains `group_by_`. Please make this question [reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) by including some or all of `data.df`. – neilfws Apr 08 '19 at 03:58
  • try checking if your site columns is a character or factor? maybe you need to turn it into a factor first. Also maybe check out the `ddply` function in the `plyr` package. I find it a lot nicer – morgan121 Apr 08 '19 at 04:06
  • Also, you’re not passing and data frame to group_by. You either need to pass the data frame in as the first argument, or use a pipe `%>%`. The specific error is because you’re trying to group some character vector called “site”, not data.df as expected – divibisan Apr 08 '19 at 04:37
  • Also, dplyr functions don’t edit the data in place, they return the modified version. You need to assign that somewhere to accomplish anything, ie `dat <- select(dat, ...` – divibisan Apr 08 '19 at 04:41

1 Answers1

-1

Try this it should work

i <- sapply(data.df, is.factor)  
data.df[i] <- lapply(data.df[i], as.character)

library(dplyr)
data.df%>%group_by(site)%>%summarise(count=n())
Rahul Varma
  • 550
  • 5
  • 23