I have a data.frame keywordsCategory which contains a set of phrases that I would like to categorize depending of words I want to check with.
For example, one of my "check terms" is test1, with correspond to category cat1. As the first observation of my data.frame is This is a test1, I need to include in a new column category with the corresponding category.
Because one observation can be assigned to more than one category, I though that the best option was to create independent subsets of my data.frame using grepl for lately binding all in a new data.frame
library(data.table)
wordsToCheck <- c("test1", "test2", "This")
categoryToAssign <- c("cat1", "cat2", "cat3")
keywordsCategory <- data.frame(Keyword=c("This is a test1", "This is a test2"))
for (i in 1:length(wordsToCheck)) {
myOriginal <- wordsToCheck[i]
myCategory <- categoryToAssign[i]
dfToCreate <- paste0("withCategory",i)
assign(dfToCreate,
data.table(keywordsCategory[grepl(paste0(".*",myOriginal,".*"),
keywordsCategory$Keyword)==TRUE,]))
# this wont work :(
# dfToCreate[,category:=myCategory]
}
# Create a list with all newly created data.tables
l.df <- lapply(ls(pattern="withCategory[0-9]+"), function(x) get(x))
# Create an aggregated dataframe with all Keywords data.tables
newdf <- do.call("rbind", l.df)
The subset > rbind works, but I am not beign able to assign the corresponging category to my new created data.tables. If I uncomment the line, I get following error:
Error in
:=(category, myCategory) : Check that is.data.table(DT) == TRUE. Otherwise, := and:=(...) are defined for use in j, once only and in particular ways. See help(":=").
However, if I add the column manually once the loop is done, f.i:
withCategory1[,category:=myCategory]
It works correctly and the table output is as expected:
> withCategory1
V1 category
1: This is a test1 cat2
tableOutput <- structure(list(V1 = structure(1L, .Label = c("This is a test1",
"This is a test2"), class = "factor"), category = "cat2"), .Names = c("V1",
"category"), row.names = c(NA, -1L), class = c("data.table",
"data.frame"), .internal.selfref = <pointer: 0x00000000001f0788>)
Which is the best/safest method to add a new column to a data.table when it is created inside a loop using the assign function? The solution doesn't need to use data.tables, as I only use it because my real data have millions of observations and I thought data.table would be faster.