0

I have trouble wrapping my head arount the following example :

If I works with simple numbers, e.g.

library(dplyr)
x=5
y <- x %>% +1
y 
x
x==y

I get what I had in mind, that is x is not changed

> y 
[1] 6
> x
[1] 5
> x==y
[1] FALSE

Now it I do that with datatables :

library(data.table)
DT = data.table(id=c("A","B","C"),Value=c(1,2,3))
DT2 <- DT %>% .[,DoubleValue:=2*Value]
DT
DT2
DT==DT2

I get something I see as conceptually different :

> DT
   id Value DoubleValue
1:  A     1           2
2:  B     2           4
3:  C     3           6
> DT2
   id Value DoubleValue
1:  A     1           2
2:  B     2           4
3:  C     3           6
> DT==DT2
       id Value DoubleValue
[1,] TRUE  TRUE        TRUE
[2,] TRUE  TRUE        TRUE
[3,] TRUE  TRUE        TRUE

That is that DT is actually changed

Most examples I have found actually do not use the <- part and are happy to modify their object anyway.

Why is DT changed but not x ?

Anthony Martin
  • 767
  • 1
  • 9
  • 28
  • I don't think this has anything to do with the pipe operator. You would get the same result with `DT2 <- DT[,DoubleValue:=2*Value]` – Allan Cameron Dec 18 '20 at 15:53
  • @AllanCameron Hmm I do tend to pipe everything, so I had not realize it is something even more fundamental I have not understood – Anthony Martin Dec 18 '20 at 15:56
  • 4
    Since you're using `data.table`, all changes to the table are made referentially. That is, even though you assign that operation to `DT2`, the original `DT` still has `DoubleValue` assigned as well. Now you have `DT` and `DT2` pointing to the same in-memory table, both with that new variable. – r2evans Dec 18 '20 at 15:58
  • 4
    The `<-` operator method for `data.table` objects creates a reference, not a copy. Check out `?data.table::copy` for details. `data.table` was not designed with interoperability with the `magrittr` pipe in mind. – bcarlsen Dec 18 '20 at 15:58
  • See also [this Q&A](https://stackoverflow.com/questions/10225098/understanding-exactly-when-a-data-table-is-a-reference-to-vs-a-copy-of-another) – Allan Cameron Dec 18 '20 at 15:59
  • @bcarlsen but it does work rather well with it, imo. Granted, the presumption that the `%>%` is a one-way check-valve of data flow is not correct, but I've (personally) never assumed that `%>%` was enforcing that; the preservation of the original data was due to the assignment or lack of it at the beginning or end of the function, and with any intermediate functions that might have side-effect. – r2evans Dec 18 '20 at 15:59
  • OK and to have a copy the correct way is to use `copy()`... Never too late too know the basics – Anthony Martin Dec 18 '20 at 16:04
  • @AllanCameron Interesting answer thanks – Anthony Martin Dec 18 '20 at 16:04
  • Btw since my question is wrong and the answer to the real question already elsewhere I suppose it is best to delete my question ? – Anthony Martin Dec 18 '20 at 16:10
  • 1
    if you think your question is unlikely to be useful to future readers, delete it. If you think that an answer that explains your confusion will be useful to future readers, you can go ahead and post an answer to your own question. – Ben Bolker Dec 18 '20 at 16:16

1 Answers1

0

It is actually entirely related to data.table features, and not at all to piping.

As it is more in-depth described here

DT2 <- DT
DT2[,DoubleValue:=2*Value]

also changes DT

It is due to a native feature, as data.table is supposed to handle large databases, it avoids hard copy by defaults, so that all <- operations only create references to a common database.

However, it changes when the structure is modified, e.g.

DT2 <- DT[1:4]

To avoid that and actually change it, one way is to use the copy() function

DT2 <- copy(DT)
Anthony Martin
  • 767
  • 1
  • 9
  • 28