1

I am creating plots with ggplot2 and for some reason the function is acting really weirdly.

I have a dataframe df, and I want to visualize several columns.

Any dataframe seems to work fine. I have generated this dummy dataframe.

df <- data.frame(Date = seq.Date(as.Date.character("2019-01-01"), by = 1, length.out = 10), 
                 Value = rnorm(10), 
                 Foo = rnorm(10))

So what I do is

  library(ggplot2)
  gg <- ggplot(df, aes(x = Date)) + geom_line(aes(y = Value, color = "Value", linetype = "Value"))
  gg <- gg + geom_line(aes(y = Foo, color = "SomeWord", linetype = "SomeWord"))
  gg <- gg + scale_color_manual(name="Legend", 
      breaks=c("Value", "SomeWord"), values=c("steelblue", "firebrick")) + 
    scale_linetype_manual(name="Legend", 
      breaks=c("Value", "SomeWord"), values=c("solid", "twodash"))
  gg

Normally, ggplot2 would now correctly assign color steelblue and linetype solid to the column Value, while assigning firebrick and twodash to the Foo column, which I assigned the name SomeWord to. However, depending on what I choose for the name, ggplot assigns the colours and linetypes in the wrong way. For example, using "Test1" as a name seems to work just fine, but "Einschritt" causes ggplot2 to throw my entire ruleset out of the window.

I have tried googling this, but have not found any clue on why ggplot seems not to accept some names, while others are just fine. I would also like to use Hyphens in the color and linetype reference name which I assume might be a problem.

Edit: As an example, I have just tried replicating this on my dummy data frame. Using the code posted above, when I use the following names, the linetype and colour are matched falsely:

  • "Value" for column "Value", anything for column Foo.
  • "Ein-Schritt-Prognose" for column "Value", anything for column Foo.
  • "SomeWord" for column "Value", anything for column Foo.

However, when I switch to something like:

  • "ABD" for column Value, anything for column Foo.

then they are matched correctly.

HongboZhu
  • 4,442
  • 3
  • 27
  • 33
  • 1
    some fun reading :-) [how to make a great r reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – HongboZhu Jul 04 '19 at 07:35
  • Hyphens are considered to be minus signs in R. They are replaced automatically by dots if you have them in column names in your input files. – HongboZhu Jul 04 '19 at 07:38
  • 2
    @J.Grünenwald: I think you still need to revise your code so that it is "runnable". i.e. move on to item 2 in the [fun reading post](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) :-) – HongboZhu Jul 04 '19 at 08:02
  • @HongboZhu what do you mean? I am sorry, but since there are several highly upvoted posts in the thread I am not sure which "item 2" you are talking about. – J. Grünenwald Jul 04 '19 at 08:05
  • @J.Grünenwald: your R code snippet does not run ("undefined symbols"). To help others help you, you'd better provide runnable code snippets. By "Item 2", I mean the 2nd item in the accepted answer to the question: [How to make a great R reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – HongboZhu Jul 04 '19 at 08:11
  • @HongboZhu I have updated my code snippet. Sorry for the inconvenience. – J. Grünenwald Jul 04 '19 at 08:14
  • @J.Grünenwald: yes, you did updated your code snippet, but not _completely_ :-). Now: how does the problem-making code snippet look like? – HongboZhu Jul 04 '19 at 08:24
  • 2
    Not sure if it helps but read in `?scale_*_manuel` about the `values` argument: _a set of aesthetic values to map data values to. If this is a named vector, then the values will be matched based on the names. **If unnamed**, values will be matched in order (usually alphabetical) with the limits of the scale. Any data values that don't match will be given na.value._ – markus Jul 04 '19 at 08:24
  • 1
    @markus Thank you! This is exactly what I was looking for. It appears I have not fully understood how the scale__manual functions work. I thought the attributes given in *values* were matched according to the values given in the *breaks* argument. However, that appears to be utterly false. I will have to look that up. Thank you again! – J. Grünenwald Jul 04 '19 at 08:31
  • @J.Grünenwald: I just try to make sense: if you use "SomeWord" for "Value" or "Value" for "Value", they are sorted after "Foo" alphabetically, so the order is reversed when compared to their orders in `df` and they messed up. "ABD" will be sorted before "Foo", so it is fine. "Ein-Schritt-Prognose" would have been fine if not for the hyphens. Is this the "pattern" you missed? – HongboZhu Jul 04 '19 at 08:42
  • Yes. When reading tutorials on ggplot2, I thought that the breaks argument was meant to be given names which would be matched to the values given in the *values* argument. Obviously that is wrong. I have now fixed my issue and accepted an answer. I am still left wondering what exactly "breaks" means, since the documentation doesn't really explain that as far as I can tell, but at least my issue is solved. Thank you very much to everyone who helped me. – J. Grünenwald Jul 04 '19 at 08:46
  • breaks are positions where you put your legend labels – HongboZhu Jul 04 '19 at 09:10

3 Answers3

1

"Ein-Schritt-Prognose" dose not work as colnames. Please see my comment below your question. In ggplot2, colnames are not quoted, thus using hyphen in colname will make it look like Ein - Schritt - Prognose (an expression). Use hyphen with caution in R.

HongboZhu
  • 4,442
  • 3
  • 27
  • 33
  • Thank you. This makes sense, however, even when I'm not using hyphens in my names, the issue still arises depending on what names I choose. I find it especially puzzling since I see no pattern in what works and what doesn't, as can be seen in my edit. – J. Grünenwald Jul 04 '19 at 08:08
1

First of all, just to make it clear: hyphens have nothing to do with this.

The issue is that the breaks argument is not used to define the data-to-aesthetic mapping in scales, at all. breaks just controls which data values appear on the legend, and in which order. Nothing else.

Here’s a demonstration (simplified to only colours; the concepts are the same):

library(ggplot2)

set.seed(42)

mydf <- data.frame(
  Date = seq.Date(as.Date.character("2019-01-01"), by = 1, length.out = 10),
  Value = rnorm(10), Foo = rnorm(10)
)

p <- ggplot(mydf, aes(x = Date)) +
  geom_line(aes(y = Value, color = "Value")) +
  geom_line(aes(y = Foo, color = "SomeWord"))

p1 <- p + scale_color_manual(
  breaks = c("Value", "SomeWord"),
  values = c("steelblue", "firebrick")
)

p2 <- p + scale_color_manual(
  breaks = c("SomeWord", "Value"),
  values = c("steelblue", "firebrick")
)

egg::ggarrange(p1, p2)

As you can see, the aesthetic mapping remains the same: "Value" is still red, and "SomeWord" is still blue; only the order of the legend has changed. If you want to control the data-to-aesthetic mapping, you have two options:

First, as metioned by @markus in the comments, you can set names for the vector given as the values argument:

p + scale_color_manual(
  values = c("Value" = "steelblue", "SomeWord" = "firebrick")
)

Alternatively (although not recommended), you can rely on aesthetics being mapped in the order of the limits:

p + scale_color_manual(
  limits = c("Value", "SomeWord"),
  values = c("steelblue", "firebrick")
)

(Note that here the legend order changed, too: this is because if not given, breaks is set to limits.)

By default, the limits are sorted in alphabetical order, which is the cause of the behaviour that you saw: V comes after S, which is why (if you don’t set the limits) "Value" gets matched with the second colour, and "SomeWord" with the first.

And as to how limits differs from breaks: limits controls what data values are mapped. If we have a data value that’s not inlucded in limits, the mapped aesthetic is set to NA:

p + scale_color_manual(
  limits = c("Value"),
  values = c("steelblue", "firebrick")
)
#> Warning: Removed 10 rows containing missing values (geom_path).

Whereas if you leave a value out from breaks, all the data are still mapped, but the omitted value is not shown on the legend:

p + scale_color_manual(
  breaks = c("Value"),
  values = c("steelblue", "firebrick")
)

Created on 2019-07-04 by the reprex package (v0.3.0)

Mikko Marttila
  • 10,972
  • 18
  • 31
0

As @HongboZhu correctly said, the problems are the hyphens. Now, your real problem is that you want to use hyphens in the legend. There are plenty of ways to change the legend labels. One way is within your scale_x_manual function.

Note I have slightly shortened your code and also changed the name of your data frame to mydf. df is a baseR function and is not recommended (although very frequently used) as an example name on SO.

mydf <- data.frame(Date = seq.Date(as.Date.character("2019-01-01"), by = 1, length.out = 10),Value = rnorm(10), Foo = rnorm(10))

library(ggplot2)
ggplot(mydf, aes(x = Date)) + geom_line(aes(y = Value, color = "Value", linetype = "Value")) +
  geom_line(aes(y = Foo, color = "SomeWord", linetype = "SomeWord")) +
  scale_color_manual(breaks=c("Value", "SomeWord"), values=c("steelblue", "firebrick"), label = c('value','Ein-SChritt-Prognose')) + 
  scale_linetype_manual(name="Legend", breaks=c("Value", "SomeWord"), values=c("solid", "twodash"))

Created on 2019-07-04 by the reprex package (v0.2.1)

tjebo
  • 21,977
  • 7
  • 58
  • 94
  • Thank you kindly. I did not know you could also manually write the legend with the label argument in scale functions. That only solves one minor issue however, since there is still the question of why, for example, using "Value" or "SomeWord" as reference names for the first column causes ggplot to flip the colour and linetype assignments around. – J. Grünenwald Jul 04 '19 at 08:26
  • Maybe a very question of myself in the past might help you: https://stackoverflow.com/questions/53703989/change-order-of-legends-for-multiple-aesthetics – tjebo Jul 04 '19 at 08:31