68

I recently discovered pandas "assign" method which I find very elegant. My issue is that the name of the new column is assigned as keyword, so it cannot have spaces or dashes in it.

df = DataFrame({'A': range(1, 11), 'B': np.random.randn(10)})
df.assign(ln_A=lambda x: np.log(x.A))
        A         B      ln_A
0   1  0.426905  0.000000
1   2 -0.780949  0.693147
2   3 -0.418711  1.098612
3   4 -0.269708  1.386294
4   5 -0.274002  1.609438
5   6 -0.500792  1.791759
6   7  1.649697  1.945910
7   8 -1.495604  2.079442
8   9  0.549296  2.197225
9  10 -0.758542  2.302585

but what if I want to name the new column "ln(A)" for example? E.g.

df.assign(ln(A) = lambda x: np.log(x.A))
df.assign("ln(A)" = lambda x: np.log(x.A))


File "<ipython-input-7-de0da86dce68>", line 1
df.assign(ln(A) = lambda x: np.log(x.A))
SyntaxError: keyword can't be an expression

I know I could rename the column right after the .assign call, but I want to understand more about this method and its syntax.

Tomerikoo
  • 18,379
  • 16
  • 47
  • 61
FLab
  • 7,136
  • 5
  • 36
  • 69
  • well the parentheses immediately treat this as some kind of method call which is an illegal name for a var: https://docs.python.org/3.2/reference/lexical_analysis.html#identifiers – EdChum Sep 29 '16 at 10:27
  • From the example above, I can still do df['log(A)'] = df.sum(axis = 1), but I understand why I get the error above (it was somewhat expected) – FLab Sep 29 '16 at 10:34
  • but `df['log(A)'] ` is a `str` for which the variable name rules don't apply – EdChum Sep 29 '16 at 10:36

2 Answers2

131

You can pass the keyword arguments to assign as a dictionary, like so:

kwargs = {"ln(A)" : lambda x: np.log(x.A)}
df.assign(**kwargs)

    A         B     ln(A)
0   1  0.500033  0.000000
1   2 -0.392229  0.693147
2   3  0.385512  1.098612
3   4 -0.029816  1.386294
4   5 -2.386748  1.609438
5   6 -1.828487  1.791759
6   7  0.096117  1.945910
7   8 -2.867469  2.079442
8   9 -0.731787  2.197225
9  10 -0.686110  2.302585
Piotr
  • 2,029
  • 1
  • 13
  • 8
  • 5
    You can also do it in a single line with `df.assign(**{"ln(A)" : lambda x: np.log(x.A)})` – EFraim Sep 27 '22 at 10:19
8

assign expects a bunch of key word arguments. It will, in turn, assign columns with the names of the key words. That's handy, but you can't pass an expression as the key word. This is spelled out by @EdChum in the comments with this link

use insert instead for inplace transformation

df.insert(2, 'ln(A)', np.log(df.A))
df

enter image description here


use concat if you don't want inplace

pd.concat([df, np.log(df.A).rename('log(A)')], axis=1)

enter image description here

piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • Thanks for your answer. There is a difference in behavior as insert only acts inplace – FLab Sep 29 '16 at 10:40