2

While looking for a way to split a column into multiple columns within a loop, I stumbled upon a list of useful Pandas snippets containing this expression:

 # Split delimited values in a DataFrame column into two new columns
df['new_col1'], df['new_col2'] = zip(*df['original_col'].apply(lambda x: x.split(': ', 1)))

which works perfectly, but I am not able to understand how it operates, in particular with respect to the * sign. Until now, I have seen asterisks only in functions definitions and I have not been able to find any documentation for this case.

Could anyone explain how it works?

PiZed
  • 67
  • 1
  • 10

1 Answers1

2

zip() in conjunction with the * operator can be used to unzip a list:

x = [1, 2, 3]
y = [4, 5, 6]
zipped = zip(x, y)
print zipped

Output:

[(1, 4), (2, 5), (3, 6)]

Explanation:

It grabbed values from lists: x and y (in columns) and saved it in tuples.


And (here's the interesting part for you):

x2, y2 = zip(*zipped)
print x2
print y2

Output:

(1, 2, 3)
(4, 5, 6)

Explanation:

  1. zip unpacked the content of zipped (took the contents out of the list)
  2. Grabbed values from every tuple, in columns, and saved it in tuples.

So if we put those tuples in columns (before unpacked) they will look like this:

[
    (1, 4)
    (2, 5)
    (3, 6)
]

Once unpacked they'll look like this:

(1, 4)
(2, 5)
(3, 6)

And if you see, the first column has 1, 2 and 3. And the second column has 4,5,6

So that's what zip does in conjunction with the * operator.

Documentation: https://docs.python.org/2/library/functions.html#zip

Andrés Pérez-Albela H.
  • 4,003
  • 1
  • 18
  • 29