0

I have a type(s1) = numpy.ndarray. I want to create a dictionary by using the first column of s1 as key and rest as values to the key. The first column has repeated values. Here is np.array.

s1 = np.array([[1L, 'R', 4],
       [1L, 'D', 3],
       [1L, 'I', 10],
       [1L, 'K', 0.0],
       [2L, 'R', 11],
       [2L, 'D', 13],
       [2L, 'I', 1],
       [2L, 'K', 6],
       [3L, 'R', 12],
       [3L, 'D', 17],
       [3L, 'I', 23],
       [3L, 'K', 10]], dtype=object)

I want to get the following:

{'1':[['R',4],['D',3],['I',10],['K',0]],
  '2':[['R',11],['D',13],['I',1],['K',6]],
  '3':[['R',12],['D',17],['I',23],['K',10]]}

This is what I tried and got:

In [18]: {x[0]:[x[1],x[2]] for x in s1}
Out[18]: {1L: ['K', 0.0], 2L: ['D', 6], 3L: ['K', 10]}

I see the problem that the grouping column has repeated values. But I am unable to do the appending. What is the trick I am missing?

Stat-R
  • 5,040
  • 8
  • 42
  • 68
  • 1
    Aside: are you wedded to using pure numpy? When you're working with heterogenously-typed data, where it makes sense for columns to have names, and you're reimplementing groupby, it's hard to resist the temptation to recommend pandas instead. (And I couldn't.) – DSM Nov 03 '17 at 20:16

2 Answers2

2

You can simply built them with defaultdict :

d=collections.defaultdict(list)
for k,*v in s1 : d[k].append(list(v))

for

defaultdict(list,
            {1: [['R', 4], ['D', 3], ['I', 10], ['K', 0.0]],
             2: [['R', 11], ['D', 13], ['I', 1], ['K', 6]],
             3: [['R', 12], ['D', 17], ['I', 23], ['K', 10]]}) 

EDIT

You can nest dicts in dicts :

d=collections.defaultdict(dict)
for k1,k2,v in s1 : d[k1][k2]=v 

#defaultdict(dict,
#       {1: {'D': 3, 'I': 10, 'K': 0.0, 'R': 4},
#        2: {'D': 13, 'I': 1, 'K': 6, 'R': 11},
#        3: {'D': 17, 'I': 23, 'K': 10, 'R': 12}})

In [67]: d[2]['K']
Out[67]: 6

See here for generalization.

B. M.
  • 18,243
  • 2
  • 35
  • 54
  • How do I make the `['R', 4], ['D', 3], ['I', 10], ['K', 0.0]` part also a dict like `'R':4,'D':3...` – Stat-R Nov 03 '17 at 21:38
1

You might want to use itertools.groupby():

In [15]: {k: [list(x[1:]) for x in g]
   ....:  for k,g in itertools.groupby(s1, key=lambda x: x[0])}
Out[15]: 
{1L: [['R', 4], ['D', 3], ['I', 10], ['K', 0.0]],
 2L: [['R', 11], ['D', 13], ['I', 1], ['K', 6]],
 3L: [['R', 12], ['D', 17], ['I', 23], ['K', 10]]}
Robᵩ
  • 163,533
  • 20
  • 239
  • 308