2

I have a large dictionary of which I want to sort the list values based on one list. For a simple dictionary I would do it like this:

d = {'a': [2, 3, 1], 'b': [103, 101, 102]}

d['a'], d['b'] = [list(i) for i in zip(*sorted(zip(d['a'], d['b'])))]

print(d)

Output:

{'a': [1, 2, 3], 'b': [102, 103, 101]}

My actual dictionary has many keys with list values, so unpacking the zip tuple like above becomes unpractical. Is there any way to do the go over the keys and values without specifying them all? Something like:

d.values() = [list(i) for i in zip(*sorted(zip(d.values())))]

Using d.values() results in SyntaxError: can't assign function call, but I'm looking for something like this.

Timo
  • 357
  • 1
  • 10
  • What do you mean by "based on one list"? The input and output data appears to indicate a straightforward sort of the values – DarkKnight Jun 10 '22 at 11:11
  • Yes, the input and output is what I want for this small dictionary, but I have multiple large dictionaries, with many keys. So I am looking for a way to sorted them without having to reassign every sorted list to its original key. So without doing: `d["a"], d["b"], d["c"], ... = [list(i) for i in zip(*sorted(zip(d["a"], d["b"], d["c"], ...)))]` – Timo Jun 10 '22 at 11:16
  • Look at my answer which demonstrates how you can sort the values (lists) in place – DarkKnight Jun 10 '22 at 11:17

3 Answers3

2

As far as I understand your question, you could try simple looping:

for k in d.keys():
    d[k] = [list(i) for i in zip(*sorted(zip(d['a'], d[k])))]

where d['a'] stores the list which others should be compared to. However, using dicts in this way seems slow and messy. Since every entry in your dictionary - presumably - is a list of the same length, a simple fix would be to store the data in a numpy array and call an argsort method to sort by ith column:

a = np.array( --your data here-- )
a[a[:, i].argsort()]

Finally, the most clear approach would be to use a pandas DataFrame, which is designed to store large amounts of data using a dict-like syntax. In this way, you could just sort by contents of a named column 'a':

df = pd.DataFrame( --your data here-- )
df.sort_values(by='a')

For further references, please see the links below: Sorting arrays in NumPy by column https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.sort_values.html

  • Thank you for your answer. The `df.sort_values(by='a')` option, combined with the comment of @Nick helped me past this problem. – Timo Jun 10 '22 at 21:07
  • Glad I could be of help. If you work with data a lot, pandas is a great library to learn. You can iterate through pandas rows and columns without really converting back to list and it has a lot of build-in methods and operations which save a lot of time. – Daniel Tchoń Jun 11 '22 at 17:10
2

If you have many keys (and they all have equal length list values) using pandas sort_values would be an efficient way of sorting:

d = {'a': [2, 3, 1], 'b': [103, 101, 102], 'c' : [4, 5, 6]}
d = pd.DataFrame(d).sort_values(by='a').to_dict('list')

Output:

{'a': [1, 2, 3], 'b': [102, 103, 101], 'c': [6, 4, 5]}

If memory is an issue, you can sort in place, however since that means sort_values returns None, you can no longer chain the operations:

df = pd.DataFrame(d)
df.sort_values(by='a', inplace=True)
d = df.to_dict('list')

The output is the same as above.

Nick
  • 138,499
  • 22
  • 57
  • 95
  • This helped tremendously. In actuality, my dictionaries contain a few keys which have string values, but by making a dictionary with only the list values, sorting the list values and then use `dict.update()` on my original dictionary, it worked. The lists are sorted, and the string values remain untouched. – Timo Jun 10 '22 at 21:03
  • @Timo good to hear. I'm glad I could help – Nick Jun 10 '22 at 23:25
1

For the given input data and the required output then this will suffice:

from operator import itemgetter
d = {'a': [2, 3, 1], 'b': [103, 101, 102]}

def sort_dict(dict_, refkey):
    reflist = sorted([(v, i) for i, v in enumerate(dict_[refkey])], key=itemgetter(0))
    for v in dict_.values():
        v_ = v[:]
        for i, (_, p) in enumerate(reflist):
            v[i] = v_[p]

sort_dict(d, 'a')

print(d)

Output:

{'a': [1, 2, 3], 'b': [102, 103, 101]}
DarkKnight
  • 19,739
  • 3
  • 6
  • 22
  • This does not work for me. You sort the lists individually, and but I want the shuffling of 'b' to be the same as for 'a'. – Timo Jun 10 '22 at 11:21
  • @Timo Now I understand the question. See edit. Note that this will **only** work if all lists are of identical length – DarkKnight Jun 10 '22 at 11:38
  • thank you for your updated answer. Unfortunately, because of certain module dependencies, I am working on python 3.7 instead of 3.8. The `:=` is not useful for me, but this might be useful for others. Thanks anyway! I will update my question with my python version too. – Timo Jun 10 '22 at 20:56
  • @Timo Updated for pre-3.8 – DarkKnight Jun 11 '22 at 06:26