Today I had to knock up a quick web app to return the newest available file to an http based update client based on various criteria. Now this is a pretty typical scenario which almost any software company has to deal with once they have software deployed in the field that has some type of online update functionality built in so I’m not exactly blazing any trails here and didn’t expect to have any major problems.
However, as any developer knows, there is always something which puts a kink in what should have been a walk in the park. In this case it was the fact that our software versions don’t have a fixed number of digits after each decimal point (Something we have in common with many other projects, including the Linux kernel). This particular kink means that a table full of version numbers will not be returned in the order you expect when you use django‘s ORM order_by() clause (Which relies on the underlying PostgreSQL‘s ORDER BY clause). Given the list ‘1.0.0’, ‘1.0.10’, ‘1.10.0’, ‘1.0.9’ and told to sort in ascending order it will return ‘1.0.0’, ‘1.0.10’, ‘1.0.9’, ‘1.10.0’ instead of the expected ‘1.0.0’, ‘1.0.9’, ‘1.0.10’, ‘1.10.0’.
Python’s sorted() function also has the same problem:
This caused me to do quite a bit of digging around on google which pulled up a whole bunch of different ways to do what is apparently called a “natural sort” as apposed to an ascii based sort on a list. In the end I settled on the sort_nicely() function from the article Sorting for Humans : Natural Sort Order only to have it pointed out by some of the guys on #python that it could end up comparing int objects with string objects. Thanks to a little bit of coaching I finally ended up with the following naturallysorted() function which should be a drop in replacement for the python sorted() function:
As a comparison here is the same list processed by sorted() and naturallysorted():
However, as any developer knows, there is always something which puts a kink in what should have been a walk in the park. In this case it was the fact that our software versions don’t have a fixed number of digits after each decimal point (Something we have in common with many other projects, including the Linux kernel). This particular kink means that a table full of version numbers will not be returned in the order you expect when you use django‘s ORM order_by() clause (Which relies on the underlying PostgreSQL‘s ORDER BY clause). Given the list ‘1.0.0’, ‘1.0.10’, ‘1.10.0’, ‘1.0.9’ and told to sort in ascending order it will return ‘1.0.0’, ‘1.0.10’, ‘1.0.9’, ‘1.10.0’ instead of the expected ‘1.0.0’, ‘1.0.9’, ‘1.0.10’, ‘1.10.0’.
Python’s sorted() function also has the same problem:
>>> a = [‘1.0.0’, ‘1.0.10’, ‘1.10.0’, ‘1.0.9’]
>>> print sorted(a)
[‘1.0.0’, ‘1.0.10’, ‘1.0.9’, ‘1.10.0’]
This caused me to do quite a bit of digging around on google which pulled up a whole bunch of different ways to do what is apparently called a “natural sort” as apposed to an ascii based sort on a list. In the end I settled on the sort_nicely() function from the article Sorting for Humans : Natural Sort Order only to have it pointed out by some of the guys on #python that it could end up comparing int objects with string objects. Thanks to a little bit of coaching I finally ended up with the following naturallysorted() function which should be a drop in replacement for the python sorted() function:
def naturallysorted(L, reverse=False):
“”” Similar functionality to sorted() except it does a natural text sort
which is what humans expect when they see a filename list.
“””
convert = lambda text: (”, int(text)) if text.isdigit() else (text, 0)
alphanum = lambda key: [ convert(c) for c in re.split(‘([0-9]+)’, key) ]
return sorted(L, key=alphanum, reverse=reverse)
As a comparison here is the same list processed by sorted() and naturallysorted():
>>> a = [‘1.0.0’, ‘1.0.10’, ‘1.10.0’, ‘1.0.9’]
>>> print sorted(a)
[‘1.0.0’, ‘1.0.10’, ‘1.0.9’, ‘1.10.0’]
>>> print sorted(a, reverse=True)
[‘1.10.0’, ‘1.0.9’, ‘1.0.10’, ‘1.0.0’]
>>> print naturallysorted(a)
[‘1.0.0’, ‘1.0.9’, ‘1.0.10’, ‘1.10.0’]
>>> print naturallysorted(a, reverse=True)
[‘1.10.0’, ‘1.0.10’, ‘1.0.9’, ‘1.0.0’]