Hacker News new | past | comments | ask | show | jobs | submit login

Here are some microbenchmarks:

In [63]: timeit dateutil.parser.parse('2013-05-11T21:23:58.970460+07:00') 10000 loops, best of 3: 89.5 µs per loop

In [64]: timeit arrow.get('2013-05-11T21:23:58.970460+07:00') 10000 loops, best of 3: 62.1 µs per loop

In [65]: timeit numpy.datetime64('2013-05-11T21:23:58.970460+07:00') 1000000 loops, best of 3: 714 ns per loop

In [66]: timeit iso8601.parse_date('2013-05-11T21:23:58.970460+07:00') 10000 loops, best of 3: 23.9 µs per loop

> Other parts that are always hot include split() and string concatenation. Java compilers can substitute StringBuffers when they see naive string concatenation, but in Python there's no easy way to build a string in a complex loop and you end up putting string fragments into a list and then finally join()ing them. Madness!

The Python solution you describe is the same as in Java. If you have `String a = b + c + d;` then the compiler may optimize this using a StringBuffer as you say[1]. In Python it's also pretty cheap to do `a = b + c + d` to concatenate strings (or `''.join([b, c, d])`; but you should run a little microbenchmark to see which works best). But if it's in a "complex loop" as you opine then Java will certainly not do this. So you have to build a buffer using StringBuilder and then use toString() which is basically the same exact process except it has the name `builder.toString` instead of `''.join(builder)`

Unless of course you have some interesting insights into the jvm internals about string concatenation optimizations.

[1]http://docs.oracle.com/javase/specs/jls/se8/html/jls-15.html...




Of course you have a different machine but the OP was getting 2.5 us per parse in .NET versus your 89.5 us in Python. I wouldn't have expected such a difference. No wonder it's hot path


Well that's dateutil (installed from pip) and not datetime (std). As part of log ingestion I would, of course, convert to UTC and drop the timezone distinctions since it does slow down python a lot when it has to worry about timezones. Working within the same units and no DST issues is much nicer/quicker.

Anyway, if you're installing packages from pip, may as well just install iso8601 and get the best performance - possibly beating .Net (who knows? as you said, I have a different machine than OP).


The numpy version seems to be about 30 times faster than the iso8601 version - note the result there is in nanosecs, not microsecs like the others.


Yeah but OP is using pypy and I don't know if numpy works on pypy fully. I think I read it does but I haven't tried it.


dateutil spends most of its time inferring the format, it's not really designed as a performance component, it's designed as a "if it looks like a date we'll give you a datetime" style component.


Is there a particular reason that all of the loops are 10k except numpy which is 1M?


The TimeIt macro runs snippets for a variable number of iterations based on how long the snippet takes.


Because it can do so many more iterations in a similar amount of time, it just does them.


timeit does a run to decide how many loops to do, since it was much faster it ran it more times.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: