In [63]: timeit dateutil.parser.parse('2013-05-11T21:23:58.970460+07:00')
10000 loops, best of 3: 89.5 µs per loop
In [64]: timeit arrow.get('2013-05-11T21:23:58.970460+07:00')
10000 loops, best of 3: 62.1 µs per loop
In [65]: timeit numpy.datetime64('2013-05-11T21:23:58.970460+07:00')
1000000 loops, best of 3: 714 ns per loop
In [66]: timeit iso8601.parse_date('2013-05-11T21:23:58.970460+07:00')
10000 loops, best of 3: 23.9 µs per loop
> Other parts that are always hot include split() and string concatenation. Java compilers can substitute StringBuffers when they see naive string concatenation, but in Python there's no easy way to build a string in a complex loop and you end up putting string fragments into a list and then finally join()ing them. Madness!
The Python solution you describe is the same as in Java. If you have `String a = b + c + d;` then the compiler may optimize this using a StringBuffer as you say[1]. In Python it's also pretty cheap to do `a = b + c + d` to concatenate strings (or `''.join([b, c, d])`; but you should run a little microbenchmark to see which works best). But if it's in a "complex loop" as you opine then Java will certainly not do this. So you have to build a buffer using StringBuilder and then use toString() which is basically the same exact process except it has the name `builder.toString` instead of `''.join(builder)`
Unless of course you have some interesting insights into the jvm internals about string concatenation optimizations.
Of course you have a different machine but the OP was getting 2.5 us per parse in .NET versus your 89.5 us in Python. I wouldn't have expected such a difference. No wonder it's hot path
Well that's dateutil (installed from pip) and not datetime (std). As part of log ingestion I would, of course, convert to UTC and drop the timezone distinctions since it does slow down python a lot when it has to worry about timezones. Working within the same units and no DST issues is much nicer/quicker.
Anyway, if you're installing packages from pip, may as well just install iso8601 and get the best performance - possibly beating .Net (who knows? as you said, I have a different machine than OP).
dateutil spends most of its time inferring the format, it's not really designed as a performance component, it's designed as a "if it looks like a date we'll give you a datetime" style component.
In [63]: timeit dateutil.parser.parse('2013-05-11T21:23:58.970460+07:00') 10000 loops, best of 3: 89.5 µs per loop
In [64]: timeit arrow.get('2013-05-11T21:23:58.970460+07:00') 10000 loops, best of 3: 62.1 µs per loop
In [65]: timeit numpy.datetime64('2013-05-11T21:23:58.970460+07:00') 1000000 loops, best of 3: 714 ns per loop
In [66]: timeit iso8601.parse_date('2013-05-11T21:23:58.970460+07:00') 10000 loops, best of 3: 23.9 µs per loop
> Other parts that are always hot include split() and string concatenation. Java compilers can substitute StringBuffers when they see naive string concatenation, but in Python there's no easy way to build a string in a complex loop and you end up putting string fragments into a list and then finally join()ing them. Madness!
The Python solution you describe is the same as in Java. If you have `String a = b + c + d;` then the compiler may optimize this using a StringBuffer as you say[1]. In Python it's also pretty cheap to do `a = b + c + d` to concatenate strings (or `''.join([b, c, d])`; but you should run a little microbenchmark to see which works best). But if it's in a "complex loop" as you opine then Java will certainly not do this. So you have to build a buffer using StringBuilder and then use toString() which is basically the same exact process except it has the name `builder.toString` instead of `''.join(builder)`
Unless of course you have some interesting insights into the jvm internals about string concatenation optimizations.
[1]http://docs.oracle.com/javase/specs/jls/se8/html/jls-15.html...