Hacker News new | past | comments | ask | show | jobs | submit login
Open Sourcing our new Duckling – Probabilistic Parser Rewritten in Haskell (wit.ai)
110 points by jimarcey on May 1, 2017 | hide | past | favorite | 23 comments



> Duckling is now used at scale internally by Facebook.

I'm thrilled to hear that Facebook has a new toy for extracting structured data from vast quantities of text.


:(


Duckling, our open-sourced probabilistic parser to detect entities like dates and times, numbers, and durations

Are there any benchmark for how well this compares with something like HeidelTime or SUTime[2]?

[1] https://github.com/HeidelTime

[2] https://nlp.stanford.edu/software/sutime.html


(disclaimer: I work on Duckling)

I don't know of any benchmarks. HeidelTime and SUTime are two solid projects... although the rule files [0][1] are a bit scary if you ask me :-).

Quick thoughts:

- SUTime relies on TokensRegex [2] which is similar to how Duckling parses sentences at a high-level

- SUTime seems to only provide English rules

- I don't know of any production use cases of either

[0] https://github.com/stanfordnlp/CoreNLP/tree/master/src/edu/s...

[1] https://github.com/HeidelTime/heideltime/blob/master/resourc...

[2] https://nlp.stanford.edu/pubs/tokensregex-tr-2014.pdf


Yes, the rules are horrible.

Yes, SUTime only has English rules

I don't know of any production use cases of either

Hi... Using SUTime for English, HeidelTime for non-English. Not at FB scale, but running against millions of messages per day.


Hey, main developer here. Happy to answer any questions folks might have!


How much coding have you done in Haskell throughout your life? How comfortable are you with monads? Did you ever read the Baroque Cycle? Also, TYVM for doing this!


This is actually my first Haskell project. I feel pretty comfortable with monads, and I haven't read the Baroque Cycle yet. :-)


How did you get comfortable with Monads? I read this on Stack Overflow: http://stackoverflow.com/questions/44965/what-is-a-monad and it did clear things up a little, just interested in seeing people who have had success with Haskell weigh in.


I'm interested to know why you dropped Clojure. How Clojure wasn't meeting your needs?


Not the op nor on their team but I do work at a company using Haskell for grpc services, internal cli tools, cloud web services, and heavy duty (and high performance) parallel parsing.

I'm a polyglot programmer and appreciate the features of many different languages but I couldn't imagine foregoing the benefits of Haskell's type system, reasoning about code algebraically, and leveraging the rock solid GHC RTS for some of the mission critical things we're doing. Rust would probably be my only other consideration for performance reasons but its ecosystem is more immature than Haskell's and some things are very clunky to express in rust (even with its enlightened type system) that are very clear in Haskell.

Haskell may not be the right choice for every team or project but it certainly has been for me for many years now on many (though not all) projects.


Clojure doesn't have a strong adoption within Facebook.



In the article, do you mean "world class" instead of "word class"?


Yes.


Do you have problems with long build times? How do you mitigate them?


Build times for this library haven't cropped up as a first-order concern. Using GHCI and `stack test` for the dev workflow has been fast enough (though could always be better).


What are the keys to writing "production" haskell?


I'm assuming you're asking what's important in writing "production" Haskell rather than "toy example" Haskell.

Ixiaus's point about mechanics more than theory certainly rings true, though we did think a lot about whether to use GADTs for the Dimension type. Overall I see this as similar to writing "production" code in other languages, going through a couple feedback loops using real use-cases. Profiling to find the bottlenecks, observing how APIs are used in practice compared to intent, and reaching the service to a stable equilibrium.


Not the op nor on their project but I do work at a company using Haskell to solve interesting theoretical problems and for mundane software like cli tools and grpc services and web services.

Most people have an assumption that production Haskell involves a lot of theory when that's actually more the exception I think. My day to day use of Haskell (with the exception of a few projects) is as a strongly typed imperative language. In order to use it in production in that capacity you should have an intermediate level understanding of the type system so that you can wire library apis up and use them.

The productivity gains from Haskell's type system are quite large and they compound, particularly when you have multiple people massaging the​ code base.


Can you share any benchmarks between the Clojure and Haskell Duckling? Do you have any between the two before you made algorithmic improvements to the Haskell implementation?

The post says that the main motivation was scaling/performance improvement, yet shares no data about how much of a performance improvement there was.


Does anyone know a probabilistic CSV parser that can map fields to a domain-specific structure?


So something written in Haskell is news now?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: