What is there in the ETL space with bi-directional sync?
I don't usually run into problems where "transfer data from X to Y" is it. Usually it's "there's data in CRM X and data in Event system Y, merge the two keeping X as the master source"
There's Mulesoft et al but they seem overkill for small deployments, as well as being stupidly expensive [1].
1. I'm sure they're good value if you're an enterprise company. But if you don't need the UI builder and only 2-3 sources kept in sync, they are expensive. And with my reading of the documentation, conflict resolution isn't great either.
Kettle - the open source component of Hitachi Pentaho Data Integration is worth looking at, has some functionality for this (you can join sources and insert joined data back into master) and its pretty easy to extend to meet requirements. Its Apache licensed, with great commercial support if you need it, and can be found here:
We are small shop but it is a key part of our data workflows. We also found spoon the UI workflow builder to be very helpful for building workflows - since its allowed team members who were not strong in Java to build workflows that they need. Last but not least and a key decider in our adoption is the community is super friendly and very helpful. Something we thought would take months got implemented in a weekend thanks to quick feedback. Perhaps worth a look.
As a long time Kettle user (probably close to 10 years) I must warn potential users that the learning curve is steep and that (as any large body of code) it contains code that sometimes can run unpredictably. I got good at diagnosing user induced bugs in PDI transformation via reading the stacktrace but it is not to everyone liking.
To me, a very strong regression is the "new" UI which switched from meaningful icons to a blue & white scheme that makes reading/discovering new transformation a real pain: all is a blur of blue without the past color cues that you learned in the past ("ok, this is the icon for a merge from a source file & a database sent to an ES cluster" became "some stuff is read from blue sources and sent to some blue output")
I recently learned about the capability to run transformation into a spark cluster that replace the original engine by a new spark implementation, bring obvious compute optimization for large enough dataset but I don't have enough experience with it to speak of it positively or negatively.
@karmbahh - good to know. I've used Kettle for about 9 months in production and so far its been pretty solid - but we are not going that far off the beaten path for most things. Its a big app, but at least there is documentation and some great users who have been very helpful and the codebase is by and large very logically laid out.
We do use the Adaptive Execution Layer - but so far not with spark (we use it with our own processing engine) - its working well for us and its great we can switch engines as needed.
re: UI. I like a lot of new look and feel but I can see how it did lose some visual semantics and i can imagine any long term user would find the changes frustrating. I guess with coming to the tool much later, this has been less of an issue for me and we teak the presentation for our own workflows and plugins anyhow.
For us Kettle/Pentaho PDI is a great open source project but it will definitely be interesting to see how things evolve now Hitachi has acquired Pentaho.
Try my EasyMorph (https://easymorph.com). It's pretty simple to use (we aim it at non-technical users) and has a decent free edition without limitations in time or data volume.
I assume you work for a non-profit or educational organisation?
Have you used Mule in anger? Would love to hear how you're using it at the moment. There's not a lot of information about it around HN and my regular communities, so it's nice to see something I work with every day mentioned.
I don't usually run into problems where "transfer data from X to Y" is it. Usually it's "there's data in CRM X and data in Event system Y, merge the two keeping X as the master source"
There's Mulesoft et al but they seem overkill for small deployments, as well as being stupidly expensive [1].
1. I'm sure they're good value if you're an enterprise company. But if you don't need the UI builder and only 2-3 sources kept in sync, they are expensive. And with my reading of the documentation, conflict resolution isn't great either.