> Most prominently, we used it during our Olympics coverage to monitor the results of the API we built and let us know if the data ingestion pipeline ever grew stale. To do that, we set up a pipeline
I always liked the Yahoo Pipes concept... but it didn' seem take off... and I personally found it too limited for everything I tried to do with it. Perhaps it's just another case of the old "visual programming language" is harder than it looks.
I hope Huginn does better. I like their copywriting "You always know who has your data. You do."
Agreed. I did a multipart Yahoo Pipes project to find my current apartment. It grabbed info from two sites, tossed out the uninteresting ones, filtered it a bit, then texted me if a new apartment in my price/location range appeared.
Very useful, if a little awkward. The Huginn project sounds like a great alternative!
It's hard to tell -- I tweaked it quite a bit, and rewrote it from scratch 2-3x. I'd say 4-10 hours.
Was it worth it? As a programmer, no. I'm very familiar with scraping (raw) web/RSS feeds for data, then processing it. I was hoping Pipes would have enough intelligence, so that I could subscribe to (cooked) data sources, then split and refine the results.
In practice, Pipes worked, but the data always required further post-processing, which was awkward to do in Pipes. You have to be a dev to understand what your system is doing, but you don't have easy access to all the standard dev things.
I look forward to seeing Pipes take off, or another technology (Huginn? Ifttt?) replace it. It was a lot of fun to wire things up graphically then for example get a text when someone's RSS feed changed.
You have to be a dev to understand what your system is doing, but you don't have
easy access to all the standard dev things.
Interesting, this mismatch may be a good description of the problem of visual languages.
Curious: what do you think is the minimal subset of unix tools to do this? i.e. instead of pretending the problem is simpler than it is, accept the complexity, but minimize it.
I'm thinking of a tool like "jq" (sed for json) for json data sources... but I don't think its raw-text manipulation is up to the task (and of course you need tools to monitor the feeds etc).
Python :-) there are libraries specifically for parsing malformed html. I'm happy using Unix tools for scraping and parsing, but you run into a brick wall rather quickly. Python is more reliable, flexible, and easier to integrate.
They're all quite simple. The most complex one uses the "parse location into lat/long" Pipes feature to automatically find me jobs in the Los Angeles area.
I don't know, the documentation is excellent. That is certainly one of the best READMEs I have ever seen. The organization of the setup sction of the README is superb. The author of the README clearly thought about the needs of new users as well as seasoned veterans. More projects need to adopt this general format:
# Getting Started
## Quick Start
If you are unsure of our project and just want to play around, you can
get things set up quickly by:
1. Clone this repository and…
2. Do something
3. Do the other thing
If you need more detailed instructions, have no fear. We are not going
to look down on you if you are not an expert. We took the time to write
a setup guide for newcomers: [Novice setup guide][novice-setup-guide].
Everybody has to start somewhere.
## Real Start
Follow these instructions if you wish to deploy your own version or
contribute back to the project. There is nothing we hate more than
README’s that ignore all of the practical concerns related to setting up
a long term installation. Follow these steps and it will be easy for you
to keep up with updates to the project and still retain the all tweaks
you made to suit your idiosyncrasies.
## Odds and Ends
### Optional features
Not everybody needs a XYZ plugin or wants to share their every action
with PQR. You can enable these features by…
### Rare Corner cases
In certain rare circumstances you made need to prevent X or implement Y.
Prevent X by…
If you need to implement Y…
This looks really awesome for managing an office. We're currently automating things using Google scripts and other custom glue to do things like order food, get feedback on lunch and mail people weekly digests activities. Sounds like this could be a great solution for this.
Weekly digests :-) We're about 35 people split in 5 teams. On a weekly basis each team gets a "weekly update" mail containing a google document that gets created off a template doc. The weekly update contains some questions that basically ask the team what they did that week. It's a shared google doc so the team can collaborate to fill it in. Those filled in docs get aggregated into a single PDF and gets sent to everyone on a Monday morning. So everyone stays in the loop with the other teams' progress.
Zapier is also good with lots of integrations, but it's a little pricey. Yet if you calculate what your time is worth and include the amount spent on making this work plus customizations, it's probably less. Depends on if Zapier can do what you want.
I'm not sure about the etymology of Hugin, but Huginn is more than likely a reference to Norse mythology (for Anchorman fans, you'd recognize it as "Great Odin's raven!"):
Yep -- I understand where the name comes from, I just personally find it very frustrating when two OSS projects are so closely named. It gets really hard to search for one or the other once both become successful.
(I also have this complaint about a lot of the single-word named OS X applications... Unless they have a LOT of traction then it's hard to find specific info on them.)
Had to debug a Cucumber problem involving recipes in a Chef cookbook. I was building up a TDID toolchain at the time. After wading through six google pages of salad, decided to use different tools.
I am working on a similar project called Taskflow.io that is aimed at more backend business oriented tasks. It can do similar things through an interface flowchart editors where you make the actual flowchart that gets executed. I would still consider it a public beta. I would love your feedback.
Will this be provided As-A-Service, or will it be a downloadable product that can be deployed in-house? This is exactly what I have been looking for for a while, but there's absolutely zero chance we're going to send any of our business information to a remote service.
I've wondered about this quite a bit, since I run computationally intensive analysis on sensitive data, and some of the same thinking would apply in this context.
In brief, I could provide an appliance on something as trivial as a Raspi that updates itself over VPN, and would let you run the services on your own systems. Would that work for you if one of these providers did the same?
Obviously we could do better with a custom system deployed onsite, but the idea is to simplify the process and potentially eliminate cost of getting started; similar to Square sending out card readers.
It depends. We've got pretty strict security requirements as we operate in the medical and government sector. A black box appliance or something that auto-updates outside of normal patching rounds is probably out of the question.
There are other companies doing workflow automation, but their products seem clunky and not aimed at web-services, which companies are increasingly relying on. I want to be the IFTTT of back room office tasks. I want anyone in an organization to be able to create a workflow to automate some mundane process they have at their job. I would love any feedback you have of Taskflow.io so I can get my product to that level. Here is a link for any feedback you might have http://eddie.taskflow.io/start_process/201?return_url=http%3...
Anyone know why this project encourages using a private fork to do contributing development?
> "Make a public fork of Huginn. [...] Make a private, empty GitHub repository called huginn-private. Duplicate your public fork into your new private repository[. ...] Checkout your new private repository. Add your Huginn public fork as a remote to your new private repository[. ...] When you want to contribute patches, do a remote push from your private repository to your public fork of the relevant commits, then make a pull request to this repository."
Ah. It seems unnecessarily complicated for people trying to get started. Perhaps preface it with a note saying something like "if you'd like to keep your commits private, follow this brief guide" so it doesn't seem required?
(For the record I can't wait to try out Huginn; I've been using Yahoo Pipes for years... I've apparently got one pipe from before when they started using only hex characters as pipe IDs.)
It will run, but the default Procfile spins up 4 processes, so Heroku might be expensive. If someone wants to figure out how to get everything to run easily in one process, that would make free Heroku hosting possible. I run it on a small VPS.
Exciting stuff, it would be amazing to build an AI layer on top of this that mines your browsing habits (depending on your paranoia settings) and automatically generates agents based on your interests.
Storm is a framework for coordinating computation. It's not really designed to "perform automated tasks for you online" - although of course you could make it do that.
If you want to run a machine learning algorithm on 100 machines then Storm is what you want. Want a service to check the weather for your location? Huginn looks good.
Storm doesn't naturally support dynamic topologies and is rather resource hungry, which needs a bit advanced planning. I was looking @ Storm for my own pipelining product (bip.io) very early on and shied away as too high an opportunity cost for self-hosting users/devs to be bothered with. On a Rasberry Pi for example, forget about it. Without being able to create dynamic graphs it otherwise just ends up being a simple message bus (anti-pattern).
You are right that Storm is certainly more robust for large amounts of data. But Storm just provides underlying infrastructure. Huginn builds on top of something like that to add different agents for twitter, weather etc. So afaict, Huginn is an app built on top of something like Storm.
I extended a Find My iDevice API lib about a year ago with the intention of creating a Huginn agent for exactly this, but life got in the way. I also wanted to add a geofencing agent, too.
With IPv6 there's no reason you couldn't do this, but what is the use case? I'm not seeing what you could do with individually addressable agents that you couldn't do otherwise.
Can you elaborate how one can do this with IPv6? What hosts have this? How many IP addresses can you get?
Basically for web scraping. If you had multiple threads and each of them had separate IP addresses, you'd have a better chance than doing it with one IP address.
Just about any host with IPv6 support will assign you a /64 block which is way more addresses than you'd need for this. Your case would then depend on the site you're scraping supporting IPv6, though.
Twitter is also just another rules engine, with pretty simple rules about which tweets you receive. And yet it's also so much more. It's a platform, and it's a social network. And it's something that many people love to use.
Who cares if it's just another rules engine under the hood?
Twitter isn't really a rules engine as it doesn't have the ability to create workflows or manage rule priority afaik but let's ignore that for a second.
Rules engines are typically a terrible idea and I say this as someone who has worked at two large corporations, one a bank, where rules engines were heavily used so they could avoid having the larger development staff they really needed. Rules engines fail miserably every single time and eventually have to be replaced.
The problem is that as time goes on people who don't know any better end up writing larger and more complex rules and workflows without an understanding of the side effects those rules generate. The end result inevitably becomes a huge mess that is extremely fragile and nearly impossible to follow.
Yes, but for simple, personal rules that won't have any major repercussions if they fail (e.g. email me when I get 10 likes on my last Instagram photo), I can see how they'd be useful.
...not that my hypothetical rule is useful, but you understand what I'm getting at.
Absolutely. For simple, isolated, inconsequential tasks this works fine. The problem is rules engines always start very innocent and simple, then users request the ability to have rules call each other, then they want to store results, etc. In an ideal world this is a good thing but I have yet to hear of any place where rules engines, used at any significant scale, aren't a complete disaster.
Is it typically a disaster because business users are given access to create their own rules and they don't know what they're doing, or because the complexity of the system grows to the point where that complexity is better managed by other tools (version control systems, QA environments, rigorous testing, etc.)?
In my experience, both. It starts with business users creating rules without having a clue what they are doing and it ends with the system becoming so complex that it would have been better off being written by engineers using tools more appropriate for the job.
Typically what happens is that business discovers they can now implement every last feature they desire without getting any push back from engineering so they go wild implementing new features without realizing the consequences. There is no VCS, no QA, no testing. There is no one telling them they cannot do something because it won't scale, it isn't secure or it won't be maintainable.
Their only metric for success is that they get the result they want now and the long term consequences be damned. Worse yet, every single person using the rules engine is acting independently and not as a team. There is no code review, when the rule works to their satisfaction it gets pushed into production.
At first everything works fine and people get promoted for saving money on engineering costs but then the rules start getting more complex, start becoming composed of other rules, need to have more complex actions or need to integrate with third party systems. Eventually the simple rules engine turns into a bastardized programming language that everyone adds onto and never modifies because no one understands how a modification will affect the 4000 other rules in the engine. At that point you end up having to do a complete re-write, which is something I have had the displeasure of doing in the past.
I think businesses still need flexible / easy to use systems that allow end-users to create solutions quickly. This may involve analysts, IT professionals, and devs working together on the same platform. For examaple, an IT pro writes the sql queries, an analyst writes the regression algorithms, and the devs writes the output adapters.
Typically, by the time you get the dev team to fully implement the solution, it has missed its mark and the analysts have moved on.
Players in the mashup landscape are "trying" to provide scalable and robust, yet flexible and easy-to-use systems.
plug - flowreports.co is one of these ... and it can be self-hosted.
Businesses have hundreds of flexible easy systems to let end-users create quick business rule based solutions (particularly for reporting purposes) and have had them since the late 80s maybe earlier, the corporate landscape is littered with them. I wish you the best of luck, that's a tough market to get into.
Thank you for the feedback. Perhaps if we add things like support for version control systems, play well with releasing code to multiple environments, and add UI elements that enforce cloning chunks of code so that changes are isolated and the system can maintain it's coherence even among disparate users we can avoid becoming some of those issues. Something to think on...
That's definitely a step in the right direction. The biggest hurdle you'll have to overcome is getting your application to enforce a process, that's a lot harder than you think it is because people tend to take the path of least resistance and process is rarely that path. You'll have to get buy-in from very high levels of any organization you work with, otherwise things will devolve quickly. Either way, good luck, I'll check out your product demo when you've completed it.
There are very few places where business rules change so quickly that a rules engine is needed. Rules engines are essentially a poor practice used by businesses who don't clearly define what their goals are and stick to them. The alternative is to have a highly modularized system that is flexible enough for engineers to make changes to the code base in a timely manner, but that requires business to sit down and define the problem(s) they are trying to solve with their software. Getting that sort of time investment is difficult.
Not only that, but the interface is always a huge part of any product that's simple under the hood, and I'm going to check it out and evaluate right now...
https://source.opennews.org/en-US/articles/open-source-bot-f...
> Most prominently, we used it during our Olympics coverage to monitor the results of the API we built and let us know if the data ingestion pipeline ever grew stale. To do that, we set up a pipeline