Meltano spun out of Gitlab is working in this domain, I think they are making progress.
https://meltano.com/
dbt are building the transformation engine that Meltano is using (I think), and are worth keeping an eye on
https://www.getdbt.com/
In my experience the issue with this domain isn't the "one-off" analysis, but rather orchestrating the BI function across the business, maintaining the single source of truth, testing and deploying across the ETL/ELT layers. I can't speak to how well Ananas is managing these but Meltano and dbt are giving this area a lot of thought.
This looks good. One of your competitors charges about $4,000 per seat per year so this seems to be a good space. If you add the possibility of building user defined nodes with Python you’d have a solid product.
Will you have a server too? Another issue is scheduling, triggers, etc.. You are in this strange space between Excel and “enterprise” ETL like Talend. We are so frustrated with the current poor state of data prep that we went to Python based workflows with the idea of scheduling, triggering, via Airflow. Seems to work well so far. But your IDE for data prep is a bit better positioned than what we use now (Jupyter).
Actually, we provide a command-line interface, you can do pretty much everything the UI does with command-line. And you can also run it in server mode with command-line. In the current version, we haven't covered scheduling yet. A possible solution would be scheduling through Airflow BashOperator in airflow. Or we can implement scheduling solution into the server itself.
It seems that Redash is a BI tool and close source. Ananas is open source, and can be used not only as a BI tool, but also an ETL tool. More over, you can run your pipeline on your own infrastructure, as Ananas can be run on multiple execution engines
Parent commenter linked to their GitHub repo. Redash is open source under the terms of the two-clause BSD license. (Maybe they edited to add link, maybe you overlooked it?)
Curious if anyone has used Metabase for serious work and can comment on it. I tried setting it up and got frustrated pretty quickly with the UX. It looks slick, but the mental model was confusing...
My startup seriously evaluated Metabase, but inevitably you'll want joins, and Metabase is fundamentally opposed to the idea: https://metabase.com/blog/Joins/ ... and if you're writing SQL for your views, you might as well be writing SQL.
We ended up shelling out for Tableau - it's pricey at $840/yr, but it supports joins out of the box (it even has a drag-and-drop interface to set up joins!), has practically every bell and whistle you could ask for, and allows you to do "exploratory analysis over screenshare in realtime with non-technical colleagues, without context switching to a coding mindset or needing to look up field names you may never have used before." I think it's intuitive and worth every penny, but YMMV. Would recommend everyone try the public version to get a feel for it.
EDIT: as others have said, Ananas is actually ETL + BI, whereas Metabase and Tableau are BI on top of a database. Tableau can stand in for good ETL due to its join support in certain scenarios. It's better than Metabase, but not necessarily comparable to Ananas.
(I work at Metabase, and am working on the joins feature as we speak)
As getsauce mentioned, we're adding joins to the next release, as well as essentially subqueries in what we're calling the "notebook". That should unlock a lot of power.
I've always been curious if there is a feature in the works for charting or displaying comparisons across defined timespans (e.g. total page views this week vs. last week)
I really wanted to like Metabase but unfortunately it's way behind its promise. I made an attempt to use it in a customer project for creating very basic customizable dashboard-like statistics, with little to no success. Just to name a few pain points:
- using the docker image is easy; not so using the jar file especially due to very little documentation
- confusing UI paired with lack of extensive documentation
- this makes it far from "easy" to be used by "everyone in your company" (quotes from metabase marketing claim): you really have to know how to do things, even easy ones like changing labels
- the UI contains many minor bugs which sometimes lead to unsavable metrics and you have just start from scratch
- no build-in way to export dashboards, which makes it nearly impossible to test your new metrics on a different system before pushing it to production; if you really want to do this, you have to juggle with database dumps
There might be some valid use cases for metabase, but I don't think it's very usable for non-technical users. I strongly suggest to evaluate it thoroughly before counting on it.
Nevertheless thanks for making it open source and free to use, so don't get me wrong.
Metabase is pretty good if you have a nicely configured datawarehouse (Snowflake, Bigquery good options). If you are connecting metabase directly to your app database then you will probably run into issues trying to integrate another data set (say CRM data).
This is where it makes sense to ELT (extract, load, transform) everything into a datawarehouse, integrate the data there and transform as much as possible, and do the "last-mile" analysis in Metabase.
This is at least the theory, I've had reasonable results with metabase doing it this way, also nice in that the bulk of your logic sits in your datawarehouse, so a BI tool migration is less painful, and also possible to run dual analytics tools.
I'm a fan of your work! We use Metabase pretty frequently.
The only nitpicks I have are around the concept of Metrics (still not quite sure what those are or how they're useful for me) and the initial download size of the libraries takes quite a while (especially over unstable VPN links)
I'm wondering if there's a way to have an option where it tries an external CDN first and then falls back to loading from the hosting server.
Thanks for following up! Here's some of things that tripped me up:
1. The lack of in depth docs.
2. The set up and usage of metrics was focusing. This was the main use case I was hoping Metabase could help me with, and it felt like an addon feature.
3. For whatever reason, managing dashboards was really confusing, and the UI [1] didn't seem to match the docs.
Why do charts with dates on the x-axis not show up correctly? For example the chart will show Jan 2019 on the label underneath a column that is actually Feb 2019. This is confusing to new users and drives experienced users up the wall. Currently the only thing that fixes it is to convert the axis to categorical.
Just FYI, about 1/3 of the way down your Getting Started page[0] it has a broken link[1] to the fifa2019.csv file. The first link on the page is valid[2], but the second one leads to a 404 due to pointing to .../raw/... rather than .../blob/...
I'm not super familiar with code signing, but if alternatives are expensive, could OP maintain a checksum value on their download page rather than go with DigiCert or alternative services? Or does code-signing solve a different problem?
No. Code signing is very different. Checksum would only work for developers on linux. Without code signing certificate, MacOS would straight refuse to run the app and windows will show an 'Unverified publisher' warning. Also things like auto-updates do not work on either platform unless you code sign your binaries.
I set up code signing for an electron app relatively recently. Best option I could find was Digicert. Really sucks that this stuff is necessary nowadays and not free, but it's not so bad.
That's for Windows - for Mac you'll also need an Apple developer account, afaik they're the only people who can issue certs.
EDIT: Woah, I take that back. Digicert has now gone up from $74/year to $474/year, which is crazy. I now also need a new certificate provider...
For Electron signing we use Tucows Code signing certs (you need to register as Tucows auther for free) which are provided by Comodo for $140 for 2 years.
Didn't have any issues besides getting a proper CI/CD process running.
We chose the "Open Source Code Signing" option, with it being stored on a physical keyfob thing (eg not "in the cloud"). Total cost, including the new key fob and super expensive, week+ delay, mandatory postage (!) was around 135 Euro.
For my project (quarkjs.io), I went for https://comodosslstore.com . They have the cheapest certificates I could find (at ~75USD), also they are the only ones issuing certificates for individual developers.
This looks very good and a fit for my end users who deal with excel files all the time. Is there any plans to add Excel as a datasource? Cannot convert to CSV without major pain since excel files are exports from mainframe apps which are out of my control. Thanks
I would recommend distributing binaries from a dedicated release server combined with a CDN. Possibly digital ocean spaces. It really increases download speeds for end user as compared to gitHub releases.
At what scale has this been tested? As in, are you aware of any data file size limits? I have a csv with ~6M rows and when paging through the docs the "Exploring your data source" gave me pause thinking this app might try and open all 6M rows at once. Will I be OK importing such a large source or will my computer turn into a space heater before refusing to respond?
Ananas has been tested on production processing terabyte data on a daily basis (with Google Dataflow, but you can achieve the same thing with your own spark cluster too).
In term of exploring large source file, the design principle is to paginate any kind of data that support random access records (for example CSV, logs, etc). So when "exploring the data" of a CSV with 6M rows, Ananas will not load 6M rows at once, but read a few rows at a time for each page. For example, in this early demo video, exploring a 755M CSV file in seconds. https://www.youtube.com/watch?v=GwqZlhmei78&t=01m00s
Excellent idea. We've though about Machine learning transformer including NLP . This NLG is something which would definitively nice to have. Please create an issue and we will prioritize it.
Does this have any sort of hinting for indexed queries at all? I would worry that a beginner would create a horrid mess of queries that could consume all available resources.
That's a good point. Actually we think of this tool as a collaboration tool which enables non technical users and data engineers to share this visual DAG and work together. The Apache Beam runners we use behind the scene have a Query planner to optimize chained queries . However you're totally right . This can't help a non technical users to write messy queries. The visual DAG should however helps them to split a complex query into simpler ones.
I was thinking the same thing, for like IoT monitoring sensors, etc - but it is open source; grab it (once GH is back online), add the new "source", and issue a PR - that'd be the way to do it I think...
The API data source is a great idea! You can create a feature request on our github. (Otherwise we will create it ) We will try to put that into following releases.
My unprofessional professional opinion. The product looks great, but the name has to go. I can't imagine pronouncing that, let alone communicating it over a phone. Any simple word before analytics would be better.
Edit: pineapplytics is the obvious cute and available one, however may still be difficult to communicate.
Well, or they just want everyone to be able to access it? There is really no choice than to publish something like this in English. Just a guess, but I'd guess the amount of people ccessing it who are not native English speakers is larger than those who are.
Delighted that this is one of the first comments! :)
The product was designed to make analytics easy. We found that the word Analytics is not easy to pronounce too. So we decided to make the word analytics easy too! But thanks for your comments, we will consider about it.
I think the name and logo are nice (pineapple database right?) but agree that it's both difficult to spell and pronounce (particularly for people who refer to them as "pineapples").
If you want to be cheeky, CONCAT(SUBSTR('analyst', 1, 4), 'desktop') is available for a .com domain.
Edit: while I have you, could you please also stop posting unsubstantive comments to HN generally? You've done that a fair bit and we're trying for something a little better than that here. In addition to the site guidelines, you might also find these links helpful for getting the spirit of the site:
Things to keep on your radar:
Meltano spun out of Gitlab is working in this domain, I think they are making progress. https://meltano.com/
dbt are building the transformation engine that Meltano is using (I think), and are worth keeping an eye on https://www.getdbt.com/
In my experience the issue with this domain isn't the "one-off" analysis, but rather orchestrating the BI function across the business, maintaining the single source of truth, testing and deploying across the ETL/ELT layers. I can't speak to how well Ananas is managing these but Meltano and dbt are giving this area a lot of thought.