Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Ananas – a hackable data tool for beginners (ananasanalytics.com)
359 points by millboh on July 22, 2019 | hide | past | favorite | 104 comments



This is great.

Things to keep on your radar:

Meltano spun out of Gitlab is working in this domain, I think they are making progress. https://meltano.com/

dbt are building the transformation engine that Meltano is using (I think), and are worth keeping an eye on https://www.getdbt.com/

In my experience the issue with this domain isn't the "one-off" analysis, but rather orchestrating the BI function across the business, maintaining the single source of truth, testing and deploying across the ETL/ELT layers. I can't speak to how well Ananas is managing these but Meltano and dbt are giving this area a lot of thought.


This looks good. One of your competitors charges about $4,000 per seat per year so this seems to be a good space. If you add the possibility of building user defined nodes with Python you’d have a solid product.


We will release a user-defined function with Javascript in next release. Python is on the roadmap too.


Will you have a server too? Another issue is scheduling, triggers, etc.. You are in this strange space between Excel and “enterprise” ETL like Talend. We are so frustrated with the current poor state of data prep that we went to Python based workflows with the idea of scheduling, triggering, via Airflow. Seems to work well so far. But your IDE for data prep is a bit better positioned than what we use now (Jupyter).


Actually, we provide a command-line interface, you can do pretty much everything the UI does with command-line. And you can also run it in server mode with command-line. In the current version, we haven't covered scheduling yet. A possible solution would be scheduling through Airflow BashOperator in airflow. Or we can implement scheduling solution into the server itself.


woah, that is steep. who is this competitor?


Probably Informatica or Talend which are two of the most popular Enterprise ETL tools.


Alteryx would be my guess.


Interesting enough both ananas here and Alteryx use a declarative approach. Alteryx is using XML and Ananas YAML


The YAML approach seems far more appealing to me honestly, though I'm sure XML was a sane choice at the time.


Pretty similar to https://github.com/getredash/redash from a first look. What would you say are the main differences?


Redash is comparable to Superset, Tableau, etc.. Ananas analytics is comparable to Talend, Airflow, etc.


It seems that Redash is a BI tool and close source. Ananas is open source, and can be used not only as a BI tool, but also an ETL tool. More over, you can run your pipeline on your own infrastructure, as Ananas can be run on multiple execution engines


Parent commenter linked to their GitHub repo. Redash is open source under the terms of the two-clause BSD license. (Maybe they edited to add link, maybe you overlooked it?)


Oh, sorry about that, my bad. I just looked through redash home page, but didn't find the open source link


The Github link is in the footer of the site, but the icon is very small.


redash source <- search


An older, but very slick tool for wrangling data: http://vis.stanford.edu/wrangler/

Which is now a commercial tool: https://www.trifacta.com/products/wrangler-editions/#wrangle...


This looks a bit like https://metabase.com -- anyone used both and can make a comparison?


Curious if anyone has used Metabase for serious work and can comment on it. I tried setting it up and got frustrated pretty quickly with the UX. It looks slick, but the mental model was confusing...


My startup seriously evaluated Metabase, but inevitably you'll want joins, and Metabase is fundamentally opposed to the idea: https://metabase.com/blog/Joins/ ... and if you're writing SQL for your views, you might as well be writing SQL.

We ended up shelling out for Tableau - it's pricey at $840/yr, but it supports joins out of the box (it even has a drag-and-drop interface to set up joins!), has practically every bell and whistle you could ask for, and allows you to do "exploratory analysis over screenshare in realtime with non-technical colleagues, without context switching to a coding mindset or needing to look up field names you may never have used before." I think it's intuitive and worth every penny, but YMMV. Would recommend everyone try the public version to get a feel for it.

EDIT: as others have said, Ananas is actually ETL + BI, whereas Metabase and Tableau are BI on top of a database. Tableau can stand in for good ETL due to its join support in certain scenarios. It's better than Metabase, but not necessarily comparable to Ananas.


Metabase added joins to query builder in a recent release: https://github.com/metabase/metabase/releases/tag/v0.33.0-pr...


(I work at Metabase, and am working on the joins feature as we speak)

As getsauce mentioned, we're adding joins to the next release, as well as essentially subqueries in what we're calling the "notebook". That should unlock a lot of power.

AMA.


Big fan of Metabase, thanks for all your work!

I've always been curious if there is a feature in the works for charting or displaying comparisons across defined timespans (e.g. total page views this week vs. last week)


Thanks!

We have a "trend" visualization: https://www.dropbox.com/s/qg54b6owdatznmm/Screenshot%202019-... Is that like what you're looking for?


That certainly looks like it's in the right direction, I'll re-familiarize myself with the docs again since it's been a little while


I really wanted to like Metabase but unfortunately it's way behind its promise. I made an attempt to use it in a customer project for creating very basic customizable dashboard-like statistics, with little to no success. Just to name a few pain points:

- using the docker image is easy; not so using the jar file especially due to very little documentation

- confusing UI paired with lack of extensive documentation

- this makes it far from "easy" to be used by "everyone in your company" (quotes from metabase marketing claim): you really have to know how to do things, even easy ones like changing labels

- the UI contains many minor bugs which sometimes lead to unsavable metrics and you have just start from scratch

- no build-in way to export dashboards, which makes it nearly impossible to test your new metrics on a different system before pushing it to production; if you really want to do this, you have to juggle with database dumps

There might be some valid use cases for metabase, but I don't think it's very usable for non-technical users. I strongly suggest to evaluate it thoroughly before counting on it.

Nevertheless thanks for making it open source and free to use, so don't get me wrong.


Metabase is pretty good if you have a nicely configured datawarehouse (Snowflake, Bigquery good options). If you are connecting metabase directly to your app database then you will probably run into issues trying to integrate another data set (say CRM data).

This is where it makes sense to ELT (extract, load, transform) everything into a datawarehouse, integrate the data there and transform as much as possible, and do the "last-mile" analysis in Metabase.

This is at least the theory, I've had reasonable results with metabase doing it this way, also nice in that the bulk of your logic sits in your datawarehouse, so a BI tool migration is less painful, and also possible to run dual analytics tools.

Checkout https://www.getdbt.com/ for more on the process.


We use it for live and love it. Also new few companies who do use it on fairly big data flow.

I am not sure why you are frustrated with UI, myself and colleagues find it quite good.


Sorry to hear that (I'm the UX guy at MB). Would appreciate your thoughts/feedback on any specifics you want to share. Thanks!


I'm a fan of your work! We use Metabase pretty frequently.

The only nitpicks I have are around the concept of Metrics (still not quite sure what those are or how they're useful for me) and the initial download size of the libraries takes quite a while (especially over unstable VPN links)

I'm wondering if there's a way to have an option where it tries an external CDN first and then falls back to loading from the hosting server.


Thanks for following up! Here's some of things that tripped me up:

1. The lack of in depth docs.

2. The set up and usage of metrics was focusing. This was the main use case I was hoping Metabase could help me with, and it felt like an addon feature.

3. For whatever reason, managing dashboards was really confusing, and the UI [1] didn't seem to match the docs.

[1] I was using the mac version.


Thanks, I appreciate your feedback.


Why do charts with dates on the x-axis not show up correctly? For example the chart will show Jan 2019 on the label underneath a column that is actually Feb 2019. This is confusing to new users and drives experienced users up the wall. Currently the only thing that fixes it is to convert the axis to categorical.


Metabase is much more for data visualization. As far as I know it doesn't really have any ETL features.

I could actually see this being pretty useful as an ETL layer to go along with Metabase if someone were trying to build a free/open source BI stack.


Is the name inspired by Orange [1]?

[1] https://orange.biolab.si


Just FYI, about 1/3 of the way down your Getting Started page[0] it has a broken link[1] to the fifa2019.csv file. The first link on the page is valid[2], but the second one leads to a 404 due to pointing to .../raw/... rather than .../blob/...

[0] https://ananasanalytics.com/docs/user-guide/getting-started

[1] https://github.com/ananas-analytics/ananas-examples/raw/mast...

[2] https://github.com/ananas-analytics/ananas-examples/blob/mas...


Thanks for the reminder, the links are fixed now.


I would recommend code-signing the build before distributing.


I'm not super familiar with code signing, but if alternatives are expensive, could OP maintain a checksum value on their download page rather than go with DigiCert or alternative services? Or does code-signing solve a different problem?


No. Code signing is very different. Checksum would only work for developers on linux. Without code signing certificate, MacOS would straight refuse to run the app and windows will show an 'Unverified publisher' warning. Also things like auto-updates do not work on either platform unless you code sign your binaries.


Thanks for your feedback, we will look for some affordable code-signing certificates. Any suggestions? By the way, here is the issue link: https://github.com/ananas-analytics/ananas-desktop/issues/61


I set up code signing for an electron app relatively recently. Best option I could find was Digicert. Really sucks that this stuff is necessary nowadays and not free, but it's not so bad.

That's for Windows - for Mac you'll also need an Apple developer account, afaik they're the only people who can issue certs.

EDIT: Woah, I take that back. Digicert has now gone up from $74/year to $474/year, which is crazy. I now also need a new certificate provider...


For Electron signing we use Tucows Code signing certs (you need to register as Tucows auther for free) which are provided by Comodo for $140 for 2 years. Didn't have any issues besides getting a proper CI/CD process running.


There aren't any great options, but if it helps we (sqlitebrowser.org) went with Certum:

https://en.sklep.certum.pl/data-safety/code-signing-certific...

We chose the "Open Source Code Signing" option, with it being stored on a physical keyfob thing (eg not "in the cloud"). Total cost, including the new key fob and super expensive, week+ delay, mandatory postage (!) was around 135 Euro.


For my project (quarkjs.io), I went for https://comodosslstore.com . They have the cheapest certificates I could find (at ~75USD), also they are the only ones issuing certificates for individual developers.


I've submitted a response on GitHub.


This looks very good and a fit for my end users who deal with excel files all the time. Is there any plans to add Excel as a datasource? Cannot convert to CSV without major pain since excel files are exports from mainframe apps which are out of my control. Thanks


Excel source was one of our first supported data sources. See our early video demo: https://www.youtube.com/watch?v=GwqZlhmei78. We just created an issue on GitHub: https://github.com/ananas-analytics/ananas-desktop/issues/60 We will add this feature back in the following release.


The app icon is transparent on mac, and thereforce only clickable on the border


I would recommend distributing binaries from a dedicated release server combined with a CDN. Possibly digital ocean spaces. It really increases download speeds for end user as compared to gitHub releases.


Unfortunately it's created by an unientified developer


Be sure and write some special firewall rules before running this...


At what scale has this been tested? As in, are you aware of any data file size limits? I have a csv with ~6M rows and when paging through the docs the "Exploring your data source" gave me pause thinking this app might try and open all 6M rows at once. Will I be OK importing such a large source or will my computer turn into a space heater before refusing to respond?


Ananas has been tested on production processing terabyte data on a daily basis (with Google Dataflow, but you can achieve the same thing with your own spark cluster too).

In term of exploring large source file, the design principle is to paginate any kind of data that support random access records (for example CSV, logs, etc). So when "exploring the data" of a CSV with 6M rows, Ananas will not load 6M rows at once, but read a few rows at a time for each page. For example, in this early demo video, exploring a 755M CSV file in seconds. https://www.youtube.com/watch?v=GwqZlhmei78&t=01m00s


Ok, but why did you name it after pineapples?


Ananas, Analytics made easy :) Pineapple was cool too. Will probably change it if we see more comments ;)


No just keep it. Its fun and why has the name to be an existing English word?

Fun fact: if everybody starts using it, it will eventually become proper English.


Pomme-stylo-anana or name it after Pikotaro!


Thought about adding some words to the data output using natural language generation? Eg arria.com or other nlg vendor?


Excellent idea. We've though about Machine learning transformer including NLP . This NLG is something which would definitively nice to have. Please create an issue and we will prioritize it.


I can't download it: https://github.com/ananas-analytics/ananas-desktop , I get a Github server error. Is it just me?

Edit: not just me, Github issues.


Looks like GitHub is experiencing errors, just had the same problem with an unrelated repo.


Thanks!


Yep we expected anything but Github issue ;)


Does this have any sort of hinting for indexed queries at all? I would worry that a beginner would create a horrid mess of queries that could consume all available resources.


That's a good point. Actually we think of this tool as a collaboration tool which enables non technical users and data engineers to share this visual DAG and work together. The Apache Beam runners we use behind the scene have a Query planner to optimize chained queries . However you're totally right . This can't help a non technical users to write messy queries. The visual DAG should however helps them to split a complex query into simpler ones.


Oh nice - thank you very much! :-D

I thought about writing my own app for exactly that task but when I see yours I think I don't need to do that anymore. Awesome! :-)


Can I use this if all I have is an odbc connection?


What kind of data source exactly do you need? We should be able to add Microsoft datasource such as MSSQL if you request it.


I have the opposite issue is there a way to connect to a cloud database without an ODBC connection?


This looks great as a promise - I looked into visualizations provided, and we need much more than what's provided though


Can we hook this to an api GET request ? I guess i could API -> download JSON -> Ananas, but you know..:)


I was thinking the same thing, for like IoT monitoring sensors, etc - but it is open source; grab it (once GH is back online), add the new "source", and issue a PR - that'd be the way to do it I think...


The API data source is a great idea! You can create a feature request on our github. (Otherwise we will create it ) We will try to put that into following releases.


This looks really cool! Thanks for sharing, I can't wait to test this


Looks great to me. Nice job shipping!


Thanks


What is your business model?


What about bananas?


My unprofessional professional opinion. The product looks great, but the name has to go. I can't imagine pronouncing that, let alone communicating it over a phone. Any simple word before analytics would be better.

Edit: pineapplytics is the obvious cute and available one, however may still be difficult to communicate.


Fun fact: Only the English language calls the fruit "pineapple", almost every other language calls it "ananas" or similar.


The Guardian made a helpful infographic a little while back about exactly this: https://static.guim.co.uk/sys-images/Guardian/Pix/pictures/2...


We call pineapple juice 'ananassap' in Dutch and sometimes use it as if it would be an English word as a joke.


I looked it up, I get it, but this post and the site, are targeting English speaking.


Well, or they just want everyone to be able to access it? There is really no choice than to publish something like this in English. Just a guess, but I'd guess the amount of people ccessing it who are not native English speakers is larger than those who are.


Now I really want the author to rename it pineapple in the English localization, and leave it Ananas every where else...


That's not true. In my native language it is called "pynappel".

EDIT: I guess that may be why you said "almost" but that, in turn, is almost impossible.


That's a bit bold statement... In Spanish is Piña and in Portuguese is Abacaxi, but a good amount of laguanges does call it ananas


In Portugal Portuguese it's Ananás.


I think this is also true in the Spanish dialects spoken in many regions.


> In Spanish is Piña

equally as bold, and also incorrect: ananá is what you'll hear in Argentina, possibly elsewhere.


Its called "ananas" in Marathi as well. Marathi is a regional language in India.


Delighted that this is one of the first comments! :) The product was designed to make analytics easy. We found that the word Analytics is not easy to pronounce too. So we decided to make the word analytics easy too! But thanks for your comments, we will consider about it.


I think the name and logo are nice (pineapple database right?) but agree that it's both difficult to spell and pronounce (particularly for people who refer to them as "pineapples").

If you want to be cheeky, CONCAT(SUBSTR('analyst', 1, 4), 'desktop') is available for a .com domain.


Love this CONCAT(SUBSTR('analyst', 1, 4), 'desktop') idea! ;)


People have same opinion about Azure. Microsoft didn't change it. Ananas is still not that bad.


"Ananas" alone might not be that bad, but "ananasanalytics.com" certainly is.

Bob Loblaw's Law Blog, anyone?


I really loved the name, but there was a bit of disconnect because I thought something named pineapple would lead me to a more exciting, artsy page.


Who said it was made by an English speaker...?

Typical. Smh.


I appreciate the push for tolerance behind your comment, but being snide doesn't help. Please don't on HN.

https://news.ycombinator.com/newsguidelines.html

Edit: while I have you, could you please also stop posting unsubstantive comments to HN generally? You've done that a fair bit and we're trying for something a little better than that here. In addition to the site guidelines, you might also find these links helpful for getting the spirit of the site:

https://news.ycombinator.com/newswelcome.html

https://news.ycombinator.com/hackernews.html

http://www.paulgraham.com/trolls.html

http://www.paulgraham.com/hackernews.html


what advantages does this have over the ELK stack?


No Redshift support? Hmmm.


It is on our roadmap! We will continue adding more data sources in the following release.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: