Show HN: Ananas – a hackable data tool for beginners

sails · on July 22, 2019

This is great.

Things to keep on your radar:

Meltano spun out of Gitlab is working in this domain, I think they are making progress. https://meltano.com/

dbt are building the transformation engine that Meltano is using (I think), and are worth keeping an eye on https://www.getdbt.com/

In my experience the issue with this domain isn't the "one-off" analysis, but rather orchestrating the BI function across the business, maintaining the single source of truth, testing and deploying across the ETL/ELT layers. I can't speak to how well Ananas is managing these but Meltano and dbt are giving this area a lot of thought.

kfk · on July 22, 2019

This looks good. One of your competitors charges about $4,000 per seat per year so this seems to be a good space. If you add the possibility of building user defined nodes with Python you’d have a solid product.

bhou · on July 22, 2019

We will release a user-defined function with Javascript in next release. Python is on the roadmap too.

kfk · on July 22, 2019

Will you have a server too? Another issue is scheduling, triggers, etc.. You are in this strange space between Excel and “enterprise” ETL like Talend. We are so frustrated with the current poor state of data prep that we went to Python based workflows with the idea of scheduling, triggering, via Airflow. Seems to work well so far. But your IDE for data prep is a bit better positioned than what we use now (Jupyter).

millboh · on July 22, 2019

Actually, we provide a command-line interface, you can do pretty much everything the UI does with command-line. And you can also run it in server mode with command-line. In the current version, we haven't covered scheduling yet. A possible solution would be scheduling through Airflow BashOperator in airflow. Or we can implement scheduling solution into the server itself.

justaguyhere · on July 22, 2019

woah, that is steep. who is this competitor?

alienreborn · on July 22, 2019

Probably Informatica or Talend which are two of the most popular Enterprise ETL tools.

ZeroCool2u · on July 22, 2019

Alteryx would be my guess.

kfk · on July 22, 2019

Interesting enough both ananas here and Alteryx use a declarative approach. Alteryx is using XML and Ananas YAML

ZeroCool2u · on July 23, 2019

The YAML approach seems far more appealing to me honestly, though I'm sure XML was a sane choice at the time.

dewey · on July 22, 2019

Pretty similar to https://github.com/getredash/redash from a first look. What would you say are the main differences?

kfk · on July 22, 2019

Redash is comparable to Superset, Tableau, etc.. Ananas analytics is comparable to Talend, Airflow, etc.

millboh · on July 22, 2019

It seems that Redash is a BI tool and close source. Ananas is open source, and can be used not only as a BI tool, but also an ETL tool. More over, you can run your pipeline on your own infrastructure, as Ananas can be run on multiple execution engines

codetrotter · on July 22, 2019

Parent commenter linked to their GitHub repo. Redash is open source under the terms of the two-clause BSD license. (Maybe they edited to add link, maybe you overlooked it?)

millboh · on July 22, 2019

Oh, sorry about that, my bad. I just looked through redash home page, but didn't find the open source link

dewey · on July 22, 2019

The Github link is in the footer of the site, but the icon is very small.

djmips · on July 22, 2019

redash source <- search

canada_dry · on July 22, 2019

An older, but very slick tool for wrangling data: http://vis.stanford.edu/wrangler/

Which is now a commercial tool: https://www.trifacta.com/products/wrangler-editions/#wrangle...

mtarnovan · on July 22, 2019

This looks a bit like https://metabase.com -- anyone used both and can make a comparison?

wiremine · on July 22, 2019

Curious if anyone has used Metabase for serious work and can comment on it. I tried setting it up and got frustrated pretty quickly with the UX. It looks slick, but the mental model was confusing...

btown · on July 22, 2019

My startup seriously evaluated Metabase, but inevitably you'll want joins, and Metabase is fundamentally opposed to the idea: https://metabase.com/blog/Joins/ ... and if you're writing SQL for your views, you might as well be writing SQL.

We ended up shelling out for Tableau - it's pricey at $840/yr, but it supports joins out of the box (it even has a drag-and-drop interface to set up joins!), has practically every bell and whistle you could ask for, and allows you to do "exploratory analysis over screenshare in realtime with non-technical colleagues, without context switching to a coding mindset or needing to look up field names you may never have used before." I think it's intuitive and worth every penny, but YMMV. Would recommend everyone try the public version to get a feel for it.

EDIT: as others have said, Ananas is actually ETL + BI, whereas Metabase and Tableau are BI on top of a database. Tableau can stand in for good ETL due to its join support in certain scenarios. It's better than Metabase, but not necessarily comparable to Ananas.

getsauce · on July 22, 2019

Metabase added joins to query builder in a recent release: https://github.com/metabase/metabase/releases/tag/v0.33.0-pr...

tlrobinson · on July 23, 2019

(I work at Metabase, and am working on the joins feature as we speak)

As getsauce mentioned, we're adding joins to the next release, as well as essentially subqueries in what we're calling the "notebook". That should unlock a lot of power.

AMA.

fegul · on July 23, 2019

Big fan of Metabase, thanks for all your work!

I've always been curious if there is a feature in the works for charting or displaying comparisons across defined timespans (e.g. total page views this week vs. last week)

tlrobinson · on July 23, 2019

Thanks!

We have a "trend" visualization: https://www.dropbox.com/s/qg54b6owdatznmm/Screenshot%202019-... Is that like what you're looking for?

fegul · on July 23, 2019

That certainly looks like it's in the right direction, I'll re-familiarize myself with the docs again since it's been a little while

schlowmo · on July 23, 2019

I really wanted to like Metabase but unfortunately it's way behind its promise. I made an attempt to use it in a customer project for creating very basic customizable dashboard-like statistics, with little to no success. Just to name a few pain points:

- using the docker image is easy; not so using the jar file especially due to very little documentation

- confusing UI paired with lack of extensive documentation

- this makes it far from "easy" to be used by "everyone in your company" (quotes from metabase marketing claim): you really have to know how to do things, even easy ones like changing labels

- the UI contains many minor bugs which sometimes lead to unsavable metrics and you have just start from scratch

- no build-in way to export dashboards, which makes it nearly impossible to test your new metrics on a different system before pushing it to production; if you really want to do this, you have to juggle with database dumps

There might be some valid use cases for metabase, but I don't think it's very usable for non-technical users. I strongly suggest to evaluate it thoroughly before counting on it.

Nevertheless thanks for making it open source and free to use, so don't get me wrong.

sails · on July 23, 2019

Metabase is pretty good if you have a nicely configured datawarehouse (Snowflake, Bigquery good options). If you are connecting metabase directly to your app database then you will probably run into issues trying to integrate another data set (say CRM data).

This is where it makes sense to ELT (extract, load, transform) everything into a datawarehouse, integrate the data there and transform as much as possible, and do the "last-mile" analysis in Metabase.

This is at least the theory, I've had reasonable results with metabase doing it this way, also nice in that the bulk of your logic sits in your datawarehouse, so a BI tool migration is less painful, and also possible to run dual analytics tools.

Checkout https://www.getdbt.com/ for more on the process.

jesterson · on July 23, 2019

We use it for live and love it. Also new few companies who do use it on fairly big data flow.

I am not sure why you are frustrated with UI, myself and colleagues find it quite good.

mazameli · on July 23, 2019

Sorry to hear that (I'm the UX guy at MB). Would appreciate your thoughts/feedback on any specifics you want to share. Thanks!

fegul · on July 23, 2019

I'm a fan of your work! We use Metabase pretty frequently.

The only nitpicks I have are around the concept of Metrics (still not quite sure what those are or how they're useful for me) and the initial download size of the libraries takes quite a while (especially over unstable VPN links)

I'm wondering if there's a way to have an option where it tries an external CDN first and then falls back to loading from the hosting server.

wiremine · on July 23, 2019

Thanks for following up! Here's some of things that tripped me up:

1. The lack of in depth docs.

2. The set up and usage of metrics was focusing. This was the main use case I was hoping Metabase could help me with, and it felt like an addon feature.

3. For whatever reason, managing dashboards was really confusing, and the UI [1] didn't seem to match the docs.

[1] I was using the mac version.

mazameli · on July 23, 2019

Thanks, I appreciate your feedback.

llampx · on July 23, 2019

Why do charts with dates on the x-axis not show up correctly? For example the chart will show Jan 2019 on the label underneath a column that is actually Feb 2019. This is confusing to new users and drives experienced users up the wall. Currently the only thing that fixes it is to convert the axis to categorical.

mjirv · on July 22, 2019

Metabase is much more for data visualization. As far as I know it doesn't really have any ETL features.

I could actually see this being pretty useful as an ETL layer to go along with Metabase if someone were trying to build a free/open source BI stack.

Jazgot · on July 22, 2019

Is the name inspired by Orange [1]?

[1] https://orange.biolab.si

programbreeding · on July 24, 2019

Just FYI, about 1/3 of the way down your Getting Started page[0] it has a broken link[1] to the fifa2019.csv file. The first link on the page is valid[2], but the second one leads to a 404 due to pointing to .../raw/... rather than .../blob/...

[0] https://ananasanalytics.com/docs/user-guide/getting-started

[1] https://github.com/ananas-analytics/ananas-examples/raw/mast...

[2] https://github.com/ananas-analytics/ananas-examples/blob/mas...

bhou · on July 24, 2019

Thanks for the reminder, the links are fixed now.

nishkalkashyap · on July 22, 2019

I would recommend code-signing the build before distributing.

yazan94 · on July 22, 2019

I'm not super familiar with code signing, but if alternatives are expensive, could OP maintain a checksum value on their download page rather than go with DigiCert or alternative services? Or does code-signing solve a different problem?

nishkalkashyap · on July 27, 2019

No. Code signing is very different. Checksum would only work for developers on linux. Without code signing certificate, MacOS would straight refuse to run the app and windows will show an 'Unverified publisher' warning. Also things like auto-updates do not work on either platform unless you code sign your binaries.

millboh · on July 22, 2019

Thanks for your feedback, we will look for some affordable code-signing certificates. Any suggestions? By the way, here is the issue link: https://github.com/ananas-analytics/ananas-desktop/issues/61

pimterry · on July 22, 2019

I set up code signing for an electron app relatively recently. Best option I could find was Digicert. Really sucks that this stuff is necessary nowadays and not free, but it's not so bad.

That's for Windows - for Mac you'll also need an Apple developer account, afaik they're the only people who can issue certs.

EDIT: Woah, I take that back. Digicert has now gone up from $74/year to $474/year, which is crazy. I now also need a new certificate provider...

NewsAware · on July 22, 2019

For Electron signing we use Tucows Code signing certs (you need to register as Tucows auther for free) which are provided by Comodo for $140 for 2 years. Didn't have any issues besides getting a proper CI/CD process running.

justinclift · on July 23, 2019

There aren't any great options, but if it helps we (sqlitebrowser.org) went with Certum:

https://en.sklep.certum.pl/data-safety/code-signing-certific...

We chose the "Open Source Code Signing" option, with it being stored on a physical keyfob thing (eg not "in the cloud"). Total cost, including the new key fob and super expensive, week+ delay, mandatory postage (!) was around 135 Euro.

nishkalkashyap · on July 27, 2019

For my project (quarkjs.io), I went for https://comodosslstore.com . They have the cheapest certificates I could find (at ~75USD), also they are the only ones issuing certificates for individual developers.

nishkalkashyap · on July 27, 2019

I've submitted a response on GitHub.

najarvg · on July 22, 2019

This looks very good and a fit for my end users who deal with excel files all the time. Is there any plans to add Excel as a datasource? Cannot convert to CSV without major pain since excel files are exports from mainframe apps which are out of my control. Thanks

millboh · on July 22, 2019

Excel source was one of our first supported data sources. See our early video demo: https://www.youtube.com/watch?v=GwqZlhmei78. We just created an issue on GitHub: https://github.com/ananas-analytics/ananas-desktop/issues/60 We will add this feature back in the following release.

jbverschoor · on July 22, 2019

The app icon is transparent on mac, and thereforce only clickable on the border

nishkalkashyap · on July 27, 2019

I would recommend distributing binaries from a dedicated release server combined with a CDN. Possibly digital ocean spaces. It really increases download speeds for end user as compared to gitHub releases.

jbverschoor · on July 22, 2019

Unfortunately it's created by an unientified developer

jessaustin · on July 22, 2019

Be sure and write some special firewall rules before running this...

chrsstrm · on July 22, 2019

At what scale has this been tested? As in, are you aware of any data file size limits? I have a csv with ~6M rows and when paging through the docs the "Exploring your data source" gave me pause thinking this app might try and open all 6M rows at once. Will I be OK importing such a large source or will my computer turn into a space heater before refusing to respond?

bhou · on July 23, 2019

Ananas has been tested on production processing terabyte data on a daily basis (with Google Dataflow, but you can achieve the same thing with your own spark cluster too).

In term of exploring large source file, the design principle is to paginate any kind of data that support random access records (for example CSV, logs, etc). So when "exploring the data" of a CSV with 6M rows, Ananas will not load 6M rows at once, but read a few rows at a time for each page. For example, in this early demo video, exploring a 755M CSV file in seconds. https://www.youtube.com/watch?v=GwqZlhmei78&t=01m00s

eli_gottlieb · on July 22, 2019

Ok, but why did you name it after pineapples?

millboh · on July 22, 2019

Ananas, Analytics made easy :) Pineapple was cool too. Will probably change it if we see more comments ;)

hondadriver · on July 22, 2019

No just keep it. Its fun and why has the name to be an existing English word?

Fun fact: if everybody starts using it, it will eventually become proper English.

SysINT · on July 22, 2019

Pomme-stylo-anana or name it after Pikotaro!

mingabunga · on July 22, 2019

Thought about adding some words to the data output using natural language generation? Eg arria.com or other nlg vendor?

millboh · on July 22, 2019

Excellent idea. We've though about Machine learning transformer including NLP . This NLG is something which would definitively nice to have. Please create an issue and we will prioritize it.

robtherobber · on July 22, 2019

I can't download it: https://github.com/ananas-analytics/ananas-desktop , I get a Github server error. Is it just me?

Edit: not just me, Github issues.

notduncansmith · on July 22, 2019

Looks like GitHub is experiencing errors, just had the same problem with an unrelated repo.

robtherobber · on July 22, 2019

Thanks!

millboh · on July 22, 2019

Yep we expected anything but Github issue ;)

jugg1es · on July 22, 2019

Does this have any sort of hinting for indexed queries at all? I would worry that a beginner would create a horrid mess of queries that could consume all available resources.

millboh · on July 22, 2019

That's a good point. Actually we think of this tool as a collaboration tool which enables non technical users and data engineers to share this visual DAG and work together. The Apache Beam runners we use behind the scene have a Query planner to optimize chained queries . However you're totally right . This can't help a non technical users to write messy queries. The visual DAG should however helps them to split a complex query into simpler ones.

VvR-Ox · on July 23, 2019

Oh nice - thank you very much! :-D

I thought about writing my own app for exactly that task but when I see yours I think I don't need to do that anymore. Awesome! :-)

richk449 · on July 22, 2019

Can I use this if all I have is an odbc connection?

millboh · on July 22, 2019

What kind of data source exactly do you need? We should be able to add Microsoft datasource such as MSSQL if you request it.

rmbeard · on July 23, 2019

I have the opposite issue is there a way to connect to a cloud database without an ODBC connection?

mtw · on July 23, 2019

This looks great as a promise - I looked into visualizations provided, and we need much more than what's provided though

lucasverra · on July 22, 2019

Can we hook this to an api GET request ? I guess i could API -> download JSON -> Ananas, but you know..:)

cr0sh · on July 22, 2019

I was thinking the same thing, for like IoT monitoring sensors, etc - but it is open source; grab it (once GH is back online), add the new "source", and issue a PR - that'd be the way to do it I think...

millboh · on July 22, 2019

The API data source is a great idea! You can create a feature request on our github. (Otherwise we will create it ) We will try to put that into following releases.

yazan94 · on July 22, 2019

This looks really cool! Thanks for sharing, I can't wait to test this

thoughtpalette · on July 22, 2019

Looks great to me. Nice job shipping!

millboh · on July 22, 2019

Thanks

pplonski86 · on July 23, 2019

What is your business model?

towlinson · on July 25, 2019

What about bananas?

overcast · on July 22, 2019

My unprofessional professional opinion. The product looks great, but the name has to go. I can't imagine pronouncing that, let alone communicating it over a phone. Any simple word before analytics would be better.

Edit: pineapplytics is the obvious cute and available one, however may still be difficult to communicate.

henrikschroder · on July 22, 2019

Fun fact: Only the English language calls the fruit "pineapple", almost every other language calls it "ananas" or similar.

pimterry · on July 22, 2019

The Guardian made a helpful infographic a little while back about exactly this: https://static.guim.co.uk/sys-images/Guardian/Pix/pictures/2...

hondadriver · on July 22, 2019

We call pineapple juice 'ananassap' in Dutch and sometimes use it as if it would be an English word as a joke.

overcast · on July 22, 2019

I looked it up, I get it, but this post and the site, are targeting English speaking.

_frkl · on July 22, 2019

Well, or they just want everyone to be able to access it? There is really no choice than to publish something like this in English. Just a guess, but I'd guess the amount of people ccessing it who are not native English speakers is larger than those who are.

ygjb · on July 22, 2019

Now I really want the author to rename it pineapple in the English localization, and leave it Ananas every where else...

mikorym · on July 22, 2019

That's not true. In my native language it is called "pynappel".

EDIT: I guess that may be why you said "almost" but that, in turn, is almost impossible.

pmelendez · on July 22, 2019

That's a bit bold statement... In Spanish is Piña and in Portuguese is Abacaxi, but a good amount of laguanges does call it ananas

caiocaiocaio · on July 22, 2019

In Portugal Portuguese it's Ananás.

michaelmior · on July 22, 2019

I think this is also true in the Spanish dialects spoken in many regions.

maximente · on July 22, 2019

> In Spanish is Piña

equally as bold, and also incorrect: ananá is what you'll hear in Argentina, possibly elsewhere.

techie128 · on July 22, 2019

Its called "ananas" in Marathi as well. Marathi is a regional language in India.

millboh · on July 22, 2019

Delighted that this is one of the first comments! :) The product was designed to make analytics easy. We found that the word Analytics is not easy to pronounce too. So we decided to make the word analytics easy too! But thanks for your comments, we will consider about it.

koolba · on July 22, 2019

I think the name and logo are nice (pineapple database right?) but agree that it's both difficult to spell and pronounce (particularly for people who refer to them as "pineapples").

If you want to be cheeky, CONCAT(SUBSTR('analyst', 1, 4), 'desktop') is available for a .com domain.

bhou · on July 22, 2019

Love this CONCAT(SUBSTR('analyst', 1, 4), 'desktop') idea! ;)

dugluak · on July 22, 2019

People have same opinion about Azure. Microsoft didn't change it. Ananas is still not that bad.

tlrobinson · on July 22, 2019

"Ananas" alone might not be that bad, but "ananasanalytics.com" certainly is.

Bob Loblaw's Law Blog, anyone?

caiocaiocaio · on July 22, 2019

I really loved the name, but there was a bit of disconnect because I thought something named pineapple would lead me to a more exciting, artsy page.

aiisjustanif · on July 22, 2019

Who said it was made by an English speaker...?

Typical. Smh.

dang · on July 22, 2019

I appreciate the push for tolerance behind your comment, but being snide doesn't help. Please don't on HN.

https://news.ycombinator.com/newsguidelines.html

Edit: while I have you, could you please also stop posting unsubstantive comments to HN generally? You've done that a fair bit and we're trying for something a little better than that here. In addition to the site guidelines, you might also find these links helpful for getting the spirit of the site:

https://news.ycombinator.com/newswelcome.html

https://news.ycombinator.com/hackernews.html

http://www.paulgraham.com/trolls.html

http://www.paulgraham.com/hackernews.html

dlphn___xyz · on July 22, 2019

what advantages does this have over the ELK stack?

tracer4201 · on July 22, 2019

No Redshift support? Hmmm.

bhou · on July 22, 2019

It is on our roadmap! We will continue adding more data sources in the following release.