Hacker Newsnew | past | comments | ask | show | jobs | submit | more babs474's commentslogin

In practice I find the bigger problem is from analysts/actuaries/statisticians who have a disdain for programming, which sometimes is viewed as a task for mere technicians.

Typically your excel model/analysis has not even solved half the problem of a datascience system. It needs to be repeatable, it needs to be open to change (source control!), it needs to be integratable with the wider system.

These things need to be considered upfront. There are plenty of reasonable software tools for this. Yes hadoop shouldn't be your first step, but taking 5 minutes to put something on a server in ec2 (omg, the cloud) is not unreasonable.

There is a swallowing abyss between excel and production. That is where datascience projects die, its a shame.


I've never met a statistician who either uses excel or has a "disdain for programming". R or Matlab are basic tools of the trade


I talk a lot of people who've had trouble with "data scientists" who are strong in statistics and know some matlab or R or something like that, but know nothing about the craftsmanship of programming.

By that I mean skills like using version control, writing software that is maintainable, working with a team that uses project management software, things like that.

A common kind of workflow is that a data scientist develops an algorithm and makes tweaks to it, and that this gets baked into a production system.

If the data scientist throws something over the wall and it takes the developers a few weeks to get it ready for real use, the "real time" productivity of the team is going to be awful. The closer we come to the data scientist checking the changes in and that's that, the more valuable the data scientist is.


This is absolutely a fair comment, coders but not software engineers, and is the same problem that's permeated bioinformatics for the last decade or so. (As an aside, it's fun hearing grand claims about data science revolutionising medicine in 10 years [0], when the same claims were made about bioinformatics 10 years ago.)

[0] https://twitter.com/HanChenNZ/status/473825783874859008


R and matlab are better, but those tools also have issues integrating into production depending on what you are doing. It's not so much the exact tool you use, but just having a little forethought about how your creation is going to interact with a production system.

A lot of people feel programming is undervalued in academia. For instance Hadley Wickham creator of ggplot2 probably hasn't gotten the recognition he deserves. With a prevailing attitude such as that is it any wonder academic code has such a poor reputation?

Whickham notes that he thinks tides are changing. I agree that it is, as a part of the datascience phenomenon. As part of the change you are going to see a few more macbooks, some cloud servers, maybe a guy with glasses talking about version control and software design. It is not all garbage, I hope you keep an open mind.

Q:Do you feel that the academic culture has caught up with and supports non-traditional academic contributions (e.g. R packages instead of papers)?

A:It’s hard to tell. I think it’s getting better, but it’s still hard to get recognition that software development is an intellectual activity in the same way that developing a new mathematical theorem is.[1]

1. http://simplystatistics.org/2012/05/11/ha/


Integrating in production is a huge biggey. I hope to be spending a lot of my time this summer sharing / educating folks abou some tech I've built to make putting interesting Analytics Into production.


I use matlab.

then I use excellink to send everything in matlab to excel.


In the case of Om, I believe the immutability of clojure helps out and makes the diffing of databound elements a simple (non-deep) compare.

You can end up making an app with quite a number of elements flying around.

A neat pixel editor example: http://jackschaedler.github.io/goya/


This randomly reminded me of an ancient kuro5hin article that I found interesting back in the day. Where the proposal was to use brute force to implement comment search on kuro5hin.

http://www.kuro5hin.org/story/2004/5/1/154819/1324


Prediction markets that don't involve money have also had some success recently. http://www.npr.org/blogs/parallels/2014/04/02/297839429/-so-...

For an easy to join one that I work on shameless plug checkout https://scicast.org


I wouldn't frame the question as "what did you do wrong". As other people in this thread have pointed out you may have dodged a bullet. It could very well be that this company is in the "wrong".

A better question to ask is what can you do to make yourself more attractive and increase the probability of closing the deal with the next company. Think like a salesman.

The fact that you told him you currently make close to what their interns make jumps out at me. Perhaps you think that would make you more attractive, but it may have had the opposite effect.

The head of engineering may have been told by his team that you have a lot of expertise, skill and potential and then all of a sudden he finds out you get paid at an intern level. He is going to ask himself what am I missing? If you are so good why doesn't anybody else value you at that level. People want to have a good feeling in their gut when they make a big purchase and a large previous salary gap may have unsettled that.

A better head of engineering wouldn't we swayed by previous salary, but the truth of the matter is a lot of decision makers are not perfectly rational.

My advice to you is to google strategies for dealing with the "what is your current salary" question and be better prepared for that situation.


tptacek, we all want to hear Willem Pinckaers' take, it is really good stuff.

I also want to hear ideas from Akamai, even if they aren't perfect. Perhaps they can lead to good things.

Unfortunately Pinckaers' commentary is a little bit too hostile and calls for Akamai to cease sharing ideas[1].

I'm sure Akamai's developers are "adult" enough, as you say, to handle it. However there is a trope in software development community that if you share something, you should be fine with being open to no holds barred attacks. Wouldn't the more "adult" behavior be to criticize in a more professional tone that is open to refinement of ideas and could spark further collaboration? I'd like to see this type of communication more in the software world, I think it would encourage more participation.

[1]"they should not be sending out non-functional, bug ridden patches to the OpenSSL community"


I think there's a difference between sharing ideas and sharing code.

An idea or concept on its own can't really do much, at least until it's put into practice somehow. The potential for harm is quite minimal, if it even exists.

Code, on the other hand, can often be directly used with relative ease by people who may not fully understand the possible implications of using such code. The potential for harm exists, and could be significant.

In the context of security, it's important to avoid potentially-harmful code wherever possible. If somebody has concerns about some code, regardless of who wrote it, it is best to express those concerns in a very blunt and direct manner.

Security is just not something to fool around with. The hard questions and painful facts should be out in the open, especially when code is involved and capable of being used. It's just not the time or place for pussyfooting around.


I made this comment the other day in a thread about children accidentally clicking on google display ads, but I think it also applies here. The problem is measuring the effectiveness of early funnel ads from clicks.

Here is a good presentation from the quantcast guys about the "natural born clicker" problem. The people clicking on your display ad are probably anything but actual potential customers.

Clicks is just an easy holdover metric from the paid search side of digital advertising. It doesn't make sense in the context of early funnel ads. You need to measure the effect your display ads are having on your purchasing endpoints. Which is what the whole cross channel attribution industry is about.

Its quite possible your are getting good value from facebook ads, you've just inadvertently focused in on the worst subpopulation, the clickers.

[1]http://www.slideshare.net/hardnoyz/display-ad-clickers-are-n...


For what it's worth, that earlier Hacker News discussion regarding the Google case can be found at https://news.ycombinator.com/item?id=7524473.


This problem extends to more than just kids playing around on mobile games.

Here is a good presentation from the quantcast guys about the "natural born clicker" problem. The people clicking on your display ad are probably anything but actual potential customers.

Clicks is just an easy holdover metric from the paid search side of digital advertising. It doesn't make sense in the context of early funnel ads. You need to measure the effect your display ads are having on your purchasing endpoints. Which is what the whole cross channel attribution industry is about.

[1]http://www.slideshare.net/hardnoyz/display-ad-clickers-are-n...


For a prediction market anyone can join checkout https://scicast.org/

Whats neat about SciCast is you can make predictions based on the assumption of how other questions will turn out.

eg I think that if the price of bitcoin exceeds 1000 by 2015 at least one presidential candidate will accept bitcoin donations. If btc ends up being below 1000 I make no prediction.


Just to add, the SciCast project came out of the DAGGRE project, which was a competitor to Good Judgment in the IARPA prediction challenge.


Also checkout minnedemo[1] for twin cities startups events.

And tech.mn[2] for news.

http://minnestar.org/minnedemo/

http://www.tech.mn


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: