Hacker News new | past | comments | ask | show | jobs | submit | jerzyt's comments login

Of course, Wikipedia is the easiest website to scrape. The HTML is so clean an organized. I'd like to find some code to scrape Airbnb.


Yeah, I'd expand on the "API over DBs" based on my experience. I could make it nest much deeper: DBs over Excel spreadsheets, and Excel spreadsheets over PDFs or scraped HTMLs, yada, yada, yada. But before I waste any time creating an API, my client wants to know if the project is even viable.


Google's Search is dying because of Thousands of self-inflicted cuts. A couple of years ago there were about 3 or clearly marked ads on the first page, followed by generally useful results. Right now, I don't have the patience to scroll past the ads which are almost indistinguishable from useful links.


The really messed up part for me is that the “real” results are getting so bad the ads are increasingly what I’m searching for.

It’s not a web search engine, it’s an ads search engine.


I don't have the stats, but seems to me that majority of purchases at convenience stores is alcohol. In CA, you cannot buy alcohol at self-service POS. That spells doom for Amazon Go.


At one of my previous jobs, I did a project on employee retention. Our goal was to identify valuable employees who were most likely to leave. The results were surprising at first. The top two factors leading to an employee moving to a new job were: high performance review and high increase in compensation (raise and bonus). On reflection, we realized that we just reaffirmed their market value, so they could negotiate their next salary starting from the new base with us.

Bottom line, whether you're happy or not with your job, you should probably move.


Really curious about something: according to the Pareto principle you should identify the top square root of employees that are top performers and pay them significantly more than the rest (they are producing several times more,~ 9 times, you even if you pay them double you get more value than from the rest of the people). Was this a conclusion of your project on employee retention?


This can change from quarter to quarter. There are companies like Cisco which pay bonuses sometimes exceeding 100% of salary to top employees. This way they’re not stuck paying that if the employee loses motivation.

There are some downsides. Engineers are far removed from the money and complex problems often require the work of many people. Some of that work is 100% necessary and sometimes not valued (writing tests). The typical mistake I see is companies rewarding people who delivered big greenfield projects before the projects have been proven to actually work. A few people on the lucky team get raises, bonuses, promotions, and then more then half the time, the project fails and the older stand-by that was being supported and slowly improved by the less-cool engineers continues to being in all the money. Those engineers then get pissed and some fraction quit.

So a few major caveats * correctly identifying the top 3% is difficult for upper management * identifying too early leads to false positives and backlash * because it’s far from the money, many programmers are quasi-communist in terms of thinking about team deliveries, so highlighting the output of a single engineer makes one person happy but half a dozen mad. There are many companies that don’t announce promotions too far and wide for that reason


Maybe people who get raises are the talented people who can get offers elsewhere.

Maybe people leave after getting a bonus because they postpone leaving until they get paid for their work.

Did you check for confounders?


Google is not paying people to provide a service. They are paying a company. I sympathize with the people who are underpaid, but their beef should be with their employers not with Google, even though as another commenter stated: Google is outsourcing non-compliance. On a related note, if people working below the minimum wage are flying to protest at Google headquarters, someone is paying their airfare. Doesn't look like a spontaneous act to me.


We go after Apple when Foxconn mistreats its employees. When you go after the intermediary, they'll go down and another will rise up to take its place -- and thanks to limited liability, it'll often be the same executives running a different company. When you go to the top, you have a chance at improving the situation.


Ah, yes, everything is suddenly fine when you put a screen like a company in front of it. It's not tax evasion, sir, it's a business relationship with a company in the British Virgin Island.

Also, how dare they organise for better pay? It must be a conspiracy. The riff raff really should know their place.

Let them eat cake.


> if people working below the minimum wage are flying to protest at Google headquarters, someone is paying their airfare. Doesn't look like a spontaneous act to me.

The raters who spoke and delivered the petition are members of Alphabet Workers Union.


This is really a great way to sabotage your competitor. Have a bunch of friend and family buy a product and flood Galaxus with bogus returns and warranty claims. We'll see how it works.


100% in agreement on Excel. Even when coding in Python I frequently save an intermediate file as xlsx to explore/debug, or even load into Tableau for viz.


110% agreement on Excel

The ability create a relational database in Excel with vlookups and hlookups, then capture it all into a macro is amazing.

I've really enjoyed using Excel as a Postgres frontend, with a real Postgres DB instance handling data, and then using the report functionality to dump to Word.

While a pro reporting engine and cutting out MS Office altogether would be a better longterm solution, it is hard to beat for quick & dirty results.


> The ability create a relational database in Excel with vlookups and hlookups,

Do yourself a favour and ditch vlookup and hlookup in favour of the recently introduced xlookup, which even obsoletes index/match !

I try to keep my exploratory joins out of Excel, but I admit that I often don't resist the immediacy of Excel's poor man's joins located right where I need them.


Took them long enough to add it.


I'm curious, how did you establish the Postgres connection?


I used ODBC [1] out of the big list of options [2] which gets a bad rep but worked for my use-cases.

The commercial devart plugin [3] looks pretty neat too but I haven't used it yet

I've also tried the JDBC connectivity option too [4], but with some different use-cases in mind for Postgres (not about Excel)

[1] https://datacornering.com/how-to-connect-to-postgresql-datab...

[2] https://www.postgresql.org/download/products/2-drivers-and-i...

[3] https://www.devart.com/excel-addins/postgresql/

[4] https://jdbc.postgresql.org/


You should learn R and dplyr ;)


Tell us more - in the context of them being replacements for Excel for his use case.


I love R but can't use it at work :(

With just the tidyverse library (which includes dplyr), R can be very useful in a data analysis pipeline. It is great for data cleaning and aggregation, especially when a process needs to be done multiple times. It is much faster than excel/power query. I am an accountant in SaaS and spend a lot of my day waiting for excel/powerbi automations to refresh. Similar solutions in R/sql/python would be nearly instant. Also excel/powerbi automations are a bitch to troubleshoot, and are unnecessarily complex.

When following tidy principles, a framework designed by the tidyverse dev Hadley Wickham, R code can be very easy to interpret, similar to SQL. Additionally the R community has made libraries for everything, and I consider R a great general purpose language as well.


Note that different flavours of R have very different performance. 'Base R' is quite slow. But R + data.table is blinding fast. Power Query perforamnce is awful, even compared to base R. Some benchmarks of these plus other data wrangling software (including my own product) at:

https://www.easydatatransform.com/data_wrangling_etl_tools.h...


"Manipulating hundreds of thousands of rows" is exactly where R, dplyr and data.table are great at. I do that on a daily basis.


I got to play around with Tableau when I was helping my wife in a collage programming course and though I don't have a current use to justify the significant cost, I must say that the tool was amazingly flexible and easy to use. I'd highly recommend it.


VMS OS from DEC had exactly the "unique password" stupidity in the late 80s. In a way, it was much worse than Twitter's, because where I worked there were only about a 100 accounts, so you could find the other by hand in less than an hour.


Is it really worse than Twitter's? The image has the website telling you exactly which account is already using that password (it's actually a joke from reddit).

It's interesting to know that this was actually a thing though...


It could also be argued that "medianism" is the perfect method to hide income inequality, because median is resilient to outliers, so the billionaires will not affect the median at all. Every metric can be "gamed" and so can the median.


That’s not how the median works. If we had as many in poverty as billionaires, we’d have a pretty good situation.


If half of a country were impoverished and half were billionaires, I'd say they have a pretty serious income inequality issue. A median would randomly conclude that either the typical person is a billionaire or in poverty. Best case scenario, the median would equal the mean and give a result that reflects reality, but it still doesn't convey the issue at all.

Median and mean both compress n data points into one, which is only useful given some assumptions about the distribution. Quintiles (or similar) would work for that scenario, but I'm sure there are more complicated situations where they fall short.


Societies in which there are within 100x as many billionaires as poverty stricken don’t exist.


Sure, it's a toy example to tease out the underlying shortcomings of the measurement. Societies where the median does not reflect the typical experience do exist.


But resistance to outliers is the point because people don't take billionaires inflating the per capita GDP as "wow, this country has a problem with inequality", they take it as "wow, this country is getting richer".

With GDP per capita, a few oil billionaires and a very small class of well-paid oil workers make Equatorial Guinea look like a middle income country. Median GDP reveals that the average person there lives on less than $2 a day, because Equatorial Guinea is also the most unequal country in the world. But clearly it's more representative to class Equatorial Guinea as very poor (median income) than moderately wealthy (per capita GDP/mean income) or rich (per capita GDP PPP)

Median income isn't perfect because it can hide things like one country having 5% living in extreme poverty and one country having 25% but with typical real world income distributions, an average not massively skewed upwards by how rich the top 10% or 1% is gives you a much more accurate perspective on which country most people have higher incomes in.


Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: