More

alexbeloi · on Dec 5, 2023

Fun fact, they can sometimes narrow down crime scene DNA to just a single person by having enough partial matches from their (potentially distant) relatives. I can't remember which DNA database was used, but some cases were solved this way, IIRC it introduced a bunch of legal questions about if you can search a database in that way.

I think this was the article that talked about this (apologies for the paywall): https://www.nytimes.com/2021/12/27/magazine/dna-test-crime-i...

mrweasel · on Dec 5, 2023

There have also been a number of false positives because many believe that DNA is infallible. What people tend to forget is that DNA tests used by law enforcement is only using a very small subset of DNA markers. This mean that if you're already in a DNA database you can get an unpleasant knock on the door just because you have 10 DNA markers in common with some random criminal.

Danish police only upgraded from 10 DNA markers to 16 in 2021, forcing them to review 12.000 cases and redoing the DNA test. Resulting in at least one person having the sentence reversed. No word on how many was falsely suspected, but I assume more than a few.

alexbeloi · on Dec 8, 2022

Seems like a steep price but I can see this becoming a marketable skill at some point. It feels similar to knowing how to google well, we've all internalized some google/search concepts like 'unique words' -> 'narrow results' and 'full sentences' -> 'phrase matching'. Probably there will be nuances to good gen art prompt writing.

sethd · on Dec 8, 2022

Does anyone pay people to exclusively perform Google searches? How much per search?

shubb · on Dec 8, 2022

The job title is Junior Software Engineer. It's a joke but also true - the effectiveness of a software engineer and many other types of worker hangs on their ability to google effectively, and it makes its way unacknowledged into their pay I'm sure.

Dzugaru · on Dec 8, 2022

ChatGPT actually showed me how much I'm frustrated with Google nowadays. It gave me a glimpse of what real question answering can be - and it's not opening 10+ tabs and endlessly reformulating the query in hopes something relevant and not just popular will come out.

I really want a tool like that - just ask a question and get a simple straight answer. For now its Google's "here is a bunch of websites, search it yourself" or ChatGPT "sure, here is the answer, but maybe I hallucinated the whole thing, lol"

mrguyorama · on Dec 8, 2022

The other day I typed "quick bite to eat" into my phone's "google search" bar and opened the maps app to look at the results. Among the results of a couple of fast food places, google suggested Fucking Lowe's hardware store. The entire company should be embarrassed at just how pathetic their machine is.

Der_Einzige · on Dec 8, 2022

Actually this is revealing that you're racist (I'm mostly kidding).

Many, Many Lowe's and Home Depots have random food carts or other types of food (usually Mexican) nearby or in lot of cases literally right in front of the location. The reasons for this are an exercise left up to the reader.

sethd · on Dec 8, 2022

Which is why I used the word "exclusively", but I still thought this was funny and I was actually thinking the same when asking. :)

alexbeloi · on Nov 24, 2022

They aren't mutually exclusive, but their incentives are opposed, usually the incentives bringing in the money win.

alexbeloi · on Feb 9, 2021

> What's the best way to prepare for DP in interviews?

Do 100 of these problems: https://leetcode.com/tag/dynamic-programming/

alexbeloi · on Feb 9, 2021

It doesn't punish anyone. If you want solutions, then the book is not for you, that's all. The author is not obligated to accommodate every audience, or even a majority audience.

alexbeloi · on Feb 1, 2021

Ads optimizes for profit, all other content is broadly optimized for meaningful social interaction and against problematic content.

https://www.facebook.com/business/news/news-feed-fyi-bringin...

https://about.fb.com/news/2019/04/remove-reduce-inform-new-s...

disgruntledphd2 · on Feb 2, 2021

Ads 100% does not optimise for profit.

Source: I had a bunch of long conversations with FB Ads engineers about what they optimise for. I believe that it's a weighted sum over conversions, which seems like a better metric for an ads system (FB could increase profit in the short term by implementing price floors, but this wouldn't lead to more long term revenue, because advertisers would stop using the platform).

alexbeloi · on Feb 2, 2021

I was oversimplifying, but I stand by my words.

It does optimize for profit, just with extra steps. For most FB ads products (that you see in feed), advertisers pay based on conversions (views, clicks, likes, joins, purchases, etc.). So revenue is directly tied to conversions. Then there are extra steps weighing in revenue != profit, advertiser retention, repetitiveness, long term user value, etc.

alexbeloi · on Jan 25, 2019

Machine Learning is a catch-all term for optimizing statistical models with data.

Simplest example that you are very likely already familiar with is that of a 'best fit line' to some xy scatter plot. This starts by making an assumption (model choice) that the relationship between `x` and `y` is linear, e.g. `y=mx + b`, then you can use data (xy points) to figure out the most likely values for `m` and `b`. You can then make predictions for new `x_new` values by plugging them into your known line to get `y_new`.

Machine learning often manifests in a two step process: first feature extraction, and then fitting features to a desired output. Deep learning combines these as an end-to-end process to eliminate 'human in the loop' problems that occur from feature extraction.

Example: you want to predict who should win a chess game in a given board state

Feature extraction (what information you think matters): what pieces does white have, what pieces does black have, is white in check, is black in check, how many valid squares can white king move to, how many valid squares can black king move, etc...

* Fitting: make an assumption about the relationship between features and outcome (model choice), fit model using data (features, outcome)

The Deepblue 2 model that played Kasparov used around 8000 features (not sure if this is the feature vector size or # of features). As you can imagine, feature extraction is highly dependent on expert knowledge of the problem and will often fail to cover unknown situations/cases.

Deep learning models aim is to avoid limitations of expert knowledge by using raw data (e.g. occupancy of each square on a chess board) and extract features implicitly rather than relying on explicit human formulas. It has also opened up new possibilities for areas where expert knowledge has made little progress in the past (e.g. there is not much an expert can say about what pixel features are might indicate a dog/cat is contained in an image).

Tarean · on Jan 25, 2019

Though this end-to-end nature also makes it harder to spot biases and errors. There is some work that first trains with deep learning and then tries to infer a simpler model that is understandable for humans.

alexbeloi · on Jan 25, 2019

You'll likely be happy to hear that this has been (is being) addressed.

I watched the live broadcast of this announcement where they did a recap of all 10 previous matches (against TLO and Mana) and they talked about this concern. During today's announcement they presented a new model that could not see the whole map and had to use the camera movement to focus properly. The deepmind team said it took somewhat longer to train but they were able to achieve the same levels of performance according to their metrics and play-testing against previous version.

However...

They did a live match vs LiquidMana (6th match against Mana) against the latest version (with camera movement) and LiquidMana won! LiquidMana was able to repeatedly do hit-and-run immortal drop harassment in AlphaStar's base, forcing it to bring troops back to defend its base, causing it to fall behind in production and supply over time and ultimately lose a major battle.

alexbeloi · on Aug 28, 2018

If by "do that" you mean mimic what a real driver would do for a specific set of sensor inputs, that is precisely ML tries to do.

To understand what the difficulty is, it's important to consider that the size of the sensor input is very large. Don't think of it like twenty range finders around the car, rather a 360 degree medium resolution color + depth image (about 0.5 million data points coming at 30 fps).

It's difficult because you will never encounter the same set of sensor inputs twice, so you can't treat it like a search space problem. Once you've accepted that, you're in AI/ML territory where you might try to reason about what the closest set of known sensor inputs and action would be (classical AI, expert system), but that is impractically difficult with as 0.5 million dimensional search space, or train an ML model to 'reason' about the sensor space to make a decision about the appropriate action.

Approaches using a small number of sensors can do automatic breaking and smarter cruise control, but haven't been seen to be successful about navigating and making strategic decisions. The current belief is that more can be done by using denser sensors and more data and seems to be the case. There are people working on reducing the sensor density requirement, but the main focus right now is building a successful and safe self driving car, regardless of sensor and compute costs.

frgewut · on Aug 28, 2018

According to Waymo most of the miles driven are simulated ( 2.7 billion miles in 2017). That's an order of magnitude more than actual miles (25K per day)[1]. And even the actual miles mostly don't have any user input.

Because of this I'm leaning towards thinking waymo isn't trying to mimic actual human input.

[1] https://waymo.com/

edraferi · on Aug 28, 2018

Anyone interested in this should read the Atlantic feature about it:

https://www.theatlantic.com/technology/archive/2017/08/insid...

dyarosla · on Aug 28, 2018

Wait.. 30fps?! That’s the speed of this data? I would have hoped it would be at around 90-120- at the very least 60...

alexbeloi · on Aug 28, 2018

Movies are filmed at 27fps so the reasoning is humans have high confidence that they aren't missing any significant information between the frames, it should possible to make a 'mental model' of a road scene at the same fps to human skill level.

In the future we'll likely have super-human spatial and temporal resolution, right now more improvements have been gained from highest possible spatial resolution with minimal plausible temporal resolution.

icelancer · on Aug 28, 2018

>> Movies are filmed at 27fps so the reasoning is humans have high confidence that they aren't missing any significant information between the frames, it should possible to make a 'mental model' of a road scene at the same fps to human skill level.

I hope there is a better, more technical explanation that ML researchers are using, because as someone who is somewhat of an expert on human vision and building products around it, this foundation is godawful if it is to be taken at face value. Which again, I am sure this is a simplification. Or at least, that's what I am telling myself.

gthaman · on Aug 28, 2018

This whole driverless car thing reeks of more vaporware, more public-yet-profitless unicorn companies promising to promise to change the world and aside from surface level consideration of edge case accidents with pedestrians (hasn't one of them already killed or seriously injured a pedestrian??) there isn't much deep talk about less straightforward issues such as (former) members of the trucking industry sabotaging these new fleets or how liability and insurance is REALLY going to work out.

File the promises and the problems under fiction because it appears to be more important to keep the world order, its financial system and these ridiculous media darling fluff piece corporations alive while they bleed money.

And no, im not closed to the idea of successful work being done on automated driving but 30 fps, WTF? too much going on in the larger context of the world, this shit isn't happening in 2020 or 2024 or whatever else many might say.

omalleyt · on Aug 28, 2018

Look at it this way: human ability to act is (for elite gamers) 300 actions per minute. That's 5fps. So with 30fps the AI could theoretically already have 1/6th the latency of the most-responsive human drivers

allendoerfer · on Aug 28, 2018

They are not actually responding at 300 actions per minute to changing input, a large percentage of those clicks are constant selections of team shortcuts.

icelancer · on Aug 28, 2018

This is even worse. Humans do not process 300 APM. They merely are limited physically to outputting 300 APM. You have no idea what the brain's capacity to process and analyze information that led to the output of 300 APM. If you think 5 FPS is the capacity of the brain's ability to process vision... well, don't make a driverless car, please.

oblio · on Aug 28, 2018

We move and react slowly but we respond to info which comes at a much higher rate. I can notice the individual frames at 5 FPS. I know I'm not getting enough info.

joemag · on Aug 28, 2018

I was going to post that the impact of higher FPS is likely too low to justify needing 4x higher processing power. But then the difference in reaction time between 30fps and 120fps is O(30ms). At 60mph that translates to almost 1 meter in stoppping distance. Tough call.

icelancer · on Aug 28, 2018

It isn't just stopping power. It is fidelity, confidence, and completeness of data. A camera quickly moving across a panorama opening/closing the lens every 1/30th of a second loses a ton of data compared to one opening its lens every 1/120th of a second.

Raidion · on Aug 28, 2018

Yea, but then you have to have 4x the processing power It's pretty easy to scale frame rates once you have stuff figured out, but I can see very good reasons to not have to try to get up to 120+fps right away. In addition, I'd imagine the ML they're using probably has a lot harder time distinguishing valid motion when the motion is 4x as small.

It's probably a very valid tradeoff.

alexbeloi · on July 23, 2018

The issues I've heard from a few people in hiring is that there is a surplus of junior data scientists from these camps and a shortage of senior data scientists to manage them. Problems not dissimilar to tech hiring in general, but companies need a lot more SWEs than data scientists.