Hacker Newsnew | past | comments | ask | show | jobs | submit | data_dan_'s commentslogin

https://www.danliden.com/

At the moment, I post fairly infrequently. The whole site is written using emacs org mode. Most of the posts have to do with emacs and data stuff (often doing data stuff in emacs).


I wrote this article. For some background—

In March, we published an article on Stock Trades by members of congressional committees: https://innerjoin.bit.io/data-cant-tell-us-whether-congressi...

To conduct this research, we needed to know: (1) which members of congress made which stock trades, and (2) which members of congress belonged to which congressional committees. The data for (1) was available from the the senate/house stock watchers sites; the data for (2) came from the ProPublica Congress API. There was no primary key available for linking the two datasets: the best we had to work with were the names of the members of congress.

This would be fine, if the names were represented uniquely and consistently. This was not the case. You can't join "Mitch McConnell" to "A. Mitchell McConnell, Jr." without a bit of work.

Manually matching every single name from the first data source to every single name in the second would be tedious, time consuming, and error prone. Instead, we used the Levenshtein distance to compute a similarity metric between each name in the first dataset and each name in the second. Simply using the best match according to this metric correctly matched more than 95% of the names, and made it incredibly simple to review the list and manually fix the few incorrect matches.

There's also an accompanying Deepnote dashboard where you can compare string distances between pairs of strings of your choosing: https://deepnote.com/@dliden-bitdotio/Whats-in-a-Name-28418c...


I wrote this article after a colleague pointed out that the Pandas DataFrame.to_sql() method uses row-by-row INSERTs. There are plenty of good reasons for this, and the to_sql method works great with many different SQL database flavors, but it's not fast.

This article compares the performance of different methods for writing a Pandas DataFrame to a PostgreSQL database using the to_sql method on DataFrames ranging from 100 rows to 10,000,000 rows.


(I wrote this) I wish I had clearer answers to those questions! The conclusion is just a little unsatisfying from a writing perspective—it's just too early to tell exactly what's going on. The quits data through most of 2021 don't look all that different from what we'd expect based on the pre-pandemic trend, though there were definitely more than anticipated, especially later in the year. But this is following a massively-disruptive pandemic that put a lot of people out of work and in general had a dramatic impact on the employment situation.

I think the most interesting part is the decrease in layoffs that coincided with the increase in quits. People aren't leaving that much more than before, but when they leave, they're doing so on their own terms.


Thank you for the numbers. To me it looks that there is not really a 'great resignation'.


To me it just looks like it's too early to tell. Did quits go up very quickly in 2021? Sure! But that comes on the heels of a massive spike in layoffs that occurred in 2020. It is at least a possibility that the current situation is a response to that.

One point I didn't go into is the fact that the labor force participation rate also dropped steeply in 2020 and hasn't recovered to pre-pandemic levels yet. So that could create labor shortages that are not necessarily represented in the quits rate.


Maybe I've been especially fortunate, or I'm just not understanding the question right, but I've felt respected pretty much from the beginning of my career (aside from grad school, which was not great, in many respects). Or at least I've felt "treated with respect" -- not sure if that's exactly the same thing. But I've always been given a fair amount of independence at work, and I've generally been able to solve whatever I'm assigned to solve, or clearly articulate the challenges, allowing those with different/more expertise to help out. And I've never been made to feel ashamed or inadequate because of either the work I've completed or the work I've needed to seek out additional help to complete.


This perspective always bothers me. It's the same with the recent Don't Look Up. The people who will watch and and understand it aren't the people who actually need to get the message. They're both bland movies that present a point of view they know their viewers will agree with. They then get praised--undeservingly, in my opinion--for presenting a "bold" perspective.


Meanwhile, the Nevada state legislature decided against extending the vaccination requirement for the Nevada System of Higher Education (https://lasvegassun.com/news/2021/dec/21/college-students-in...) and the NSHE board of regents may be on the verge of reversing their staff vaccine mandate instead of firing those who refuse to comply (https://www.reviewjournal.com/local/education/regents-to-rea...).


I use a lot of U.S. government data sources (EPA climate data; BLS employment statistics; etc.). I also use a fair amount of international greenhouse gas emissions data, such as from the UNFCCC greenhouse gas inventory datasets.

Pain points: data disappearing, moving, or being updated without notice and without indication of a change. Numbers from the same API endpoints or URL changing unexpectedly and without explanation can be an unwelcome surprise.

I use bit.io (https://bit.io -- I work there) to deal with these problems. It's an online PostgreSQL database; very easy to use with e.g. psycopg2/SQLalchemy in Python or DBI+dbplyr in R. Before any analysis, I copy the necessary data over to a repo/schema in bit.io, fill in the documentation with the dates on which I obtained the data, and use that as the source of "ground truth" for the analysis.


Testing and clinical trials!


It would have been much faster if regulators and ethicists hadn’t been so squeamish about challenge trials.

In order to save dozens of lives that might have been lost in challenge trials, we sacrificed hundreds of thousands of lives so that we could wait for more ethically-sound vaccine trials to complete.


The problem are the vaccine mandates.

If you (soft or hard) mandate a vaccine, and you kill someone with it, that's a lot of responsibility to take, if the vaccine didn't go through a full trial.

If the vaccines were as optional as eg. flu vaccines are, then a simple waiver would solve most of the issues.

(a 20yo girl died in slovenia due to jannsen vaccine not that long ago, and she got vaccinated, becase she was soft-forced by the government mandates (48 hour testing, far away from home, but unable to use the bus without a test, to go to the testing site, 12eur/test,...).


Your reply is totally off-topic for the parent you replied to.


I see why you're saying that, but I think you missed their point. Challenge trials might well allow us to make a safe vaccine available sooner to those who want it, but if we mandate a vaccine that has only undergone challenge trials and then that vaccine kills someone, it would certainly be politically disastrous for challenge trials.


Challenge trials are dangerous for the people who participate in the trials, but they aren’t any less rigorous.


I agree! But it would still be a political disaster if something does go wrong.


This isn’t about opinions!

In a normal trial, you give half of participants a vaccine, and half of participants a placebo. Then, you wait around and see how many people in each group catch COVID naturally and get sick. Your vaccine works if fewer people who received a vaccine get sick compared to the placebo.

In a challenge trial, you give half of participants a vaccine and half of participants a placebo, and then purposefully expose them all to COVID so you don’t have to wait around for them to catch the virus naturally. As before, your vaccine works if fewer people who received a vaccine get sick.

A challenge trial gives you data which is more, not less, robust, because you’re controlling for more variables between groups. And we’ve used challenge trials to test vaccines in the past—just, never with a disease that’s nearly as deadly as COVID.

Any firestorm would result from a trial participant dying from the COVID they were purposefully given (which could absolutely happen), not from the vaccine. This has nothing to do with vaccine mandates.


Challenge trials have nothing to do with adverse side effects in the original clinical safety trial. We also have good proxies for challenge trials for COVID.


You just make vaccine optional and only mandate once enough data is available.


I think there were some edits after I replied.


Pointing to one person's death and claiming a specific cause... I assume you have very specific autopsy evidence to prove it?


Yes, I do.

Well, not me personally but a "five-member commission, namely, three doctors (neurologist, infectologist and vascular specialist), a pharmacologist and Zoran Simonovič, a representative of the epidemiological profession" has.

https://www.gov.si/en/news/2021-11-30-expert-commission-conf...

> Minister of Health, Janez Poklukar, the head of the regional unit of the Maribor National Institute of Public Health, Zoran Simonovič, professor Borut Štrukelj from the Faculty of Pharmacy, Ljubljana and Maja Bratuša held a press briefing on the current situation regarding Covid-19 disease.

...

> “The commission unanimously assessed that there was a direct link between the vaccination with Janssen Johnson & Johnson and the tragic complication, i.e. the onset of the syndrome”, said Simonovič.

...

> Moreover, he said that he is to propose to the vaccine advisory group to stop vaccinating with Janssen in Slovenia, or to enable vaccination with Janssen only at the explicit request of an individual, who must confirm this with signature. “This means that the currently valid provisional vaccination protocol with Janssen will become permanent”, said Minister Poklukar.


I think the clinical guidance for younger women (<50) is that they get a two dose mRNA vaccine and not Janssen or AZ for this very reason.


Back when I was vaccinated, all four vaccines (pfeizer, moderna, jannsen and astrazeneca) were safe and good for everyone, but there were availability issues with pfeizer on one side, and on the other, jannsen needed only one dose, and you'd get your covid certificate faster (so one month sooner than with pfeizer, and that also means one month less of paying for tests and waiting in long lines every two days when these measures were implemented).

Soon after, astrazeneca was slowly pulled out due to a few deaths elsewhere (not in slovenia), then a wife of our diplomat died in belgium (jannsenn), and the media talked a lot about the hospital procedures, and how she could be saved... then this 20yo girl (from the report) died from jannsen, and we stopped using jannsen too, then scandinavian countries stoped using moderna due to heart issues in younger people, and we're down from 4 to 1 vaccine, with huge mandates that indirectly force you to get vaccinated. ...and the antivaxxers are just waiting for something bad to happen with pfeizer, to show they were right about safety issues.


It’s more nuanced than that in terms of safety.


Were 100,000 of lives lost because of a delayed vaccine? Considering the third world hasn't had access to the vaccine and death rates are lower with delta I'm not sure skipping trials would have done much good.


Vaccine challenge trials are faster not because they are any less rigorous. They are faster because you are actively infecting people, instead of just waiting for participants to be randomly infected as they go about their lives. You can have a much higher level of reliability with a trial that is orders-of-magnitude smaller, and you can have definitive results within weeks compared to months.


Throwing ethical standards in the bin "for the greater good" is never a good idea.


Challenge trials can be ethically designed, and they have been used to test vaccines for malaria, influenza, and other potentially-fatal diseases.

https://en.wikipedia.org/wiki/Human_challenge_study#Vaccines...


It is a good idea when the ethical standards are bad and get replaced by better ones.


We don’t do clinical trials for the yearly flu vaccine update, do we?


> We don’t do clinical trials for the yearly flu vaccine update, do we?

Yes, we do[1][2].

[1] https://www.cdc.gov/flu/vaccines-work/effectivenessqa.htm

[2] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4947948/


We also have 80 years of experience with the flu vaccine opposed to 8 months


> We also have 80 years of experience with the flu vaccine opposed to 8 months

8 months is incorrect. Here's[1] an article from 17 months ago about results from Moderna's covid-19 trials for vaccinations dating back to mar16 2020. So it's been 21 months, not 8 (and that's ignoring trials of mRNA vaccines years earlier as they weren't for this specific virus)

[1] https://www.cidrap.umn.edu/news-perspective/2020/07/hopeful-...


We can split hairs on what constitutes experience, or when it starts. but the point still stands: We have a lot more experience with the flu vaccine.

This is the answer for why not all annual flu vaccines need clinical trials.


„the flu vaccine“?

Which one?


The first vaccines in 1933 were against A type influenzas, followed by vaccines against against B type influenzas in 1942.

Here is a pretty good summary of the history of vaccine development.

https://www.medscape.com/viewarticle/812621_1

While nearly all influenza vaccines generate similar antibodies, Covid and influenza vaccines generate significantly different antibodies

Influenza vaccines generate antibodies against influenza Hemagglutinin proteins targeting sialic acid receptors

Covid vaccines generate antibodies for (S) glycoproteins targeting ACE-2 receptors.

In short, we have 80+ years of experience generating antibodies for hemagglutinin, and much less for (S) glycoproteins.

https://en.wikipedia.org/wiki/Coronavirus_spike_protein

https://en.wikipedia.org/wiki/Hemagglutinin_(influenza)


Yes, but it's a greatly abbreviated process for each year's new version of the Flu vaccine.

Going forward, the FDA has already said that a reformulation of the mRNA Covid vaccine would face a similarly shortened approval process.

>FDA says Covid vaccines that target new variants won’t need large clinical trials to win approval

https://www.cnbc.com/2021/02/22/covid-vaccine-fda-says-shots...


This reminds me of the 737 MAX.


The Washington Post did some really great work on generating a variety of comparison datasets: https://www.washingtonpost.com/climate-environment/interacti.... You're right, though -- it's really hard to avoid the issue of political influence in climate data. None of the data can exist in a vacuum; it all has (geo)political implications.


Well, the Wapo is really one of the most untrusted sources ever. It's merely a CIA propaganda outlet.

Scientific research would be a try, but propaganda outlets for sure not.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: