Hmm... it seems like this would be a good thing for undergrads to do in a class ...

gus_massa · on Aug 21, 2020

It would require a lot of babysitting. Some objects have a few slightly different definitions (for example,one book has a definition, and the other book the definition is 1/2 of that). Sometimes the programs to run the calculations don't use the same variables than the paper (perhaps the team changed the opinion, or the main reference, and the graph must show x+y vs y). Sometimes the work that should be included in the paper is underdocumented, ...

Another difficult part is to select what to publish, for example cut the dead branches and add a few more data about the interesting part. It is not usual to get a bunch of data and just publish it without some additional work.

etrautmann · on Aug 21, 2020

Yep, and it’s not really how authorship is supposed to work.

The hard part here is that communicating any idea clearly to an audience takes massive effort, and usually null results, unless quite interesting in a specific context, are naturally a lower priority.

I’m currently working on a paper built around what I believe to be a fascinating null result though...

mycall · on Aug 21, 2020

> It is not usual to get a bunch of data and just publish it without some additional work.

I thought that is what data lakes and event sourcing is suppose to solve.

gus_massa · on Aug 21, 2020

I'm not sure what that means, but we are not using it.

In medicine some studies are preregistered, but one of the lessons of Covid-19 is that each week there is a new study that is clearly unregistered, without a control group or with a definition of control group that makes me cry (like "an unrelated bunch of guys in another city").

I think the people in particle physics have a clear process to "register" what they are going to measure and exactly how they are going to processes it. (The measurements are too expensive and too noisy, so it is very easy to cheat involuntarily if you don't have a clear predefined method.) Anyway, I don't expect them to have the paper prewritten with a placeholder for {hint, evidence, discovery}.

In most areas you just put in the blender whatever your hearth says and hope the best. Or run a custom 5K LOC Fortran 77 program (Fortran 90 is for hipsters).

If you get an interesting result for X+A, Y+A and Y+B, you probably try X+B before publishing because the referee may ask, or more B because B looks promising.

If you run a simulation for N=10, 20, 30 and get something interesting, you try to run it for N=40 and N=50 if the interesting part is when N is big, or for N=15 and N=25 if the program is too slow and the range is interesting enough.

And it is even more difficult in math. You can't preregister something like "... and in page 25 if we are desperate we will try integration by parts ...".

refurb · on Aug 21, 2020

That’s an option.

What would be useful is a low effort way to translate what’s written in a notebook (or electronic notebook) into a nice summary that can be shared.