Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
How to Read a Research Paper [pdf] (eecs.harvard.edu)
174 points by yankoff on July 17, 2014 | hide | past | favorite | 48 comments


My father (PhD in Malacology) taught me when I was a kid, that a research paper should be read in the following steps

1. Read title and abstract 2. Read introduction 3. Read conclusions 4. If the content of the 3 previous steps makes sense, and the paper is relevant: - Read the body of the paper - Archive it and write bibliographic record card (yup... that old school).

Worked me wonders in my PhD.


I've spent a lot of time recently doing a deep dive into all the academic papers in a particular field of neurobiology, and I have a different approach:

1. Read the title and abstract 2. Read the methodology 3. If the methodology isn't totally asinine, then read the rest

I discovered after reading hundreds of papers that most of them are total nonsense.

I found myself reading them and updating my "knowledge," then getting to the methodology section and realizing it was bullshit. I felt like I couldn't fully "un-update" my model when that happened, the damage was sort of done.

They make claims and they have "evidence" that sounds compelling enough to slip by some filter, but then the methodology is totally bunk. The n is way too small, or limited in some other fundamental way, the experiment design is idiotic cargo cult stuff. Obviously just going through the motions of publishing because they have to, rather than having some valuable insight. Then papers like that cite each other, and build this whole wobbly network, full of sound and fury, and signifying nothing.

I've learned to totally ignore any paper that I haven't read the methodology for first.


Critiquing the methodology section is the best part of reading scientific papers. However, it soon becomes bittersweet as when you implement your own experiments, you soon become aware of how difficult it is to design a good experiment. That being said, if journalists read methodology sections (or anyone, really) then the world would be a much better (and less sensationalised) place.


Those steps don't quite map well to computer science academic papers - or, at least, to systems papers, which are what I tend to read and write.

Conclusions in CS papers are generally worthless. The contribution of the work is typically not new knowledge, but a thing - a system, a technique, an algorithm, a language, a framework, etc. So the conclusions in CS papers tend to be "We presented a foo for a blah blah blah." I joke that conclusions in CS papers are mostly vestigial, and exist only because they're expected to exist. (Please note that this bothers me, and I try to include actual conclusions in my papers, which will take the form of general principles we can learn from this new technique or system that we're presenting. But such discussion is not needed, and is often cut due to space limitations.) So, skip the conclusions sections in CS papers.

The introduction can also usually be skipped, if it's an area that you are already familiar with. CS papers tend to have to tell a "story", and the introduction sets the scene for why the work is relevant and who cares about this sort of thing. These introductions can start out very broad and general, trying to motivate the work from larger trends in industry and society. (Yes, even CS systems papers will appeal to larger trends in society to motivate their work. There are a lot of CS systems papers which will appeal to the prevalence of, say, online social networks to motivate their work on the systems required to support such things. Such as graph processing or real time data management.)

If you are familiar with the field, you can typically skip all this. You already know it. Please note that introductions are great if it's a field you're not as familiar with.

A CS systems paper typically has the structure similar to: 1. Introduction; 2. Background; 3. Design; 4. Results; 5. Related Work; 6. Conclusions.

If I want to quickly assess if a CS paper is worth my time, I typically read the title, the abstract and then skim the Design section and skim the Results section. (Of course, if it's my field, I often can't resist just skipping to the Related Work to see if they cited me.)

If the paper passes my first look, then I'll typically start by reading the Design. If I hit any difficulty in understanding the Design section, then I'll jump back to the beginning to get the proper context. I don't bother reading the Results section until I understand what they did, and think it's reasonable. (If your Design section tells me how you implemented the best 3-wheel car in the world, I'm not going to bother reading your Results. I don't care to try to reason about the performance of your 3-wheel car, because 4-wheel cars exist - unless you can convince me of situations where 4-wheel cars are not an option. That's a hard sell, though.)


I suppose quantum computing presents one of the few exceptions? And on the other end, distributed systems? Where else is new ground being explored? Cross functional work?


This is, by far, the most appealing approach.

I have to dig through a lot of research papers for my blog, most of the papers are dead-boring, but I feel obligated to read them with a fine-tooth comb to better understand what's being reported.

Filtering irrelevant or poor documentation by following your process seems like a no-brainer. Thanks for sharing!


When we were taught how to write papers in 6th grade or so, it seemed very repetitive. I have to summarize the research first, then go into detail, then another summary? But it's really more like making an easy-to-scan index of your paper.


On a related note, I just realized this week that there was an entire category of software dedicated to doing something I've been painstakingly doing manually for years, organizing your collection of papers.

They are called reference managers, and will extract the title/year/author/abstract so you can quickly glance at that obscure "iccp2012.pdf" you downloaded last month and know if it's relevant or not to your current task. Provide full text search on your entire collection, synchronization between home and work, etc.

Coming from an engineering background I had no idea these existed, but it sure will save me a lot of time and frustration.


Those capabilities are remarkable. There is so much software on the internet now that you can perform a kind of magic trick:

1. Envision what you would like software to do to help you with some task.

2. Figure out what software like that would be called, and what search terms/search engine it would be found on.

3. Enter terms into correct search engine, find that the software exists, install it and start using it.


"In a recent xkcd's alt text, Randall Munroe suggested stacksort, a sort that searches StackOverflow for sorting functions and runs them until it returns the correct answer. So, I made it. If you like running arbitrary code in your browser, try it out."

https://gkoberger.github.io/stacksort/


That idea might approach something useful if it was combined with memory for how various algorithms have done and the ability to optimize over time by choosing better algorithms. It would be sort of like a meta-algorithm.


Zotero is the Open Source choice. The best feature is keeping a BibTeX file with all your bibliography updated.


Can you tell me how to do this? I used to use Mendeley before it was bought by Elsevier, and it had a useful function of exporting your entire library to a single BibTeX file automatically each time you added a new reference. I've not found a way of replicating this behaviour in Zotero (I can manually export to BibTeX, which is great, but I'm lazy and if the program can do it automatically, I'm all for it!).


It might be different in the browser, but in Zotero Standalone (which is what I use), right clicking the name of the collection in the left sidebar will give you an option to Export Collection. BibTeX will be one of the options. It works pretty well. I occasionally had issues with special characters and had to make changes every now and then for biblatex.


See autozotbib zotero plugin. It may be close to what you are looking for.

https://www.zotero.org/support/plugins


Papers is a cool product / company based in UK (London IIRC).

edit: http://www.papersapp.com/


I love Papers but they've been ridiculously slow with retina support; I wouldn't mention it, except it's the easy things that still haven't been done, such as coloured circles for tags, while more complicated things (many icons/artwork etc.) were done ages ago.

They released a major new version (Papers 3) just under a year ago. One of those 'fresh start' kinda releases; a re-think of what a reference app should be like, and substantially different from Papers 2. I still haven't upgraded because it doesn't do much more than Papers 2 did, but if you're coming to the product for the first time, it really is a great release. Highly recommended. Definitely the best Mac client out there, and they have a good iOS app too, along with solid Dropbox syncing. (Haven't tried their Windows version.)


The Netherlands, I think.

I agree, Papers is great. I have been using it since the earliest version, both on my Mac and on my iPad. Neither has ever given me trouble.


Not sure about that [0]. Can someone clarify?

0: http://www.papersapp.com/about/


Ah, they've moved since they started. Their new offices are indeed in London, though if you look at the copyright notice on the footer of that page you'll see it has a Netherlands address. Their careers page [1] does say their new offices are in London.

When they started, the two founders were both still mainly post-docs at the Netherlands Cancer Institute [2], and that's what I remembered, reinforced by the copyright notice I mentioned above.

[1] http://www.papersapp.com/careers/

[2] https://web.archive.org/web/20070213053257/http://mekentosj....


Those terrible meaningless PDF names, along with wanting to automate grabbing BibTeX from online databases (mostly ACM and IEEE for me), were the main drivers for writing BibDesk, a Mac OS X reference manager: http://bibdesk.sourceforge.net/

I started it back in ~2002 or so, and it's been kept running by a small group of contributors ever since.

Not a multi-user or web-based solution, but it has accumulated quite a few features (including searching many databases) that can make keeping a personal BibTeX file up to date much less of a pain.


Flow, while not open source, is a modern and social version of these types of tools.

https://flow.proquest.com/


Your OS doesn't support full text search of PDFs?


Does yours detect which part of the full-text is an author name, which is journal and which is publication date (mind the reference section present in practically every paper)? Does it index the notes you made on them?


To badly paraphrase Star Wars, "your confidence in your friends is your greatest weakness". Pdf sucks, it's little more than a glorified 2d graphics api that happens to be used mostly for content that should be represented as text. Except that there are dozens of ways to make stuff look like text, but without the advantages.


This is true for some PDFs but not for academic ones produced by PDFLaTeX.


If you're really suggesting what I'm reading into your post, which is that 'academic pdfs' are somehow some sort of glorious special species because everybody in academia uses latex and all publications are typeset using that, then I have a bridge in Brooklyn you might be interested in.


PDFs don't necessarily support that. Some PDFs draw a lowercase i by drawing two filled shapes, more draw it by calling a function that draws two filled shapes. And, yes, some draw it by invoking 'i' in a font.


Dam - I wish I had this during University!!!



Looks good; I'll be doing the same thing and passing this on to the students.


I think the overall advice here is pretty solid. I might add a couple comments:

--- The time budget is extremely sensitive to how much you already know about the subject. If you're generally up on the literature in an area, it is often much easier to isolate the chunks that are genuinely new to you and thereby grok the paper very quickly. In a new field, even several hours to really read a paper might not be enough, depending on how much backtracking you need to do to understand core ideas.

--- In physics, the conventional wisdom (which I agree with) is that the main thing to do when checking out a paper is to look at the figures. These are generally chosen to highlight the most important points of the paper and will often quickly convey what was measured/calculated, how the effect scales, etc.


Then there's how to read a research paper for hackers.

I see each article as a patch. Our minds are both running operating systems and decentralized repos. Each of us are specialized and running a different custom OS, so may require different sets of patches.

Science is specifically concerned with our mental models of how the world works. Different fields in science are like different levels of abstraction in programming.

My specialty leans more towards higher levels, so subjects like psychology, philosophy, biology, are more relevant to my mental model than lower level mathematics, chemistry, and physics.

When I read research articles, I'm primarily interested in extracting an abstract high-level idea, which I can apply to my repo. I start with the end of the abstract, then discussion, conclusion, results, to find the main point, then only look at details if I can use them.

Once I have a simple idea, I merge it by connecting it to related ideas in a functional way. Each idea then becomes like a code snippet of a function, and my process of learning is like coding, where although you may copy/paste a snippet off the web, you still need to reason with logic to connect it into your code so it actually works.

If I can't make something work right now in my repo, it doesn't commit, so is temporarily stashed, where I may either forget about it, or come back to it if I find the missing pieces to fit it into the working directory.


I like this analogy, I find myself falling into a similar pattern. Particularly when I can't understand something because I lack the understanding of its mathematical basis for example. I put it aside and have found myself returning to those papers once other things I have coincidentally read allow me to have a better understanding.


I'm afraid the one-page review assignment part at the end of the article won't give much satisfaction to the instructor who came up with it. A lazy student could simply paraphrase the paper abstract to address the first two bullets (summary, arguments, conclusions).

I think asking for a one-page detailed example and illustration of the paper would force a more careful read, and would be more interesting for sharing with peers.


As the author mentioned that there will be follow up on how to skim a paper, anyone knows the link to that or how to write a paper?


Here are some references from "How to read a paper" by S. Keshav [1](found in a comment by tnhh above)

Writing Technical Articles http://www.cs.columbia.edu/~hgs/etc/writing-style.html

Whiteside's Group:Writing a paper: http://www.ee.ucr.edu/~rlake/Whitesides_writing_res_paper.pd...

[1] http://ccr.sigcomm.org/online/files/p83-keshavA.pdf


Look on Simon Peyton Jones's website, he has a set of slides on how to write a paper.


> But to really guage the scientific merit, you must compare the paper to other works in the area.

Know what you read!


This is probably true in the hard sciences, but not necessarily in mainstream CS which is a mixture of empirical evidence, engineering and current best practices.


> Read critically

surely that means accepting that a bulk of research papers are written exceptionally poorly with excessive wordiness and jargon, a lack of good paragraph structure and not very much real content.

if you have to tell people who can read how to read something, its probably written incorrectly.


No change in writing style will let the researcher who reads it get out of thinking about how the ideas might be useful in their own work, what the authors might have done wrong without realizing it, etc.


s/paper/code/g and you get a nice read about code review


Or it can be generalized to any persistent, human-readable and compressed form of complex ideas. Good papers and working good source code are in this category.


The person who needs instructions for reading a research paper probably ahouldn't even bother. I've seen way too many graduate (including doctoral) students need to be spoonfed how to do things. These people never turn into original thinkers who are capable of even slightly meaningful output (contributions to their field), in my experience.


I tend to disagree. Everyone has to do even the simplest of tasks for the first time and reinventing the wheel is often a waste of effort.

If you think that everyone just knows how to do this, take a close look at your last set of reviewer comments; were all of them pithy, accurate and demonstrating a close understanding of your work?


That may be true for researchers, but there are a lot of implementors who take original research and try to apply it to problems.


I hope you have never TAd a class.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: