Already counting the days until this inevitably gets killed. I've been burned too many times to rely on Google for anything, except tracking me and pushing ads, which they indeed do better every day.
I wonder if their real intent is to gather training data on which parts of papers are considered important by readers, and which topics are related to each other.
Capturing and visualizing research knowledge is personally an exciting space. I feel that deep reading and absorbing content continues to be challenging, due to the ever-increasing amount of published research, rudimentary reading apps (Google PDF reader finally addressing issue with easily looking up references), and due to somewhat disconnected tools for reading and note-taking. Similar to the readers piggy-backing on the PDFjs library, I've developed an app that helps me capture and organize personal research knowledge [1]. Additionally, visualizations and customizable contexts for notes help to recall and link information.
As a daily Zotero user, not really. The nicest thing I can say about it is, it has plugins and is FOSS. Maybe the new 7.0 release will blow me away, but I've been waiting for it to get out of beta forever.
More fundamentally, we need to stop disseminating scholarly work as PDFs, a format primarily designed for print. Plain HTML would be an improvement. Even better than HTML would be an extended variant with scholarly-specific semantic markup and universal, animated, explorable figures. Embedded notebooks would be cool, too, but disseminating data would still be a major challenge. (And I don't just mean storage/transfer; a lot of researchers are reluctant to share source data to the world.)
So I'm a researcher that almost always uses pdfs... Does HTML have the reproducibility that PDF promises? My feeling is that if I store a PDF, it'll look the same in a decade. But is HTML the same way? It seems like it relies on the web browser and many other things... How would one manage things like images and gifs? Is there a way to keep everything into one HTML file that's easily shareable and feels secure?
The potential to freeze an HTML page in time with minimal changes at render time is already there. [0] Such an ability can even be baked directly into the rendered HTML page so the viewer would be able to download a copy of the page as it is seen at a given time. Other archiving facilities, such as archive.org, take static snapshots of accessible pages if allowed by the publisher of the page and requested by anyone who wants to make that snapshot.
My point is that it is possible to achieve in principle and in practice, albeit that might be practiced as often as one would like to see.
I like SingleFile, but it's not perfect. It usually works just fine, but will occasionally drop the ball depending on the type of JavaScript on the page.
For example, I once backed up a page using it, and while it got all the content, it did not grab the JavaScript necessary for the images to display correctly.
> Does HTML have the reproducibility that PDF promises? My feeling is that if I store a PDF, it'll look the same in a decade.
Feelings and promises are each one thing. Reality is another. PDF doesn't even look "the same" today. I have serious questions about how often folks who think that PDF is reliably consistent from system to system step outside their bubble and just how diverse their setups are that they're testing on.
> is HTML the same way?
Well the status-quo for copy-and-paste in HTML isn't dogshit, it's comparatively trivial to find and use tools that can thoroughly and exhaustively search your collection (or even write your own), and HTML is a dead simple plain-text format that if worst comes to worst you can read with your eyes (unlike needing to run a bunch of inscrutable code from a PostScript subset through an interpreter before you can do anything with it). So, no, I wouldn't call them the same.
Machines and humans can both easily use HTML/XML. Extracting information from PDF’s is so much harder that there’s deep learning products dedicated to doing it. They still make mistakes, too.
I’d much rather have something akin to the CHM files where everything I need is in one file, easy to analyze, and has good readers.
I explored tools to export/interchange PDF to HTML in the KnowledgeGarden app, but the results were not optimal, suffering from non-standard layout and poor typesetting of equations. Publishers of scholarly articles generate web pages of papers, but they're not replicas of PDF files.
Re. self-contained HTML (and slightly off-topic), look at TiddlyWiki, which contains data/code/layout all in one interactive, local or hosted HTML. Extensibility, plugins, and community of contributors are some key highlights, among others.
> As a daily Zotero user, not really. The nicest thing I can say about it is, it has plugins and is FOSS. Maybe the new 7.0 release will blow me away, but I've been waiting for it to get out of beta forever.
Can you elaborate where you think Zotero drops the ball?
one major issue with zotero is the lack of android support. they are working on an android version or app or something since forever.
then is the way you store the pfds. if you want to sync between multiple computers you have to either know how to work with webdav or know how to point zotero at the location where you have your pdfs or (what they most certainly love) pay a lot of money for not so much storage space on their system. that last thing is what i don't like because i just don't trust anyone these days. you get invested in a system, build your routine around it only for them to shut it down, sell it watever and then puff you have to start over.
people keep calling zotero foss but if they were truly foss they would have a much more transparent way for people to roll their own selfhosted zotero server. instead, what they have is a dump of an old version, with next to zero documentation and a bunch of stubborn people that have managed to get something working but not quite.
I get that they are trying to make money but I am sure they could do that and be more transparent.
The other thing is the reliance on so many plugins. While zotero itself may last a while, who can say anything about the many devs of the many plugins that you end up relying on in order to make zotero bow to your routine? I like zotfile and a few others, but how long are they going to last? Also, reinstalling my system is a huge pain to get back to my routine because I have to remember all the settings for each and every plugin I install. They should come up with a way to save all these settings and restore them, and no don't do it through another plugin!
Here's a guide I found useful to set up zotero storage. In brief, it relies on zotfile to flatten the storage (keep all pdfs in one directory) and better bibtex.
I realized that it helped me to get rid of exactly the pain with fresh installs that you mention. I realized that the two plugins give me most of the functionality that I want.
Does anyone have a research paper reading tool they're happy with? Zotero is what meets most of my needs but I wish I could organize the papers faster and I wish the annotation tools were better. AI-assisted reading is a plus too.
I was also unhappy with how reference managers handle annotations. So I rolled my own app (https://getcahier.com), with highlight management integrated in the application. This enables me to extract highlights according to topics, organize them in notes using document elements (like collapsible notes and outlines) and use them to plan more complex arguments. This makes it much easier to read actively.
On the paper organization side, I would also like to find out a better way of doing it. What helps me a lot, from a more methodological perspective, is to categorize books according to time period, school of thoughts, or perspective.
evince recently added some features that I now find it hard to live without. The biggest is that if I mouseover an internal link in a pdf, it shows a little preview of where that link would take me - so if I'm reading a long document and the author says "by Theorem 3.3 it holds..." and if the pdf is recent enough for "Theorem 3.3" to be an internal link, then I can mouseover to remind myself of the statement of Theorem 3.3!
I use a combination of Zotero, Locally Linked PDFs/Folder Structure, and SumatraPDFs (Comments etc.):
folders:
- for every literature search, create a folder with date and name
- e.g. 2024-03-21_Quantum_Entanglement
- use CTRL-SHIFT-DRAG to drop files into Zotero as Links, see [#77](https://github.com/zotero/zotero/issues/77)
- You _can_ organize in Zotero, but you don't have to. Files can be linked
to multiple Zotero folders (simply copy library entries in Zotero)
- sync literature folder and zotero database with nextcloud to somewhere, for backup
zotero:
- disable sync
- set “Base directory” (Preferences > Advanced > Files and Folders) to local literature folder
- set PDF View to “System default” (Preferences > General > “Open PDFs using..”)
- Enable recursive quick search in folders: go to Preferences > Advanced > Config Editor, search for `recursiveCollections`, double click (set to True)
- use CTRL-Shift-C to copy bibliography to clipboard
- Dark Theme:
- https://github.com/Rosmaninho/Zotero-Dark-Theme
- Go to `%AppData%\Zotero\Zotero\Profiles\` (`XXXXXXXX.default`)
- Create `chrome` folder
- Place the `userChrome.css`
- Start Zotero
- Add-Ons:
- zotero-pdfkit
- https://github.com/sharpevo/zotero-pdfkit/
- allows to modify/select a “default” PDF attachment to be opened
- ZoteroDuplicatesMerger
- https://github.com/frangoud/ZoteroDuplicatesMerger
- easier merging of duplicates
- zotero-folder-import
- https://github.com/retorquere/zotero-folder-import
- bulk import PDFs from a folder
- zotero-tag
- https://github.com/windingwind/zotero-tag
- allows to add stars to items (Num Key `1`, `2`, `3` etc.)
- PDF Tools:
- qpdf
- removing passwords, unlocking PDFs, conversion
- install in WSL with `apt-get install qpdf`
- remove password with `qpdf --decrypt --password="" input.pdf output.pdf`
- `SumatraPDF`
- _Really_ fast Viewing of PDFs and adding annotations (highlight, comment etc.)
- Highlight Text: `A`, Save to file: `CTRL+SHIFT+S`
- it is much faster than Adobe Acrobat
- [pdfplumber](https://github.com/jsvine/pdfplumber)
- Awesome python package to extract tables from PDFs into data pipelines. Use with Jupyter Lab.
- [PDF X-Change viewer](https://www.tracker-software.com/product/pdf-xchange-editor), `choco install pdfxchangeviewer`
- for manual OCR of pages/PDFs
Readwise Reader has a nice pdf reader with highlighting, notes, and an AI reader tool. I organize sources using tags. It's very new and in active development. Academic research is not it's main focus, though, so it probably won't add mindblowing academic tools. (like citation support/ backlinks. although it does have internet backlinks that tell you want articles link to the one you're reading)
Readwise Reader is a poor PDF reader, unfortunately. Where it shines is making readable text documents out of PDFs, so it depends on the type that you’re reading.
Nothing quite beats a simple google docs file where I can take notes and put links to sci-hub. Very often, legal download links expire after some time or they force the browser to download the pdf.
I have a google docs for each research project and thus I can share them with my collaborators. Each person has their own section within the doc so we can also easily share information with each other!
Those "chat with a PDF" apps get me halfway there, but I'm more imagining something that can explain certain terms in the context of the paper, or automatically dive into the citations and pull explanations from them too.
Ever since the big UX app overhaul there is not much to like regarding functionality. They removed the pdf renaming and some other major automated file mgmt features from mendeley.
Other than that the commenting and note reader UI was pretty good. And overall UI/UX felt more modern than Zotero, also free (as in beer) cloud backup.
Today I had to do some literature review, and I reinstalled Zotero 7beta because I am not happy with the removed functionality from Mendeley.
I suggest you try new devices to read papers. Often the perception that paper is a better support is due to a lack of more convenient devices. Paper is better than a 15'' screen for sure, for many reasons including size and posture while reading. But have you tried larger screens (> 27''), large tablets (>= A4) or as large as possible E-Ink readers? Depending on your preferences, you might find that some of these work actually better than paper also for you :-)
There is no way I can perceive reading on an expensive device as more comfortable than paper. Paper is fairly cheap, lightweight and resilient; I can carry it around, fold it, toss it aside, sit on it by accident while thinking, annotate it with scribbles, and pour coffee on it with aplomb and finesse. I can flip it, half-tear it in anger, drool on it when I reach my brain capacity. I can take it hiking with me without fear of breaking or losing it. In other words, paper is a tool that gets out of my way.
I did try all the devices you listed above, even had my department pay serious money, and ended up barely using them for all those reasons. I am a mathematician, I am clumsy and I want to focus on my problem-solving; I want to think, and babysitting devices and tools is not what I want to spend my brainspace on.
In my experience those people don't even talk about ink, it's all about paper. They must think that you get just a few sheets of paper for a single tree or something like that, when in reality you get like 10 000 sheets of paper for one averagish tree. And those trees are not rare or anything like that, and the process of making paper is nowhere near as bad as electronics industry. Using paper is as ecological as it gets.
With the typical reading volume of an academic and the amount of plastic in my toner cartridges, I'm not sure paper comes out ahead in that comparison.
A high yield toner cartridge can print between 3000 and 8000 pages of text [1]. Average number of pages in a scientific manuscript is 10 [2]. This means that it would take 300 to 800 printed scientific papers to deplete one cartridge. I would have to assume that a single toner cart is not the same amount of waste as a reading device just due to the recyclability of toner carts, but it is up to you how to count them. If I was going to pull a number out of my ass, I would say 10 carts would be equal to one reading device with battery. Let's go low-end and pick 300 papers, which means you would need to print 3000 full scientific manuscripts to equal the waste of one reading device. How many do you read in two years?
Paper enables non-vision based(scroll bar) random access to content, when you keep going back an forth between two pages, it is very annoying on all current devices except paper. Vision pro/VR/AR or a particular multi-screen set-up can achieve that, but so far all alternatives are not as good.
I’m typing this comment on an iPad ;-) Which I love using for marking up papers and other documents for quick feedback. But doing research is another story.
This looks great! Since they link it all to one's Gmail account, I wonder if they implement saving annotations to these PDFs and have them live on your Drive or elsewhere.
Edit: Also, Chrome now defaults to this extension for rendering any PDFs you load.
Looks great, but can you imagine Google pulling the rug under an academic's document/citation database?
I don't even want to imagine having to migrate all annotations and citations to something else when they inevitably pull the plug on it some years down the road.
Does anyone have any recommendations for good local PDF readers for Windows? I've been reading a lot of various papers recently, and clicking on a citation in Acrobat reader is very frustrating. The document scrolls to show the citation in view, but doesn't clearly show it in the long list that most papers have, and then I have to scroll up to where I was since it doesn't seem to have a working back feature.
I've been using Sumatra PDF on Windows to read papers (and as my default PDF reader) for more than a decade. Clicking on a citation takes you to the bibliography page and lands the cited paper at the top of the screen. Then Alt-leftarrow brings you back.
Sioyek seems awesome, especially vim inspired features.
Too bad u (undo) doesn't work and there doesn't seem to be a way to undo. Am I missing something or is it laking it?
DrawboardPDF if you want something more full featured and like to annotate, highlight, bookmark and whatnot, particularly if there's any chance you'll also use a stylus
Just so you know: normally it scroll down so that the reference is on top of the page.
But most importantly.. ALT+'left arrow' allows you to go back before you clicked on the citation! It doesn't work all the time, but usually it does after some left arrows ;)
Also, in Android: you can click on the 'scrolling sign' on the right of the pdf and specify the page, or see the link to 'jump back' to before you clicked on a link!
Mostly too slow for a lot of content, not every content is supported, not easy to keep it open at the right page, no comments, not easy to find the right tab, etc.
What you might consider if finding an ebook reader app and using that. I had a similar issue but on Android (for ebooks not in kindle format). I ended up with Librera but there are several. Turns out it's also equally great at academic or work PDFs.
Unfortunately, Preview has been the best reader in my experience. I say "unfortunately" not because it is inherently bad, but because it is a sad state of affairs when nobody can build something better than the barebones native tool
I've never seen so many light/dark modes before. There's Device Mode, Light Mode, Dark Mode, and Night Mode. AFAICT Device Mode follows the browser/device's current setting, Dark Mode makes the sidebar dark but doesn't change the PDF, and Night Mode darkens both the sidebar and the PDF. I wonder how they decided to have so many modes?
Automatically applying dark mode to documents tends to have poor results, especially when images are involved, but some people are masochists and/or can't be bothered to turn on the light, so they made a separate setting for them. Although I think a toggle would be better instead of dark mode/night mode.
Did you put "Very Datk" in quotes because it was spelled like that where you saw it? Or is it just "Very Dark" and it's quotes because it doesn't fully imply what you get?
That's nice and all, but google scholar recently removed all the 'cited by' 'related articles' and other links from the HTML pages of google scholar. It was like this for about two months before they restored the functionality. It likely they will remove it again soon. Google scholar is getting worse, not better. The google devs have no idea what a typical academic's computer is like around the world. They dev for their lived experience and it's just not applicable. A javascript (slow, computationally expensive) pdf reader is just another aspect of this ignorance.
Alongside this, I have found that Google Scholar's search has become noticeably worse in the last year or so. I can search for an author's name and a few keywords from a paper title and it won't show up, even if the paper has like 5000 citations.
Wow so I am not going mad! I had the same experience and it almost feels like Google is trying to recommend me papers based on my past searches. I hope they revert to their earlier algorithm
A js pdf reader they control has monetization possibilities. Slip in an interstitial page for Naturally Fun Arkansas with an article from Nature. You don't want scholar going the way of reader do you?
Yes, I confirmed it with 3 other people on IRC a couple months ago. I didn't know google scholar had restored it until I checked right before I wrote the above post. I thought the links were still gone. They had been the last time I'd used google scholar about 3 weeks ago. Back then I also confirmed it myself first using 3 different computers, 4 browsers (with JS disabled), coming from 3 different IP addresses, both logged in to google and logged out. I probably wouldn't have started writing the post at all if I didn't think they were still gone.
I figure in addition to the feedback they received from me (and presumably others) at the time they saw a drop in usage and restored the functional version. But they'll try again.
Check at 2024 Feb 08 09:54:00 (am) CST. I definitely didn't confabulate the memory because the initial conversation about it (with others) is in my IRC logs. Sorry I don't have any screenshots of my own. Perhaps it was A/B testing or something.
I use an extension called histre, https://histre.com/ for annotation and keeping up with _notes_ / _thoughts_ inline. I found that using tools like Fermat's Library, which provides side bar annotation, histre for inline highlight, annotation and multi-references, and ChatGPT to understand complex terms, all helped with understand recent papers. Even a medical journal paper, https://senthil.learntosolveit.com/posts/2023/10/21/medical-... for me, in one instance.
I tried Google Scholar after this comment. Google Scholar PDF is a valuable addition to the research community. There isn't any tool that can give direct hyperlink to the references in the research paper, and this does it, as inline hover, and it is incredibly useful to navigate and follow the references.
Ha ha lol. Is that really the best they can think of in an age of AI? Instead of turning PDFs into web pages how about some actually useful tools:
* Summarisation
* Succinctly placing the research in context of the broader field
* Highlighting limitations or flaws in research methods, etc.
* An outline view to summarise each paragraph/section and then drill down into the ones you actually want to read in more detail
* Rephrasing into plain English. A lot of academics enjoy sounding clever and usling long words so it'd be nice to be able to switch off "ego mode" and just read stuff in plain English instead of having to wade through their word-soup.
With more effort maybe Google could create a PDF reader that is actually innovative.
The friends I have in academia say that their PhD professors tell them to use "ego mode" or papers won't pass review and be accepted. I'm with you though. And it's not about specific jargon of a field, it's just wankery. Most lawyers do the same thing and you need to get extremely good ones to write good contracts in clear language.
Can someone recommend an app for ipad that can read PDFs? I want to be able to bookmark using my browser but read it on my ipad. Sort of like "Save to pocket" extension.
Primarily I've been using Zotero and Notability. They each have "save to" on mobile. Zotero has a chrome plugin that requires the desktop app to be running. They both optionally support a dark mode for reading in the dark.
I like the experience of reading in Muse.app on the iPad. It's a nested whiteboarding thing, but also can act as a PDF reader. (It'll let you pull out chunks of the PDF and put it on your canvas with a link back into your document, if that fits your flow.) I often read on my phone, so this is not an option for me.
Apple Notes and Muse slow down with a lot of ink. For taking a page full of notes I'm using Notability.
I've heard good things about GoodReader, but haven't played with it in years.
I use Readdle Documents to sync PDF folders with my server PC via FTP. Free version supports PDF highlighting & simple annotations, basic file management, and automatically syncs back everything.
In theory the built in files app will work for this. However, I like goodnotes, which has good highlighting snd library support. I’ve used it since grad school for reading papers.
Does anyone know of a library (or reading material) that can render a pdf (mostly architectural drawings) on to webgl canvas as actual vectors not image?
WebGL is inherently a vector-to-raster technology, it’s always backed by a pixel buffer. One might argue that even PDFjs works this way with its calls to the canvas API.
What are you trying to do? Why is webgl the key here?
Basically I am looking to render pdf of architectural drawings onto webgl (because 2d context is too slow), and maintain the vector information in the drawings (ie lines, arcs). I know webgl is eventually raster, but I want to pan/zoom while retaining the crispness of vector lines.
Can someone recommend lightweight alternatives to Paperpile or EndNote that have two essential features:
1. Rename a PDF file to a consistent (Author Year Journal) format.
2. Online sync (Mac, iOS and web access) - including via say iCloud or Dropbox.
Maybe this just needs a script? I just paid $100 for EndNote 21 yesterday and don’t think these needs justify that cost.
Has anyone tried installing this? It says "PDFs on all sites will have a new look in Chrome."
This makes me nervous. I'm often looking at PDFs that are embedded in a page (either grad school software for commenting on PDFs, or publishers' sites). Is it going to play nicely with those? Is this only for navigating directly to a PDF?
My guess (as someone whose company makes a PDF extension for Chrome) is that it may intercept embedded PDFs as well. Sometimes sites use iframes or the like, and those get intercepted. But if the PDF is displayed through some sort of third party tool then it would be unaffected. Just my 2¢!
Alt+<- brings you back to where you were after clicking on a reference. You can skip around pretty easily to see the referenced object and this overlay does seem kinda interesting it’s not something crucial.
I would also be interested to know how they decide to pick up a site. I was very surprised to learn that a technical note posted only to my website was picked up somehow. (I am a mathematician and so there are other things on my site, but it’s some custom static site generator thing and I’m still astounded).