A practical, scalable, distributed data store

wdr1 · on June 14, 2008

Is it just me or is scribd.com the worst thing to happen to the Internet since MIME-based email?

Insofar as I can tell all it does is strip me of basic functionality (e.g., save the thing so I can read it offline), introduce confusing functionality (TWO scrollbars on the left!? WTF?) and all for what? So some engineers could do some flash-based masturbation & feel web 2.0?

Scribd.com developers: you are not Web 2.0. You are not 1.0. You are web -0.5. You are what we people did when they had bullshit internal doc apps, before HTML.

Now please stop pissing off the Internet & go under already.

bayareaguy · on June 14, 2008

I suspect that Scribd may be good for precisely the kinds of proprietary format documents you don't see as YC news items very often (Word, Excel, PowerPoint etc).

However for PDF, Scribd really isn't a good fit since there are already so many good PDF viewers out there. In particular the PDF support my OSX Laptop comes with blows Scribd out of the water so a Scribd link to a PDF is effectively a downgrade for me.

Too bad there isn't some info in the HTTP protocol Scribd could use to decide to deliver me the PDF instead. Oh wait...

jfarmer · on June 14, 2008

To me Scribd was always useful for sharing documents online.

For documents that are already online it sort of defeats the purpose.

I suppose for some OS/browser combinations the experience is better, but on Mac/Safari I can display the PDF right in the browser where I can scroll and search without any trouble at all.

jsn · on June 14, 2008

insulting the scribd product is somewhat unfair, imho. it's not that the product is inherently bad or something. maybe someone finds it useful (ycombinator owners apparently think so). i certainly don't, so i use my greasemonkey script to route around it.

all that said, i'm somewhat puzzled with the decision to enforce the scribdification here on HN. isn't it a bad PR for scribd? does it really generate enough positive effects to justify the obvious negative effects? not trying to troll or something, just really curious.

smanek · on June 14, 2008

Here's the real PDF: http://www.hpl.hp.com/techreports/2007/HPL-2007-193.pdf

thaumaturgy · on June 14, 2008

I read the paper, and I must be missing something. (I'm not a computer scientist, so there's probably something important here that's flying right over my head.)

The paper mentions "fault tolerance" a number of times, and I'm thinking, "fantastic! They've got some magic in there so that if a node goes down, there are automatically some number of nodes that can instantly take over its place." Except, they don't ... their fault tolerance sounds like they just expect to use servers with really good data recovery. It does mention a primary backup server that's supposed to shadow each other data server, but now we're talking about an effective 2n nodes. Also, it doesn't sound like it would be all that fault tolerant in the event of network problems (broken pipe between nodes, denial of service attack, etc.), or if the root node goes down.

Since they're only storing actual data in leaf nodes, they're not outperforming a classic balanced binary tree in search time.

It does sound like they've done some good work in making sure that they keep data integrity in the event of client-server communication problems, but that's also been solved by a number of database and file systems already.

So ... can someone explain the novelty here?