Hacker Newsnew | past | comments | ask | show | jobs | submit | bkalman's commentslogin

It'd be entirely up to the distributor of the data, so perhaps the answer to your question is "all of the above".

For example, (1) our command line tools use URL-like paths which implies "use this hostname" (to copy-paste into terminal), (2) we have some in-browser visualisations like http://splore.noms.io/?db=http://demo.noms.io/cli-tour which implies more of a "click here" type UI.


Keep in mind (if this wasn't clear) that the chunks are only probabilistically 4K: https://github.com/attic-labs/noms/blob/master/go/types/roll.... I.e. the thing that's "fixed" here is the chunk size we're aiming for. The chunks themselves could be of any size.

In any case, that's a good question - we might want to do something about that down the line. But, if we did change that constant, the structure of the trees will change, and all[1] the hashes will change.

[1] a small number will stay the same


See https://github.com/attic-labs/noms/blob/master/doc/spelling.... -

In this case, we need to be able to address either a database and a dataset. The presence of a :: makes it unambiguous.


But isn't `<database>/<dataset>` more or less similar to `<database>::<dataset>`? The only difference is the choice of a delimiter to disambiguate between a database and a dataset. For me, the first scheme is much more familiar.


Say we did just do <database>/<dataset>. What does the path "http://demo.noms.io/cli-tour/sf-fire-inspections/raw" refer to? Is the database "http://demo.noms.io" and the dataset "cli-tour/sf-fire-inspections/raw"? Is the database "http://demo.noms.io/cli-tour/sf-fire-inspections" and the dataset "raw"?

In our sample data (see https://github.com/attic-labs/noms/blob/master/doc/cli-tour.... for example) we actually have this exact path, and the database is "http://demo.noms.io/cli-tour" and the dataset is "sf-fire-inspections/raw". We need the "::".

Allowing "/" in a dataset name is very convenient (it's common in git branches). Allowing "/" in database names is essential for URLs.


You're just trading one arbitrary thing for another, IMO, but what's worse is you are now abusing the URL specification for the HTTP(S) protocol, so nobody can use existing HTTP URL libraries.

You could easily say everything before either ? or ; always refers to a database, and use a query parameter or a semicolon to delineate a dataset. Or you resource paths:

Address a dataset:

    http://demo.noms.io/?dataset=cli-tour/sf-fire-inspections/raw
    http://demo.noms.io/;cli-tour/sf-fire-inspections/raw
    http://demo.noms.io/dataset/cli-tour/sf-fire-inspections/raw
Address database (catalog):

    http://demo.noms.io/database/cli-tour/sf-fire-inspections
    http://demo.noms.io/catalog/cli-tour/sf-fire-inspections
Address dataset in that database:

    http://demo.noms.io/database/cli-tour/sf-fire-inspections;raw
    http://demo.noms.io/database/cli-tour/sf-fire-inspections?dataset=raw


Why not have the dataset name as a fragment in the URL? For instance:

    http://demo.noms.io/cli-tour#sf-fire-inspections/raw
Glancing over RFC3986 [1], fragment identifiers seem to be pretty much made for what you're trying to communicate with :: - separating a subresource (the dataset) from a primary resource (the database). Unless I'm misunderstanding something?

[1]: https://tools.ietf.org/html/rfc3986#section-3.5


There are issues with using `:` in an URL, if you plan on using the URL in a way that's compatible with the extant software out there. I remember:

- I remember the Rails community trying to use `;` which broke Mongrel 1. Mongrel's parser was generated from the RFC. There was a huge flame war about that back in the day. The Rails core team at the time thought that Mongrel should make an exception to a reserved character. (And after all was said and done, it got changed back to `/` for that particular use-case).

- When working on IPv6 support about 3 years ago, one of the things I added to an open source Ruby project was IPv6 literals into the URL. This was a case of using `:`. Even though this was defined in the RFC specifying the literal, I found out at that time the Ruby standard library was written in a way that assumes you would never have `:` in the URL other than to delimit the port. I ended up having to do some workarounds for that.

That's with Ruby. I wouldn't be surprised if many other extant libraries parsing URLs that might break -- at least not without escaping those characters.

See: https://perishablepress.com/stop-using-unsafe-characters-in-...

You don't NEED ":". You NEED some sort of delimiter that can clearly distinguish between database and dataset; you happen to pick ':' to satisfy that. There might be a different delimiter that works better.

The other option is to not pretend that is a URL and call that something else.

Post-script: I think this project is a great idea. I'm looking forward to see how it turns out.


And just to be clear on this: the `::` might not be a big deal if it happens after the `/` delimiter specifying the host part.

So:

http://localhost:8000::dataset

may break code that tries to discern the host name. However:

http://localhost:8000/::dataset

Might not. Further, you could also reserve `_` in your scheme to refer to the default database:

http://localhost:8000/_::dataset

But as I mentioned in my previous reply, there may be unintended consequences. If this is something you guys want to do (and have HTTP/HTTPS URL compatibility) to check it out on different language/platform and see if your scheme breaks things. (And definitely see if Windows library assumes this; Windows file paths uses `:` as a reserved character)


why break something that's already solved a gazillion times. go open standards, don't create your own.


Java breaks:

groovy -e "new URL('http://localhost:8000::people')" Caught: java.net.MalformedURLException

Python breaks:

>>> urlparse('http://localhost:8000::people') ParseResult(scheme='http', netloc='localhost:8000::people', path='', params='', query='', fragment='')


:: breaks the url for clients / is not supported in the URL specs. Use the fragment or query.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: