Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
We Need A Standard Layered Image Format (shapeof.com)
246 points by ingve on April 30, 2013 | hide | past | favorite | 142 comments


TIFF, http://en.wikipedia.org/wiki/Tagged_Image_File_Format has been around since 1993. It supports layers. It supports compression. It supports paged(!) images. It has well-defined supported image encoding formats. It's already supported by all the image editing software you've ever used (from this millennia).

We have a Standard Layered Image Format. HURRAY. NEXT PROBLEM, PLEASE.


Image formats have come a long way. I think the author wants an image format that can replace .psd in a sensible way. TIFF sadly cannot do that. While it can contain layered raster data, it lacks a lot of other capabilities. The author explicitly mentions vector layers, but if you want parity with .PSD you're looking at also having to support adjustment layers (hue/curves/levels/etc), blending modes, opacity/fill, raster masks, clipping masks, vector masks, locks, annotations, guides, grids, gradients, tags, text, comps, channels, and countless other things I can't even think to list (I wont even dive into smart objects). Keep in mind that's just the layer data, there's tons of other metadata in the format as well. It is a behemoth of a file format, and the years of backwards compatibility inside of it have multiplied that complexity to levels of insanity. I have .psd files that if I enable backwards compatibility when saving will increase the file size by half a gig. Granted those are enormous source files to begin with, but with higher def screens on the horizon file sizes are increasing dramatically. Having something more sensible than .PSD is something we're going to have to deal with sooner or later.


Anything that TIFF lacks can be added using custom tags. All people need to do is to agree on names and types for these custom tags that store the new information. This means any old app will still be able to work with such TIFF files and just ignore the information it doesn't understand. So the problem isn't solved but the answer is not coming up with yet-another-file-format.

A good example of extending TIFF with tags is Pixar's PhotoRealistic RenderMan. It uses a TIFF flavor as a base for its mip-mapped textures. Such a TIFF contains a pyramid of layers of the image at different pre-downsampled resolutions and a bunch of custom tags, the renderer interprets when it reads such a file.

However, I can use 'Open As...' in Photoshop and open this as a TIFF. I will simply get the highest resolution layer from the file. That's a simple and battle-proven example of why TIFF is indeed a great answer to the problem.


To provide a better answer than the GP:

.PDF .SVG .DXF/.DWG

All of these are open standards or basically open standards which are already widely implemented and allow the use cases mentioned (vector and raster combinations and layers) and also many of the ones you have thrown in.

Of course the real problem is that they are far too complex for most people's needs so why bother.

People really don't really often need to share the 'layers' of images combined with vector data (usually you use the layers yourself then just share the final product - a flattened JPG or similar), however if they do the above formats are there for them.


.XCF


Not really that is still as good/bad as .PSD is because of the nature of what the file is used for. They are both primarily containers for unfinished works and are designed to hold a lot of detail without so much regard for size and portability. Of course the .XCF format is open but not good enough for a simple viewer or even a web browser. Some file format based on SQLite will probably be more viable as a cross platform image format that is extensible beyond any one software developer limiting it. Kind of like HTML.


It meets all of the requirements of the parent post I replied to, plus it even has a pretty decent open source editor. It's internal structure may not be in SQLite format, but it is an extensible tag based chunk format. You shouldn't dismiss it so easily.



Fun fact: Even though Adobe sits on the TIFF standard for more than 20 years, Photoshop doesn't support layered TIFF images.


Actually, it does: http://cl.ly/image/0j383w2d0t2v

Open this test image if you don't believe me: https://dl.dropboxusercontent.com/u/824493/layertest.tif


This one doesn't work with PS, but can be opened with GIMP as two separate images:

https://sites.google.com/site/elsamuko/new.tif

tiffinfo of this image:

http://pastebin.com/raw.php?i=85aUFjeS


Came here to say almost exactly this; of course, people being people, TIFF won't be "good enough", where "good enough" is either 1) invented in house (NIH syndrome), or 2) monetizable (TIFF is too open/not proprietary enough).


or if thats not good enough: http://www.openexr.com/

has photoshop plugin aswell. No real support in the gimp though (cinepaint possibly has it though.)


FXG could have solved the single file problem by grouping all the files up and making a zip file, and picking a different extension. For example this is how Android (.apk) and iOS (.ipa) applications are distributed.

It also has the advantage that if you extract the zip you can now access whatever pieces are inside as separate files even by tools that don't understand the containing format. (eg a bunch of jpg/png inside the zip are perfectly viewable in any image viewer). Trying to do the same with SQLite is problematic as you have to write queries to grab the blobs and then dump them to disk.

Where SQLite as a format does shine is if you need to keep historical information such as an undo/redo log. That makes it good for an application format, but not for an interchange format.


> For example this is how Android (.apk) and iOS (.ipa) applications are distributed.

A more relevant example: this is how .docx files work, for very much the same reasons (one XML file at the root, and then a bunch of referenced resources.)


Apple also uses zip for Keynote, Pages and Numbers documents, again for very much the same reasons, and Java's .jar, .war, .ear, .whatevar files also use zip.

.zip seems to be the universal 'combine some stuff in a single file' format, probably because it is old enough to be guaranteed patent free (OTOH: http://broadcast.oreilly.com/2010/06/is-zip-in-the-public-do... and http://www.pkware.com/support/zip-app-note. http://www.pkware.com/documents/casestudies/APPNOTE.TXT only mentions encryption and patching as patented)


It also, unlike tar.gz, provides an index for easily seeking to and extracting just one file from the zip archive. My understanding is that doing this in a .tar.gz means unzipping and processing the whole tarball as a stream.


Indeed--squashfs was basically created to deal with the inadequacies of random-access-extraction from compressed tarballs.


Does the zip format let you replace an arbitrary file inside an archive without rewriting the whole file? It's not uncommon to see Photoshop files that are 400+ MB, and rewriting the whole file on save would have some pretty poor performance. The other examples of files that use the zip approach are either distribution-only (ipa/apk/jar) or generally don't get very big (word processing).


You can easily add new files at the end, then rewrite the central directory header (which is at the end of the file for exactly this kind of reason: not have to rewrite everything to add new files). And because the CDH specifies offsets to files, you can replace the old entry and put the new one there (replace old offset by new offset, done). This does however mean the old entry will still be in the file until you "garbage collect". That's if the new version of the file is bigger than the old one of course, if it's smaller or the same size you can just overwrite the old entry and not touch anything.

One could even imagine padding files in the archive so there's some leeway in growing the compressed files, it'd likely depend on the format expression.


This is however the sort of thing SQLite already does for you.


This is an interesting problem, I wonder if Photoshop or The Gimp actually do something fancy here. There shouldn't be much of a problem replacing files when the files never get bigger and uncompressed ZIPs are used (compressed would make any sense for PNGs and JPGs anyway). Small variations in file size could be compensated with some kind of padding, but it seems to me that we are approaching a point of complexity here, where we should just use another kind of filesystem, one that supports fragmented data.


You can append just fine, and SQLite doesn't let you replace arbitrary content either. If the new content fits within the space of what it is overwriting then all is fine (as with zip), but if there is insufficient space then the new data is written at the end (as with zip). SQLite has a vacuum command which creates a new database only with used content.


Also done with CBR / CBZ files (comic book archives) -- basically a renamed .zip of images with filenames numbered in ascending order.


The file extension actually makes a difference here, CBRs are RAR compressed archives.


Hence "basically" =P


Exactly. An interchange format should be editable with standard tools included with windows, and it should be easy to write all the code needed to extract part of the file yourself. Sqlite has many benefits, but realistically the only way to interact with it is through someone else's code. That rules it out as a suitable interchange format.

Any performance benefits sqlite might have are easily outweighed by the ease of interacting with the zip-based format.


> Last summer, Adobe killed their image exchange format "FXG".

Edit: previously I had balked at this claim, but I'm wrong. While I can't find any official notice of its death, and it's still in use in Scene7 and somewhat supported in the open-sourced version of Flex, trawling message boards does indicate Adobe shying away from it in favor of SVG[1][2] and it's not supported in CS6 without an extension.

Still, his problem with FXG (or a format like it) seems superficial. Yes, FXG defines links to other assets instead of combining them into one file, but to me, that's a feature, not a problem: I can them programmatically swap out assets easily by just manipulating the references within the file. It makes it really useful for generating customized assets on demand. In practice, it's like HTML for layered documents. We don't complain that HTML is broken because it links to images instead of including them, do we?

If you don't like that you have to send a folder: zip it up and give it a custom extension if need be, a la browser extensions and Microsoft Office formats.

[1]: http://mail-archives.apache.org/mod_mbox/flex-dev/201303.mbo... [2]: http://mail-archives.apache.org/mod_mbox/flex-dev/201303.mbo...


"We don't complain that HTML is broken because it links to images instead of including them, do we?"

Actually, we do.

Saving a webpage and getting tons of file is a nightmare while the IE approach of lumping them all together is just so sensible it's hard to imagine a universe where this isn't standard.

Replacing resources is the special case. Sharing, moving or just storing an image isn't. Just get or write a tool that can replace the resources for you and you have completely satisfied the 0.001% as well.


Truth be told, you're the first person I've seen ever argue that HTML is a broken file format, particularly because it doesn't contain everything it links to. Imagine what it would mean if HTML was an encapsulated file format: you couldn't pre-fetch or cache static assets at all, because you'd have to get everything any time the structure of the document (the HTML part proper) changes. That's not a 0.001% problem.

As I and others have mentioned throughout the comments here, the single index file—be it XML like FXG, HTML, or otherwise—and linking to assets within it is pretty common and standard approach to complex documents. Combining them by zipping them up is a trivial addition and, again, common way to solve the "but I don't like having multiple files to transfer" problem.


"Combining them by zipping them up is a trivial addition..."

That, seriously, 99% of the population will never, ever, accomplish in their entire lifetime. Instead, you get files without the resources. Instead, you get files with hard coded local links to C:\xxx.

This of course works for HTML since most "regular people" never ever has to deal with either html files or its resources (but every time they do, it is sure to be a disaster).

Images on the other hand is something that most people deal with in some way or another. It must be something that you can easily share or it will be pointless.

It would be nice if you could actually add a caption to an image - without having to "destroy" the original by converting it to a png just so that you'd be sure that the recipient could handle it.

Everything has its place. Resources work fine for html files if you are the creator, if you want to store a page for later use it is a disaster.

I love the fact that the resources are separate from the document when I write use InDesign or LateX - such a relief from having to deal with word documents etc. At the same time those solutions make it completely inaccessible to most people I know, they are great for their isolated environment but disastrous for anything else.

So, an image with resources... Might work well, terrific even, in your studio but if it is ever going to be used by the masses it is a dealbreaker.

What if PDF only used linked resources, no one would even know what PDF was by now if that were the case.

"Imagine what it would mean if HTML was an encapsulated file format: you couldn't pre-fetch or cache static assets at all, because you'd have to get everything any time the structure of the document (the HTML part proper) changes. That's not a 0.001% problem."

You could of course do all that with single encapsulated file as well...


> That, seriously, 99% of the population will never, ever, accomplish in their entire lifetime.

Yes, they do. All the Microsoft Office formats (DOCX, XLSX, etc.), all the OpenOffice formats, all the iWork formats, ePub, Safari extensions, Chrome extensions, JAR files, and countless other general public-facing formats are exactly that: loose collections of files that contain an index file (or multiple index files) and are combined using zip. People by and large don't seem to have a problem dealing with them: transfers, edits, and other operations are seamless to regular users.

> What if PDF only used linked resources, no one would even know what PDF was by now if that were the case.

Many PDFs do, in fact, rely on this capability.[1] It's generally transparent to the user.

> You could of course do all that with single encapsulated file as well...

No, you couldn't. You'd have to transfer the entire file to determine what's changed. That's the the definition and principle benefit of encapsulation: one file gets transferred at once instead of many in piecemeal.

[1]: http://en.wikipedia.org/wiki/Portable_Document_Format#Conten...


"Yes, they do. All the Microsoft Office formats (DOCX, XLSX, etc.), all the OpenOffice formats, all the iWork formats, ePub, Safari extensions, Chrome extensions, JAR files, and countless other general public-facing formats are exactly that: loose collections of files that contain an index file (or multiple index files) and are combined using zip. "

In other words, a single encapsulating file... Which was my point?

The exact implementation is hardly relevant is it?

"No, you couldn't."

Of course you can. Just include hashes of the different resources within the file and only the parts of the file that contain changes needs to be downloaded. Or implement something more general along the lines of the rsync algorithm.

Taking html as an example you could also do everything transparently on the server if you wanted to. Or do everything transparently on the client instead if you wanted to keep a snapshot of each visit as a single file.

But no, I'm not arguing that we should encapsulate all web pages. But on the client side, if the user chose to save a webpage, the result should in most cases be a single encapsulating file.

"Many PDFs do, in fact, rely on this capability.[1] It's generally transparent to the user."

The keyword was only used linked resources. I think my point was, and is, rather obvious. PDF would never be where it is today if it weren't for its ability to encapsulate resources.

EDIT: My point isn't that all files should be encapsulating all the linked content. There must be a point to it. Web pages on a server, hardly beneficial... Image files? Absolutely.


We also don't typically use HTML as an interchange format between editors.


At its core, a layered document is a set of declarations about the constituent parts the document is made up. They link to other objects, like images and fonts. Whether you include everything the document links to into one container or let them hang out in a folder, it doesn't change how layered files are created or interacted with within editors or elsewhere.

So to transmit web pages? Sure, we use HTML as an interchange format. You receive everything about the page to manipulate it, including the locations to where the images and other related files are.

In other words, a layered document is not the same in kind as a JPEG or a GIF, which are flattened images. You need to be able to manipulate a layered document in the same way you could manipulate a Word document or an HTML file.

Whether you use a plain text format like XML or a SQLite database, the end result is going to be the same: in a SQLite database, you could place the component images/fonts into the database as blobs, just as you could technically place the component images/fonts in an XML file as data URIs.

But at least with XML, it's editable as plain text and doesn't require the component assets to be transmitted with the layout-defining file. I much prefer that over a format that dictates everything must be encapsulated within.


Well ePub is a zip file of HTML documents. So some people do.


Not really. Not only is EPUB not what I would consider an editor interchange document, but the HTML isn't even the format used for the "core" document file. An XML file with the .opf extension is the "core" file that describes how the EPUB works, and the HTML files are actually just content, analogous to the layer data in Gus's sqlite.


But that's because HTML has complicated and ambiguous rendering rules, not because everyone hates structured, human-readable index documents.


"I want a filesystem in a single file. I'll pick the one I'm most comfortable with and pretend that it's much better than separate files".

Meanwhile, we'll ignore all the semantic issues around the data. What exactly _is_ a layer? Any given image file? What if some tools only handle e.g. alpha-layers? What if a layer is actually a filter kernel?

How are the layers composited? How are they ordered? Is it a linear order, or is there actually a tree of layers?

The reason PSD works so well is only partially that things are all in one file. It's also that the semantics of its contents are extremely well defined. (OK, if you're willing to consider PSD documentation "extremely well defined")

And Gus is almost completely punting on that part - define _that_ well enough, and it might make sense. Until then, it's just another VFS with a blob of assets that the receipient probably can't parse exactly as intended.


But a layered image SHOULD be a single file and not a filesystem. No non-technical user would expect otherwise.

And why all the spite? Gus isn't proposing that Acorn's database schema BE the standard. He says that clearly. He simply wrote a blog post giving an example of how he has used SQLite for storing layered images. (Also, he did happen to describe how he does layer sequencing - did you read the full article?) Any final spec would of course need to define additional semantics. That wasn't the point.


> But a layered image SHOULD be a single file and not a filesystem.

But if all (or most of what) you want is a VFS, SQLite isn't necessarily a good way to make one. As has been mentioned elsewhere in the thread, it's fairly common for modern file formats to be zips of a structured filesystem; although this has upsides and downsides compared to SQLite, it seems much easier to run unzip than look up how to get SQLite to output the binary content of a table row to a file.


Couldn't someone come up with a relatively sane schema for storing the same information that a PSD would contain in a SQLite database?

That way, you could standardize on the semantics that are already documented and familiar to other developers, but avoid the difficulties of the file format, which seem to be the issue here.

* disclaimer: I've never had to write a PSD-parser, I'm just going on hearsay.


Photoshop layers have all the nastiness of threaded comments plus the added complexity of "blending". I frankly can't comprehend the complexity of modeling Photoshop's layers as SQL tables.

Converting PSD's to JSON seems like a reasonable thing to do, but converting them to normalized SQL seems like a never ending shit show.


How about just strip PSD of all its crud, Adobe specifies it openly and gives it to the commons. I have a dream..


Keep dreaming: PSD is not a single file format, but two decades of cruft upon cruft:

http://www.jwz.org/blog/2012/11/psd/


I found this article, linked in the comments, to be insightful: http://www.joelonsoftware.com/items/2008/02/19.html


If what you want is a filesystem in a file, then it's probably best to just start with .iso files and work from there. You'd still have to standardize the directory structure and "internal" formats for actual data, but that would get you the filesystem-like structure you're looking for in a format that is already optimized for use as a filesystem.


It's not going to happen for the same reason a standard 3D format will never happen. (and no neither FBX nor Collada fit).

The problem is, at least in the case of image editors (and 3d editors), standard formats arguably stifle innovation because you have to break the standard to add anything.

Maybe you decide it would be be better if all vector colors were stored as HSVA instead of RGBA except now no one can read your files. Maybe you'd prefer to store floating point colors with values greater than 1.0 to represent light emission. Maybe the standard only defines circles but you want eclipses. Maybe the standard says circles are defined based on rounded rect with the maximum roundness but you'd like them to be based on a center point and radius. Maybe the standard doesn't support text on a path but you want text on path. Maybe the standard doesn't support linking paths so that if the text doesn't fit on one path it bleeds into another. Maybe you'd like to justify text across multiple paths but that's not in the standard. Maybe you like columns but that's in the standard. Or you want to be able to define areas to be cut into pngs, each area with a specifiable filename. Or you want some of the settings to be per animation frame but the spec never thought about animation so your SOL. Or you want layer fx. Or you want to add a new layer effect. Or you want to be able to embed a PDF as a layer. etc.. etc.. etc..

Photoshop's basic chucky format is well known. It's not that hard to write some code to read all the chunks you care about. Putting those chunks in an SQLite format will not make it any easier to deal with new chunks your code does not understand.


> The problem is, at least in the case of image editors (and 3d editors), standard formats arguably stifle innovation because you have to break the standard to add anything.

Indeed: that's exactly why Adobe created FXG instead of adopting SVG in the first place[1] (though they are now circling back to SVG).

[1]: http://www.mikechambers.com/blog/2008/09/30/why-adobe-chose-...


Even if all vendors deviate, it's still easier to interoperate, since at least some of the structure is transparent.


Wasn't metadata supposed to solve this issue?

If the parsers are only looking for data relevent to them, I see no reason why a standard couldn't be developed.

I might be oversimplifying the problem...


Interestingly, in the GIS/Mapping space Mapbox has faced a similar kind of problem, albeit with image tiles rather than just layers. Their solution was to come up with an image format called mbtiles (http://mapbox.com/developers/mbtiles), which actually uses SQLite. It's not the same problem, but perhaps it throws some credibility to the use of SQLite as an on disk format for structured images.


ESRI Shapefile formats include a DBF file, which is a Foxpro database.


And I still dream of spatialite becoming the one interchange format to rule them all.


I just used Spatialite to create a tool for managing large stacks of images on the pixel level; i.e., each image gets its own database file, each pixel is stored as an XYZ coordinate and an RGB value.

I can use Spatialite's built-in functions to rotate, translate, scale, etc. images; and use SQL to pull out sub-volumes of the stack, edit and composite them.

My biggest stack is almost 2000 images, over 90GB uncompressed data. Working with subvolumes is pretty snappy up to a few hundred megabytes, which is good enough for my purposes. For bigger jobs it should be possible to parallelize some tasks for better performance.

Not entirely on-topic, but the takeaway is I'm thumbs-up for using SQLite to process image data.


Do you do anything in particular to improve access speed to that image data? I've been working with big matrices that get spit out of traffic assignment software that we use in travel modeling. Every vendor seems to have their own proprietary format. We ended up using HDF5 as a container due to its somewhat awesome speed characteristics. I'd initially tried SQLite for that matrix data, but couldn't squeeze the same kinds of performance out of it. That could have just been my own brain fail though.


Nothing beyond the standard advice for SQLite performance. I found that keeping the data in individual per-image database files worked a lot better than trying to create one big database for the whole stack. With that done, I expect the big performance win to come with parallelization of the image transformation tasks.


This whole idea seems totally mad from a distance but as a developer I'd love to be able to open complex files as DBs.

What would be the downfalls of passing around a DB as an image file? Does the compression suck? Does the performance suck? What makes this a terrible idea?

EDIT: to clarify, by compression I'm referring to the non-lossy type - I'm assuming that within the db you'd have already processed assets.


One downside would be that tools like file(1) that try to guess a file's format by examining its contents would return "SQLite database" for every image file, which isn't the most helpful result.

Since SQLite aims to be a generic way for applications to store data, it'd be nice if its "open database" and "create database" functions allowed you to supply, say, a 4- or 8-byte magic value that would be stored at a fixed location in the file's header so tools could distinguish your SQLite-based file-format from everybody else's, without having to load the thing up and examine the schema.


There are currently 24 contiguous bytes of unused space in the SQLite header. If need be, and if SQLite catches on for use as a portable image format, I will be willing to allocate some or all of those 24 bytes to an identifier string for file(1).


This is surely one of the greatest responses I've ever seen on the internet. What was the intention with the unused space originally? Did you foresee this moment coming?


When designing a file format, it is always a good idea to plan for enhancements. There were originally 36 bytes of unused space in the 100-byte header of the SQLite 3.0.0 file format, back in 2004 - bytes set aside specifically to deal with unforeseen needs. Over the years, 12 of those bytes have been allocated to various improvements. 24 bytes remain. (I'd prefer not to use them up all at once or on a whim, obviously.)


Thank heaven for people like you.


First, thanks so much for making what is one of the most awesome database tools around.

If you would be willing, what's your opinion on using SQLite as an image format like this? The "When to Use" page specifically sites application file formats as a good use, but would your header byte allocation be done begrudgingly or contentedly?


Who are you to be able to speak on behalf of the SQLite developers? Anyone could be behind that username.


What an awesome response.


I'm not convinced this is actually a good idea overall, but...

There's no reason `file` couldn't be made to recognize special cases of SQLite databases. It certainly handles far uglier cases already.

And SQLite already provides a place to store meta-data about a table: http://www.sqlite.org/pragma.html#pragma_user_version Though the bummer is user_version is only 32 bits.


>One downside would be that tools like file(1) that try to guess a file's format by examining its contents would return "SQLite database" for every image file

If the format catches on, they wont. For one, it would have it's own fixed file extension, and second, file(1) will be updated to perform and extra check even if something initially looks like a common SQLite database file.


There's already a similar situation for files like ODT, which are actually just a zip file with specific contents. But maybe the zip format has more flexibility to stick magic values in than sqlite does.


why not reserve a specially named table with the format identifier in there that file(1) could poke into and take a look at?


There are already formats that allow you to keep n layers of images. openEXR for example allows you to keep multires, multi bitdeptph images all in the same file, with lots of metadata

It also has options for byte order to allow faster read/writes


> and Adobe, if you are listening you should really give SQLite a serious look

I'm not in the Photoshop group, but I do make liberal use of SQLite in the projects that I'm involved in at Adobe.


I'm interested in why Acorn ties the concept of a layer so tightly to a bmp or shape blob rather then segregating the two. It seems that having a library table that the layer table links to would allow for better reuse of images.

Library table <- pixel data

Layer table <- heirarchy/duplication of library items

Attributes table <- attributes of layers

Also, are the concept of Pages accounted for by the ordering of parentless layers, or does Acorn not support the notion of pages? Would that be its own table or a modification on the Layer table.


OpenEXR solved that problem years ago and it's supported by virtually all content creation tools. An open source implementation with a well designed C++ api and bindings to several other languages.

For reasons that I don't understand, it's adoption is weak outside of the VFX industry.


Yeah, OpenEXR (http://www.openexr.com/) is a very nice format, well designed and documented, and with an open-source implementation.

Apart from VFX and post-production, OpenEXR has also become the standard file format in HDR applications.

IMHO it isn't mainstream yet because most people don't know what HDR imaging is and why current image formats are inadequate for HDR. For still image editing Photoshop is overwhelming dominant, whereas in VFX pipelines there's a lot of custom tools so you need an open format.


Won't this format leave behind traces of what was in the file before? I'm not sure that this is something that we want in a standard interchange format. Users will assume what they are sending is only what they see in an application. This is reasonable. I am concerned that using this format will lead to data leaks.


Not if you set "PRAGMA secure_delete=ON;" See http://www.sqlite.org/pragma.html#pragma_secure_delete for additional information.


It could be defined in the format spec that applications should VACUUM the database after every save.


That ship has pretty much sailed with EXIF and embedded metadata anyways, right?

This would seem to be more targeted at digital content creation (and toolchain application interop) than a normal-use file format.


It's up to your editor what metadata it chooses to save. It is not up to your editor what data was in the sqlite row that it just modified or deleted but may still be forensically available. There is a fundamental difference here.

It is possible to clean it up of course - perhaps by vacuuming, or definitely by copying to a new file at database level. But it seems dangerous to me to start with a native format with this behavior and attempt to clean it up later.


>But it seems dangerous to me to start with a native format with this behavior and attempt to clean it up later.

File formats don't have behaviors. Programs do.

Lots of binary formats, including PNG, will ignore any extra junk after the end of the file. So if a change makes a PNG file smaller, there's nothing stopping a program from leaving data from the old version at the end. But that's not normal behavior for programs that write PNG, so it's not an issue.

As long as programs adopted the behavior of VACUUMing the database, this wouldn't be an issue either.


How is that different from a psd or editable fireworks png file? I can have several groups with various visibility toggles that any normal user wouldn't see unless they dig into the layers.


Whats wrong with SVG? Doesn't it accomplish what the OP wants to do already and is already a standard supported by a lot of applications? I must be missing something here


XML is a really bad format for containing large chunks of binary data (bitmap)


The XML can store references to the binary chunks in standard formats though, alongside the vector shapes and filters.

  <svg viewBox="## ## ## ##">
    <image xlink:href="layer1.png" x="##" y="##" width="##" height="##"/>
    <image xlink:href="layer2.jpg" x="##" y="##" width="##" height="##"/>
    <rect x="##" y="##" width="##" height="##" fill="red"/>
  </svg>
All that's left is to standardize a way to package it.


Thats actually a good, along with the fake .zip extension.


And why are we storing binary data? The whole point of SVG is everything is in XML unless I'm very mistaken


The author is looking for a container format capable of storing multiple image layers, and one that is capable of storing both vector and raster images.

So you could possibly argue that XML (a la SVG) is the right format to represent the vector files within that container, but it wouldn't be appropriate for the container file itself since binary data really bloats XML.


How else would you propose to store a bitmap.


<points> <point x='2px' y='3px' color='rgba(255,255,255,0)' /> <point x='2px' y='4px' color='rgba(100,255,255,0)' />

    <!--...-->
</points>

heh


Zip archive containing an SVG file embedding multiple png/jpg/jpeg2000 images.


I would like to suggest this minor change:

  create table layers (id text, parent_id text, sequence integer, uti text, name text, data blob);
become (Vec2 used for notation's sake to make the declaration simpler--just two reals):

  create table layers (id text, parent_id text, sequence integer, uti text, name text, data blob, offset Vec2, bounds Vec2, rotation real);
This would make it really easy to ignore layers which have no component in the view space, or to compactly represent raster subimages.

"bounds" is the size of the axis-aligned bounding-box of the image. "offset" i a translation from the center of the axis-aligned bounding-box of the layer image data to the origin, and "rotation" is the rotation (in radians, CCW) of the layer. Order of application is rotation, then translation.

EDIT: Fixed missing def for "bounds".


I would argue you're not gaining much by complicating that specific table structure, since that information is going to be stored in the layer_attributes table where it naturally belongs. There may be several other attributes that can affect whether or not to load the resulting binary blob into memory.


I think you're missing something in your comment. There's no change between the two lines, and no reference to offset or rotation.


mouse over the declaration line and scroll left.


Oh, wow. HN really truncates that early. Those code blocks were narrower than the surrounding text, with no indication whatsoever that they were independently scrolling divs.

I get that HN is awfully bare-bones in its appearance, but this I think is a bit too far.


That is a problem with your browser not clearly indicating scrollable areas. Hacker News doesn't do anything special to prevent browsers from showing scrollbars.


But it doesn't do anything to force the scrollbars either, or to provide a visual treatment that indicates that this is an independent scrollable area. We've had iOS around for 6 years now, with its hidden-scrollbars approach, and OS X has followed the hidden-scrollbars approach for some time now as well. It is no longer acceptable to assume that scrollbars are necessarily going to be visible.


Hacker News does everything it is supposed to according to the CSS standard to indicate to your browser that the area is scrollable. In fact, it is obvious that the site is doing enough because the browser does in fact render the area as scrollable. The fact that your browser doesn't do anything to visually indicate when a region is an independent scrollable area is entirely Apple's fault.

You can argue that websites should be written to conform to the standards-violating behavior in Apple's browser rather than actual Web standards, but we already went through that with Microsoft in the '90s and early 2000s, and I don't think you'll find a lot of people who are eager to return to those days. Browser-makers are expected to conform to the standard, not vice-versa.

I am sympathetic to the fact that this is inconvenient, because I use Apple's software too. But it's Apple that's responsible for the inconvenience, not all the web designers who are correctly following the standard.


Hacker News does everything correctly to _functionally_ indicate that the area is scrollable. It does absolutely nothing to _visually_ indicate this. And the CSS spec does not require that the browser visually display scrollbars.

So no, Hacker News is in the wrong here. They're relying on the assumption that the user is using a browser that renders scrollbars, and providing a sub-par experience for any browser that does not. This is somewhat analogous to a site that only renders correctly in IE.


Huh. So do we build sites for users or to satisfy specs?


Do "we" make browsers to be usable, or to be pretty? If any random hack came up with that idea, it would be a random, hackish idea. But because it's Apple we have to abide? I dare say nahhhh...


HN's semantic markup isn't always great, but in this case it simply uses <pre> styled with overflow: auto.

> The behavior of the 'auto' value is user agent-dependent, but should cause a scrolling mechanism to be provided for overflowing boxes. — http://www.w3.org/TR/CSS2/visufx.html#overflow

Apple's rendering is overly pretty to the point of being useless. They dropped support for a perfectly reasonable semantic requirement (make all the content available) which was in the spec literally before their browser existed.


What about OpenRaster? It's used in open source apps like MyPaint and Gimp.

http://www.freedesktop.org/wiki/Specifications/OpenRaster


If I can just use sqlite, than I don't have to open an archive, parse xml, and otherwise get into a bad mood (see their file layout here: http://www.freedesktop.org/wiki/Specifications/OpenRaster/Dr...).


> OpenRaster is an open exchange format for layered raster based graphics.

How's that going to work for vector images?


You can have SVG layers in OpenRaster. See: http://freedesktop.org/wiki/Specifications/OpenRaster/Draft/...


One nice thing about using SQLite is that various compressed versions of the flattened image could just be stored as blobs in another table in the same file. For example, my camera shoots with a RAW+JPG mode that produces two files, why not store the RAW data as described here and the JPG as an easy to grab/display item for quick image viewers etc. Hell store various common versions of thumbnails and icons in there as well.


This is kind of like saying "We don't need PNG/JPEG/ETC, we just need to put image data in an XML file! It can do everything! Standardization at last!"

Which is true. Except for that little "implementation" detail.

I like the idea of using SQLite as a complex-data-in-single-file format. Hadn't really thought of that for cases like this. But it does nothing to solve the problem of standardization.


Standardizing a format/API upon a tool like SQLite was something that was proposed in the web world as WebSQL but scrapped due to not wanting an entire standard to rely on a specific code base but rather wanting to have the standard be possible to reimplementable by others and thus creating the IndexedDB standard instead.

I'm not proposing that an image format like this one should use IndexedDB, but if relying on SQLite means being reliant on a specific code base that has to be used by anyone that wants to read the file - then it's probably not such a good idea.

When discussing this I think one should also have in mind eg the move to web apps as well - a reliance on SQLite will make it hard for them to read the format. A standardized simple base format that can easily be supported in new languages would be preferable as I see it - but I'm no expert in the image processing area.


> When you want to send someone an image you want to pass them a single file, not an XML file with a folder of assets. While there are technical benefits to this, it's an incredible burden on the customer.

Or, you could just zip that whole thing up and everyone who doesn't have to care thinks it is "just one file", like .war's.


Other common examples: .app .docx


Some of your blobs seem to contain a lot of structured data: I would further specify a format for them somehow: perhaps as further tables that join to the main ones, perhaps as JSON, perhaps as protobufs. Burying application- or format-specific blobs in SQL fields just pushes the problem down a level.


The author didn't put the punchline in the title. Why did you put it in the title, HN submitter or editor?


On the contrary, in my opinion, this isn't a joke, and since the "punchline" was likely to draw attention to the article from people that might be more interested in a debate over the wisdom of using SQLite "trivially" like this than a generic discussion of image formats, it was useful and shouldn't have been removed from the title.


Whatever happened to the debate around how in/efficient the blob storage format in SQLite was? Did that get changed in v3?

EDIT:

Found the official examination[0] on the performance of blob storage.

[0] http://www.sqlite.org/intern-v-extern-blob.html


Correct me if I am wrong, but FXG files are an archive with an XML file and assets, just as xlsx files are.

How is this a problem? How could is possibly be easier to open a file in sqlite than it is to just extract the file into it's folder structure with any standard decompression software?

I don't know about you, but last time checked there is no easy way to open a file contained in a sqlite database in my editor directly. I for one would much prefer the archive method than using a sqlite database.

Also, I would rather edit xml, than edit a sqlite database. XML can be edited by any text editor. I'd say text editors are a bit more universal and widespread than sqlite editors.


And why not TIFF?


According to Wikipedia, Adobe Systems holds the copyright to and has control over the TIFF specification.


Alias used TIFF as the native format for layered images in Sketchbook Pro 1.0 in the early 00s.



That page says: "The use of XCF as a data interchange format is not recommended by the GIMP developers, since the format reflects the GIMP's internal data structures, and there may be minor format changes in future versions."


It still seems like a better basis to fork for a format than something new and untested.


Something you don't control — and which the people who do control it say "don't use this because we'll probably change it often" — is somehow better than something you do control?


> fork


This is an interesting idea, though I cringe a little at using a relational database as a key/value store (as in the case of layer_attributes).

I wonder whether it would be more useful to just represent the whole thing as a big JSON structure with base64 blobs. Each layer can be an object in the "layers" array with the attributes as keys.


Layered images can easily grow to gigabytes in size. You do not want to store that much information in a format like JSON that must be opened into memory, parsed and stored in native objects. SQLite is RAM efficient and quick to open, even at large file sizes.


Fair enough - I rarely work much with images, so didn't even think of size.


What I'd like is the ability for two editors to work on a file at the same time, so layer locks might need to be built in. A db seems to me, a novice in everything relevant(!), to be a good way to allow multiple applications access whilst controlling for collisions.


Don‘t think the discussion is complete without http://en.wikipedia.org/wiki/OpenRaster which is already supported in a number of programs.


IMHO SQLite is horrible format. using Matroska (.mkv) would be suitable for anything, vectors, bitmaps, audios and so on. It's essentially a container format like XML but for binary

WebM is based on mkv as well.


Could you elaborate on what specific benefits using Matroska would have? One downside I can think of is that there are no programming environments (that I can think of) that ship a Matroska parser, whereas sqlite readers/writers are found in many standard libraries.


I thought about this (actually, an ISO based format like MPEG-4, but same idea). You could even represent edits as samples over time, and then play back the file as a movie!


not a huge conspiracy theorist, but isn't this the kind of thing that would make us not rely on purchasing Adobe CS7 exclusively (for fear of incompatability) in the future?

Standards wise, open-source wise, and generally it's a good move (who wouldn't want to get a standard complex image format that would make sense to the common programmer?), but I don't think the finance department over at Adobe is going to like it very much?


.ora? (OpenRaster)

http://en.wikipedia.org/wiki/OpenRaster

gimp (plugin), krita, and mypaint


If someone does this right, it should go over pretty well for EDA (Electronics Design) file formats. EDA is a wasteland.


Good luck opening and modifying that 150MB psd file.


xkcd.com/927 :)


A limited subset of HTML would be nice.


How about an additional frames table?


this is like saying "i've invented a standard file format for all types of files. i call it binary. it's made up of zeroes and ones. i've solved interoperability forever".


Amen!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: