Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Why does Numbers hide a huge PNG file in exported Excel sheets? (apple.stackexchange.com)
118 points by rogual on Aug 18, 2014 | hide | past | favorite | 25 comments


I'm not sure why it's including this in the export, but this is one of the default "Image Fills".

http://f.cl.ly/items/1X0M2o0V0f3A463c0a2H/Untitled_numbers.p...


If you ever dissect an uncompressed pages file there's a lot of junk in there you really don't need. My guess would be that this is a case of that, Apple including the default fill pattern when it need not be included.

Uncompressed, an empty pages file includes about a megabyte of XML for a 3x3 sheet with tons of absolutely unnecessary default metadata about the sheet as well as each individual cell. Pages really isn't very good at knowing what it actually needs to include.

I spent a few days dissecting the format a few years back at a previous job for a data processing tool. We decided it wasn't worth our time.

As for the size, some things simply don't compress well in PNG.


I guess the other color image fills are just variation of the original blue image.


Pages does it too. Export a document in DOCX format and you get the same PNG file. Export a document in Word 97-2003 format and you get three of these PNG files!

http://mcmillan.cx/blog/2014/05/31/bloated-exports-from-page...


From the looks of the OOXML format, the image there is just used as a resource. http://web.mit.edu/~stevenj/www/ECMA-376-new-merged.pdf xl/theme/_rels/theme1.xml.rels refers to image1.png. I'd say it's a default background of some description. Still, it would be cool if there was something in there. Maybe taking a look at the frequency of the bytes would help?


Someone should run that image through a steganalysis tool.


Or more simply, just MD5 them from two machines and see if the hashes (and sizes) match. It is actually quite hard to find hidden text in "random" images. Quite easy to tell if they're identical always however...


Most steganography is trivially easy to detect. Good steganography needs a lot of cover data and better systems than manipulating the LSB.


If the distribution of bits in an encrypted stream is indistinguishable from normal image dithering, wouldn't encrypted steganography be hard to detect?

It wouldn't survive any lossy image transformations, though.


Steganography on normal pictures. If the picture is just random pixels then it would not be detectable.


Randomness is something that cryptographers have spent a lot of time studying.

Introducing non-random data into a random data stream is easy to find unless you're very careful about the amounts of data-to-be-hidden and cover-data.


If you encrypt the data first, then you're inserting "random" data as far as analysis goes.


salam


It doesn't seem to be explicitly associated with Numbers, but with Excel: http://www.it-volpers.net/2011/03/21/disassemble-xlsx/


The file in the OP's article was generated by Numbers, not Excel. The png file in your link is there for a reason, since an image (a logo) was added in an Excel spreadsheet.

So you are mistaken.


That article doesn't support your conclusion. Did you draw that conclusion because the file is named "image1" in both the OP's article and that one?


According to one of the answer, it is an image to fill used by Excel. So its goal is the same, even if the picture is different.


The image1.png file included by Excel is definitely an image that the author used in the spreadsheet, not a piece of cruft. Since he provides a copy of the xslx file, it's trivial to check:

http://i.imgur.com/j1A5vvC.png


It doesn't happen to me. No strange image gets included when saving the spreadsheet.


Maybe it's a default that the user's set (say as part of a user style)?


Here's something interesting. The image only has 47 unique RGB values in it. 47 is such an arbitrary amount though. It doesn't go into power of two cleanly.


The colours go from 16708e and increase all RGB values by one. So the next ones are 17718f, 187290, 197391 etc.


Can't be just a placeholder. If it was, it would be a single color for a better compression. Hence the whitenoise - it helps to maintain compression ratio low.


What is that image?Any answers from anyone in HN?


The first comment on the question itself makes the most sense to me.

Edit: Sure, downvote me ;)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: