Sending 1.2M Tweets

SimeVidas · on July 15, 2019

I remember the days when your ISP would give you 5 MB of hosting space.

If each of the 1.2 million tweets includes a ~150 KB image, that’s 180 GB of images hosted on Twitter for free.

craz8 · on July 15, 2019

A while ago, this guy found that you could store files in the DNS servers belonging to other people, and created DNSFs that he documented here: https://blog.benjojo.co.uk/post/dns-filesystem-true-cloud-st...

It looks like there might be about 250mb available for all to share across the internet using this system

PingFS is even more out there!

The internet is full of weird corners to exploit in fun ways

hn23 · on July 15, 2019

Free yes but they come with an energy tag on it. (backup and redundant systems etc). I wonder how much energy it would take to store it like this compared a local hard drive and simple PC or Raspi.

azhenley · on July 15, 2019

Does this mean that I can make a Twitter account to backup all my photos and then use the "Download your data" feature [1] to download all of them?

[1] https://help.twitter.com/en/managing-your-account/how-to-dow...

bscphil · on July 15, 2019

Only if you don't mind them being recompressed and the metadata being deleted. Of course, if you don't, surely Google Photos would be a better choice, since it's specifically designed for this purpose, has unlimited storage for lossy-compressed photos, and photos under a certain size are left alone.

petepete · on July 15, 2019

All it needs now is a sensible uploader. Google Photos is so close to perfect in every way other than that.

stavros · on July 15, 2019

Also the fact that there is zero privacy.

kabwj · on July 15, 2019

If you can live with the crappy photo quality.

goblin89 · on July 15, 2019

I’ve been wondering how does Creative Commons apply in ‘big data’-ish use cases. Can a dataset distributed under CC BY-SA be analyzed, possibly used as part of training input for an ML model? What if a product is built on top of a model that learned from a CC-licensed dataset? Products are rarely distributed under CC; bow far do ShareAlike & Attribution reach, by letter and by spirit?

Should there be (or does there exist) a type of license for data—different from the ones typically used for software source code (MIT, GPL) and ones typically used for creative work (CC), encouraging innovation but giving something back to dataset creator or maintainer?

edent · on July 15, 2019

Those are reasonable questions. At work, we release lots of data under OGL (Open Government Licence) which is CC compatible.

For my personal stuff, if you'd like a different license, I'm happy for you to pay me for a more restrictive one. But if you build an ML using my open data, I expect that model to be released under a similarly licence.

goblin89 · on July 15, 2019

Didn’t know about OGL, it does look suitable for this purpose.

To (partially) answer myself, contrary to what I implied CC-BY does cover this base if (for example) the creator of the dataset accepts a note in product’s “About” documentation as sufficient attribution.

savant_penguin · on July 15, 2019

It looks like a great dataset to associate power generation to pictures of the sky. Perhaps it could help decide the best location to place the solar panels? One big picture of the sky and you would get the power-generating estimate of each location based solely on the image. Perhaps taking several large pictures over the year would help decide the best location on average. Or the location with best worst-case scenario. Hmmm

grenoire · on July 15, 2019

Camera exposure and light sensitivity are likely inconsistent. As long as this data is not in the photographs, they are as good as nothin'.

smmnyc · on July 15, 2019

I think a machine learning algorithm wouldn't care about that, because with a large enough training data set it would start to account for that and be able to accurately predict energy output based on the image alone.

grenoire · on July 15, 2019

Regardless of how big the dataset is, the image recognition algorithm is bound to get confused by the large differences in colour that exposure and sensitivity results in. It will likely look for the overall gray-to-blue gradient and estimate results from that; on the gray end of the things alone, the camera settings make a very, very big difference. You can't really tell the algorithm to ignore these and only determine the level of 'cloudiness.'

Another issue with this dataset is the overlay changing over time in text content, font, and colour. The algorithm might overfit and think e.g. yellow font presence means higher output simply because the output was higher during that period. You could strip away the text, but then you're introducing potential errors into the dataset yourself.

Nican · on July 15, 2019

Rending all pictures at 30 frames per second, it would be a 12-hour video.

manuw · on July 15, 2019

Someone should do this :)

ijafri · on July 15, 2019

while I applaud the author, but it didn't entirely settle well with me... I guess it still amounts to misuse of one's resources. in this case twitter.

ascales · on July 15, 2019

This comment reminds me of getting flamed on forums for hotlinking images from some guy's website... Times have changed.