Tangentially, there's even larger public datasets coming out of astronomy. PanSTARRS' imaging survey will make 2 PB available online. They even have a picture [0] of the completed dataset in transit, if you wondered what 2 PB of HDD's on a flatbed looks like.
We're a CERN connected site and originally 'only' had 10x10G feeds. When the first data started to come in the network guy was looking a bit worried and said "the plane's coming into land, and the runway isn't nearly long enough..."
>“Once we’ve exhausted our exploration of the data, we see no reason not to make them available publicly,” says Kati Lassila-Perini, a CMS physicist who leads these data preservation efforts.
1. It's only fair that the collaborations get first dibs on producing results out of their blood, sweat and tears.
2. The collaborations don't want to waste time shooting down the large number of false claims that would inevitably happen if the data were made public immediately.
It's cool seeing technology developed at CERN in the spotlight. There are a lot of interesting tools developed there that can solve real problems outside CERN and academia.
One such technology, featured in the article, is CernVM File System that is used to distribute terabytes of scientific software to hundreds of datacenters all over the world.
Given that cheap and disposable trainees — PhD students and postdocs — fuel the entire scientific research enterprise, it is not surprising that few inside the system seem interested in change. A system complicit in this sort of exploitation is at best indifferent and at worst cruel.
Potential missing staff in some areas is a separate issue, and educational programmes are not designed to make up for it. On-the-job learning and training are not separated but dynamically linked together, benefiting to both parties. In my three years of operation, I have unfortunately witnessed cases where CERN duties and educational training became contradictory and even conflicting.
- the Management does not propose to align the level of basic CERN salaries with those chosen as the basis for comparison;
- in the new career system a large fraction of the staff will have their advancement prospects, and consequently the level of their pension, reduced with respect to the current MARS system;
- the overall reduction of the advancement budget will have a negative impact on the contributions to the CERN Health Insurance System (CHIS);
"The cost [...] has been evaluated, taking into account realistic labor prices in different countries. The total cost is X (with a western equivalent value of Y) [where Y>X]
Public relations pioneer Edward Bernays refined the creation and use of press releases.
Propaganda was used by the United States, the United Kingdom, Germany and others to rally for domestic support and demonize enemies during the World Wars, which led to more sophisticated commercial publicity efforts as public relations talent entered the private sector. Most historians believe public relations became established first in the US by Ivy Lee or Edward Bernays (he felt this manipulation was necessary in society), then spread internationally. Many American companies with PR departments spread the practice to Europe when they created European subsidiaries as a result of the Marshall plan.
Well, you can "run out" and buy a 180TB 4U backblaze storage pod assembled for about $10,500. For $21,000 you can buy two and have 60TB to spare. $8,500/ $17,000 if you want to DIY. Not too bad:
Yev from Backblaze here -> http://www.backuppods.com/ check those guys out they'll build one for you, or you can DIY if you're handy. And then you can choose which drives you want to toss in it.
The lhc experiments should be sensitive to a wide range of factors. I wonder if random correlating every variation of the results from same conditions could show some unexpected correlations like between particle path variation and earthquake (just speculating here not putting a theory forward)
In keeping with this spirit, here is a reminder of how we monitor (your) CERN activities. We monitor all network Traffic coming into and going out of CERN.
Our new analysis infrastructure will be able to cope with the automatic live analysis of about one terabyte of data every day. All this data is stored for one year.
This [1] is apparently the data released. I am no physicist but that page doesn't exactly inspire awe among the curious minded.
They do explain how a couple of undergrads were able to use the data to create something meaningful in the original release article but that specific site can definitely use a UX designer, or two.
I don't mean it has to be pretty, but that is not even pleasant to look at. I can provide all the useful data in the world but if it's accessibility of low then it's value is greatly reduced.
I doubt it makes that much difference in reality, the value is in the data and since this data is unique and from a single source I can't see it mattering.
Not arguing the value of accessibility but in this case it's a nice to have rather than an essential.
The site is clearly designed for people who work in the field, and even then it only took me a moment to find a download link for some data. It even has a workable search function.
I don't think they need to care about UX. The "conversion rate" is probably absurdly low given the need for storage, RAM and CPUs to store and process the data...
"Physicists must sift through the 30 petabytes or so of data produced annually to determine if the collisions have thrown up any interesting physics.
…
The Data Centre processes about one petabyte of data every day - the equivalent of around 210,000 DVDs. The centre hosts 11,000 servers with 100,000 processor cores. Some 6000 changes in the database are performed every second.
The Grid runs more than two million jobs per day. At peak rates, 10 gigabytes of data may be transferred from its servers every second."
Yeah, I don't think it's feasible to release data that can be used to do deep physics.
The logical thing to do is replicate the dataset to the 3 major cloud providers (Amazon, Google and Microsoft) so that anybody can attach their VMs to it with local data access speeds.
[0] https://archive.stsci.edu/mug/mug_2016/PS1_MUG_2016jan14.pdf...