The proteins are chopped into smaller fragments using an enzyme called trypsin...
That's known as bottom up analysis. It is currently the most popular way to do it, and IMHO it's crap.
You can do intact whole protein analysis and that's much better. More higher end instruments will do both, starting with the intact protein, measuring its mass, and then breaking it down and measuring the mass of the peptides.
The breaking down part is not deterministic but with the mass of the whole protein and the genetic sequence and a few runs you can figure out exactly what's what.
None of that will tell you what the 3D shape of the protein is. That's a whole other ball of wax.
--EDIT--
the initially deposited data had several problems, including incomplete files, proprietary software formats, and screenshots of data displays in software rather than actual data files
I like how Ars is able to make a complex topic more accessible. MS-based proteomics is still young, and it will take a few years for things to stabilize. RNA expression arrays were in much the same state several years ago. The key is to make sure that the variation comes from the biological samples, and perhaps the instrumentation, with the methods not being a source for those differences. Two things that the field needs to do is (a) make raw data available so that experiments can be reproduced and (b) uses consistent data formats. Standards can/will follow.
One experiment I would like to see is the same lab doing the same experiment multiple times on the same sample on different instruments (of the same make and model). Results should be interesting.
The advantage that RNA expression arrays has over mass-spec is that the techniques used are just a highly scaled up version of the northern and/or southern (depending on the method) hybridization blots that have been in use forever. The corresponding protein technique (western blots) requires individual anti-bodies, and as such is just not scalable to the same degree.
So, instead we are left with mass-spec solutions that miss a ton of data. Not all peptides can be read on a mass-spec, and if you do get a peptide, it's up to the software to figure out what it was.
Also, the data analysis in proteomics has been a known issue for many years. There are little to no standards since it takes so long to run samples. At the same point in the microarray field, the statistical and data analysis techniques were largely worked out. But again, this is due to the 'simplicity' of the measurements. It is just plain easier to measure known RNA or DNA sequences.
That's not to say mass-spec isn't useful, because it is. But I highly doubt it will be able to ever give us a complete picture of the proteome.
It's kinda funny, because we are seeing a shift in the transcriptome world that is now moving toward rapid sequencing approaches that mirror mass-spec a great deal.
I love Ars Technica for articles like this. I just wish they'd stop the front-page crapflooding they've been doing since the acquisition, since it makes the gems so much harder to pick out.
Could anyone please tell me the names of some companies/non-profits researching proteomic sequencing? Preferably ones trying to work out the kinks that the article references? I live in the Boston area and used to analyze genomic data for accuracy and consistency and would love to help with this problem.
Thank you! Their site gives a better idea about what's actually going on in the world about proteomics, and is a great start about finding affiliate companies.
That's known as bottom up analysis. It is currently the most popular way to do it, and IMHO it's crap.
You can do intact whole protein analysis and that's much better. More higher end instruments will do both, starting with the intact protein, measuring its mass, and then breaking it down and measuring the mass of the peptides.
The breaking down part is not deterministic but with the mass of the whole protein and the genetic sequence and a few runs you can figure out exactly what's what.
None of that will tell you what the 3D shape of the protein is. That's a whole other ball of wax.
--EDIT--
the initially deposited data had several problems, including incomplete files, proprietary software formats, and screenshots of data displays in software rather than actual data files
Yep, there's companies that can help you with that: http://www.bioanalyte.com/