Automatic file encoding detection is only heuristic and cannot be in principle exact. But I do not want to criticize the import functionality. I just wanted to say that CSV is an obsolete format and should not be used for data interchange.
Chartio actually will accept TSV files because it will detect most of these settings. It'll detect the delimiter, the presence of a header line, file encoding, and newline character(s).
I agree TSV is a lot nicer, and has the bonus that Excel will open a TSV file with an .xls extension without any problems (great for sharing!).
Back in the '80s, TSV was the standard way to work with data for us - and for many years Lotus, Quattro Pro and Excel (when it came out) could easily save directly to TSV and open TSV without problems.
Don't know why Excel and Windows no longer recognize the TSV extension as a unique file format, but it is easily fixed without having to go through the .xls extension route [which can be a bit of a pain since it requires identifying delimiters every time one opens a file].
The quickest solution I found is a 2 line batch file for Windows described here [1]. I've used this solution without issues on multiple computers. TSV is my preferred file format for data work. [I generally analyze data in Python and R and use Excel for looking at results or formatting a pretty version to send to others that prefer Excel.]
Once your CSV (or TSV) files start having quoted fields, they become very tricky to parse using standard multi-purpose tools like sort, awk, & uniq.
It's hard enough when you have delimiters in quoted fields, but dealing with quoted newlines starts to become unreasonable, especially for line-based tools.
CSV files, as you say, are absolutely wonderful to create. Problems come up when you try to parse files other people write. Not everyone follows RFC 4180.
Problems come up when you try to parse files other people write. Not everyone follows RFC 4180.
Plus you've got encodings. If you're accepting CSVs from users, they'll generally come from Excel, which will produce different encoding in different circumstances.
The judgement he claims at the start of the essay you point to is probably this: <http://newswire.xbiz.com/view.php?id=136832>. It's worth noting that this is the "pay up or we'll sue you for downloading gay porn" extortion racket, and the $10,401 judgement is a settlement. So his negligence theory probably hasn't been tested in court.
I like how he tosses that out there as something other lawyers have said, and then says that he would only compare it to leaving your keys in your car. As though that negates his sensationalism.
I also wonder what his opinion is on wireless networks that are "secured" by not broadcasting an SSID, or by using WEP, which even the slowest computer can crack in less time than it takes to torrent a movie. Is that still negligent?
Furthermore, if he's in the business of actually making contributory negligence claims, why can't he cite a successful example, instead of relying on analogous case law from the 1930s, involving physical property instead of intellectual property? (The opposing viewpoint cites the Supreme court on the issues of contributory and vicarious infringement, and the 9th Circuit on a more recent case.)
But this is how arguing about the law so often works. It's not clear what the law should say in a new case, so we make analogies to cases where there's precedent.
The question falls into the general class of problems "If you provide the tools for someone else to break the law, do you have any responsibility when they do?" And that's a huge class of problems, with precedent both ways for various subcases. You are partially responsible if you leave a loaded gun lying on a park bench and somebody shoots someone else with it. You aren't partially responsible if someone breaks your window, picks up a piece of broken glass and goes on a stabbing spree.
It's further complicated, here, by the fact that it's possible to claim "it wasn't me, it was someone else stealing my wifi" when it was, in fact, you.
I know that's how a common law system works, and I'm not saying it's a bad thing. I'm just saying that, for someone who's been involved in hundreds of these suits, it shouldn't be hard to provide a more relevant or more recent analogy, preferably one involving intellectual property or computer crimes, or both. He should also be able to cite the statutes or rulings that provide the legal groundwork for his theory of negligent contributory infringement. (If the SCOTUS has already says that contributory infringement must be willful, what provides the cause of action against an unknowing enabler of infringement?)
His example of negligence in a case where there was a contractual business relationship that was not satisfactorily fulfilled doesn't seem very convincing up against a Supreme Court ruling that contributory infringement must be willful. It sounds more like a way to be on the losing side of a summary judgement.
The fact that the defendant can perjure himself is really never a good reason to invent a new offense.
The question here though is not whether people can spell things correctly or correctly identify what state they live in, as whether you can write a computer program to, given what someone claims is their address, reliably break it down into parts to, for instance, extract what state they live in.
If you give someone separate fields for 'city' and 'state' you have some chance of figuring out where they are located if they type 'Washington' into one of those fields. If you ask them just for 'city, state', some people will understand that as 'city or state' and just type 'Washington', and you have no idea what they mean. The example I gave before, LA, is an example of a city which shares its name with a state abbreviation.
An anecdote for you: I was once working with UK callcentre system which had a form for callcentre staff to update customer contact addresses. UK postcodes are combinations of letters and numbers, and certain letters are excluded from certain positions to prevent ambiguous handwritten letterforms causing confusion. I naively implemented strict postcode validation, thinking it would help catch data entry errors. What actually happened was that we ended up, in a significant number of cases, telling people that the postcode that they had been using as part of their address for decades was not, in fact their postcode - it couldn't be because it was an invalid postcode. This made the customers quite cross, and we had to change it.
Do you want your validation message to be the one to break it to some eighty year old guy that Nebraska is actually NE, not NB any more?
Looking at the supported browsers on http://www.google.com/tools/feedback/intl/en/index.html and some of the clearer bugs in their screenshots (enlarge the picture, check out for example text-underlining and compare with FF/Chrome), combined with what ElliottZ mentioned in this Twitter message https://twitter.com/#!/ElliottZ/status/89520809147772929, I could with quite high certainty say that they have a very similar approach to doing this. Having worked with this for some time already, I can see why they for example have not supported Opera, although the problem (presumably the reason they don't) can easily be fixed with wrapping text in temporary nodes for example. One major advantage they have, which works for their favor is that they use the script on pages they control, where as my approach is trying to get this working on any page, regardless of who created it with as bad CSS/HTML as they did. If you aren't gonna be using z-index positioning, no letter-spacing, no CSS3 properties, no HTML5 form elements etc. it can be very easy to make matching screenshot to the page.
For IE<9 the Flash option won't have to be the only option either, a server could be rendering the elements gathered from the users browser (in other words a canvas proxy for the canvas renderer).
In terms of my script, the aim is to try and get the compatibility down to FF3, IE6 (with non-canvas support through flashcanvas or server canvas), Opera (haven't looked much yet how old version would be able to support it), and Chrome.
The title is a bit misleading at first glance: each company is setting records against previous quarters of the company itself, not compared to every other company in the US.
Honestly, this doesn't seem like big news: each company is undergoing growth and it makes sense that they're spending more on a lot of things: including lobbying.
Now at $2M/quarter and with nearly 500 congress critters that means they are investing about $4000 per. I've always felt the PAC expense would be naturally limited by how much money could be moved into the system. Given total spending by PACs [1] on various causes I expect any member of congress can populate their re-election fund with anywhere from 1 to 8M$/year by 'listening' to their concerns.
On paper, universities disallow helping others cheat. In reality, it doesn't seem to be enforced.
If the helper student has already passed the class, the professor can't take disciplinary action against them without making a major case about it. It's easy to simply fail the cheating student or give them a zero on the assignment without making a record.
Additionally, if it is brought to a committee, it'd be much harder to prove.
So although it's officially not allowed, there's not much universities are doing to combat the providers.
They are not publishing their own source.
If you're running a small, open source project, this seems like an easy way to manage deployments. Kudos to them for a free plan.