Knowing the command line is one of those unsung critical skills of being a software engineer. Doesn't really matter so much what flavor of command line you're using (though bash is great for obvious reasons), but the ability to glue different, disparate commands together through pipes is one of the most efficient ways of getting throwaway work done. Heck, just knowing IFS=$'\n' and how to do a for loop will get you a long way.
> Knowing the command line is one of those unsung critical skills of being a software engineer.
Yep, the command line is what lets you solve "We have 178 CSV dumps of tables that has ~60 GB of data and we want them imported into a SQL database, there's no previous DB schema info, here's a zip file of questionably named CSVs, can you have this done in 2 days?". Meanwhile there's 8,000+ columns of data that are strings, booleans, datetimes, etc. and some of the files are 15 GB each.
It didn't take too much shell scripting to solve that problem in a way that you can run it against a directory of CSV files and have it produce SQL files with table schemas that can be created and then generate the SQL to efficiently import them from a CSV. Basically a little bit of shell scripting and using tools like find, head, sed, grep, wc and friends. It took 4 hours to solve the problem in a way that was testable.
I would have reached for Python/Perl at this point, just because something can be done with bash and sed doesn't mean it should (not a diss, you got it done so you were right)
yep, I recently did something similar with data that wasn't updated properly and programmatically generated thousands of targeted SQL statements to fix our database. A little bit of python and a lot of shell scripting.
One of my biggest moments was figuring out you could run SQL from the command line, using environment variables in the query string.
Long time unix sysadmin here, now working as a dba.
It amuses me quite a lot to see how my code changes when I'm thinking more with my "dba hat" or my "unix hat".
The unix hat will make much more use of embedded environment variables and shell constructs for loops/repetition, the DBA hat leans towards tally tables, sometimes 2 phase sql generation, or stored procedures.
It does make my scripts inconsistent as all hell tho, which is irritating - but the shel, approaches usually flow more easily from my fingers, while the pure sql ones will usually have the edge in runtime performance - shifting back to shell if I need to chunk up a job for parallel processing..
A recent fun one for me was realising the I could lean on the the posix shells printf feature of repeatedly using the format string to consume parameters if there are more parameters that format placeholders.. I can actually pass thousands of lines of sql output to printf and have printf reformat it. Some relatively easy scripting with "split" to chunk up the db output leaves me with something that only runs about 2x slower that raw output from the db (in ksh ), but gives lots of flexibility on things like adding thousands separators to numbers etc, something our db platform is poor at.
Given that most of the line by line iteration is happening inside printf, not in a script for/while loop, it's really quite quick. Haven't benchmarked it against anything yet tho, it's still a toy I wrote while waiting on something else.. but it's got potential.
Judicious use of the old ascii separator codes rather than commas/tabs/pipes/whatever makes the field splitting much easier to deal with too.
Embedded newlines remain a pain tho. For now each chunk of lines gets passed through a perl fragment to turn them into literal \r or \n strings
You have to quote and unquote parts of the awk command appropriately, so that some parts of the command are processed directly by awk, and others by the shell (first).
Also CommandLineFu [1] has some halfway decent examples for those that learn better from examples. In may be best to test the examples in a throw-away VM as things change and environments may differ.