Don’t underestimate grep-based code scanning

secure · on Aug 7, 2019

One thing which was not immediately obvious to me for a while: the stricter your language’s formatting is, the easier it will be to grep source code.

I work a lot with Go, where all code in our repository is gofmt'ed. You can get quite far with regular expressions for finding/analyzing Go code.

(And when regexps don’t cut it anymore, Go has excellent infrastructure for working with it programmatically. http://golang.org/s/types-tutorial is a great introduction!)

fragsworth · on Aug 7, 2019

Another thing that's not immediately obvious is the longer your search string is in grep, the faster it will find your results.

herpderperator · on Aug 7, 2019

Can you explain why that's the case?

saghm · on Aug 7, 2019

I think it's because there are fewer potential substrings to check for matches, since most of the characters you add to a regex to make it longer also add to the minimum length of the expressions that it can find

jlebar · on Aug 7, 2019

String search algorithms can cleverly skip forward when they don't find a match. They can skip forward more for longer "needle" strings.

That said grep is pretty fast period, so probably doesn't make a huge difference in practice, especially if you're IO-bound, which is common.

jandrese · on Aug 7, 2019

The caveat being if you're Unicode aware then many of the old skipahead strategies don't work as well and have to be rolled back or disabled.

burntsushi · on Aug 7, 2019

That's not true. They work just fine. Typically, substring search algorithms are implemented at the encoding level, e.g., on UTF-8 directly. If you just treat that as an alphabet of size 256, then algorithms like Boyer-Moore work out of the box.

But the skip-ahead stuff isn't the most important thing nowadays. The key is staying in the fast vectorized skip loop as long as possible.

jandrese · on Aug 7, 2019

I noticed severe slowdowns when passing in the /u flag on my regexes, even with big fixed ASCII strings in the middle of the patterns. They were taking 10 times as long to complete.

burntsushi · on Aug 7, 2019

That doesn't imply that things like Boyer-Moore suddenly stop being effective. Without more details (which regex engine? what regex? what corpus? which programming language?) it's impossible to state the cause, but it could be as simple as the regex engine not being smart enough to use a literal searcher in that case.

JetSpiegel · on Aug 7, 2019

Since you don't want to toot your own horn, and you saved me many hours of not waiting for searches, I'll link to ripgrep here

https://github.com/BurntSushi/ripgrep

felipeerias · on Aug 7, 2019

Related to this, it is generally a very good idea to be strict when naming functions, parameters, variables, etc. so that each concept has exactly one name throughout the codebase.

polytomous_ · on Aug 7, 2019

Also, check your spelling. It's a pain when items don't show up in search because of spelling issues.

I've seen function names misspelled, and then every invocation just doubling down on that misspelling.

abjKT26nO8 · on Aug 7, 2019

I encountered a similar problem in a c++ codebase and the debugging logs it produced. There was an error which was being reported in the logs as "weak ptr expired" or somethings like that. I grepped for it the whole source code (a gigantic project). No results. Going back and forth several times. Feeling stupid beyond imagination. Then I copy-pasted what was actually printed in the logs into my grep query (previously I was typing it in manually). It quickly found a match. Turns out someone wrote "weak ptr" as "week ptr". Everyone in the team had a good laugh.

jandrese · on Aug 7, 2019

And it turns out the pointer expired on sunday morning, except in some regions where it expired on some other day.

splittingTimes · on Aug 7, 2019

But how do you effectively orhanize/enforce this for a code base of several million LOC where geographically distributed teams are working on different ends of the system all the time?

The amount of cross team coordination is staggering.

peterstensmyr · on Aug 7, 2019

Fail verification in CI if the change doesn’t pass your checks. You can check anything, such as whether it has a duplicate name already in the codebase.

splittingTimes · on Aug 8, 2019

I would be interested to know, which CI tool can check "that each concept has exactly one name throughout the codebase."

I thought code reviews are the only way and then you need to have every Dev aligned and on the same page on this topic... Which never happens. :/

peterstensmyr · on Aug 8, 2019

It wouldn’t be something provided by the CI tool, you’d have to write the test yourself. At the end of the day it’s just another test, albeit a more complex one than a standard unit test.

paulddraper · on Aug 7, 2019

Documentation, I think.

xamuel · on Aug 7, 2019

Also, please think twice before using OO classes as a license to give all your methods useless names like "get", "add", "open", etc.!

paulddraper · on Aug 7, 2019

I find that line wrapping frequently prevents me from getting complete coverage though. So it works for some cases, not for others.

But I do love auto-formatters. (I'm doing web dev at the moment, so Prettier.) It is freeing to not worry about spacing, line breaks, parens, etc.

All I have to do is give the computer a valid AST and it does The Right Thing.

jandrese · on Aug 7, 2019

Isn't this what the /s flag is for in your regex? Assuming you are also using /x of course.

paulddraper · on Aug 7, 2019

Sort of. But there's indentation, so I need repetition.

And if I'm doing whitespace repetition, there's no great advantage in an autoformatter.

jandrese · on Aug 7, 2019

You do have to be more liberal with '\s*' or '\s+' instead of just ' ' in your patterns, that is true.

nkozyra · on Aug 7, 2019

> the stricter your language’s formatting is, the easier it will be to grep source code

Well that makes sense, the less reliable your input text, the more complex the regexp.

fredley · on Aug 7, 2019

Don't use grep. Use ag[0], which is specifically designed for searching code. It's much faster, honors .gitignore, and the output can be piped back through grep if you like.

    ag FooBar | grep -v Baz

It's in brew/apt/yum etc as `the_silver_searcher` (although brew install ag works fine too).

0: https://github.com/ggreer/the_silver_searcher

Joeboy · on Aug 7, 2019

Is there a reason to use ag rather than rg? afair the latter was a lot faster when I tried it (on ubuntu / intel).

dkarl · on Aug 7, 2019

Same experience here, started with ack and switched to ag and then to rg for speed. I've found them roughly equivalent in functionality, but for those who need specific features here's a link to a feature comparison table:

https://beyondgrep.com/feature-comparison/

burntsushi · on Aug 7, 2019

That table is quite out of date for ripgrep, which has added a number of features. See: https://github.com/beyondgrep/website/issues/97

dmix · on Aug 7, 2019

Thanks for ripgrep, I use it daily and was recently going through the source code to learn how to build production quality Rust apps! (https://github.com/BurntSushi/ripgrep)

wyldfire · on Aug 7, 2019

Pretty sure that rg is a successor to ag, so most people who use `ag` are just ones who haven't heard of `rg` yet.

Well, there's no true hierarchy/succession order but rg was written after ag was.

burntsushi · on Aug 7, 2019

Yes, this is true, although ripgrep is more of a hybrid than ag is. ag has numerous problems with being treated as a normal `grep` tool, where as ripgrep does not. (Although, to be clear, ripgrep is not POSIX compatible.)

war1025 · on Aug 7, 2019

I find `git grep` to be quite sufficient. A real pain when trying to look through code that isn't in git though.

Whoaa512 · on Aug 9, 2019

you can just add `--no-index` to search in non-git repositories

hawski · on Aug 7, 2019

It's not much faster as in "over 50% faster". It's faster to invoke as you have much less to type to scan recursively with ignore list. However I find that in many projects .gitignore is too extensive, because it includes generated code, that many times is quite informative. Then it's still nice to use those alternative grep-likes, but not by much. Besides, when you can't install easily it's hard to beat something that it's already there and everywhere else.

tigroferoce · on Aug 7, 2019

I second ag too. Not because it's faster, but because it is developer-oriented so it has sane defaults for searching into code.

valbaca · on Aug 7, 2019

+1 Here's why:

ag provides sane default and settings for developers. grep is ubiquitous and great, but to do what most developers want it requires some guidance, whereas ag focuses on being what you want most of the time.

What I mean by that is that I enjoy the smart-case sensitivity (as in, if there are not caps in my pattern, then it defaults to case-insensitive but if I have any caps in my pattern, it uses case-sensitive) or fast filename searching with -g or both with -G.

I've tried rg and while it was faster, it also didn't provide as good support for filename searching. Ag is still what I consider "so fast I almost don't believe it"

It's like the 'tldr' command (https://tldr.sh/). Of course I still use man pages, but having something that gets me what I really need very quickly is important.

Steps:

1. Use ag

2. If ag isn't present, try to install ag

3. If I can't install ag, then use grep or find, no big deal.

ag > ack > grep/find

burntsushi · on Aug 7, 2019

> it also didn't provide as good support for filename searching

Could you elaborate on this? Is it because you need to type more? If so, I'd suggest one of two things. 1) use `fd` for searching for files, which is dedicated to that purpose. 2) define `alias rgf="rg --files | rg"` (or similar) and use `rgf foo` just like you would use `ag -g foo`.

JetSpiegel · on Aug 7, 2019

ripgrep > > > ag

_revy · on Aug 7, 2019

why not use a posix standard tool which is available everywhere? I appreciate the suggestion, but its worded a little strongly.

jimmaswell · on Aug 7, 2019

Still never going to beat AST-integrated searching like VS has for C#. Which has a regex search too.

sethammons · on Aug 7, 2019

I have stopped using the AST integrated searching vs code for Go. Instead of clicking to declaration, it is now faster in my larger code base (especially one that uses interfaces a lot) to just search for substrings. The AST search still works most of the time, but sometimes it fails and usually it is just plain slow.

arethuza · on Aug 7, 2019

Are there any stand-alone AST based search tools?

hchasestevens · on Aug 7, 2019

I'm going to plug my own here: astpath, which is AST-based search for Python. https://github.com/hchasestevens/astpath/

secure · on Aug 7, 2019

Yes, https://github.com/mvdan/gogrep for Go

bruce_one · on Aug 7, 2019

I've found a few, eg https://www.graspjs.com/ is a Javascript one.

I do wonder if there is a multi-language aware one though.

llimllib · on Aug 7, 2019

googling returned https://github.com/azz/ast-grep for js

jolmg · on Aug 7, 2019

The ability to easily grep for functions in C-like code is why I've come to appreciate projects defining their functions like:

  int
  foo_func(void) {

You can grep for `^foo_func\b` to get to a declaration or definition, or `^foo_func\b.* {$` to get to a definition or `^foo_func\b.* ;` to get to a declaration. This is instead of using something like `^\w.* \bfoo_func\(`, which is what you'd need for:

  int foo_func(void) {

By the way, anyone know of a way to insert a literal asterisk here without having to follow it up with a space?

paulddraper · on Aug 7, 2019

Just one more reason to love languages with trailing instead of leading types (Scala, Typescript).

    fooFunc(): Int {

"fooFunc returns an integer."

Not

"An integer is returned by fooFunc."

kahirsch · on Aug 7, 2019

Yes, I've been doing this since I saw it in the BSD source back in the '80s.

timwaagh · on Aug 7, 2019

> If not, the reviewer can quickly dismiss it as a false positive

This is were you could be wrong. We would need to give a reason for dismissing it and then the risk officer would need to approve it (or reject it). False positives can be a real pain in the ass.

tannhaeuser · on Aug 7, 2019

The post's core message seems to be lost on HN. It's about screening sources for supposedly insecure and/or injection-prone funcs using simple text scanning (such as strcat, which however is considered in iOS apps when it is a C std API func); supposedly grepability is also about quickly finding code locations of messages and variables. But comments are all about Rust or Go superiority, irrelevant grep implementation details, and AST-based code analysis tools when these are specifically dismissed in TFA as producing too many false positives. Talk about bubbles and echo chambers.

secure · on Aug 7, 2019

Or maybe the core message just resonates and people have additional discussion in the comments?

fastbeef · on Aug 7, 2019

Man, n-gate.com is going to have a field day.

KuhlMensch · on Aug 7, 2019

I do a few VERY SIMPLE greps. The most useful, is a pre-commit hook to check no blacklisted env vars exist in the commit diff. So, useful.

Grepping leans-in to shell. Though if you have other environments available (python, javascript etc), it makes sense to lean-into them e.g I use JavaScript examine my package.json to ensure my dependency SemVers' are "exact".

That said, I rarely write static-analysis scripts: In JavaScript-world there is already a plethora of easily configurable linting & type-checking tools. If I wanted to focus in on static-analysis etc I'd probably reach for https://danger.systems/js/

SideNote: My CI generates a metrics.csv file, which serves as a "metric catch-all" for any script I might write e.g. grep to count "// TODO" and "test.skip" strings, plus my JavasScript tests generate performance metrics (via monkey-patching React).

I don't actually DO ANYTHING with these metrics, but I'm quite happy knowing the CI is chugging away at its little metric diary. One day I'll plug it into something.

johnny-lee · on Aug 7, 2019

I've gone down this road years ago.

While there's no install and initial results are quick to appear, the false positives that grep or any string search tool generates will make the cynics shoot down this simple attempt to find problems in the source code.

Problems that arose:

- what about use of those questionable APIs/constants in strings (perhaps for logging) or in comments?

- some of the APIs listed in the article were only questionable when certain values were used - sometimes you can get grep/search tool of choice to play along, but if the API call spans multiple lines or the constant has been assigned to a variable that is used instead, then a plain string search won't help.

- it's hard to ignore previously flagged but accepted uses of the API/constants.

- so there's a possible bug reported, but devs usually want to see the context of the problem (the code that contains the problem) quickly/easily. Some text editors can grok the grep output and place the cursor at the particular line/character with the problem, some can't.

If you go down that road to try and reduce false positives, you'll end up with a parser for your development language of choice.

time0ut · on Aug 7, 2019

I haven't tried this approach, but having spent years using one of the best commercial SAST tools, I'm reluctant to dismiss it too quickly.

My SAST generates tons of false positives and is unforgivably slow. If this is orders of magnitude faster, it might be worth the extra false positives.

As a side note, my dream is a SAST that comments directly in the PR like a human reviewer would. Maybe that exists?

johnny-lee · on Aug 8, 2019

The SAST program is probably doing a lot more than a string search tool does.

If the SAST has to process C/C++ source code, then the SAST will parse all the #include'd header files. The SAST may track values to determine if illegal/uninitialized values are used.

A string search tool will skip doing all of that.

If the class of problems you're looking for contains only bad functions/constants, then a string search tool may be fine.

But as I mentioned before, the string search tool may get confused if these bad strings occur in strings/comments/irrelevant #if/#else/#elif sections.

There are another class of bugs dealing with data values which a string search tool can't deal with easily.

As an example, PC-Lint lists the type of problems the program may flag - https://www.gimpel.com/html/lintchks.htm. A string search tool won't know about classes and virtual destructors or other concepts relevant to the programming language in question.

For the string search tool, you'd either invoke the search string tool several times with different search strings for the same source code or slightly more efficient, have one long search string containing all your search strings as alternate search targets for the string search tool.

Either case, when the string search tool spits out a positive result, it won't explain why there is a problem. The dev will have to know or lookup the problem associated with that search result.

When I worked on this area, C/C++ compilers stopped at syntax errors. Most have gotten better at flagging popular problems like variable assignments within if statements, operator precedence bugs, and printf-format string bugs.

Some divisions at Microsoft required devs to run a lightweight SAST before committing changes to locate possible problems ASAP.

It's relatively easy to integrate an SAST into your build system to scan the modified source code just before you're ready to commit the changes.

anon1253 · on Aug 7, 2019

I tend to work a lot in Lisp and XML, both are more or less trees if you squint (with the Lisp syntax famously being the AST due to homoiconicity) and it always makes me wonder if there are better command line tree search or tree diff algorithms out there (extra awesome if it works with git merge strategies). I mean whitespace preference is fine and all, but sometimes you just don’t care :p

alxmdev · on Aug 7, 2019

Hold on, strncat and strncpy are considered dangerous too, now? Not just the older versions without the size_t num argument?

unilynx · on Aug 7, 2019

They don't \0-terminate the target on overflow, so you still need to test for that condition. So most people will have a wrapper around those to ensure the \0 is there.

deathanatos · on Aug 7, 2019

strncpy() acts as you describe, but strncat() will terminate; from its man page[1],

> the resulting string in dest is always null-terminated.

[1]: https://linux.die.net/man/3/strncat

guitarbill · on Aug 7, 2019

I think BSD has strlcpy and strlcat for exactly this reason

jandrese · on Aug 7, 2019

strlcpy has the braindamage that it returns the length of the source buffer, which means it has to traverse the entire buffer to figure out the length.

If you want to copy out the first line from a buffer that happens to be a 10TB mapped file, that strlcpy call will take a long time to finish. If you are using strncpy/strlcpy because you don't trust the src buffer is properly null terminated but you still want to stop the copy at the first null or when the buffer is full, well, you're out of luck because strlcpy is going to blast past the end of the source buffer regardless.

I would have been much happier if it had just returned a flag indicating either successful copy (0), buffer was truncated (1), or an error occurred and errno was set (-1). Possible errors could be that the src or dest was NULL or the size was 0 (ERR_BAD_ARGUMENT).

deathanatos · on Aug 7, 2019

In addition to what unilynx mentions about strncpy(), the size arguments are also, effectively, the remaining space in the destination buffer, not the entire space in the destination buffer.

So, you have to figure that out. It isn't hard (hell, it's trivial) but I think if you're either going to be aware of the pitfalls — and then these functions are mostly not going to help you — or you're not, in which case you're just as likely to pass the wrong value for the size (dest's size/src's size) and overflow the buffer anyways.

Honestly, if I had to do more than a trivial amount of string manipulation in C, I'd be wrapping that in a mini library to manage some sort of stronger string type or finding such a library (glib? ICU?) very quickly, depending on needs. std::string was one of the things in C++ that made me question why anyone was still using C, given how much less error-prone it is, comparatively. (std::string is not without problems / only as compared to char * in C.)

parentheses · on Aug 7, 2019

You can search your codebase using livegrep [0] and get near instant results.

[0] https://github.com/livegrep/livegrep

jpalomaki · on Aug 7, 2019

Random idea: maybe you could supercharge this by introducing to grep some constructs from programming languages. Now you have things like "word character", "whitespace", "start of line". In supercharged version you would have "function", "identifier"

LandR · on Aug 7, 2019

Visual Studio with Resharper does this.

It's very very fast / almost instant even with hundreds of source code files and millions of lines of code.

I hit ctrl+T and then can search everything, this give me a drop down that filters out the more I type, select the item in the dropdown and it goes to that source file.

I can also type:

/t and search just types

/m members

/mm methods

/u unit tests

/f file

/fp project

/e event

/mp property

/mf field

/ff project folder

e.g.

/t Foo

will find all the Foos

/mm SavePhoto

will find any methods called SavePhoto

Same works in JetBrains Rider for C# stuff.

I couldn't dev without this now, and it's all built into my IDE.

predakanga · on Aug 7, 2019

There's an old project from Facebook that does something like this, called Pfff[0]

It provides a tool called sgrep (syntactical grep) that lets you do some cool tricks, for instance:

`sgrep "some_func(X,X)"` returns all calls to some_func with the same argument repeated.

`sgrep "some_func(X,Y,...)"` returns all calls to some_func with 3 or more arguments.

It's come in very handy for refactoring some troublesome codebases.

[0]: https://github.com/facebookarchive/pfff

pjc50 · on Aug 7, 2019

Then it's not grep, it's something much slower which has to parse the syntax.

secure · on Aug 7, 2019

https://github.com/mvdan/gogrep does something like this for Go :)

switch007 · on Aug 7, 2019

The imperative headline strikes again!

K0nserv · on Aug 7, 2019

Just a small note that I would highgly recommend ripgrep[0] over standard grep. It's another modern tool that has been created by leveraging Rust and it's from BurntSushi[1] who is excellent.

0: https://github.com/BurntSushi/ripgrep

1. https://github.com/BurntSushi

pepper_sauce · on Aug 7, 2019

Why is it better than grep?

Canada · on Aug 7, 2019

It's way faster, which is great when you're working with big repos. It's designed for recursively searching through a lot of files.

roryrjb · on Aug 7, 2019

GNU grep is fast as well but by default it doesn't ignore anything. Granted it's handy to have this configured out of the box, but I prefer to know and use the flags and perhaps write a shell script wrapper, and I find that practically it feels just as fast as rg or others. My main point in doing this is to avoid the situation where I'm on a different machine to my laptop and can just get going right away without having to install anything. There's a trade off in all things, I just prefer it this way.

dev_dull · on Aug 7, 2019

Portability is key. There’s a reason these gnu tools have such staying power. It’s not necessarily because they’re the best, but because they’re ubiquitous.

masklinn · on Aug 7, 2019

Grep is really fast at that at the actual search (gnu grep at least), the gain there is mostly that "smarter" tools will ignore e.g. VCS data or binary files by default whereas grep will trawl through your PNGs and git packfiles.

icebraining · on Aug 7, 2019

Explanation from the original author on why GNU grep is fast: https://lists.freebsd.org/pipermail/freebsd-current/2010-Aug...

Excerpt: "The result of this is that, in the limit, GNU grep averages fewer than 3 x86 instructions executed for each input byte it actually looks at (and it skips many bytes entirely)."

masklinn · on Aug 7, 2019

There's also this bit: https://ridiculousfish.com/blog/posts/old-age-and-treachery....

However note https://news.ycombinator.com/item?id=19522987

> It does not. ripgrep does not use Boyer-Moore in most searches.

> In particular, the advice in [the freebsd mailing list post] is generally out of date.

although the out of date bits are really the Boyer-Moore ones: https://lobste.rs/s/ycydmd

> much of Mike Haertel’s advice in this post is still good. The bits about literal scanning, avoiding searching line-by-line, and paying attention to your input handling are on the money.

> But the stuff about Boyer-Moore is outdated.

oconnor663 · on Aug 7, 2019

If I remember that giant post of benchmarks correctly, there are some big exceptions, particularly around non-ASCII searches.

Annatar · on Aug 7, 2019

However fast it is, it's going to have a tough time beating /usr/bin/fgrep.

adrianN · on Aug 7, 2019

fgrep -r takes about six times longer than rg on the repository that I'm currently working on.

Annatar · on Aug 7, 2019

Real fgrep does not implement -r because that would be implementing tools within tools, which is against the UNIX®️ philosophy.

Try

/usr/bin/find . -depth -type f -print | /usr/bin/xargs -i /usr/bin/fgrep string '{}'

and run it several times so that the filesystem cache is primed.

adwn · on Aug 7, 2019

> Real fgrep does not implement -r

That's BS. The fgrep on my system – GNU grep 3.1 – provides recursive search (-r). What now, are you claiming that's not "real fgrep"? [1]

> that would be implementing tools within tools, which is against the UNIX®️ philosophy

Even more BS. Or are you telling me that "rm -r" is also against the "UNIX philosophy"?

> /usr/bin/find . -depth -type f -print | /usr/bin/xargs -i /usr/bin/fgrep string '{}'

Terrible, terrible advice. Cumbersome, error-prone, and slow as molasses. A quick test: Searching for 'asdfadsgf' in the Linux kernel repository takes 0.25 s using rg, 12 s using GNU fgrep -f, and 228 s (!) using your command.

You know, when your ideology results in the worst results of all, you should really reconsider your ideology.

[1] https://en.wikipedia.org/wiki/No_true_Scotsman

robohoe · on Aug 7, 2019

This is why Unix is great. It gives you enough tools to shoot yourself in the foot.

yjftsjthsd-h · on Aug 7, 2019

"Unix was not designed to stop you from doing stupid things, because that would also stop you from doing clever things."

Annatar · on Aug 7, 2019

"What now, are you claiming that's not "real fgrep"?

GNU stands for GNU is not UNIX®️.

pjc50 · on Aug 7, 2019

.. which falls over as soon as you have a file with a space in the name.

Edit: this highlights the big weakness in the "UNIX philosophy", in which the only record delimiter that's conventionally recognized in pipelines is the newline but the shell recognizes characters as filename delimiters that are also allowed in filenames. Causing a cascade of delimiter bugs. Sometimes you really do need a bit more structure to your data.

(The UNIX philosophy is best understood in contrast to what went before - the COBOL or JCL style where files had fixed records, in turn based on fixed-column punchcard layouts.)

sxldier · on Aug 7, 2019

It's easy to unsafely handle filenames. I've seen it at my job where we do a lot of bash scripting. However, there are good guides on doing it the right way. [0]

[0] https://mywiki.wooledge.org/BashPitfalls#for_f_in_.24.28ls_....

eequah9L · on Aug 7, 2019

> .. which falls over as soon as you have a file with a space in the name.

Yeah, spaces are nasty. find has -print0 and xargs has -0 to handle this gracefully, but one needs to know to use it.

Annatar · on Aug 7, 2019

-print0 is a GNUism and isn't portable. As stated previously, replacing '{}' with "{}" will handle spaces and most metacharacters just fine.

felixfbecker · on Aug 7, 2019

And that’s why PowerShell is awesome :)

pjc50 · on Aug 7, 2019

Powershell falls over in the other direction: the objects flowing down the pipeline are "magic" and can't be serialised, or even necessarily inspected with normal tools. For most unix operations you can replace

    foo | sort

with

    foo > file
    sort < file

I like the idea of powershell, but every time I try to do something complicated with it I'm disappointed.

majkinetor · on Aug 7, 2019

That is some serious BS. Objects can be serialized easily.

    $WhateverObject | Export-CliXml
    $WhateverObject = Import-CliXml whatever.xml

Most of the time `>` also work.

By the way, PowerShell serialization is by orders of magnitude better then anything *nix has to offer as you can use objects from other machine shell just like they exist on your local one.

> I like the idea of powershell, but every time I try to do something complicated with it I'm disappointed.

I did some very complicated things in PowerShell. For example, check out the script that maintains ~300 mainstream packages on Chocolatey up to date, all in few minutes with bunch of self maintenance features.

You need to learn it, simple as that.

https://gist.github.com/choco-bot/a14b1e5bfaf70839b338eb1ab7...

pjc50 · on Aug 7, 2019

I suppose this is a good demonstration of the best way of getting a right answer being to post a wrong one, as I spent a long time trying to work out how to do this last time I needed it.

Annatar · on Aug 7, 2019

All you have to do is replace the single with double quotation marks and my commands will handle spaces just fine.

adrianN · on Aug 7, 2019

rg searches my repo in under a second. Your command takes over a minute, with a warm cache. It also takes thirty seconds to type. And it doesn't work for files with a space in them. Even on single files rg is faster for me than fgrep. For example my tags file is 2GB large. rg takes 0.4 seconds to search it, fgrep takes 0.6 seconds.

masklinn · on Aug 7, 2019

Both grep and rg will do literal search with all the whizbang optimisations they could think of if they're given a literal (a string with no metacharacters) or the -F option (to not interpret the input as a regex).

Which you'd know if you'd wondered, because that's something /u/burntsushi regularly explains: https://blog.burntsushi.net/ripgrep/#literal-optimizations

seren · on Aug 7, 2019

Blog post from the author with pros & cons of rg

https://blog.burntsushi.net/ripgrep/

(It seems down at the moment you can try the cached version)

K0nserv · on Aug 7, 2019

Like a few other tools — ack, ag, and pt — it's specialized for source code, in addition to that it's really fast. The repo contains detailed comparisons with grep and an FAQ.

simias · on Aug 7, 2019

In general I think that's very good advice. In this particular instance however a dumb old grep might be superior because it could catch potential security vulnerabilities that are not explicitly hardcoded in the source code by greping through the compilation artifacts for instance. Sure you'll get a bunch of false positives that way but at least you know that nothing is slipping through the cracks.

burntsushi · on Aug 7, 2019

To be clear, you can disable all smart filtering in ripgrep. e.g., `rg -uuu foo` should be equivalent to `grep -r foo`. And if you want to exhaustively search binary files, then you need to add the `-a` flag to both commands.

_kbh_ · on Aug 7, 2019

rg + fzf really makes for a great toolset for some quick first pass code review.

Annatar · on Aug 7, 2019

[flagged]

adrianN · on Aug 7, 2019

It "reinvented" it and made it dramatically faster. Rust is also available on FreeBSD. I don't think you actually need Rust to run ripgrep. It's not like it's an interpreted language.

Annatar · on Aug 7, 2019

Since when do FreeBSD executables run on the illumos family of operating systems, as far as I know, there is no freebsd-branded zone yet in illumos?

Dramatically faster? Is there any scientific evidence to that?

If it were me, I wouldn't rush to make assumptions.

semi-extrinsic · on Aug 7, 2019

As for the dramatically faster, the ripgrep author doesn't claim this. What he claims (and supports with benchmarks) is the obverse, that there are no other tools dramatically faster than ripgrep.

Basically the stated goal is to be "fancy" like ack etc. and yet remain as fast as good ole' grep.

yjftsjthsd-h · on Aug 7, 2019

> Since when do FreeBSD executables run on the illumos family of operating systems, as far as I know, there is no freebsd-branded zone yet in illumos?

Moving goalposts.

> Dramatically faster? Is there any scientific evidence to that?

Yes, although the difference is significant enough that most people don't bother.

> If it were me, I wouldn't rush to make assumptions.

You literally are rushing to make assumptions, it's just that you're making assumptions against ripgrep.

adrianN · on Aug 7, 2019

I tried it and it's much faster for me. The author did some benchmarks. I'm not about to publish a paper.

I also never said anything about Illumos.

dmit · on Aug 7, 2019

Illumos is also a supported platform (SPARC and x86-64).

Annatar · on Aug 7, 2019

You obviously haven't tried it on either of those. They are "second tier", which means one is completely on one's own.

What in your opinion would have to be the size of the source code to warrant jumping through the hoops to get this software running, as opposed to a combination of find + xargs + egrep,fgrep,awk?

sctb · on Aug 7, 2019

Could you please review the guidelines and post less rudely and antagonistically?

https://news.ycombinator.com/newsguidelines.html

Annatar · on Aug 7, 2019

[flagged]

burntsushi · on Aug 7, 2019

Except tools like ripgrep aren't equivalent to find + grep. There is no simple invocation of find + grep that does what ripgrep does automatically. `git grep` would be closer.

You talk about being antagonized, but many of your comments in this thread have stated either outright incorrect things, or moved the goalposts, without acknowledging either one even when others point it out. Talk about infuriating.

Just because a misguided person such as myself wrote a piece of software doesn't mean you get to be rude to everyone who talks about it or suggests it.

dmit · on Aug 7, 2019

Sorry, replying to you while responding to the parent because their post is already flagged.

Annatar, both the Rust compiler and ripgrep are available as packages in pkgsrc. The number of hoops one needs to jump in order to use this tool on your niche platform is exactly one. And that hoop is not even on fire.

Keep moving those goal posts though. Hopefully you can move them far enough to keep the Venn diagram of your mistruths and people who recognize them completely disjoint.

the_duke · on Aug 7, 2019

> one is completely on one's own

That's not accurate.

The test suites are not run on CI for tier 2, but they are at least guaranteed to build.

Tier 2 platforms have binary builds, are supported by rustup and often work just fine.

rjsw · on Aug 7, 2019

TBF, the table of Tier 2 platforms has differences in features that are expected to work, Illumos has fewer than other platforms.

You need rust to build Firefox, so any platform that can't get it to work well is going to be at a disadvantage.

burntsushi · on Aug 7, 2019

> It is written in Rust which isn't available on anything other than Linux, OS X and Windows.

That's not true. As ripgrep's README says, it's available on OpenBSD, FreeBSD and NetBSD.