Hacker News new | past | comments | ask | show | jobs | submit login

I did an extensive comparison a while ago: https://blog.burntsushi.net/ripgrep/ --- It should still largely be pretty accurate (and ripgrep has only gotten faster).

More generally, if someone can find a non-trivial example of ag being faster then ripgrep, then I'd love to have a bug report. (Where non-trivial probably means something like "not I/O bound" and "not so short that the differences are human imperceptible noise.")




I'm currently working on a searcher on my own: https://github.com/elsamuko/fsrc

When I started, I didn't know ripgrep, now I use it as reference. Of course it's still slower for regex searches and it has less options, but in some cases (e.g. simple string matching search), it is faster than rg (PM_RESUME in 160-170ms), mostly thanks to mischasan's fast strstr: https://mischasan.wordpress.com/2011/07/16/convergence-sse2-...

If you want, let me know, what you think about it.


I don't see any build instructions, so I don't know how to try it. Sorry. I did run `./scripts/build_boost.sh`, but that didn't produce any `fsrc` binary that I could use.

I would also caution you to make sure you're benchmarking equivalent workloads.


There are no build instructions yet, you need to build boost with build_boost.sh and then open qmake/fsrc.pro with Qt Creator. There are binaries available here, too: https://github.com/elsamuko/fsrc/releases

And I know than benchmarking is hard, a coarse comparison is in scripts/compare.sh. More detailed performance tests are in test/TestPerformance.


I don't know what Qt Creator is. Please provide tools to build your code from the command line.

I did some playing around with your binary, but it's pretty hard to benchmark because I don't know what your tool is doing with respect to .gitignore, hidden files and binary files. Your output format is also non-standard and doesn't revert to a line-by-line format when piped into another tool, so it's exceptionally difficult to determine whether the match counts are correct. Either way, I don't see any evidence that fsrc is faster. That you're using a fast SIMD algorithm is somewhat irrelevant; ripgrep uses SIMD too.

On my copy of the Linux checkout (note the `-u` flags passed to ripgrep):

    $ time /tmp/fsrc PM_RESUME | wc -l
    41

    real    0.143
    user    0.330
    sys     0.474
    maxmem  67 MB
    faults  0

    $ time rg -uuu PM_RESUME | wc -l
    17

    real    0.149
    user    0.564
    sys     0.690
    maxmem  13 MB
    faults  0

    $ time rg -uu PM_RESUME | wc -l
    17

    real    0.112
    user    0.481
    sys     0.675
    maxmem  13 MB
    faults  0

    $ time rg -u PM_RESUME | wc -l
    17

    real    0.118
    user    0.507
    sys     0.701
    maxmem  13 MB
    faults  0

    $ time rg PM_RESUME | wc -l
    17

    real    0.142
    user    0.749
    sys     0.726
    maxmem  21 MB
    faults  0
I originally tried to run `fsrc` on a single file (in order to better control the benchmark), but I got an error:

    $ time /tmp/fsrc 'Sherlock Holmes' /data/benchsuite/subtitles/2018/OpenSubtitles2018.raw.sample.en
    Error  : option '--term' cannot be specified more than once
    Usage  : fsrc [options] term
    Options:
      -h [ --help ]         Help
      -d [ --dir ] arg      Search folder
      -i [ --ignore-case ]  Case insensitive search
      -r [ --regex ]        Regex search (slower)
      --no-git              Disable search with 'git ls-files'
      --no-colors           Disable colorized output
      -q [ --quiet ]        only print status


    Build : v0.9 from Jul  5 2019
    Web   : https://github.com/elsamuko/fsrc

    real    0.005
    user    0.002
    sys     0.002
    maxmem  9 MB
    faults  0


I included qmake and added a `deploy.sh` in the main source folder, which generates the deployed zip file. Let me know, if this doesn't build.

  * gitignore behaviour: If there is a .git folder in the search folder, it uses git ls-files to get all files to search in
  * a .git folder itself is never searched
  * hidden folders and files are searched
  * binaries are ['detected'](https://github.com/elsamuko/fsrc/blob/f1e29a3e24e5dbe87908c4ca84775116f39f8cfe/src/utils.cpp#L93), if they contain two binary 0's within the first 100 bytes or are PDF or PostScript files.
  * pipe behaviour is not implemented yet
  * it supports only one option-less argument as search term
  * folders are set with -d




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: