More

samarudge · on Sept 3, 2018

Songkick | Full Stack Software & Platform Engineers | London, UK | Fulltime | ONSITE

Songkick is on a mission to bring the magic of live music to fans everywhere. Since 2007, we've set about making it as easy, fun and fair as possible for you to see your favorite artists live. Right now, more than 15 million music fans across the globe use Songkick to track their favorite artists, discover awesome concerts and never ever miss out.

We're looking for developers with the ability to take on a range of challenges: from developing our highly scalable website and mobile apps, to integrating with other platforms (streaming services, social networks), to large-scale data acquisition and processing.

We do our best work when we're happy, respectful and relaxed. Our values and work ethic have got us far, and as we grow we'll never shake that small startup feel. Earlier this year we became part of the Warner Music Group family, opening up epic new realms of opportunities to bring fans and artists closer together.

If you're interested, there's more info on our jobs page at songkick.com/jobs, or reach out to me directly on smudge [at] songkick.com

samarudge · on Jan 6, 2015

> the Varnish HTTP cache has been used very successfully to speed up WordPress. But Varnish doesn’t help a lot with logged-in traffic > This is caching that Varnish and other “normal” HTTP caches (including CloudFlare) could not have done

Varnish supports the ESI (Edge-Side Includes) standard, which allows it to cache fragments of a page, and for the cache server to build them again. It also allows you to completely bypass the cache for certain fragments. This is also supported by a number of CDNs (Fastly, Akamai). I've used the ESI technique several times and have been able to achieve a >98% cache hit rate on Fastly for a site with dynamic per-user content. Even the cache misses are only responsible for rendering a small component of the page

FooBarWidget · on Jan 6, 2015

Good to know. Using edge side includes may be easier than trying to change the app to a semi single page app. But that only solves half of the problem. The other half is varying the response based on the value of a specific cookie.

I've updated the blog post with information regarding edge side include.

samarudge · on Jan 6, 2015

Varnish also supports custom cache keys via the VCL function `vcl_hash` as documented in https://www.varnish-software.com/static/book/VCL_functions.h...

I couldn't (quickly) find documentation on how to get the value of a specific cookie, but the server could send a user ID in a header or something Varnish can easily access to be used in the above function.

cheald · on Jan 6, 2015

Off the top of my head:

    sub vcl_recv {
      set req.http.X-Custom-Cookie-Value = regsub(req.http.cookie, ".*;|^)YOUR_COOKIE_NAME=([^;]+)(;.*|$)", "\2");
    }

Then you can just:

    sub vcl_hash {
      set req.hash += req.http.X-Custom-Cookie-Value;
    }

And now you're cache segmented on that cookie value.

Argorak · on Jan 6, 2015

You will probably have to write a VCL plugin if you want to parse the cookie, but that's not a huge hassle.

acdha · on Jan 6, 2015

You can handle cookies in pure VCL – the code's not particularly elegant but it's manageable for light usage:

https://www.varnish-cache.org/trac/wiki/VCLExampleRemovingSo...

Argorak · on Jan 6, 2015

I'm not sure whether I would call "munging the cookie header with regexps" "cookie handling" ;).

acdha · on Jan 6, 2015

Agreed – it's possible and for simple tasks such as stripping an analytics cookie it's workable but for anything more serious you'd want something like https://github.com/lkarsten/libvmod-cookie

sandstrom · on Jan 6, 2015

I understand that you want to offer something that 'beats' varnish, and it shines through in the article.

But I don't think it matters if your cache is better than every other cache. Rather, as long as you offer a convenient, easily implemented cache, built into the webserver, that's great in itself. We're using Passenger on all our production servers and are most satisfied, because of its ease of use.

Perhaps you could just write "this could be accomplished with Varnish, which has a lot of benefits for advanced cases, but we think our cache will be useful for those that prefer not to manage a separate caching tier."

FooBarWidget · on Jan 6, 2015

> I understand that you want to offer something that 'beats' varnish, and it shines through in the article.

I am the author of the article. No, the point is not to "beat" Varnish. It is an article describing various ideas and a call for help. See https://news.ycombinator.com/item?id=8844905

Perhaps the writing style gave a competitive impression, so I've updated the article to mention that we're not out to beat Varnish, but to research the possibilities.

Knowing that Varnish can accomplish some of the things is good, because that way we can draw from an existing pool of experience.

cheald · on Jan 6, 2015

Everything described in the article is covered by Varnish. You can hash on Vary headers, on individual cookie values, on the sum of the digits in the user's IP address if you want. ESI lets you provide partial caching of pages as the article describes - it's actually a separate standard that's existed since 2001 (http://en.wikipedia.org/wiki/Edge_Side_Includes).

Varnish also gives us things like ACLs for managing access to various resources, on-demand content purges, multiple routable backends with different cache/grace rules, and more powerfully, request pre-processing - one thing we do is process the request and determine if the agent is capable of accepting WebP images, and if they are, we add that to the hash key with a corresponding header for the app to key on and determine whether to serve JPEG or WebP images. This lets us serve WebP images to modern agents for faster downloads, while gracefully falling back to JPEG for anything we're not sure of.

Varnish is way more than a "make Wordpress not destroy your server" cache.

cheald · on Jan 6, 2015

Varnish also supports plugins for extreme flexibility. For example, I wrote a plugin for our Varnish install which performs HMAC validation of a specific signed cookie and then sets a header which is used downstream in the caching rules.

Varnish is mature, powerful, and fast as hell. It would take a lot of work to reach a point where I'd swap it out for something else.

stephenr · on Jan 6, 2015

Exactly.

I'm not really sure what situations this built-in cache would be more effective than the likes of a well-tuned Varnish.

samarudge · on Oct 15, 2013

Thanks!

The HTTP library (requests) by default will pull a gzipped version of the resolvers list from Github (Content-Length returned for current version is 59029 compared to 239359 with gzip disabled). Compressing with `gzip -9` gives me a file size of 49648 so I don't think the added complexity of having to consciously deal with the compression in the application outweighs the small gain over the standard HTTP compression Github and Requests provide by default.

I had looked at pre-downloading the resolvers file in the setup script, unfortunately there doesn't seem to be a decent, reliable way to do it. If people download the source and run `setup.py install` it's easy but I'd imagine most people will just install with `pip` or `easy_install` which makes things a bit more complicated since nether of them seem to run post install actions.

Both good suggestions though, I'll keep them on my todo list.

toomuchtodo · on Oct 15, 2013

I'm happy to submit a pull request if you'll accept them. Already have forked the repo.

EDIT: You're right regarding Github serving it gzipped. You can disregard my comment on that; I was using wget for testing without specifying --header='Accept-Encoding: gzip'.

samarudge · on Oct 15, 2013

If you could that would be greatly appreciated. This is the first time I've put something on Pypi so not completely familiar with it.

samarudge · on Sept 14, 2013

RedHat loves CentOS. To quote their CEO

> CentOS is one of the reasons that the RHEL ecosystem is the default. It helps to give us an ubiquity that RHEL might otherwise not have if we forced everyone to pay to use Linux. So, in a micro sense we lose some revenue, but in a broader sense, CentOS plays a very valuable role in helping to make Red Hat the de facto Linux.[0]

[0]http://readwrite.com/2013/08/13/red-hat-ceo-centos-open-sour...

toomuchtodo · on Sept 14, 2013

Somewhat why Adobe didn't care if you pirated Photoshop. If you weren't making money with it, they'd rather you know it than something else (maybe gimp?). Having mindshare (uggh, I hate that word) definitely helps when it comes time to get the credit card out for tools you'll need for a paying project.

samarudge · on Aug 15, 2013

Not related to the article but Firefox (23.0) is giving me a mixed-content warning and blocking most of the CSS http://i.imgur.com/CWysR0Y.png

samarudge · on July 25, 2013

Page from the UK site with full specs and prices in £ https://www.ovh.co.uk/dedicated_servers/kimsufi.xml

samarudge · on July 7, 2013

I don't work for a big company, so this is purely speculation, but there could be many reasons why it's not done.

* Documentation Time

If you've got a big project, it could take weeks, even months to properly document some software to a position where it could be used by someone outside of the company. Sure you have your internal documentation but it can often be incomplete, or make assumptions that the person reading it knows about other bits of the company.

* Deployment

Big projects will often use very specialized hardware, software and environments, to the point where it could be nearly impossible to deploy outside of the company. It could depend on internal services that can't be open sourced because there still used, or are an important part of the business. Take Google Reader, yes it would be nice if it was open sourced, but internally it probably uses services, databases, APIs specialized just for Google, it's probably been optimized to work on Google's hardware, with their webserver, with their OS build etc.

Reddit is another example of this, Reddit's code is open source, and while it can be deployed, it's not easy. This seems mostly because it's been built to work on a very specialized set of software versions, and in a very specific environment. Larger open source projects tend to be tested on a multitude of environments, with applications only deployed or built internally, there's no point because you can very accurately control your environment.

* Some of the code is still used

Some, or even big chunks of the code could still be being used in current software. If you've got a library that's particularly useful, you might keep using it. If it works there's no point re-writing it just for a new project.

* The code is very bad

We all know it happens, a project contains terrible code, bad bugs and maybe even security issues, they never bothered getting fixed because they were never noticed. Given the opportunity to look through the code people might pick up on these issues and it would look bad on the company.

* Open source is complicated

Open source seems to come along with a whole host of fun things to deal with, GitHub issues, ranty blog posts, forks, copyright, licenses can all get a bit complicated. Even if it's old software that isn't used anymore you probably need to do some degree of management before things get too out of hand. Even a single tweet can have a big impact on a company, or a products reputation, so particularly at larger companies they'd probably want it managed in some way.

Anyone else got anything to add?

kabdib · on July 7, 2013

Dependencies on commercial software (e.g., that package for the sound system, purchased as source and modified, without which the product won't even compile).

Dependencies on specialized build tools; porting to something free would not be easy.

Exposure of security holes in existing deployments by revealing bad security practices.

flomo · on July 7, 2013

People tend to believe the source code is more useful than it actually is.

A good example was when Netscape open sourced Navigator v4. People couldn't get it to build and it was missing some proprietary components. So even though the open source world was desperate for a web engine, nothing was really done with it. In the end it was decided to start over from scratch with Mozilla.

lquist · on July 7, 2013

This.

People often underestimate the amount of effort that open sourcing (in a responsible manner) a large project requires.

PeterisP · on July 7, 2013

Royalties to some components.

You can't really opensource a project if it includes, say, a movie playback component or 3d engine or audio code with royalty-based licencing; the code won't even compile with it, and it may be customized/integrated to an extent where it would be a lot of work to even identify which of your source code files are "contaminated" by licenced stuff that isn't yours to publish.

GnarfGnarf · on July 7, 2013

Some of the code contains someone else's proprietary code. IBM have said they can't release OS/2 for this reason.

samarudge · on June 5, 2013

It validates emails with a regex. I'm not sure how many times it's been discussed that you shouldn't validate emails with a regex, but please stop validating emails with a regex.

Source: https://github.com/guillaumepotier/validator.js/blob/master/...

http://davidcel.is/blog/2012/09/06/stop-validating-email-add... http://stackoverflow.com/a/202528/744180 (others, just Google)

asperous · on June 5, 2013

A dang long one at that! I'm sure the author just copied and pasted it someone. It looks like it's not that great: http://tinyurl.com/emailregextest

Maybe we can find a better simple solution they can use? If they don't have that feature many people will be frustrated.

Just as a reminder, if you are sending email verifications out anyway, there is no point in validating email addresses manually.

guillaumepotier · on June 5, 2013

Right. But not including it will inexorably lead to new Issue or PR to add it. Not sure it is validatorjs's role to educate users in their validation, just providing best as possible tools to do it right, no ?

samarudge · on June 5, 2013

I guess it depends on one's perspective. You are probably right that someone will send a pull request to add it if it's not there, but I think authors of tools should always do their best to encourage users to do things correctly, especially issues like this which can easily be found without even knowing it's an issue (Google-ing "Email Regular Expression" brings up results for me on the first page recommending it's not a good idea)

Certainly when it comes to security, authors should do their best to ensure users of the tool are educated, when there's a security issue with any large software product, particularly open source, that's mostly down to poor configuration or ill-informed users, the authors are instantly criticized, but I think it should be the same for any general features in a tool that targets a specific functionality. This particular "best practice" is quite easy to find, and the tool is very specifically targeted towards validation, yet has something that's against best practice and could potentially cause frustration for people who use this tool, and people who use the stuff created using this tool.

samarudge · on June 3, 2013

Got this exact same thing, tried it on the device (Nexus 4) play store, it lets me view the app, hit install, accept the permissions then says "The item that you were attempting to purchase could not be found" =/

samarudge · on May 30, 2013

I think Daily Grace (+1.3mil subscribers, +100mil video views) sums this up best - http://www.youtube.com/watch?v=b7Nxyqi4WK0