Songkick | Full Stack Software & Platform Engineers | London, UK | Fulltime | ONSITE
Songkick is on a mission to bring the magic of live music to fans everywhere. Since 2007, we've set about making it as easy, fun and fair as possible for you to see your favorite artists live. Right now, more than 15 million music fans across the globe use Songkick to track their favorite artists, discover awesome concerts and never ever miss out.
We're looking for developers with the ability to take on a range of challenges: from developing our highly scalable website and mobile apps, to integrating with other platforms (streaming services, social networks), to large-scale data acquisition and processing.
We do our best work when we're happy, respectful and relaxed. Our values and work ethic have got us far, and as we grow we'll never shake that small startup feel. Earlier this year we became part of the Warner Music Group family, opening up epic new realms of opportunities to bring fans and artists closer together.
If you're interested, there's more info on our jobs page at songkick.com/jobs, or reach out to me directly on smudge [at] songkick.com
> the Varnish HTTP cache has been used very successfully to speed up WordPress. But Varnish doesn’t help a lot with logged-in traffic
> This is caching that Varnish and other “normal” HTTP caches (including CloudFlare) could not have done
Varnish supports the ESI (Edge-Side Includes) standard, which allows it to cache fragments of a page, and for the cache server to build them again. It also allows you to completely bypass the cache for certain fragments. This is also supported by a number of CDNs (Fastly, Akamai). I've used the ESI technique several times and have been able to achieve a >98% cache hit rate on Fastly for a site with dynamic per-user content. Even the cache misses are only responsible for rendering a small component of the page
Good to know. Using edge side includes may be easier than trying to change the app to a semi single page app. But that only solves half of the problem. The other half is varying the response based on the value of a specific cookie.
I've updated the blog post with information regarding edge side include.
I couldn't (quickly) find documentation on how to get the value of a specific cookie, but the server could send a user ID in a header or something Varnish can easily access to be used in the above function.
Agreed – it's possible and for simple tasks such as stripping an analytics cookie it's workable but for anything more serious you'd want something like https://github.com/lkarsten/libvmod-cookie
I understand that you want to offer something that 'beats' varnish, and it shines through in the article.
But I don't think it matters if your cache is better than every other cache. Rather, as long as you offer a convenient, easily implemented cache, built into the webserver, that's great in itself. We're using Passenger on all our production servers and are most satisfied, because of its ease of use.
Perhaps you could just write "this could be accomplished with Varnish, which has a lot of benefits for advanced cases, but we think our cache will be useful for those that prefer not to manage a separate caching tier."
> I understand that you want to offer something that 'beats' varnish, and it shines through in the article.
I am the author of the article. No, the point is not to "beat" Varnish. It is an article describing various ideas and a call for help. See https://news.ycombinator.com/item?id=8844905
Perhaps the writing style gave a competitive impression, so I've updated the article to mention that we're not out to beat Varnish, but to research the possibilities.
Knowing that Varnish can accomplish some of the things is good, because that way we can draw from an existing pool of experience.
Everything described in the article is covered by Varnish. You can hash on Vary headers, on individual cookie values, on the sum of the digits in the user's IP address if you want. ESI lets you provide partial caching of pages as the article describes - it's actually a separate standard that's existed since 2001 (http://en.wikipedia.org/wiki/Edge_Side_Includes).
Varnish also gives us things like ACLs for managing access to various resources, on-demand content purges, multiple routable backends with different cache/grace rules, and more powerfully, request pre-processing - one thing we do is process the request and determine if the agent is capable of accepting WebP images, and if they are, we add that to the hash key with a corresponding header for the app to key on and determine whether to serve JPEG or WebP images. This lets us serve WebP images to modern agents for faster downloads, while gracefully falling back to JPEG for anything we're not sure of.
Varnish is way more than a "make Wordpress not destroy your server" cache.
Varnish also supports plugins for extreme flexibility. For example, I wrote a plugin for our Varnish install which performs HMAC validation of a specific signed cookie and then sets a header which is used downstream in the caching rules.
Varnish is mature, powerful, and fast as hell. It would take a lot of work to reach a point where I'd swap it out for something else.
The HTTP library (requests) by default will pull a gzipped version of the resolvers list from Github (Content-Length returned for current version is 59029 compared to 239359 with gzip disabled). Compressing with `gzip -9` gives me a file size of 49648 so I don't think the added complexity of having to consciously deal with the compression in the application outweighs the small gain over the standard HTTP compression Github and Requests provide by default.
I had looked at pre-downloading the resolvers file in the setup script, unfortunately there doesn't seem to be a decent, reliable way to do it. If people download the source and run `setup.py install` it's easy but I'd imagine most people will just install with `pip` or `easy_install` which makes things a bit more complicated since nether of them seem to run post install actions.
Both good suggestions though, I'll keep them on my todo list.
I'm happy to submit a pull request if you'll accept them. Already have forked the repo.
EDIT: You're right regarding Github serving it gzipped. You can disregard my comment on that; I was using wget for testing without specifying --header='Accept-Encoding: gzip'.
> CentOS is one of the reasons that the RHEL ecosystem is the default. It helps to give us an ubiquity that RHEL might otherwise not have if we forced everyone to pay to use Linux. So, in a micro sense we lose some revenue, but in a broader sense, CentOS plays a very valuable role in helping to make Red Hat the de facto Linux.[0]
Somewhat why Adobe didn't care if you pirated Photoshop. If you weren't making money with it, they'd rather you know it than something else (maybe gimp?). Having mindshare (uggh, I hate that word) definitely helps when it comes time to get the credit card out for tools you'll need for a paying project.
I don't work for a big company, so this is purely speculation, but there could be many reasons why it's not done.
* Documentation Time
If you've got a big project, it could take weeks, even months to properly document some software to a position where it could be used by someone outside of the company. Sure you have your internal documentation but it can often be incomplete, or make assumptions that the person reading it knows about other bits of the company.
* Deployment
Big projects will often use very specialized hardware, software and environments, to the point where it could be nearly impossible to deploy outside of the company. It could depend on internal services that can't be open sourced because there still used, or are an important part of the business. Take Google Reader, yes it would be nice if it was open sourced, but internally it probably uses services, databases, APIs specialized just for Google, it's probably been optimized to work on Google's hardware, with their webserver, with their OS build etc.
Reddit is another example of this, Reddit's code is open source, and while it can be deployed, it's not easy. This seems mostly because it's been built to work on a very specialized set of software versions, and in a very specific environment. Larger open source projects tend to be tested on a multitude of environments, with applications only deployed or built internally, there's no point because you can very accurately control your environment.
* Some of the code is still used
Some, or even big chunks of the code could still be being used in current software. If you've got a library that's particularly useful, you might keep using it. If it works there's no point re-writing it just for a new project.
* The code is very bad
We all know it happens, a project contains terrible code, bad bugs and maybe even security issues, they never bothered getting fixed because they were never noticed. Given the opportunity to look through the code people might pick up on these issues and it would look bad on the company.
* Open source is complicated
Open source seems to come along with a whole host of fun things to deal with, GitHub issues, ranty blog posts, forks, copyright, licenses can all get a bit complicated. Even if it's old software that isn't used anymore you probably need to do some degree of management before things get too out of hand. Even a single tweet can have a big impact on a company, or a products reputation, so particularly at larger companies they'd probably want it managed in some way.
Dependencies on commercial software (e.g., that package for the sound system, purchased as source and modified, without which the product won't even compile).
Dependencies on specialized build tools; porting to something free would not be easy.
Exposure of security holes in existing deployments by revealing bad security practices.
People tend to believe the source code is more useful than it actually is.
A good example was when Netscape open sourced Navigator v4. People couldn't get it to build and it was missing some proprietary components. So even though the open source world was desperate for a web engine, nothing was really done with it. In the end it was decided to start over from scratch with Mozilla.
You can't really opensource a project if it includes, say, a movie playback component or 3d engine or audio code with royalty-based licencing; the code won't even compile with it, and it may be customized/integrated to an extent where it would be a lot of work to even identify which of your source code files are "contaminated" by licenced stuff that isn't yours to publish.
It validates emails with a regex. I'm not sure how many times it's been discussed that you shouldn't validate emails with a regex, but please stop validating emails with a regex.
Right. But not including it will inexorably lead to new Issue or PR to add it. Not sure it is validatorjs's role to educate users in their validation, just providing best as possible tools to do it right, no ?
I guess it depends on one's perspective. You are probably right that someone will send a pull request to add it if it's not there, but I think authors of tools should always do their best to encourage users to do things correctly, especially issues like this which can easily be found without even knowing it's an issue (Google-ing "Email Regular Expression" brings up results for me on the first page recommending it's not a good idea)
Certainly when it comes to security, authors should do their best to ensure users of the tool are educated, when there's a security issue with any large software product, particularly open source, that's mostly down to poor configuration or ill-informed users, the authors are instantly criticized, but I think it should be the same for any general features in a tool that targets a specific functionality. This particular "best practice" is quite easy to find, and the tool is very specifically targeted towards validation, yet has something that's against best practice and could potentially cause frustration for people who use this tool, and people who use the stuff created using this tool.
Got this exact same thing, tried it on the device (Nexus 4) play store, it lets me view the app, hit install, accept the permissions then says "The item that you were attempting to purchase could not be found" =/
Songkick is on a mission to bring the magic of live music to fans everywhere. Since 2007, we've set about making it as easy, fun and fair as possible for you to see your favorite artists live. Right now, more than 15 million music fans across the globe use Songkick to track their favorite artists, discover awesome concerts and never ever miss out.
We're looking for developers with the ability to take on a range of challenges: from developing our highly scalable website and mobile apps, to integrating with other platforms (streaming services, social networks), to large-scale data acquisition and processing.
We do our best work when we're happy, respectful and relaxed. Our values and work ethic have got us far, and as we grow we'll never shake that small startup feel. Earlier this year we became part of the Warner Music Group family, opening up epic new realms of opportunities to bring fans and artists closer together.
If you're interested, there's more info on our jobs page at songkick.com/jobs, or reach out to me directly on smudge [at] songkick.com