Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You're basically dumping down a database to the web browser, including all of the internal metadata that's likely irrelevant to rendering the HTML.

For example, user role memberships:

   {
        "id": "c80b68c5-09ae-4a50-a447-df7c5a4a6d01",
        "type": "user",
        "attributes": {
            "username": "kinshiki",
            "roles": [
                "ROLE_MEMBER",
                "ROLE_GROUP_MEMBER",
                "ROLE_POWER_UPLOADER"
            ],
            "version": 1
        }
    }

Also record timestamp dates like created/changed, along with contact details that may be revealing sensitive info:

    "attributes": {
        "name": "SENPAI TEAM",
        "locked": true,
        "website": "https:\/\/discord.gg\/84e3j9b",
        "ircServer": null,
        "ircChannel": null,
        "discord": "84e3j9b",
        "contactEmail": "senpai.info@gmail.com",
        "description": null,
        "official": false,
        "verified": false,
        "createdAt": "2021-04-19T21:45:59+00:00",
        "updatedAt": "2021-04-19T21:45:59+00:00",
        "version": 1
    }
But let's just go back to your response:

> Most of it is page filenames which indeed could be made optional

Do that! If you strip them out, the 529 kB document shrinks to 280 kB, which hardly seems worth the hassle, but when gzipped, this is a miniscule 13 kB! This is because those strings are hashes, which significantly reduces their compressibility compared to general JSON, which usually compresses very well.

It's basic stuff like this that can make a website absolutely fly.

Avoid giving computers unnecessary, mandatory work: https://blog.jooq.org/many-sql-performance-problems-stem-fro...



As I said, it's not so much that we ask that data to be fetched -- it is there in the first place, and pulled from Elasticsearch, not a SQL database

Because of this model, we also make sure that Elasticsearch merely works a search cache, not as an authoritative content database (hence everything we add in there is considered public, on purpose, and what isn't meant to be public is just not indexed in ES)

However the gzip efficiency improvements would be really neat for sure

Fwiw I also don't work on the backend and there might be good reasons to not expressly filter out data (yet anyway, perhaps it will end up as a separate entity and be a include parameter)


I have to say I'm glad this is being talked about in a public forum. Outsiders rarely get to see brainstorming, troubleshooting & group discussion of technological issues like this.

Someone who is focused on the performance aspect & someone who is focused on stack stability discussing the real world input & output of a business system and showing why performance & UX are not the only metrics that matter is a good thing for us to see.


You can query Elastic for specific fields only: https://www.elastic.co/guide/en/elasticsearch/reference/curr...

Edit: As you said, there may be reasons on the backend not to filter things out of the query. Though it seems likely that the web response could be trimmed down.


This seems less like a performance problem and more of a security issue. Especially considering that this is a website that hosts unlicensed translations. How much of this information is actually intended to be made public?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: