Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

From the the perspective of a technical person, such as a software developer or AI scientist, it's easy to think about what we do as working with data and algorithms. While we might step back and look at a system as a whole, a lot of technical people can't communicate the birds eye view of what they're creating. They might have the capacity, but easier for them to rely on the technical description - yeah we can just make sure the data is good and the problem is fixed. However, that doesn't communicate fully and allow another person to understand the larger picture or that you even recognize it yourself.

As an example, consider the architectural view of a system where you need availability, performance, reliability, and scalability scalability. Over time we've learned to add Security. In recent years we've begun to learn how important how to include accessibility and privacy - we're learning and don't always get it right. All these things are often implied and we expect people to believe that we already know to include them. Now, lets look at more dimensions that are affecting what we build in the area of equality, diversity, and inclusion, which are part of our discussion, especially when we're talking about AI where we lack explainability.

So, it's true that we have to look at the data and that's part of the data science associated with AI work. When we are doing data science, it's more than just munging data to try to get a validation set match. We have to do the same thing we do with all other software and look at the domain of the problem, what it's purpose is, who are the users, and what are we trying to accomplish. If we examine the data set these things will help inform us on the appropriateness of the data being used, which takes analysis, just like other software problems.

It's simple to say, let's just make sure we have an equal amount of data from each represented group. While that isn't bad, it isn't enough. Imagine hiring or loan application program. We have demographics like name, location, occupation, sex, race, etc. Some obvious discriminants are sex and race. However, think about things like location - is it possible for someone to be denied a loan because they live in a part of town considered high-risk? Maybe a human wouldn't make that connection, but a machine learning algorithm with historical data that discriminates will also use that data to automate the discrimination.

We have to look at the history, current status, and goals of what we want to do to ensure we're thinking the problem through, rather than some mechanical input-process-output steps. That's why simplifying the discussion with "just change the data" doesn't communicate the source of the problem or how it should be fixed. It would have been nice for both people to ask "BTW, what did you mean by ...?" to open the conversation so we all could learn.



Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: