Hacker News new | past | comments | ask | show | jobs | submit login

No. It is a combination of structured and unstructured data. It's just that the data is not in some consolidated or coherent schema that is useful for the business to use.

Hadoop and NoSQL systems are critical for this role since often it is extremely time consuming to (a) design the final end state schema and (b) create the ETL processes to populate it. So the idea is to just fill the data lake as quickly as possible and then work out later how to use it.

Data lakes are a concept that apply mainly to enterprises so we are talking about big data and complex, multi disciplinary/functional schemas.




A key thing is that an HDFS-like system has compute resources that scale with the storage, so the time to do a "full scan" of the system is constant with respect to the number of nodes.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: