Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Hmm. I can see some ways for this to work.

Awhile ago I started a project converting government voting records (both elections and congressional) into a database. Would that be interesting to you?

Here's an idea: You could also host a data bounty program, and/or start a grant program for the production of these data sets.

Still missing an answer to "what are you requirements"? How do you verify data quality, etc? What format(s)?



We support arbitrary data formats in that Quilt falls back to a raw copy if it can't parse the file. On the columnar side (things we convert to Parquet) we support XLS, CSV, TSV, and actually anything that `pandas.read_csv` can parse. We use pandas and pyarrow for column type inference. We want to add a "data linter" that checks data against user-provided rules, and welcome such feature requests on GitHub or in our Slack Channel.


Currently, we support 2 "targets" a Pandas DataFrame and a file. Files can be any format. The Quilt build logic uses Pandas to read files into DataFrames so any format Pandas can read should work in Quilt to create a DataFrame node.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: