Hacker Newsnew | past | comments | ask | show | jobs | submit | meteo-jeff's commentslogin

In case someone is looking for historical weather data for ML training and prediction, I created an open-source weather API which continuously archives weather data.

Using past and forecast data from multiple numerical weather models can be combined using ML to achieve better forecast skill than any individual model. Because each model is physically bound, the resulting ML model should be stable.

See: https://open-meteo.com


Is there somewhere to see historical forecasts?

So not "the weather on 25 December 2022 was such and such" but rather "on 20 December 2022 the forecast for 25 December 2022 was such and such"


Not yet, but I am working towards it: https://github.com/open-meteo/open-meteo/issues/206


I’ve always wanted to see something like that. I always wonder if forecasts are a coin flip beyond a window of a few hours.


I just quit photographing weddings (and other stuff) this year. It's a job where the forecast really impacts you, so you tend to pay attention.

The amount of brides I've had to calm down when rain was forecast for their day is pretty high. In my experience, in my region, precipitation forecasts more than 3 days out are worthless except for when it's supposed to rain for several days straight. Temperature/wind is better but it can still swing one way or the other significantly.

For other types of shoots I'd tell people that ideally we'd postpone on the day of, and only to start worrying about it the day before the shoot.

I'm in Minnesota, so our weather is quite a bit more dynamic than many regions, for what it's worth.


I know at a minimum that hurricane forecasts have gotten significantly better over time. We can now

https://www.nhc.noaa.gov/verification/verify5.shtml

Our 96 hour projections are as accurate today as the 24 hour projections were in 1990.


Looks like https://sites.research.google/weatherbench/ attempts to "benchmark" different forecast models/systems.

They're very cautious about naming a "best" model though!

> Weather forecasting is a multi-faceted problem with a variety of use cases. No single metric fits all those use cases. Therefore,it is important to look at a number of different metrics and consider how the forecast will be applied.


That last paragraph sounds like something ChatGPT would write.


Are you thinking something like https://www.forecastadvisor.com/?


I would like to see an independent forecast comparison tool similar to Forecast Advisor, which evaluates numerical weather models. However, getting reliable ground truth data on a global scale can be a challenge.

Since Open-Meteo continuously downloads every weather model run, the resulting time series closely resembles assimilated gridded data. GraphCast relies on the same data to initialize each weather model run. By comparing past forecasts to future assimilated data, we can assess how much a weather model deviates from the "truth," eliminating the need for weather station data for comparison. This same principle is also applied to validate GraphCast.

Moreover, storing past weather model runs can enhance forecasts. For instance, if a weather model consistently predicts high temperatures for a specific large-scale weather pattern, a machine learning model (or a simple multilinear regression) can be trained to mitigate such biases. This improvement can be done for a single location with minimal computational effort.


How did you handle missing data? I’ve used NOAA data a few times and I’m always surprised at how many days of historical data are missing. They have also stopped recording in certain locations and then start in new locations over time making it hard to get solid historical weather information.


Open-Meteo has a great API too. I used it to build my iOS weather app Frej (open source and free: https://github.com/boxed/frej)

It was super easy and the responses are very fast.


That’s awesome! I’ve hooked something similar up to my service - https://dropory.com which predicts which day it will rain the least for any location

Based on historical data!


Yikes, after completed three steps I was asked for my email. No to your bait and switch, thanks!


It can take up to 10 min to generate a report - I had a spinner before but people just left the page. So I implemented a way to send it to them instead. I’ve never used the emails for anything else than that. Try it with a 10 min disposable email address if you like. Thanks for your feedback!


Ok, seems like your UI is not coming from a place of malice. However, pulling out an email input form at the final step is a very widespread UI dark pattern, so if nothing else please let people know that you will ask their email before they start interacting with your forms.


Hi Jeff, Great work, Respect!

I just hit the daily limit on the second request at https://climate-api.open-meteo.com/v1/climate

I see the limit for non-commercial use should be "less than 10.000 daily API calls". Technically 2 is less than 10.000, I know, but still I decided to drop you a comment. :)


10.000 requests / (24 hours * 60 minutes * 60 seconds) = 0.11 requests / second

or 1 request every ~9 seconds.

Maybe you just didn't space them enough.


Maybe, that would be funny. ~7 requests per minute would be a more dev-friendly way of enforcing the same quota.


I confirm, open-meteo is awesome and has a great API (and API playground!). And is the only source I know to offer 2 weeks of hourly forecasts (I understand at that point they are more likely to just show a general trend, but it still looks spectacular).

It's a pleasure being able to use it in https://weathergraph.app


> And is the only source I know to offer 2 weeks of hourly forecasts

Enjoy the data directly from the source producing them.

American weather agency: https://www.nco.ncep.noaa.gov/pmb/products/gfs/

European weather agency: https://www.ecmwf.int/en/forecasts/datasets/open-data

The data’s not necessarily east to work with, but it’s all there, and you get all the forecast ensembles (potential forecasted weather paths) too


Thank you, I didn't know! I'd love to, but I'd need another 24 hours in a day to also process the data - I'm glad I can build on a work of others and use the friendly APIs :).


This is awesome. I was trying to do a weather project a while ago, but couldn't find an API to suit my needs for the life of me. It looks like yours still doesn't have exactly everything I'd want but it still has plenty. Mainly UV index is something I've been trying to find wide historical data for, but it seems like it just might not be out there. I do see you have solar radiation, so I wonder if I could calculate it using that data. But I believe UV index also takes into account things like local air pollution and ozone forecast as well.


How about https://pirateweather.net/en/latest/ ?

Does anyone have a compare this API with the latest API we have here?


Both APIs use weather models from NOAA GFS and HRRR, providing accurate forecasts in North America. HRRR updates every hour, capturing recent showers and storms in the upcoming hours. PirateWeather gained popularity last year as a replacement for the Dark Sky API when Dark Sky servers were shut down.

With Open-Meteo, I'm working to integrate more weather models, offering access not only to current forecasts but also past data. For Europe and South-East Asia, high-resolution models from 7 different weather services improve forecast accuracy compared to global models. The data covers not only common weather variables like temperature, wind, and precipitation but also includes information on wind at higher altitudes, solar radiation forecasts, and soil properties.

Using custom compression methods, large historical weather datasets like ERA5 are compressed from 20 TB to 4 TB, making them accessible through a time-series API. All data is stored in local files; no database set-up required. If you're interested in creating your own weather API, Docker images are provided, and you can download open data from NOAA GFS or other weather models.


This is great. I am very curious about the architectural decisions you've taken here. Is there a blog post / article about them? 80 yrs of historical data -- are you storing that somewhere in PG and the APIs are just fetching it? If so, what indices have you set up to make APIs fetch faster etc. I just fetched 1960 to 2022 in about 12 secs.


Traditional database systems struggle to handle gridded data efficiently. Using PG with time-based indices is memory and storage extensive. It works well for a limited number of locations, but global weather models at 9-12 km resolution have 4 to 6 million grid-cells.

I am exploiting on the homogeneity of gridded data. In a 2D field, calculating the data position for a graphical coordinate is straightforward. Once you add time as a third dimension, you can pick any timestamp at any point on earth. To optimize read speed, all time steps are stored sequentially on disk in a rotated/transposed OLAP cube.

Although the data now consists of millions of floating-point values without accompanying attributes like timestamps or geographical coordinates, the storage requirements are still high. Open-Meteo chunks data into small portions, each covering 10 locations and 2 weeks of data. Each block is individually compressed using an optimized compression scheme.

While this process isn't groundbreaking and is supported by file systems like NetCDF, Zarr, or HDF5, the challenge lies in efficiently working with multiple weather models and updating data with each new weather model run every few hours.

You can find more information here: https://openmeteo.substack.com/i/64601201/how-data-are-store...


I always suspect that they don't tell me the actual temperature. Maybe I am totally wrong but I suspect. I need to get my own physical thermometer not the digital one in my room and outside my house and have a camera focussed on it. So that later I can speed up the video and see how much the weather varied the previous night.


What? Why?


There is also https://github.com/google-research/weatherbench2 which has baselines of numerical weather models.


this is really cool, I've been looking for good snow-related weather APIs for my business. I tried looking on the site, but how does it work, being coordinates-based?

I'm used to working with different weather stations, e.g. seeing different snowfall prediction at the bottom of a mountain, halfway up, and at the top, where the coordinates are quite similar.


You'll need a local weather expert to assist, as terrain, geography and other hyper-local factors create forecasting unpredictability. For example, Jay Peak in VT has its own weather, the road in has no snow, but it's a raging snowstorm on the mountain.


Is it able to provide data on extreme events. Say, the current and potential path of a hurricane? similar to .kml that NOAA provides


Extreme weather is predicted by numerical weather models. Correctly representing hurricanes has driven development on the NOAA GFS model for centuries.

Open-Meteo focuses on providing access to weather data for single locations or small areas. If you look at data for coastal areas, forecast and past weather data will show severe winds. Storm tracks or maps are not available, but might be implemented in the future.


I would love to hear about this centuries-old NOAA GFS model. The one I know about definitely doesn't have that kind of history behind it.


Some of the oldest data may come from ships logs back to 1836

https://www.reuters.com/graphics/CLIMATE-CHANGE-ICE-SHIPLOGS...


Sorry, decades.

KML files for storm tracks are still the best way to go. You could calculate storm tracks yourself for other weather models like DWD ICON, ECMWF IFS or MeteoFrance ARPEGE, but storm tracks based on GFS ensembles are easy to use with sufficient accuracy


Appreciate the response. Do you know of any services that provide what I described in the previous comments? I'm specifically interested in extreme weather conditions and their visual representation (hurricanes, tornados, hails etc.) with API capabilities


Go to: nhc.noaa.gov/gis There's a list of data and products with kmls and kmzs and geojsons and all sorts of stuff. I haven't actually used the API for retrieving these, but NOAA has a pretty solid track record with data dissemination.


I was going to ask about air quality, but just opened the site and you have air quality as well! Thanks!


Are multiple data sources supported?


I have heard the same regarding 5xx errors in the past couple of months. I am also working on open-source weather API https://open-meteo.com/. It covers most of WeatherKit features and offers more flexibility. You can either use the public API endpoint or even consider to host your own API endpoint.

Forecast quality should be comparable as the API uses open-data weather forecasts from the American weather service NOAA (GFS and HRRR models) with hourly updates. Depending on the region, weather models from other national weather services are used. Those open-data weather models are commonly used among the most popular weather APIs although without any attribution.

If you have any questions, let me know!


Wrote you a mail. followed the project since a longer time!



That’s exactly what I want to change with my open source weather api https://open-meteo.com

It collects raw weather mode data and redistributes weather forecasts with simple APIs

Briefsky is also using it :)


Seriously, thanks a ton for providing an API-key free weather API! That's why it's the default for briefsky. Sorry if there's a load spike today ;)


Hi, creator of open-meteo.com here! I am using a more wide range of weather models to better cover Europe, Northern Africa and Asia. North America is covered as well with GFS+HRRR and even weather models from the Canadian weather service.

In contrast to pirate weather, I am using compressed local files to more easily run API nodes, without getting a huge AWS bill. Compression is especially important for large historical weather datasets like ERA5 or the 10 km version ERA5-Land.

Let me know if you have any questions!


open-meteo.com looks awesome. I've been messing around writing a snow forecast app for skiing/snowboarding for a while now and the main thing I'm missing is historical snowfall data. Do these data sources exist in a machine readable format and I've just not been able to find them? If so, would you ever consider adding precip + kind of precip to your historical API?


Snowfall is already available in the historical weather API. Because the resolution is fairly limited for long term weather reanalysis data, snow analysis for single mountains slopes/peaks may not be that accurate.

If you only want to analyse the weeks to get the date of last snowfall and how much power might be there, use the forecast API and the "past_days" parameter to get a continuous time-series of past high-resolution weather forecasts.


I am working on an open source weather api: https://open-meteo.com

If you are looking for raw weather forecast data, it could be a good start


Thanks for the info. Snowfall was recently added. I am afraid there could be a bug with that particular variable.

Temperature, clouds, etc, seem fine

EDIT: Issue identified and will be fixed in the next days! Thanks!


Sure. What do you think about "&start_date=20220701" and "&end_date=20220714"?

If end_date is not specified, it would return start_date with 7 days forecast


Some technical background:

Open-Meteo offers free weather APIs for a while now. Archiving data was not an option, because forecast data alone required 300 GB storage.

In the past couple of weeks, I started to look for fast and efficient compression algorithms like zstd, brotli or lz4. All of them, performed rather poor with time-series weather data.

After a lot of trial and error, I found a couple of pre-processing steps, that improve compression ratio a lot:

1) Scaling data to reasonable values. Temperature has an accuracy of 0.1° at best. I simply round everything to 0.05 instead of keeping the highest possible floating point precision.

2) A temperature time-series increases and decreases by small values. 0.4° warmer, then 0.2° colder. Only storing deltas improves compression performance.

3) Data are highly spatially correlated. If the temperature is rising in one "grid-cell", it is rising in the neighbouring grid cells as well. Simply subtract the time-series from one grid-cell to the next grid-cell. Especially this yielded a large boost.

4) Although zstd performs quite well with this encoded data, other integer compression algorithms have far better compression and decompression speeds. Namely I am using FastPFor.

With that compression approach, an archive became possible. One week of weather forecast data should be around 10 GB compressed. With that, I can easily maintain a very long archive.


Amazing you were able to get data from 300-GB to 10-GB, impressive!

Radar-Data: Find the most obvious gaps in predicting short-term weather are related to radar data. Obviously radar datasets would be require massive storage space, but curious if you have run across any free sources for archival radar data or APIs for real-time streams; or open source code from scrapping existing services radar feeds.


`300 GB` to `10 GB` was bit over optimistic ;-) 300 GB already included 3 weeks of data. `100 GB` to `10 GB` is a more realistic number.

Many weather variables like precipitation or pressure are very easy to compress. Variables like solar radiation are more dynamic and therefore less efficient to compress.

Getting radar data is horrible... In some countries like the US or Germany, it is easy, but many other countries do not offer open-data radar access. For the time being, I will integrate more open datasets first


I wonder if with bitpacking you can achieve even higher ratio, considering each temp has 3 digit and temp range 51.2 to -51.2 if reasonable range 1 for signature 9 temp bit could store 3 temp in an integer. Deltas might consume less range maybe but might need extra bit tweak, afaik fastpfor also does similar run with simd , but what i understand time is not your main concern.

Edit: just read the 0.04-0.02 range , if I understand right only putting 1 real temp and then deltas could fit 12 first int and 16 temp following ints? Quick napkin math , could be wrong:)


Yes, it is a combination of delta coding, zigzag, bitpacking and outliner detection.

It only works well for integer compression. For text-based data, results are not use-full.

SIMD and decompression speed is an important aspect. All forecast APIs use the compressed files as well. Previously I was using mmap'ed float16 grids, which were faster, but took significantly more space.


Would flac work for compression? Given the weather data is a time series of numbers it could be represented as audio. It would then automatically do the difference encoding thing you’re doing.

If you encoded nearby grid cells as audio channels, flac would even handle the correlation like it does for stereo audio.


3) Data are highly spatially correlated. If the temperature is rising in one "grid-cell", it is rising in the neighbouring grid cells as well. Simply subtract the time-series from one grid-cell to the next grid-cell. Especially this yielded a large boost.

Can you expand on this?


Sure. I bundle a small rectangle of neighbouring locations like 5x5 (= 25 locations). The actual weather model may have a grid like 2878x1441 cells (4 million).

Inside the 5x5 chunk, I subtract all grid-cells from the center grid-cell. The borders will then contain only the difference to the center grid-cell.

Because the values of neighbouring grid-cells are similar, the resulting deltas are very small and better compressible.


Hello, good work.

Please investigate if you would like to work for this company: https://www.energymeteo.de/ueber_uns/jobs.php


Feedback:

(1) Maybe it’s just me, but the “current jobs” are only available in German, if you switch to English, Spanish or French — the page gets translated, but the three “current jobs” drop down lists get removed; super confusing, since it gets reset to German if you click “current jobs” from any of the other pages;

(2) HN is an English site, would be nice if you were linking to the English page, not German;

(3) if you’re affiliated with the company, which I believe you are, you should say so and noting it in your profile with contact information would be nice too.

(4) Reminder that HN has free job postings every month if you are affiliated with the company:

https://news.ycombinator.com/submitted?id=whoishiring


Thank you for your feedback.


[flagged]


I think these are reasonable suggestions from a perspective of someone who wants this site to become better.


Interesting. Have you come across TileDB before?

https://tiledb.com/


I have not. It looks promising as it seems to offer multi dimensional data storage and some compression aspects.

I must also admit, that I like my simple approach of just keeping data in compressed local files. With fast SSDs it is super easy to scale and fault tolerant. Nodes can just `rsync` data to keep up to date.

In the past I used InfluxDB, TimescaleDB and ClickHouseDB. They also offer good solutions for time-series data, but add a lot of maintenance overhead.


Thanks for your response. I have no affiliation, it just piqued my interest.

> I must also admit, that I like my simple approach of just keeping data in compressed local files. With fast SSDs it is super easy to scale and fault tolerant. Nodes can just `rsync` data to keep up to date.

I also like this approach :-)


Actually, they are historical weather forecasts, but assembled to a continuous time-series.

Storing each weather forecast individually to a performance evaluation for "how good a forecast in 5 days is", would require a lot of storage. Some local weather models update every 6 hours.

But even with a continuous time-series, you can already tell how good or bad a forecast compared to measurements are. Assuming, your measurements are correct ;-)


This would still be a remarkable dataset for learning. And worth the storage. Though it might need other inputs as well (like pressure zone etc.) to escape potential biases.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: