Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Data Parallel Extensions for Python (intelpython.github.io)
109 points by nateb2022 on Nov 25, 2023 | hide | past | favorite | 9 comments


Despite sounding very broadly useful, this documentation seems to go out of its way to avoid providing a list of what hardware you can target with this framework, aside from mentioning some of the other middleware this can interface with. I'd much rather see documentation that prioritizes the user's needs over Intel's corporate strategy policies.


Considering how poorly it seems to support cuda as a backend [0], I wouldn't hold my breath about non intel vendor support (amd cpu or gpu). If you want to try anyway, here's the benchmark suite [1] and I couldn't find sample results available.

As for less common gpus, there really is no good support in any library. If you ever want to go down a fun rabbit hole, try to use the gpu in a raspberry pi for something. You'll eventually find one guy who reverse engineered the drivers to make a compiler but that's it. That's also the story for most uncommon chips with gpus (amlogic, broadcom, etc) where the real vendor gives you a driver blob for linux and no documentation. On the other hand rockchip has some good things available [2].

[0] https://github.com/IntelPython/dpctl/discussions/1124

[1] https://github.com/IntelPython/dpbench/blob/main/README.md

[2] https://github.com/rockchip-linux/rknn-toolkit2


The documentation talks about OpenCL & Level-Zero so so probably Intel, AMD, NVidia.

(https://intelpython.github.io/dpctl/latest/docfiles/dpctl/dp...)


I think it's just Intel, whatever's compatible with oneAPI.

- Intel® Processor Graphics Gen9 and above

- Xe Architecture


Nice.

Top bad it's not a thread about OpenAI and/or Rust. I'm sure we'd see a magnitude of more experts here.


Funny enough, if they would have included Polars library to the list, they could add Rust to the title.


Do you have some benchmarks?

Last month I had a problem becasue I don't know enough Python to be sure when something will be broadcasted and parallelized. So I had to make a few tries until it was faster. Do you have an example of an easy case were this is much faster?



Are the resulta of the benchmarks posted somewhere?




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: