Hacker Newsnew | past | comments | ask | show | jobs | submit | guites's commentslogin

I'm working on something similar for Brazilian blogs. For now it's a collection of ~500 blogs, and I plan to expand it by following external links found in these blogs and then somehow finding out which are links to other br blogs.

Not really a curated list, but I would welcome projects that filter the existing blogs into categories and etc.

https://guilhermegarcia.dev/brcrawl


Oh neat! I went and shared this with some Brazilian friends of mine. They tend to like stuff like this so I think it will make them happy.

I know I enjoyed checking out some sites you have in that list (using translation) so thanks!

I love browsing outside the anglosphere but it can be challenging sometimes finding curators that find regular new sites or posts to check out when other languages aren't part of your regular browsing routine.


Requesting https://github.com/lobsters/lobsters as I'm going through that codebase and would be able to provide feedback. cheers

ps. just gonna second everyone else who's saying being able to edit out incorrect data is very important, otherwise people are gonna be weary of reading repos they aren't already familiar with.



On it !


Anyone with the time would care to explain how voting is handled on the server? I image that the image which is generated on the client is somehow tied to an endpoint that runs the necessary SQL or something.


https://guilhermegarcia.dev

Coming in a little late, I've got a blog where I write about my tech findings in brazillian Portuguese. Mostly short write ups about how I research a new technology or topic (though there are some non tech related stuff here and there).


Hey! Glad to see flower getting attention on hn.

I've been working on a project for over a year that uses flower to train cv models on medical data.

One aspect that we see being brought up again and again is how we can prove to our clients that no unnecessary data is being shared over the network.

Do you have any tips on solving that particular problem? I.e. proving that no data apart from model weights are being transferred to the centralized server?

Thanks a lot for the project.

edit: Just to clarify I am aware of differential privacy, I'm talking more on a "how to convince a medical institution that we are not sending its images over the network" level.


If you're concerned about data leakage, it's worth noting that model weights can very easily be used to reconstruct the original data that it was trained on: so it could be misleading to claim that user data isn't being shared over the network. To avoid this, you'd need to look into techniques like Secure Aggregation or local differential privacy. Flower does provide some of this, FWIW.


This doesn’t sound right, if they don’t know the structure of the NN how can the reconstruct from the weights alone? (Perhaps the structure is communicated within the weights?)


Every agent training the model on their proprietary data has to have access to the model form in some way (otherwise how would they train it?)

For this reason, one must assume that the model form is known to the adversary.

With this, the question becomes: is it possible to reconstruct training data from a trained model? We already know that, at least for some image models, the answer to that question is "yes": https://arxiv.org/pdf/2301.13188.pdf


That must only be true if there isn’t a one way compression step occurring, or any approximation in the whole model.


I don't think lossy compression is sufficient. The very first example in the paper I linked to is clearly not identical to the original image (=lossily compressed) yet leaks a training image in a way that would be highly problematic in certain domains, e.g. medical imaging.


I see what you are saying. Agree. Seems we need some set patterns in NN models that will reliably remove reversibility without effecting loss too drastically.


Hi guites, Thank you! That is undoubtedly something relatable. We have it on the screen and plan to provide helpful material and presentations helping to convince stakeholders. If you are up for a call to share the specific challenges, we could ideate with you.


Would love to! You can grab my email on my profile. Could you ping me over there? Thanks


Thanks, glad you like it!

One approach to increase the transparency on the client side (and build trust with the organization where the Flower clien is deployed) is to integrate a review step that asks the someone to confirm the update that gets send back to the server.

On top of that, you should definitely use differential privacy. To quote Andrew Trask here: "friends don't let friends use FL without DP". Other approaches like Secure Aggregation can also help, depending on what kind of exposure your clients are concerned about.

My general take is that the best way to solve for transparency and trust is to tackle it on multiple layers of the stack.


A review steps sounds like a good idea. Our implementation involves very little interaction on the client side, besides setting up the datasets etc, so maybe a way to log information sent for later inspection would help.

I'll be looking into secure aggregation as I'm not fully aware of how it works. As of now we rely on differential privacy only.

Thanks!


Cool. I saw a proposal to use TEEs for secure aggregation. OpenFL uses Gramine for that. Not sure if that provides sufficient protection, really, but worth having on the radar.

https://arxiv.org/abs/2105.06413 https://openfl.readthedocs.io/en/latest/index.html https://gramineproject.io/


Flower has an agreement to develop interoperable components with OpenFL. This is part of the broader plan by Intel to work with a consortium of players (that includes Flower Labs) and have the output code sit with the Linux Foundation. Enabling TEE support within OpenFL for SA assessible to Flower users is precisely the type of opportunities we seek to make possible by working with Intel on this.

This is the official press release for those who are interesed: https://www.intel.com/content/www/us/en/newsroom/news/transi...

More broadly, in regards too your comment -- our current SA support does not require hardware support, which is what we targeted first, so that can be broadly adopted in many potential hosts of FL aggregation servers. It is suitable for most applications in need of privacy, although still requires certain assumptions to be met such as the number of nodes within a round, and other factors.


What about MPC + DP? Are you planning to integrate any SMPC algorithms on flower or do you find any limitations for not doing so.

I'm trying to apply federated learning to the medical domain too and I'm trying to define the best "stack" that guarantees privacy and compliance with regulations like the GDPR


I can’t speak for Flower’s core dev roadmap, but PySyft is in the process of integrating Flower and some Secure Enclave options which would let you do this.

Congrats on the launch Flower team!


Thanks! We're huge fans of the work that PySyft is doing, and we're very supportive of the Flower PySyft integration.


Agreed that this is an interesting direction. The core Flower abstractions are "federated learning agnostic", which means that they can be used for different kinds of distributed/federated workloads, not just federated learning. We'll add examples for more approaches (like SMPC) in the future, we just don't have the bandwidth to do it immediately.


I would generalize and say that tinkerers are better programmers, because you tend to get a better grasp of how your tooling works.

People that create vscode extensions and customize their workflows also share that spirit.


This has strong Tibia (the mmo) vibes. Would also love it, although the maps would have to be a bit bigger, or maybe the drop rates smaller.

This would also make it harder for the first adventurers and gradually easier for late comers (more bodies/more loot)


Woah, such a coincidence, I've been trying yabai[1] on my macOS mini for the past few days.

I wonder how they compare?

Installing yabai involves some fishy tinkering with system integrity protection and giving screen recording permission for some functionalities, which doesn't bother me while at my personal PC but is kinda troublesome for a work machine.

Also, does it include some sort of workspace style virtual monitors? Like i3 does for linux.

In this case I would love if it allowed me to load custom workspaces on boot, which is such a pain to get working on i3[2]. (I'm not sure if yabai handles workspaces).

1. https://github.com/koekeishiya/yabai 2.https://wiki.archlinux.org/title/I3 section 5.3.2 for example


Sadly Apple Silicon is still very much difficult to properly configure for some areas of ML whilst leveraging the GPU (object detection for example), so having that nvidia really makes the set up smoother.


This is a great example of how someone with little programming knowledge could leverage an AI into building simple scripts.

Lately I've been encouraging my friends into trying just that.

If the poster would want, for example, to save all current tabs when switching context (going from dev to marketing, for example), this would quickly turn into a more involved debugging/prompting question.


That would be a great follow on. I posted the repo if you want to leave any feedback or help me continue building it out: https://github.com/brevdev/dev-browser


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: