Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: What problem are you close to solving and how can we help?
263 points by zachrip on Aug 29, 2021 | hide | past | favorite | 472 comments
Please don't list things that just need more bodies - specifically looking for intellectual blockers that can be answered in this thread.



I want to bring back old school distributed forum communities but modernise them in a way that respects attention and isn’t a notification factory.

Mastodon is a pretty inspirational project but the Twitter influence shows, I miss the long form writing that was encouraged before our attention spans were eroded.

Not at all close to solving it, but it’s been on my mind for a long time. Would love to hear if there are others like me out there and what you imagine such a community to look like.


Is this a software problem? There are a lot of open sources (with their differences) platforms. From old forums, to more modern, etc. I think the problem is that people don't want or don't know how to use a forum.

This is based in my experience:

- Old people: They started to use internet recently so they are used to social networks (Facebook, Instagram), and newspapers websites.

- Young people: Hard to make them use the browser, if there isn't an app (Instagram, TikTok), you are lost. If they want to discuss a topic, mostly Twitter through hashtags, YouTubers or Discord.

- Adult people: This is where some of them may use a forum, but you have to be lucky enough to find adult people who use internet many years ago, and they know what is a forum. If you find a 30 or 40 years old who started to use internet 5 years ago (which happens), you are lost.

And on top of that, you need to compete against Reddit and their own subreddits.

(Edited to format)


The value of a forum is not to get everyone. It is actually in limiting a community to the most mature voices and thoughtful people.

There's little upside to having teenagers in a forum, for example, unless you're looking to monetize.


Younger people can help evolve topics by providing a set of fresh eyes. It's also important to pass knowledge on, otherwise it just ends up dying with you.

For example: Virtual reality headsets were fairly stagnant until a young guy in his early 20s tried something new.

Old people often become stuck in their ways and it requires someone new to show up and ask how the sausage is made.


"fresh eyes" can still be the people new to the form who have that year just turned 35 (eg).


I was moderating forums when I was twelve. While I experience the usual “I was so embarrassing back then” when thinking back to those times, I do think my contributions were positive and well-received.


I used many old school forums in the pre-social media days. "Mature voices and thoughtful people" were quite rare.


If you think young people won't use forums, you clearly haven't seen Scratch's. (https://scratch.mit.edu/discuss/)


>- Young people: Hard to make them use the browser, if there isn't an app (Instagram, TikTok), you are lost. If they want to discuss a topic, mostly Twitter through hashtags, YouTubers or Discord.

I disagree on this take.

GenZ can do long-form discussion and they use forums frequently, but for specific discussion(s).

Content consumption is done via forever-scroll apps because it's good to kill time; Tiktok, Instagram, etc match this well because it's just a stream of things to have fun with.

But GenZ is making great "longford" content on many traditional platforms, even blogging. The difference, as I understand it, is their approach to engagement. GenZ has seen Facebook arguments and said "no thank you". Twitter discourse also isn't really a thing -- people disagree and Tweet at people, but Twitter isn't like a forum thread and there's no way to ensure your content is associated with the content you want to respond to, so it's not effective to communicate in responses. Reddit is kind of a mystery for me as I just don't frequent it at all, but I don't get the impression that GenZ is posting frequently.

GenZ has platforms that work best when you make a statement, not where you open a discourse; Twitter is too fast/broad to respond to all comments and find real content to respond to, video content just isn't great for back and forth and becomes time consuming for lighter topics. Viewers will make whatever statement they want, but the validity of the video is based on how far a concept is spread; I'd actually position GenZ is very good at concisely expressing its idea in a simple and condensed format, and responses are not at an individual, but an idea. But, they will go to Tumblr/Medium/other long-form post when the medium is appropriate. This is one thing I like about a lot of GenZ content because they tend to be VERY good about choosing the right format for their argument; that many don't have much more to express besides tweets/tiktoks isn't an indictment of GenZ, it's praise. Think of the forums you maybe still lurk around and how many comments are just complete garbage/non-sequiturs; for forums such a post are enough to derail a topic/distract because we feel obligated to a degree to respond, but filtering noise is part of the skill of using more modern platforms.

Forums are kind of contrary to this, and also are bogged down by the before-mentioned Facebook-argument issue and the preferred platforms not really being strongest for direct rebuttals. Again, there are times when you'll see GenZ use forums or other long-form posts, but it tends to be more controlled or on 'forums-but-not-really' forums like Tumblr.

Forums have their purpose and use; but we have __many__ alternatives that make forums defunct, as some topics/ideas are far better expressed on Tiktok or Twitter, and whatever followup we end up with.


this vaguely aligns with my experiences


I am building a project (https://linklonk.com) that does information discovery in a way that respects your attention. In short, when you upvote content you connect to other people upvoted that content and to feeds that posted that content. So to get your attention other users need to prove to be a good curator of content for you.

I'm planning to do a "Show HN" post next week for it and would appreciate any feedback that I could address before it. We have about 4 active users and a couple more would be great.


Really like this idea, something I've been thinking about for a while. Will join and give the tyres a kick :-)


Thanks! It looks like 13 people signed up, which is really encouraging.

I wrote about the performance tuning I did in preparation to the Shown HN post: https://linklonk.com/item/277645707356438528

If you have any feedback please add a comment to that post.


Hey, site down?


Yes, I managed to delete the "A" DNS record last night when I was adding records for mail hosting linklonk.com. Sorry! It is up now.



Hmm... did you happen to work at Google? The concept and the UI seems very familiar.


Bug report: the UI on interacting with posts in the “From feeds and users that recommended this” section is broken.


Thank you! Indeed, the upvote button on those item-based recommendations didn't work. It is fixed now.


Seems good but the UI can be improved IMO.

I have two questions please:

- Is there a list of all the feeds used by LinkLonk?

- How many users are there today?


Do you have any specific suggestions on how to UI? Small tweaks would be most appreciated as I could implement them before the upcoming Show HN post.

To answer your questions:

1. The list of all feeds used by LinkLonk is not publicly accessible through the website. They are feeds that users explicitly submitted through https://linklonk.com/submit or feeds that LinkLonk parsed from the meta tags of the links that users submitted.

2. The number of active users has been about 4 for the last few months. I'm hoping to get it to 10 this year.


When and how are you planning to change your Terms of Use / Privacy Policy? That kind of information should be in the documents. Your privacy policy is currently insufficient.

> We only collect your information for the purpose of providing this service.

Okay, but what information do you collect? “Your information” is too broad; if you're collecting my retinal scans or matching me to a behavioural analytics profile, you need to justify it. If you're not (which you're not), say so!

> You can delete your account and all your data at any time (see Profile).

But can I take all my data out? Currently, no; that's not a GDPR violation, so long as you provide it on request, but it's certainly a feature I like to have!

The 30-day deletion threshold for anonymous accounts is (IANALaTINLA) GDPR-compliant, since you have to delete personal information (on request) within 30 days, and that happens even if you can't figure out whose account details you should be deleting. Good job.


I share your concerns about user privacy. The information LinkLonk collects is what you explicitly provide (ratings, etc) and the regular server request logs (which include your ip address, user agent). I clarified this in https://linklonk.com/privacy

I do want to add functionality to download your ratings. I'm thinking of exporting the ratings data in either the bookmarks format (ie, the format that browsers use to export bookmarks: https://support.mozilla.org/en-US/questions/1319392), csv or json. Please let me know what format would be most useful.


Yeah, that's great now. (You're collecting even less than I expected!) Thanks for making LinkLonk.

JSON would probably be easiest to start with, because it's easy to generate, easy to read and well-defined. Bookmarks would be a nice extra, though.


Looking forward to see where you go. I used forums primarily for my automotive hobbies, but they all seemed to have died around 2013-2015 as Facebook groups took over. Still, forums are often the best place to find good information. I worry about the day that Facebook solves the search and weekly "repeat topic" problems that are the only thing holding it back.

It's a shame phpbb, vbullettin and other big players in the space were too slow to adapt to mobile.


> It's a shame phpbb, vbullettin and other big players in the space were too slow to adapt to mobile.

Was that a problem? My memory is that everything used 'Tapatalk' for mobile, perhaps before the first iPhone even (I recall using it on an iPod Touch).


I used to volunteer my time to a very major forum software. Tapatalk at the time had a very strange business model of having a plug-in/add-on that was free to the forum owner but charged for the app that the end user used. This was, even at the time, traditionally backwards from the current user interaction model: path of least resistance to engagement. It was unpopular among forum admins as they would rather buy a license for it like you would with vBulletin, Xenforo, Invision Power Board - including the people who ran open source ones like phpBB, SMF, et al.

While I understand Tapatalk have changed their business model since then, the damage was already done as the Facebook started to wholesale eat forums’ lunch as far as userbase. The biggest problem is that we never turned forum interactions into a protocol like we did with at the application level (smtp/pop, http, irc, xmpp) or on top of http like RSS, podcasts, or just plain, standardized REST APIs. This could have enabled multiple clients (like browsers) to appear and may have prevented facebooks swift dominance with online communities.

Everyone wanted to own their forum’s experience, but this stubbornness caused the fiction for users to sign up to be greater and greater. Platforms like Disqus attempted to solve this by creating an embeddable service to just drop comments in a context like a blog post, but this ultimately gave users almost no value it they were just in a shouting match against bots with generic messages laden with spam.

Facebook unified the experience for users, where the user could, with an account+app that they already had, browse and join groups and engage in discussions and become apart of communities in a way that forums could not possibly compete with.


Oh yes! I'd forgotten that aspect. Was there not anything you could do for free?

I wasn't involved with the hosting/software/ops etc. side of it at all, but I moderated 'The Computer Forum', latterly 'Computer Juice', and used it mainly with that. I fondly remember wasting an awful lot of time helping people solve Windows problems (haven't used it since.. not saying that's related..) and spec new builds.

I suppose that's all happening on StackExchange and probably some DIY custom pc Discord server or whatever these days.


It was, and it was pretty terrible.


I have spent a good bit of time thinking here too, on two primary parts.

First, in my mind the difference between reddit, FB, twitter, HN, forums, etc is really just configuration. Abstracting just a tad bit higher, you can include Slack and other realtime options. I just want a curated gRPC API that implements it with pluggable auth and let others figure out discoverability and network (not activity api and not with an already built network, just persistence and auth). End to end encryption is important IMO too (even for large groups) so the host can have plausible deniability.

Second, you have to solve for hosting/network in distributed fashion without over-complicating server democratization and discoverability/naming like p2p often does. I see 2 options: 1) self hosted on at-home workstation using Tor onion service (get NAT busting for free) knowing you need an offline friendly implementation or 2) one-click easy reselling of cloud instances and domains from inside the app (this also provides a funding model).

I know of many p2p options for solving these problems, but I think we dont need to complicate at that level. As for the quality of the communities themselves, a self-hosted megaphone instead of perverse share/like broadcast incentives of companies today will automatically improve discourse (at a potential cost of creating echo chambers).


> First, in my mind the difference between reddit, FB, twitter, HN, forums, etc is really just configuration.

The big boys are dopamine reinforced network effects incorporated by means of software tech. So forget about aiming for Homer Simpson.

You'll have to be happy with a small, but productive minority. Enough valuable people would rather die than use FB for niche purpose X. Start by convincing them...


A bit late to the party but I've been thinking about this too.

One of the main problems with platforms like FB and Reddit is that the posts/discussions are shortlived. They bubble up to the top in the feed when they're fresh and active but then die off and are replaced by the next thing that craves your attention.

Forum posts are sorted in chronological order, grouped into categories. Browsing through the feed you can see what topics are being actively discussed, or you can search for past discussions on a topic you like, and resurrect and old thread if you find a good one, and it gets a new life. I like this model.

One concept I've thought about is something like Reddit, where anyone can create and manage subs, but which wouldn't have the same kind of karma whoring and short attention span issues, i.e. posts/threads would live forever and not get locked after 6 months or whatever Reddit does, and posts would be sorted by activity rather than ADD points. I've found so many good X years old Reddit threads with super interesting discussions which I would've liked to jump in and resurrect but can't.

Of course an immediate problem that comes to mind is spam. If posts with new comments are lifted to the top it invites spam bumps and thus moderation. And/or it could be combined with some sort of karma system (reputation, account age etc).

And, since this wouldn't be a specialized forum you'd need to make sure you could cater to various different kinds of communities, i.e. have good multimedia sharing capabilities (for those communities only wanting to share images/videos/memes), code formatting and syntax highlighting, maybe LaTex etc...


I see a great blog one time where the author was writing a book in the first post. All following postings were dressed up change logs with many interesting and useful comments.

You don't want the notification factory but if its a single post that gets updated you don't get any feed updates at all. Reading the same book again and again looking for the updated sections is also not much fun. Dropping a comment in the long list of comments under it doesn't really create a discussion. (specially not without feed updates)

You could design the publishing tools so that it "forces" the user into that pattern. People could work on multiple books or long reads but start with a crappy draft or just a bunch of links.


I'm writing a crappy book, and was thinking of using Pijul (https://pijul.org/) to do exactly this! I'd like a better solution, though.


The people I want to talk to are on facebook groups (my hobbies seem to be "old people" hobbies). I think you're suffering from network effects.

Sooooo..... StackOverflow solved it by starting with a vertical the founders had a lot of social juice in, and spreading in to other verticals. Possibly also by focusing very tightly on "questions and answers".

So my suggestion is "overfocus". If the big platforms have a weakness, it's that they're generic one-size-fits-all solutions. Solve one problem (Q&A, show off my project, discussion, news aggregator) for one vertical really well, then expand. An example off the top of my head might be collaborative note-taking for a class. 6510's "write a book in public" platform is also a fantastic idea at first glance.

(But skip the distributed bit - customers don't care about your architecture, and whatever USP a distributed platform has can be emulated by a centralised platform. Centralised always wins).


Great point about starting with a specific vertical. Creator communities (youtubers etc) is an area I had been thinking to focus on, though this space is mostly dominated by Discord at the moment.

> Centralised always wins

My dream isn’t necessarily to win in a financial/monopolistic sense, but rather to build a compelling enough alternative to the centralised systems that have lost their way thanks to incentives that aren’t aligned with the community.

Facebook, reddit, disqus all started out with good intentions to connect people, but have been slowly eroded by incentives to suck user attention.

So it may not the best business strategy, but I think such software should live or die on whether the community enjoys using it and is willing to (financially) support the continued existence, rather than how much attention can be siphoned into ads.

In other words, small niche communities where a few members don’t mind contributing financially rather than huge communities that rely on network effects and centralisation.


I've thoguht about this for a couple of days, and I want you to understand I'm coming from a place of kindness - I want you to succeed.

Ok, so. The problem you're trying to solve is "build a community that is prepared to contribute financially to the running of the site" (correct me if I'm wrong).

Distributed is one possible solution to this problem. You're in love with that solution and its time to murder your darlings. Sit down, brainstorm five other possible solutions, and honestly assess which one solves your problem best.

I think you'll have a hard time beating a subscription model.


I worked in forums for 4 years. It is the worst most unprofitable business you can get into. The best you can do is show low quality ads or use crappy affiliate programs. The more ads used the less the users like the forum. Forum users love being anonymous. So you provide no value to advertisers that want age,sex,location. Advertisers also HATE forums due to their ad being shown on user generated content. Drama: There are always personal attacks on moderators, posting of illegal material, threats, police involvement and constant human curation. If you build centralized forum software every forum owner works day and night to use something cheaper and get away from your control. Don't even bother offering host your own software (non centralized). "OH i know React I can build a vBulletin clone." No. No you cannot. vBulletin has been working on this software for 20 years and they offer it for $300.


One idea is to use some more obscure/techie protocol like Gemini [1] to create a self-selecting group of bloggers that are drawn to and choose to participate in the community, and at the same time keep spammers, commercial interests, and other unwanted influences out.

[1] https://gemini.circumlunar.space/ Earlier discussion at: https://news.ycombinator.com/item?id=23042424


I'm very interested in this space too. I want to start a project that explores different ways to communicate on the web. The current state of Facebook and Twitter are not the best way in my opinion.


You may be interested in my open source forum:

https://github.com/ferg1e/peaches-n-stink

It's basically an experimental communication platform. Right now I am building Internet forum style communication but I want to expand to other communication mechanisms later.


I tried this, got quite far. Will go a little further when I have spare cycles for it.

Example: https://www.lfgss.com based on code you'll find on GitHub under microcosm-cc


Reddit is building something like this https://www.reddit.com/community-points/


From your summary I'm not sure I understand what exactly your project sets out to do. People are still able to run their own independent blogs, after all. Are you thinking of blogging federation of some sort?


I think Discord is then modern form of this. It’s really a great product.


Isn't Discord just chat (aka IRC)? Every time I try to get on Discord, it seems chaotic and confusing. Like chat, I guess it depends on who the users are at the moment you happen to use it. Forums are a much better approach to information sharing IMO.


Discord is chat in a sense. But community forums are also a form of chat. You shouldn’t confuse a technical implementation (e.g. Discord or IRC) with the end product (e.g. fostering community discussion).


Be sure to add cryptographic signatures on the postings and up/downvotes (also with signatures). Then many people can develop content ranking and blocking, and who knows someone may get it right


Did you try Diaspora?


I have posted this here before- hexafarms.com. I am trying to use ML to discover optimal phenotype for growing plants in vertical indoor farms to a. have the higest quality produce b. to lower the cost of producing leafy green/med plants, etc. within cities itself.

Basically, every leafy green (and herbs, and even mushrooms), can grow in a range of climatic condition (phenotype, roughly) ie temperature, humidity, water, CO2 level, pH, light (spectrum, duration and intensity) etc. As you might have seen around the world there is a rise in indoor vertical farms, but the truth is that 50% of those are not even profitable. My startup wants to discover the optimal parameters for each plant grown in our indoor vertical farm and eventually I would let our AI system control everything (something like alphaGo, but for growing plant X (lettuce, kale, chard, ). Think of it as reinforcement learning with live plants! I am betting on the fact that our startup will discover the 'plant recipes' and figure out the optimal parameters for the produce that we would grow. Then, the goal is that cities can grow food cheaper in more secure and sustainable way than our 'outsourced' approach in country side or far away lands.

So now I have secured some funding to be able to start working on optimizations, but I realized that *hardware* startups are such a different kind of beast (I am a good software product dev though, I think). Honestly, if anyone with experience in hardware related startups (or experience in the kind of venture I am in) would just want to meet me and advise me, I would take it any day. Being the star of the show, it's hard for me to handle market segmentation, tech dev, team, next round of funding, European tech landscape, etc. I am foreseeing so many ways that our decisions can kill my startup, all I need is advise from someone qualified/experienced enough. My email: david[at]hexafarms.com


Reminder to focus on nutritive content, flavour, and crop diversity, not just yield. The past 100 years of industrial scale agriculture, with the singular goal of maximizing yields, has done incredible harm. (This has come up on HN repeatedly, so I trust you've seen it, but it's worth championing)


> incredible harm

I agree that micronutrient content has decreased in the past century. Some might be because of scale, some might be that yield gains are mostly driven by macronutrients and water, not micronutrients, it could be selecting varieties that taste better, or it could be depleting the soil.

That said, the US has an obesity epidemic, so there's no shortage of macronutrients. Macronutrient shortages also seem rare. Scurvy and rickets aren't exactly problems.


This isn’t an answer to your ML question, but it is an answer to your problem.

I heard about a greenhouse company that has programmed their climate control to match “best growing conditions historical weather”. So, they ask local experts what year / location had the best X and then they use that region’s historical weather and replay it in their greenhouse. I thought that was brilliant!

(Just realized this was Kimbal Musk that mentioned this)


When I studied farming back in 1998-1999 we once visited a greenhouse and one interesting thing I picked up was that by observation some gardeners had realized that lowering the temperature a bit extra an hour or two before sunrise they could get their flowers to be more compact instead of stretching.

This had replaced shortening hormones in modern gardening (or at least at that greenhouse, but my understanding they were just doing the same thing as everyone else).

I guess there is a lot more to learn for those who have scale enough to experiment and patience to follow through.


Hmm,

Sounds similar to what I read a long time ago about a big tomato farm in the Netherlands... Have you tried talking to actual farmers of that produce? Universities? Agricultural faculties do a lot of research in that direction.

Expensive, quickly perishable produce might be able to compete, otherwise I guess free water and energy from above in the "remote" classical farming will be hard to beat.

And then my naive guess would be that to generate enough data for a "ml" approach not only by name might be somewhat expensive.

This sounds so negative, but this is not my intention... I wish you all the best and hopefully will stumble upon a success story in the future :-)


I know this isn't going to sound as sexy as AlphaGo for plants, but I really think this is a classic multilinear optimization problem once you've properly labeled the data and defined the dynamics between the plants / other organisms (e.g., aquaponics). You're looking to optimize multiple variables across a set of known constraints and I think if you properly defined these constraints you could save a lot of headache / buildout by leveraging a pre-exsting toolset like Excel with the Excel Solver add-in an a couple hundred user defined functions. We're talking 1% of the work to get something useable and product-market-fitable with automatic output of graphs, etc, that clients could tune and play with locally without you needing to actually share the source sauce. Eventually you could switch to Python for something more dynamic / web based.


Yeah, the description made me think "simulated annealing" not "AI". I mean, even genetic algorithms might be overkill here.


I'm not able to help, but you don't have any contact details listed against your profile or in this post. How is anyone able to contact you?

At the very least what's a link to your startup's website?


Sorry I thought my email was on my HN profile. I am sitting behind david[at]hexafarms.com


If you’ve listed it in the email field, that’s accessible to HN admins, but not users.

If you want users to have it from your profile, put it in the “about” field.


It sounds like an interesting project, good luck and I hope someone reaches out!


There's some great research on using evolutionary computation to explore plant growing recipes (light strength, how long to leave the lights on, etc). In one experiment, researchers discovered that basil doesn't need to sleep - it grows best with 24 hours of light per day. Risto Miikkulainen shared the experiment on Lex Fridman's podcast: https://youtu.be/CY_LEa9xQtg?t=27m7s I believe this is the paper describing that experiment: https://journals.plos.org/plosone/article?id=10.1371/journal...


This sort of ml problem is characterized by relatively expensive data labeling. Hence, hiring an expert or mixture of experts, and modeling the crop responses to their choices, will save you a lot of hill climbing The wrong part of the decision space


That sounds awesome. I’d love to work in this field. Any tips on where you learn this stuff? Currently a software dev in crypto.


I think you'd be better off using a Gaussian process than reinforcement learning


You need to sequence the plants otherwise you will waste too much time on tuning hyperparameters.


I'm not sure if this is in the spirit of the thread but I've been working on a way to allow reviews of gameplay in video games. In short, you upload a video of you playing the game and someone who's an expert can review it.

I currently have a UI with the comments down the side of the screen which looks like this:

https://www.volt.school/videos/c980297a-417b-416f-947b-58a70...

This is good because you can easily: - See all the comments - Navigate between them - See replies etc.

However it has a huge problem with you trying to balance watching the video with reading the comments.

I also have an alternative UI I've been working on which only shows one comment at a time:

https://www.volt.school/videos-v2/c980297a-417b-416f-947b-58...

However the downsides of this is that you can't see all the comments at once. I'm not a UI/UX designer AT ALL so I'd really appreciate some pointers around how to think about making this better! The original post mentions "close to solving", I think I am pretty close but it's still not quite right and while I'm not out of ideas yet, I'd appreciate feedback if solving this is obvious to someone else.


How about showing the comments directly on the video, at a specific time, and a specific place.

Something like Soundcloud comments but for video.

Asian video platforms used to do that. Here's an example: https://www.youtube.com/watch?v=hOMMQmYwd4I

It's totally crazy but can be made much more coherent. Would be useful to have comments at a specific time and place on the screen just for very accurate pointers/comments.


So maybe the problem with showing all the comments at once is that there are too many, and when showing one at a time they are not shown for long enough.

How about breaking the play into chapters/zones/rooms/segments (whatever makes sense for the game) then showing all the comments for that segment. Once the segment ends, there would be a replay button if they missed anything on the first play while reading comments, and a next segment button to carry on.

Interesting time spans could be marked for slow motion, boring bits played at double time.

There would be high level navigation between segments with thumbnails and comment counts. Buttons to skip between “pivotal moments”, maybe with voting to highlight them.


The default should be to show one comment at a time, because that's convenient and quick to get into, but also with an option (maybe just a scroll down) to view all comments. One, that helps the reviewer get an overall idea of what kind of things the submitter is looking for, if they want that, and two, some submitters are inevitably gonna screw up, posting at the wrong times or asking overall summary questions that should be asked at the end right at the beginning or somesuch. So an All Questions button or similar should be there as an escape hatch, but not the primary UI.


I find that my normal model for reading comments on videos across platforms is to not read them much of the time, but if it's a really interesting video go look at the comments and it's ok if the video is for example, fully minimized or off screen etc while I read them.

I don't know how normal my use is though or if that's at all helpful.


Don't have anything to add right now but I like the idea of this thread and would support it becoming a regular thing.


We are having atrocious READ/WRITE latency with our PG database (api layer is django rest framework). The table that is the issue consists of multiple JSON BLOB fields, with quite a bit of data— I am convinced these need to be abstracted to their own relational tables. Is this a sound solution? I believe it is the deserialization in these fields of large nested JSON BLOBS that is causing latency. Note: this database architecture was created by a contractor. There is no indexing or relations existing in current schema. Just a single “Videos” table with all metadata stored as Postgres JSON field type blobs. EDIT: rebuilding the schema from the ground up with 5-6GB of data in the production database (not much, but still at the production level) is a hard sell, but I think it is necessary as we will be scaling enormously very soon. When I say rebuild, I mean a proper relational table layout with indexing, fk’s, etc.

EDIT2: to further comment on current table architecture, we have 3-4 other tables with minimal fields (3-4 Boolean/Char fields) that are relationally linked back the Videos table with a char field ‘video_id’, that is unique on the Videos table. Again, not a proper foreign key so no indexing.


Are you just doing primary key lookups? If so, a new index won’t do much as Postgres already has you covered there.

If you have any foreign key columns, add indexes on them. And if you’re doing any joins, make sure the criteria have indexes.

Similarly, if you’re filtering on any of the nested JSON fields, index them directly.

This alone may be sufficient for your perf problems.

If it isn’t, then here’s some tips for the blobs.

The JSON blobs are likely already being stored in TOAST storage, so moving them to a new table might help (e.g. if you’re blindly selecting all the columns on the table) but won’t do much if you actually need to return the JSON with every query.

If you don’t need to index into the JSON, I’d consider storing them in a blob store (like S3). There are trade offs here, such as your API layer will need to read from multiple data sources, but you’ll get some nice scaling benefits here and your DB will just need to store a reference to the blob.

If your JSON blobs have a schema that you control, deprecate the blobs and break them out into explicit tables with explicit types and a proper normalized schema. Once you’ve got a properly normalized schema, you can opt-in to denormalization as needed (leveraging triggers to invalidate and update them, if needed), but I’m betting you won’t need to do any denorm’ing if you have the correct indexes here.

And since you have an API layer, ideally you’ve also already considered a caching layer in front of your DB calls, if you don’t have one yet.


This is super interesting stuff.

First of all, I think the caching layer (which we currently don’t have) is going to be a necessity in the coming weeks as we scale for an additional project (that will be relying on this architecture)

Second of all, it is just PK lookups. We don’t actually have a single fk (contractor did not set up any relations), which makes me think moving all of this replicated JSON data from fields to tables may help.

The queries that are currently causing issues are not filtering out any data but returning entire records. In ORM terms, it is Video.objects.all(), and from a URL param in our GET to the api, limiting the amount of entries returned. What’s interesting is this latency scales linearly, and at the point we ask for ~50 records we hit the maximum raw memory alloc for PG (1GB) causing the entire app to crash.

The solution you propose for s3 blob store is enormously fascinating. The one thing I’d mention is these JSON fields on the Video table have a defined schema that is replicated for each Video record (this is video/sensor metadata, including stuff like gps coords, temperature, and a lot more).

So retrieving a Video record will retrieve those JSON fields, but not just the values: the entire nested BLOB. And does so for each and every record if we are fetching >1

Would defining this schema with something like Marshmallow/JSON-Schema be a good idea when you mention JSON schemas we control? As well as explicitly migrating those JSON fields to their own tables, replaced with an FK on the Video table?


I do want to emphasize that the S3 approach has a lot of trade offs worth considering. There is something really nice about having all of your data in one place (transactions, backups, indexing, etc... all become trivial), and you lose that with the S3 approach. BUT in a lot of cases, splitting out blobs is fine. Just treat them as immutable, and write them to S3 first before committing your DB transaction to help ensure consistency.

Regarding JSON schema, if you have a Marshmallow schema or similar, yes that’s a wonderful starting point. This should map pretty closely to your DB schema (but may not be 1-to-1, as not every field in your DB will be needed in your API).

I’d suggest avoiding storing JSON at all in the DB unless you’re storing JSON that you don’t control.

For example, if the JSON you’re storing today has a nested object of GPS coords, temperature, etc.. make that an explicit table (or tables) as needed. The benefits are many: indexing the data becomes easier, the data is stored more efficiently, the table will take up less storage, the columns are validated for you, you can choose to return a subset of the data, etc… You will not regret it.


Unrelated to post, but as you seem well informed in the field, would you agree that if a schema is not likely to change and is controlled as you put it, there is no reason to attempt to store that data as denormalized document?

Or at least as you suggest if required for performance the data would still be stored denormalized and where needed materialized / document-ized?

At my current company, there seems to be a belief that everything should be moved to mongo / cosmo (as document store) for performance reasons and moved away from sql sever. But really I think the issue is the code is using an in house orm that requires code generation for schema changes and probably less than ideal performance query generation.

But then I am also aware of the ease of horizontal scaling with the more nosql orientated products, and trying to be aware of my bias as someone who did not write the original code base.


> would you agree that if a schema is not likely to change and is controlled as you put it, there is no reason to attempt to store that data as denormalized document

As a general rule of thumb, yes. Starting with denormalization often opens you up to all sorts of data consistency issues and data anomalies.

I like how the first sentence of the Wikipedia page on denormalization frames it (https://en.wikipedia.org/wiki/Denormalization):

> Denormalization is a strategy used on a previously-normalized database to increase performance.

The nice thing about starting with a normalized schema and then materializing denormalized views from it is that you always have a reliable source of truth to fall back on (and you'll appreciate that, on a long enough timeline).

You also tend to get better data validation, reference consistency, type checking, and data compactness with a lot less effort. That is, it comes built into the DB rather than introducing some additional framework or serialization library into your application layer.

I guess it's worth noting that denormalized data and document-oriented data aren't strictly the same, but they tend to be used in similar contexts with similar patterns and trade-offs (you could, however, have normalized data stored as documents).

Typically I suggest you start by caching your API responses. Possibly breaking up one API response into multiple cache entries, along what would be document boundaries. Denormalized documents are, in a certain lens, basically cache entries with an infinite TTL... so it's good to just start by thinking of it as a cache. And if you give them a TTL, then at least when you get inconsistencies, or need to make a massive migration, you just have to wait a little bit and the data corrects itself for "free".

Also, there are really great horizontally scalable caching solutions out there and they have very simple interfaces.


Thanks for your response. The comparison between infinite ttl cache entries and a denormalized doc is an insight I can't say I've had before and makes intuitive sense


Doesn't postgress have a way to index JSONB if needed?


You can index on fields in JSONB, but I don’t believe that’s what the op is solving for here.

In either scenario, I’d still generally encourage avoiding storing JSON(B) unless there isn’t a better alternative. There are a lot of maintenance, size, I/O, and validation disadvantages to using JSON in the DB.


imo, json dt should be an intermidiary step in db struc in rdb, never the final. Once you know & have stable columes, unravel the json into proper cols with indexing, it should improve the situation

if youre having issues with 5gb, you will face exponential problems when it grows due to lack of indexing


Cheers for the response (and affirmation). After some latency profiling I am convinced proper cols with indexing will vastly improve our situation since the queries themselves are very simple.


Depending on how much of the data in your json payload is required, extract data into their own table/cols. And store the full payload in a file system/cloud storage.


Also there's a way to profile which queries take longest via DB itself and then just run EXPLAIN ANALYZE to figure what's wrong


You can take incremental approach and p.o.c with the data that you have so you can justify your move too!


I don't think the latency issues are necessarily related to the poor schema. I'd say to dig into the query planning for your current queries and figure out what's actually slow, since it may not be what you expect.

Rearchitecting the schema might be worth doing. From the technical side, PG is pretty nice about doing transactional schema changes. I'd be more worried about the data though. Are you sure that every single row's Json columns have the keys and value types that you expect? Usually in this type of database, some records will be weird in unexpected ways. You'll need to find and account for them all before you can migrate over to a stricter schema. And do any of them have extra unexpected data?


I had to move a MongoDB to PG database on my new job (old contractor created MVP, I was hired to be the "CTO" of this new startup) and I had some problems at first, but after I created the related models and added indexes, everything worked fine.

As someone said, indexes are the best to do lookups. Remember your DB engine do lookups internally even if you are not aware (joins, for example), so add indexes to join fields.

Another thing that worked me for me (and I dont know if it's your case), was to add trigram text indexes, which make it faster to do a full text search. Remember, anyway, that adding a index makes search faster, but insert slower, so be careful if you are inserting a lot of data.


Other tips:

- Change the field type from JSON to JSONB (better storage and the rest) https://www.postgresql.org/docs/13/datatype-json.html

-Learn the json in-build functions and see if one of them can replace one you made ad-hoc

- Seriously, replace json with normal tables for the most common stuff. That alone will speed up things massively. Maybe keep the old json around in case, but remove when it become old(?)

- Use views. Views allow to abstract over your databaee and allow to change internals

- If a big things is searching and that searching is nkind of complex/flexible, add FTS with proper indexing to your json then use it as first filter layer:

    SELECT .. FROM table WHERE id IN (SELECT .. FROM search_table WHERE FTS_query) AND ...other filters
This speeeeeedupppp beautifully! (I get sub-second queries!)

- If your query do heavy calculations and your query planner show it, consider move them into a trigger and write the solved result into a table, and query that table instead. I need to loans calculations that requiere sub-second answers and this is how I solve it.

And for your query planner investigation, this handy tool is great:

https://tatiyants.com/pev/#/plans


Questions first:

1. What are the CRUD patterns to the "blobby data". 2. What are the read patterns and how much data needs to be read.

Until Read/Write are properly understood the following solutions should be considered as general guide lines only.

If staying in PG: JSON can be indexed in Postgres You could also support a hybrid JSON/Relational model giving the best of worlds.

Read:

Create views into the JSON schema that model your READ access patterns and expose them as IMMUTABLE Relational entities. (Clearly they should be as light weight as possible)

Modify:

You can split the JSON blobs into their own skinny tables. This should keep your current semantics and facilitate faster targeted updating.

Big blobby resources such as video/audio should be managed as resources and not junk up your DB

Warning:

Abstracting the model into multiple tables may cause its own issues depending on how you ORM map your entities.

Outside the Box Thinking:

-Extract and transform the data for optimized reading. -Move to MongoDB or a Key Value store

Conclusion:

What are the update patterns? -is only 1 field being updated -inter-dependencies of the data being updated How are "update anomalies" minimized

You will need to create a migration strategy to a more optimal solution and would do well to start abstracting with views. As the data model is improved this will be a continuous process and the data model can be "optimized" without disturbing the core infrastructure requiring rewrites.


I had this issue at a previous job where we would query an API (AWS actually) and store the entire response payload. As we started out we would query into the JSONB fields using the JSON operators, however at some point we started to run into performance issues and ended up "lifting" the data we cared about to columns on the same record that stored the JSON.


Bit hard to tell without some idea of the structure of the data, but my experience has been storing blobs in the database is only a good idea if those objects are completely self contained i.e. entire files.

If you write a small program to check the integrity of your blobs i.e. that the structure of the json didn't change over time, you may be able to infer a relational table schema that isolates those bits that really need to be blobs. Too leave it too long invites long term compatibility issues if somebody changes the structure of your json objects.


I think your heart shouldn't quail at the thought of re-schemaing 5-6GB! I'm going to claim that the actual migration will be very quick.


This is an affirmation I’ve been longing to here, lol!

I’ve already done the legwork, cloning to the current prod DB locally and playing around with migrations, but the fear of applying anything potentially-production breaking is scary to a dev who has never had to work on a “critical” production system!


I would recommend setting up a staging app with a copy of the production database, testing a migration script there, then running the same script on production once you're confident.


Large blobs are not the use case of relational databases - this is the starting point for any such discussion. I have 2 current projects where I am convincing the app builders (external companies, industry-wide used apps) to change this, keep relational data in the database and take out the blobs, so far is going better than expected.


I don't know about PG, but with MariaDB, a nice way to find bottlenecks is to run SHOW FULL PROCESSLIST in a loop and log the output. So you see which queries are actually taking up the most time on the production server.

If you post those queries here, we can probably give tips on how to improve the situation.


Interesting. I believe I noted a similar function in the Postgres docs I was scouring through Friday. I’ll give it a look and see what I can find.

Tangentially related for those who have experience, I am using Django-silk for latency profiling.


also, never trust orms. they make it easier to query but they do not use/output the most optimized quries


Examine the slow queries with the query planner, don’t spend a bunch of time re-architecting on a hunch until you know for sure why it’s slow!

An hour with the query planner can save you days or weeks or wasted work!


This may already be solved, but one of the last pieces remaining in my quest to be Google-free is an interoperable way to sync map bookmarks (and routes, etc) between different open source mapping apps. I can manually import/export kmz files from Organic Maps and OsmAnd, and store them in a directory synced between different devices with Nextcloud, but there's no automatic way to keep them updated in the apps, and so far I haven't found a great desktop app for managing them either. The holy grail would be to also have them sync in the background to my Garmin Fenix, but I am not aware of a way to sync POIs to a Garmin watch in the background.

Related: I'd love to have an Android app with a shortcut that allows me to quickly translate Google Maps links into coordinates, OSM links or other map links. There is a browser extension that does this on desktop, so if anyone is looking for a low hanging fruit idea for an Android app, this might be a fun idea (if I don't get around to it first).


Have you documented your replacements for various Google technologies somewhere? I'm particularly interested in a good calendar.


I'm using Nextcloud to host my calendar. On my work Mac, I connect to it using Fantastical. On my personal Ubuntu machine I use GNOME Calendar, and on Android I use https://github.com/Etar-Group/Etar-Calendar

Everything is seamless for me, though admittedly I'm not a super heavy calendar user.

I plan to do a write up on my whole Google-free setup, but I haven't done it yet, unfortunately.


Thanks! Do you self-host Nextcloud or did you get an account at one of the providers?


I got a VPS and installed Nextcloud with Docker. I would self-host on my own server, but I'm too nomadic for that at the moment. I think the /e/ foundation has a decent managed Nextcloud setup.


I built a community that aims to keep FOSS projects alive. It's meant to solve the kitchen and egg problem by having as many people and projects sign up, and then any developer who was interested could just automatically get commit permissions to any project.

It's called Code Shelter:

https://www.codeshelter.co/

It's stalled for a while, so I don't know how viable it is, but I'd appreciate any help.


One thing you could try to solve is coordinated revival of abandoned projects - i.e. extending your model to support unsolicited takeover of projects, in the case of a maintainer's having walked away.

For example, I use a javascript library that's best in class for what it does, and yet hasn't had any real commits from its maintainer since 2016. There are 50 pull requests open, some of which fix significant bugs, or add good new features. There are literally 2000 forks of the library, some of which are published on npm but are themselves unmaintained, and almost none of which link back to the actual fork's code from npm. It's a mess, and I bet it's a situation repeated hundreds of times over.

If you were to figure out a workflow by which a maintenance team could form on your platform, and then a) the existing maintainer is pinged to request that they add the team, falling back to b) making it easy for the new team to fork and adopt existing pull requests while supporting them through initial team-forming by laying out a workflow for assigning needed-roles, then I think you'd have a valuable platform.

The key thing is ensuring there's a large enough team to start, so that yet another fork doesn't die on the vine, so maybe think about a(n old) reddit link type interface where people can link to, vote on, and volunteer for projects, with no work needed until there's critical mass and the platform moves the project forward.


Hmm, that's an interesting idea, thanks. Given that finding one maintainer is already hard, though, I think finding a team would be almost impossible...


With the voting mechanism, you wouldn't necessarily need to form a team all at once. Maybe it takes 6 months for enough people to click the "I'd participate" button on a popular project. Granted, half of them might drop out when the project graduates...but if you can try to stake the ground of "the place to suggest and coordinate forks" then at least people who were interested might find it over time.


Oh hmm, I see how you mean, that's interesting... I'll think about that, thanks!


> Given the high level of trust users and project owners are putting in us, we need our maintainers to already have demonstrated their trustworthiness in the community. As such, we'd like to see any popular project you are an owner/maintainer of, as it would make it easier for us to accept you.

I certainly understand the rationale but doesn't this narrow down the universe of possible maintainers while putting even more load on existing maintainers by expecting them to take on more work?


> kitchen and egg problem

Hadn't heard that malapropism before.


Oof, must have been hungry when I wrote that.


Json diffing.

I haven't found any implementations I'd consider good. The problem as I see it is that there are tree based algorithms like https://webspace.science.uu.nl/~swier004/publications/2019-i... and array based algorithms like, well, text diffing, but nothing good that does both. The tree based approach struggles because there's no obvious way to choose the "good" tree encoding of an array.

I've currently settled on flattening into an array, containing things like String(...) or ArrayStart, and using an array based diffing algorithm on those, but it seems like one could do better.


At the risk of not being helpful: I have some json files that are updated weekly that I keep under source control in git. The week-to-week updates are often fairly simple, but git was showing some crazy diffs that I knew were way more complicated than the update. I soon realized that the data provider was not consistently sorting the json arrays; when I began sorting the json arrays by rowid everytime before writing it, the diff's were as straightforward as expected. I think I don't understand the problem you're encountering, because this solution seems too obvious.


The problem is in the syntax of JSON itself. Use JSONL instead


I want to improve parts of online professional networking, specifically to be more about self-mentoring/shared learning, as opposed to sales connections.

This is ever more important with the onset of remote hiring, remote work, and the isolation/depersonalization it brings to newcomers to the industry.

There's also an "evil" momentum in remote hiring -- some companies _need_ asynchronous interviews to support their scaling and operations, and the general perception is that it's impersonal and dehumanizing.

This made me think that if we preemptively answered interview questions publicly, then it'd empower the job seekers to have a better profile/fight back a dehumanizing step, while allowing non-job-seekers to share the lessons that were important to them.

I've been getting decent feedback on my attempt at the solution HumblePage https://humble.page, the reality is that there's a mental hurdle to putting your honest thoughts out there.


This is a nice idea, to talk about and get thinking about these "soft" questions that people often struggle with.

One feedback about the homepage: show a few examples of how people have answered questions, below the prompt. That's more helpful to get us thinking about our own answers, compared to a blank field. (Also, it's not clear what the percentages are meant to represent there. And I'm guessing the number next to the edit icon shows how many people have answered the question already? May need some UI tweaks on these.)


Your guess is close! It is the total numbers, and the green/blue ratio indicate how many people answered the prompt publicly vs privately.

My intention was to show the general comfort level of answering the prompt in public. Looking back, I wonder if I was being too quirky.

I’m thinking the same on needing UI tweaks, I’m planning for major rearrangements.

Thank you for your interest. Please feel free to reach out via the contacts if you’d like an invite.


Economically sustainable and ethical monetization of user-generated-content games.

The closest most known example of this kind of game nowadays is Roblox, but I'm thinking of things more like Mario Maker or the older-generation Atmosphir/GameGlobe-likes.

Unlike "modding platforms" or simulators/sandboxes/platforms such as Flight Simulator, VRChat or ARMA, these games' content are created by regular players with no technical skill, which means the game needs to provide both the raw assets from which to build the content, as well as the tool to build that content.

Previous titles tried the premium model (Mario Maker), user-created microtransactions (Roblox) and plain old freemium (Atmosphir and GameGlobe).

I suspect Mario Maker only works because of the immense weight and presence of the franchise.

Roblox's user-created microtransactions (in addition to first-party ones) seem to be working, but they generate strange incentives for creators, which I personally feel taints a lot of the games within it. (The user-generated content basically tends to become the equivalent of shovelware)

GameGlobe failed miserably by applying the microtransaction model to creator assets, which means that to make cool content, creators have to pay as well as spend lots of their time actually building the thing, which means most levels actually published end up being the same default bunch of free assets and limited mechanics.

Atmosphir is a bit closer to me so I find some more nuance in its demise, but long story short, essentially they restricted microtransactions only to player customization, however it didn't seem to be enough to cover the cost of developing the whole game/toolset. Eventually adding microtransactions to unlock some player mechanics, which meant that some levels were not functional without owning a specific item.

---

In short, the only thing that can effectively monetize on is the game itself (premium model) or purely-cosmetic content for players. Therefore, to incentivize the cosmetics, the game needs to be primarily multiplayer, which implies lots more investment on the creator tooling UX, as well as the infrastructure itself. But this also restricts the possibilities for the base game somewhat.


My favorite Microtransaction systems are in Counter-Strike Global Offensive and Planetside 2.

Planetside 2 has very slight pay to win mechanics in the form of subscriptions for more xp, but it doesn't feel bad to play without pay.

Counter-strike on the other hand (and I think so other valve games too) have just about the perfect model in my mind. There are no advantages you get by paying, only status. The skins that you can buy and sell look cool but are purely cosmetic.

Even with this in mind people spend quite a lot of money (we're talking hundreds of dollars for one gun skin in some cases). It always seemed like a great way to generate revenue ethically.

One thing I will note with that model is they still have the gambling mechanics with the "crates" that open random skins. You could probably crank it up one more ethical notch by getting rid of those or trying to make them less addictive.


That is indeed my current conclusion of the least-ethically-bad viable monetization. Purely cosmetic.

However, paid status is not really a big hook for single-player games, you need to be able to show it off! This means that the game must be designed primarily around multiplayer interaction, which is fine but limits a lot the kinds of games you can implement with this monetization.


A different but similar in topic problem is running and playing tabletop roleplaying games like Dungeons and Dragons.

The solution is a general-purpose distributed computing platform designed for end-user development.

The closest three things that exist are Google Sheets, replit.com and dndbeyond.com. Replit is too low level, dndbeyond is not powerful enough, sheets are stuck with grid and too clunky for everything else.

Here's a few things the user should be able to do:

1. Design a tabletop roleplaying game system from scratch and automate all the math

2. Write content designed to be used with a system

3. Use systems and content designed by other people, without copy-pasting

4. Modify the system and the content designed by other people for your own purposes

5. Share access to the content in a granular way

Tabletop roleplaying games are unique: they thrive on content that must be created and shared quickly, but includes simple but fully general programming capabilities. Seems like a great place to start making programming as commonplace as literacy.


I'm surprised you didn't mention Core.

https://www.coregames.com/


Yeah the post was getting a bit long, there are a bunch of similar games to the ones listed, I just used one of each to exemplify each strategy.

Core is very similar to Roblox in that the creation tools are rather involved, it tends more to a platform with distinct creator/consumer roles.

There's also the PS4 game Dreams, as well as other integrated-modding initiatives like in Krunker.


It's a Roblox copy, and as such is already mentioned.


These are statistics/math problems that 2 medical professionals I'm seeing are working on, not my own work. But they got me curious. FWIW I worked in "data science" as a software engineer for many years, and did some machine learning, so I have some adjacent experience, but I could use help conceptualizing the problems.

Does anyone know of any books or surveys about statistics and medicine, or specifically mechanics of the human body?

- One therapist is taking measurements of say your arm motion and making inferences about the motion of other muscles. He does is very intuitively but wants to encode the knowledge in software.

- The other one has an oral appliance that has a few parameters that need to be adjusted for different geometries of the mouth and airway.

The problems aren't that well posed which is why I'm looking for pointers to related materials, rather than specific solutions (although any ideas are welcome). I appreciate replies here, and my e-mail is in my profile. I asked a former colleague with a Ph.D. and biostats and he didn't really know. Although I guess biostats is often specifically related to genetics? Or epidemiology?

I guess the other thing this makes me think of is software for Invisalign or Lasik, etc. In fact I remember a former co-worker 10 years ago had actually worked on Lasik control software. If anyone has pointers to knowledge in these areas I'm interested.


> One therapist is taking measurements of say your arm motion and making inferences about the motion of other muscles.

This seems like a sequential Bayesian filtering problem. Probably high enough dimension that you should just use a particle filter. The big seminal background text in this area is Bishop: Pattern Recognition and Machine Learning.

If the "motion of other muscles" is inferring pose, you could also look into what computer graphics calls inverse kinematics (a typical IK model has a number of dimensions that could fit into a particle filter). There's some more in-depth stuff in motion planning that actually takes into account muscle capability. But I wouldn't know where to find info on that, short of watching the last several years of Siggraph Technical Papers Trailers, grabbing all the motion planning ones, then reading everything they cite.


Thanks, I will follow these references.

I've heard of inverse kinematics but I think it's more focused on "modeling" than statistics/probability? That is, you would have to model each muscle?

I think he is doing something that is more "invariant" across human variation? (strength, body dimensions, age, etc.) I'm not sure which is why my question was vague, but this is helpful.


Yeah, IK is about pose and motion modeling. But you can put any state+motion model inside sequential Bayes, and get the probability that the model is in a particular configuration out.

Hard to know whether that's relevant without knowing what he's trying to predict though.


My research specialty is in orthopedic biomechanics. For the arm motion thing, it sounds like you might want inverse kinematics or inverse dynamics. Take a look at OpenSim: https://simtk.org/projects/opensim

For the oral appliance adjustment, I'm not sure what your output measures of interest are. If they're mechanical maybe you want to do a sensitivity analysis using FEA. Maybe look at FEBio: https://febio.org/

As for books or surveys, biomechanics is huge topic so I'm not sure what to recommend without wasting your time. If you're still defining the problem, maybe run some searches on Pubmed with the "review" and "free full text" boxes checked, and browse the results until you find which sub-sub-topic is relevant to you?

https://pubmed.ncbi.nlm.nih.gov/?term=biomechanics&filter=si...

If no one on the team knows statics, dynamics, and (if you're considering internal strain and stress) continuum mechanics, consider finding a mechanical engineer to help.


Thank you for the references, I will follow these!

I think the basic idea is that when you're doing physical therapy that targets certain muscles, you have to find the muscle(s) that are limiting the motion! This is not obvious because they all interact.

Like if you have a back problem, you can try to exercise your back all you want, and that may not actually fix the problem. Because the real issue could be with your leg, which causes 16 hours a day of "bad" motion against your pelvis, which in turn messes up your back.

All the muscles in the body are interlinked and they often compensate for each other. When people have a problem in one area, they compensate in other ways.

So I have the same question as above: I think inverse kinematics is more about "modeling"? You would need to model every muscle, which is hard, and it is specific to a person?

I think his intuition is partly based on a mental model, but it's also probablistic. I think the model has to capture the things that are "invariant" across humans (i.e. basic knowledge of anatomy), and the variation between humans is the probablistic part. It's also based on variation in your personal health history / observed behavior, e.g. how you walk, how often you're sitting at a computer, etc.

So it does feel like an "inference" problem in that sense -- many factors/observations that result in multiple weighted guesses of the cause / effective therapies.


Inverse kinematics is about reconstructing body motion from position marker data, not really about modeling. For example, glue some tennis balls to a person's arms and legs, track their position from video of the person walking around, and use inverse kinematics to reconstruct their joint angles (their skeletal pose) across time. It's also possible to do this with marker-free methods.

Inverse dynamics takes the kinematics data from above and, in combination with ground reaction forces measured from a force plate (or instrumented footwear, etc.), calculates the forces and moments on each joint. Since control of the human musculoskeletal system is over-determined (the same motion, forces, and moments can be produced by multiple muscle activation patterns), EEG data or even ultrasound elastography is sometimes used to better constrain estimates of muscle activation patterns.

In your example the usual approach would be to use (elements of) the above methods to find out if a patient had unusual motion patterns, like the suspected abnormal leg motion in your back pain patient. Statistics comes into play once you have population data to classify as "good" or "bad", and when you're trying to determine if the hypothesized relationships between symptoms and particular motion / muscle activation patterns genuinely exist. Of course, it's fine to try different approaches (but don't forget to obtain IRB review and comply with the various regulations on human subjects research).


I can't help you conceptualize these specific problems but having worked on similar problems in the past I'd advise you to look into ordinary differential equations applied to those systems. They're used a lot for modelling in medical science and even if you're not interested in the dynamics of it they might lead you to the relevant literature for your problems and will address the parameters you're interested in and how they relate to each others.


It sounds vaguely related to "system identification"?


I am blocked on finding a good (defined below) way to determine whether a product description A and product description B refer to the same product.

Imagine that a product description is a n-dimensional vector like:

  ( manufacturerName, modelName, width, height, length, color, ...)
Now imagine you have a file with m such vectors (where m is in millions), and that not all fields in the vectors are reliable info (typos, missing info, plain wrong, etc).

What is a good way to determine which product descriptions refer to the same product.

Is this even a good approach? What is state of the art? Are there simpler ways?

Here is what I mean by good:

  - robust to typos, missing info, wrong info
  - efficient since both m and n are large
  - updateable (e.g. if classification was done, and 10k new descriptins are added, how to efficiently update and avoid full recomputation)


I have worked on this problem many times, at many companies. I am working on it again, actually. Usually some combination of scoring and persisting results in CSVs for human review.

(edit: I am at a desktop now and I can say a bit more)

Here is the process in a nutshell:

1. Create a fast hashing algorithm to find rows that might be dups. It needs to be fast because you have lots of rows. This is where SimHash, MinHash, etc. come into play. I've had good luck using simhash(name) and persisting it. Unfortunately you need to measure the hamming distance between simhashes to calculate a similarity score. This can be slow depending on your approach.

2. Create a slower scoring algorithm that measures the similarity between two rows. Think about a weighted average of diffs, where you pick the weights based on your intuition about the fields. In your case you have handy discrete fields, so this won't be too hard. The hardest field is name. Start with something simple and improve it over time. Blank fields can be scored as 0.5, meaning "unknown". Hashing photos can help here too.

3. Use (1) to find things that might be dups, then score them with (2). Dump your potential dups to a CSV for human review. As another poster indicated, I've found human review to be essential. It's easy for a human to see that "Super Mario 2" and "Super Mario 3" are very different.

4. Parse your CSV to resolve the dups as you see fit.

Have fun!


With regards to 1, I wonder: why would calculating the Hamming distance be slow? In python you can easily do it like this:

    hamming_dist = bin(a^b).count("1")
It relies on a string operations, but takes ~1 microsecond on an old i5 7200u to compare 32bit numbers. In python 3.10 we'll get int.bit_count() to get the same result without having to do these kind of things (and a ~6x speedup on the operation, but I suspect the XOR and integer handling of python might already be a large part of the running time for this calculation).

If you need to go faster, you can basically pull hamming distance with just two assembly instructions: XOR and POPCNT. I haven't gone so low level for a long time, but you should be able to get into the nanosecond speed range using those.


What's your cost matrix? How much does a false positive hurt? False negative?

I built a commercial system like that for Thermo Fisher, except their descriptions were encoded as natural language text on input, not vectors (for an extra complication).

Some observations:

1. Crude methods based on vector embeddings, cosine similarity, Levenshtein, etc – don't work, if you care at all about false positives.

I see sibling comments recommend this, but it's clear this cannot work if you think about it. Values like "black" and "white", or "I" and "II" (part numbers), "with" and "without", are typically close together in such crude representations, but may lead to products that are not interchangeable.

2. A hybrid approach worked. The SW produced suggestions for which products might be duplicates (along with a soft confidence score), then let a human domain expert accept / reject these suggestions. It also learned from these expert decisions as it went, to save human time.

What I quickly learned is that even as a human (programmer with a PhD in ML), I could not look at two product descriptions and make the decision myself. Are these the same product or not? One word, even one letter, could be absolutely vital. Or absolutely irrelevant. Sometimes even the same attribute / word, depending on the product category.

Hence the final interactive solution with a domain expert in the middle. It worked well and saved time, rather clever, but not in the "hooray NN training" way. A lot of work went into normalizing the surface features intelligently based on context: units, hyphens / tokenization, typos…, because that's a mess in product sheets. The "fancy" downstream ML and clustering part was relatively simple by comparison.

But YMMV, the Thermo Fisher products were fairly specialized and sophisticated (in their millions).


Usually, I do this sort of thing somewhat manually, building up an algorithm (mostly classical, with a little ML as a treat) that can deal with the problem.

I'd start by detecting common typos. Typos are similar to un-typo'd data, so I'd do a frequency analysis on the textual representations of manufacturer name and model name, and a Levenshtein distance calculation, then synonymise the obvious synonyms (looking things up when I wasn't sure). The key idea is that you have access to more information than just this dataset: Tony and Tomy are different manufacturers, but Sony and Somy aren't (even though somy is in the dictionary and tomy isn't).

Once the manufacturer and model fields are mostly typo-free (after typo replacement – don't modify the original file, if you can help it!), you can start looking at dimensions and colour. Sort by manufacturer, and start de-duping entries. Once you get a feel for the process you're doing (e.g. under what circumstances do you check whether there's a 102mm Phillips screwthread?), you can start automating bits of it. There will always be special-cases, but your job is to get the data processed correctly, not to get the computer to process the data.

Accidentally aliasing two different products is much worse than leaving the same product described twice, so err on the side of “these are different”. (Keep in mind that manufacturers of some things, e.g. SD cards, often pretend two different products are the same – so you can't always win!) Remember, humans exist: bothering them a few million times is a problem, but bothering them a few hundred would be okay.

When new data comes in, I'd run all the code I used to come up with my system, and see if the output was notably different. If it was, I'd get the computer to let me know.

I'd also add some way for users to flag duplicates. Many humans make light work.


You could generate word embeddings for all natural language text fields and then do cosine similarity?


Use a Minhash-LSH ensemble with pre-processing on the words to fix typos via Levenshtein. Tune parameters to get the best distance


Definitely some clustering method based on similarity of the vectors (there are many, pick a simple one to start)


I want to make technical recruiters better at their job.

Many sourcers and recruiters don't have a technical background and find it very difficult to hire software engineers, especially in the current labor market which is very tight.

I'm starting off simple: writing recruiting guides from a software engineer's perspective that are easy to understand.

Are there other ways we can make technical recruiters better?


> Are there other ways we can make technical recruiters better?

- list salary range for positions

- emphasize tech stack

- emphasize number of rounds

I’ve wasted my time in the past going through the interview process only to find out the company’s budget for the position was only up to X after getting the offer; a valuable lesson I’ve learned to avoid since then of course.

I also see some recruiters only talking about what the business does … leaving out the tech stack.

If these points are clear and easily visible to recruiting leads they might get higher quality candidates.

Just my two cents. ¯\_(ツ)_/¯


I’ve also been thinking about this problem space. My approach is to help candidates build skills, demonstrate proficiency, in a loop with the recruiters.

Basically, take the whole “how I learned my data science skills” into something that can be done in public.

The recruiters can then see a wide range of examples, and can be better at picking up where people’s strengths and talents are.

(This is focused on analytics)


Frustrated by the degree of manual programming process in production metal machining. The industry exists largely on inertia. I would like to resolve this by applying standard optimization algorithms to a set of known machining strategies plus machine, work-holding, material, part and tool inputs. Have already analyzed the problem space to some extent and will be touring a huge production facility next week to better understand best-in-class processes from large established players. Need someone to either wrap existing simulation algorithms (any CAM system) or write enough of one (not that hard, the solution space is extremely multivariate but well understood and well documented) to make it feasible (not too hard for 2.5D machining). You can get as intellectual as you like in the solution, but remember perfect is the enemy of done. Value is huge, happy to split equity on a new entity to resolve if a workable solution for the easier subset of parts emerges in the next few weeks.


We run about 40-50 CNC. Lots of our engineering time goes in to planning, on how to step by step machine a component so that it can reach mentioned tolerances. Sometimes required tolerance are at or below the machine accuracy. Are you going to solve this also ?


I am looking at the low hanging fruit, 80/20 right now.


How to make png encoding much faster? I'm working with large medical images and after a bit of work we can do all the needed processing in under a second (numpy/scipy methods). But then the encoding to png is taking 9-15secs. As a result we have to pre-render all possible configurations and put them on S3 b/c we can't do the processing on demand in a web request.

Is there a way to use multiple threads or GPU to encode pngs? I haven't been able to find anything. The images are 3500x3500px and compress from roughly 50mb to 15mb with maximum compression (so don't say to use lower compression).


I've spent some time on this problem -- classic space vs. time tradeoff. Usually if you're spending a lot of time on PNG encoding, you're spending it compressing the image content. PNG compression uses the DEFLATE format, and many software stacks leverage zlib here. It sounds like you're not simply looking to adjust the compression level (space vs. time balance), so we'll skip that.

Now zlib specifically is focused on correctness and stability, to the point of ignoring some fairly obvious opportunities to improve performance. This has led to frustration, and this frustration has led performance-focused zlib forks. The guys at AWS published a performance-focused survey [1] of the zlib fork landscape fairly recently. If your stack uses zlib, you may be able to find a way to swap in a different (faster) fork. If your stack does not use zlib, you may at least be able to find a few ideas for next steps.

[1] https://aws.amazon.com/blogs/opensource/improving-zlib-cloud...


I have no experience in PNG encoding, but found https://github.com/brion/mtpng The author mentions "It takes about 1.25s to save a 7680×2160 desktop screenshot PNG on this machine; 0.75s on my faster laptop." which makes me think your slower performance on smaller images either comes using the max compression setting or using hardware with worse single threaded performance.

Although these don't directly solve the PNG encoding performance problem, maybe some of these ideas could help?

* if users will be using the app in an environment with plenty of bandwidth and you don't mind paying for server bandwidth, could you serve up PNGs with less compression? Max compression takes 15s and saves 35MB's. If the users have 50mbit internet, then it only takes 5.6s to transmit the extra 35MB, so you could come out 10s ahead by not compressing. (yes, I see your comment about "don't say to use lower compression", but no reason to be killed by compression CPU cost if the bandwidth is available).

* initially show the user a lossy image (could be a downsized png) that can be quickly generated. You could then upgrade to a full quality once you finish encoding the PNG, or if server bandwidth/CPU usage is an issue then you could only upgrade if the user clicks a "high-quality" button or something. If server CPU usage is an issue, the low then high quality approach could let you turn down the compression setting and save some CPU at the cost of bandwidth and user latency.


Are you required to use PNG or could you save the files in an alternative lossless format like TIFF [1]? If you're stuck with PNG, mtpng [2] mentioned earlier seems to be significantly faster with multithreading (>40% reduction in encoding times). If you're publishing for web, TIFF or cwebp might also be possibilities with -mt (multithreading) and -q 25 (lower compression and larger filesize but faster) flags, or an experimental GPU implementation [3].

[1] https://blender.stackexchange.com/questions/148231/what-imag...

[2] https://github.com/brion/mtpng

[3] https://emmaliu.info/15418-Final-Project/


GPGPU is the way to go.

Not terribly hard if you only need 1-2 formats supported, e.g. RGBA8 only. You don't need to port the complete codec, only some initial portion of the pipeline and stream data back from GPUs, the last steps with lossless compression of the stream ain't a good fit for GPUs.

If you want the code to run on a web server, after you'll debug the encoder your next problem is where to deploy. NVidia teslas are frickin expensive. If you wanna run on public clouds, I'd consider their VMs with AMD GPUs.


Thanks, I hadn't heard of that and I will look into it. This is a research setting with plenty of hardware we can request and not a huge number of users so that part doesn't worry me.


> This is a research setting with plenty of hardware we can request and not a huge number of users

If you don’t care about cost of ownership, use CUDA. It only runs on nVidia GPUs, but the API is nice. I like it better than vendor-agnostic equivalents like DirectCompute, OpenCL, or Vulkan Compute.


I solved a similar problem last year. As others have said, your bottleneck is the compression scheme that PNG uses. Turning down the level of compression will help. If you can build a custom intermediate format, you'll see huge gains.

Here's what that custom format might look like.

(I'm guessing these images are gray scale, so the "raw" format is uint16 or uint32)

First, take the raw data and delta encode it. This is similar to PNG's concept of "filters" -- little processors that massage the data a bit to make it more compressible. Then, since most of the compression algorithms operate on unsigned ints, you'll need to apply zigzag encoding (this is superior to allowing integer underflow, as benchmarks will show).

Then, take a look at some of the dedicated integer compression algorithms. Examples: FastPFor (or TurboPFor), BP32, snappy, simple8b, and good ol' run length encoding. These are blazing fast compared to gzip.

In my use case, I didn't care how slow compression was, so I wrote an adaptive compressor that would try all compression profiles and select the smallest one.

Of course, benchmark everything.


> Is there a way to use multiple threads or GPU

Maybe you could write the png without compression, compress chunks of the image in parallel using 7z, then reconstitute and decompress on the client side.


This is on our list of possibilities. It would take a little more time than I'd like to spend on this problem but it would work.


I would also be interested in knowing the answer to this. Currently we use OpenSeadragon to generate a map tiling of whole slide images (~4 GB per image), then stitch together and crop tiles of a particular zoom layer to produce PNGs of the desired resolution.


I'm unsure if this will help, but the new image format JPEG XL (.jxl) is coming soon to replace JPEG. It will have a lossless and a lossy abilities. It claims to be faster than JPEG.

Another neat feature is that it's designed to be progressive, so you could host a single 10mb original file, and the client can download just the first 1mb (up to the quality they are comfortable with).

Take a look: https://jpegxl.info/


This is a research university that moves very slow, so waiting two years for something better is actually a possibility (and prerendering to S3 works ok for now). I'll keep this bookmarked.


Since this is Python, which encoder are you using? I'd make sure it's in C, not Python. You might also be spending a lot of time converting numpy arrays to Python arrays.


also check the FPGA cards (ask the Xilinx; Altera/Intel, ...)


I try to find an agile project management tool that works for us. We run on what many would call Scrum (it’s not actually Scrum).

We are on JIRA now, and it’s … JIRA. We tried basically any other tool, including Excel (yes, that is somewhat possible).

My problem generally is that tools are slow, planning is cumbersome, visibility is limited and reporting for clients is often even more limited.

Heck, I’d even write my own tool if I knew it would help others, but I am concerned it’s too close to what we already have for anyone to actually migrate.

You could help me by sharing your thoughts!


I've recently started using ClickUp for managing my helpdesk and development work and I like it a lot. I don't do scrum myself but the product claims to be useful for that kind of work, as well as many other approaches and use cases.

https://clickup.com https://clickup.com/on-demand-demo

ClickUp for Agile Workflows https://www.youtube.com/watch?v=H9hZRwivnL8


Depends on workflow and team size. For small teams good fit could be some kanban based tools, for example Trello or GitHub projects.

You could also modern try agile tools, for example Linear. JIRA is good for 100+ teams and complex architectures.


We use Restyaboard for Agile marketing in that you will be able to manage all your projects, teams, and clients from one single space. https://restya.com/board/demo


Is linear.app in the realm of what you're looking for? Have you tried that?

Not affiliated, but I've had a positive experience with it in a small team. I would describe it as an IDE for issues.


Asana works really good for us. Really good UI/UX and fast, which I think it's the best feature they have :)


have you tried https://tara.ai/?


Try Asana


I'm working on a different type of compression (for all file types). I am able to to get in the 10-20% range, but the speed to compress is to slow many times, or the compression doesnt complete at other times (I've been working on this for years). My personal website: http://danclark.org

I'm also working on a conversational search engine (using NLP) at http://supersmart.ai


Have you looked into Middle Out compression?


Funny, I've actually had a lot of fun working on this compression software. It's a weird mix of needing it to be fast and hitting a compression threshold of it being useful. One of the best projects I've embarked on


We are experiencing very high CPU load caused by tinc [0], which we use to ensure all communication between cloud VMs is encrypted. This is primarily affecting the highest traffic VMs, including the one hosting the master DB.

I am starting to consider alternative tools such us wireguard to reduce load, but I am concerned of adding too much complexity. Tinc's mesh network makes setup and maintenance easy. The wireguard ecosystem seems to be growing very quickly, and it's possible to find tools that aim to simplify its deployment, but it's hard to see which of these tools are here to stay, and which will be replaced in a few months.

What is the best practice, in 2021, to ensure all communication between cloud VMs (even in a private network) is encrypted?

[0]https://www.tinc-vpn.org/


Apart from some smaller projects building on top of WireGuard, there's Tailscale [1]. One of the founders is Brad Fitzpatrick who worked on the Go team at Google before and built memcached and perkeep in the past.

Outside of the WireGuard ecosystem there's ZeroTier [2] which has been around for a while and they're working on a new version; and Nebula [3] from Slack, which is likely to be maintained as long as Slack uses it.

There might be others, but with tinc these four are the ones I've seen referred to most often.

[1] https://tailscale.com

[2] https://www.zerotier.com

[3] https://github.com/slackhq/nebula


+1 for Tailscale, the product is great. I've used it in a very limited scale but can vouch for quality and performance. No CPU issues at all (even on rPi).


Similar to Tailscale is the Innernet project, which has similar goals but is fully open source (also built on Wireguard). I've heard that set-up is a bit more painful, but for those who are interested in FOSS or self-hosting, it might be worth looking into.

[1] https://github.com/tonarino/innernet


NoCode: fly.io with its 6pn (out-of-the-box private networking among clusters in the same org).

DIY: envoyproxy.io / HashiCorp Consul for app-space private networking over public interfaces.

LowCode: Mesh P2P VPN network among your clusters with FOSS/SaaS like WireTrustee / tailscale.io / Slack Nebula.


What kind of loads are we talking about here? How many requests per seconds? Or is each request response large?

Have you noticed whether it is worse for lots of small requests vs large data transfers?

I use a very similar setup, but haven't seen tinc CPU usage matter yet, though for very low traffic.


There is a juxtaposition in the UK job market. We have millions of people working in low-paid precarious jobs in retail, food service, warehousing etc. while simultaneously companies complain that they cannot recruit into highly-paid, skilled roles due to a lack of candidates.

Given that you can study Introduction to Computer Science from Harvard University, online, for free and in your own time, it seems like the barriers to building skills is lower than ever.

However, many people are put off or intimidated by the idea of studying such a course. My solution to this is some kind of mentoring, either 1-to-1 or more likely in small groups. However, this is very resource intensive for my idea to scale. I'd be very interested to hear how others might approach this, both the mentoring or the underlying encouragement to study.


How to find motivation/energy to do a long-term creative project when having a full time jobs + other responsibilities?


The brain hates to start things and loves to finish things. This can be hacked in that a working session should always leave something unfinished.

Say you're writing a novel. Every writing session (but the first, obviously) should cover the end of the last scene and the beginning of a new one, AND THEN STOP, i.e. not finish the new scene.

Your brain will want to come back to the work to finish it, which overcomes the friction of "starting" something new every time.

It's easier said than done. It's surprisingly difficult to leave something unfinished at the end of each work session. But that's the trick.


The brain you describe is not the brain I possess.

Starting is easy.


"starting" is ambiguous. I really meant "getting to work".

Thinking about new things and maybe throwing down a few ideas is indeed easy and pleasant.

But deciding to spend a few hours to move a project (instead of not) is what the brain hates. It hates commitment, and is very afraid of the opportunity cost.


Nah, I'm with parent commenter on this. When I'm excited about something I have no problem diving into it for hours on end. But when I know that something is 90% done and it just needs to be tidied up, I will do anything other than working on it. Either everything from solving the hardest problem to being completely done happens in one sitting, or it never gets finished.


I've adopted this sort of trick ending my programming session with a failing unit test. I works quite well (When I remember to do it)


Wake up at 0430, exercise, take a shower and then work till you need to get ready for your full time job. If possible also dedicate half of your lunch hour to your project as well, together with half a weekend day.

It's quiet so early in the morning, so your productivity will skyrocket. I've coded my Paras this way while working a full time, heavy blue-collar job.


Does it not cause adverse impact on the day job?


Quite the opposite in my experience


How close are you to solving that ? And please would you share your progress :)


My current strat is to channel my scant motivation into maximizing my sleep and well-being, expecting to squeeze out more motivation from that.


I could use some help with some heuristics for Machine Learning, like how much data do I need to make a workable model, what framework/approach makes more sense given my ultimate goals.

Here's an example: there's a lot of ML tutorials on doing image identification. Like you have a series of images: picture one might have an apple and a pear in it, picture 2 might have an apple, orange, and a banana in it.

Where I'm struggling is putting this into my domain. I have a 100k images and from that around 1k distinct labels (individual images can have between 1 to 7 of these different labels), with between 13,000 to 100 images as examples in each label.

Is that enough data? Should I just start working on gathering more? Is this a bad fit for a ML solution?


Hey, I'm a ML practitioner for over 6 years and I'm glad to help.

1k distinct labels with a long tail distributions which is what you are describing, is definitely a challenging problem. It's called a imbalanced classification problem.

I'd focus first test how well your model will be able to predict these classes doing a stratified cross validation (stratification controlled by the label class), and measure the F-score, Weighted Accuracy and ROC AUC. Check also the precision and recall for each class. You'll definitely see that the model predicts better for the labels with more samples. The code that you use here, you'll be able to reuse later on, so keep it organized and easy to follow.

Then you have a couple options, focus on gathering more examples for the labels with small sample size, or try to oversample your dataset. This article is a good place to start https://towardsdatascience.com/4-ways-to-improve-class-imbal...

Considering this problem of image classification is normally solved with deep learning, the more data you have, the better will be your results.


This was very helpful, thank you so much!


Computer Vision is the one domain where ML (and neural networks in particular) is the undisputed king. Unless you're in an embedded application where you can't use neural networks in which case you might want to go with handcrafted features to trade-off accuracy for speed/compute-efficiency.

With regards to the size of your dataset, there's no hard rules it depends on the complexity of the task. Problems with a high number of classes are among the more difficult ones, 100 samples per class might be enough or it might not . The only way to know for sure it to try and see if you reach a performance that's acceptable for your application.

I recommend the Pytorch framework, it's coherent, easy to use and well documented (both the API reference and the examples available on SO and throughout the net). Your problem is similar to imagenet (assuming you want to detect the presence of a class and not its position, in which case it's a different problem.) so you can try to run one of the pytorch tutorials and see how well it does. The only difference is you want to detect multiple classes in one single image so you'll have to adjust your output layer and loss potentially but the network itself could remain the same. You also might want to look into doing transfer learning with a pretrained imagenet network to speed-up the training.


Define workable model. Do you care more about recall (how many of the images with label X will be labeled by your model with label X) or precision (how many of the images where the model says have label X are actually with label X)?

It is a good fit for ML, but you need to be clear on what the results will be. If you expect 100% accuracy, that won't happen. Even 90%+ accuracy would be require a lot of effort.


If you have not at least skimmed through fast.ai, you should definitely do that, as the course itself addresses some of this and the people on their forums are among the most helpful I have ever seen!

Second, this could be more than enough! Especially if you are doing transfer learning.

Third, you can "inflate" the amount of images you have now with "Image Data Augmentation"


The answer is, "it depends". My recommendation would be to just start trying. What you describe sounds like you could train it in a couple of days. Grab an off-the-shelf resnet and just see what happens. This is a well-studied problem, you can just look at papers that train on imagenet, and then tweak their approaches.


Sounds like a problem which YOLO can solve pretty easily , i.e object detection and classification, transfer learning, etc. Try downloading a pre-trained model, and play around with that. (The other replies have outlines what you need to focus on)


How do we scale social accountability and knowing?

This is expected to enable us to solve distributed coordination problems. Also, it should facilitate richer more meaningful relationships between people.

Expected outcomes include increased thriving and economic productivity.

[edit: consider the limit on how many people you can know and the relationship between how deeply you come into relationship with that population and the size of that number]


I have spent quite a lot of time thinking about coordination in general. Indeed, knowledge is a vital part of it. The problem that I see is that knowledge is too vague and lossy and changing and incomplete [as I mentioned in this comment https://news.ycombinator.com/item?id=26203718].

An hypothetical solution would be a system that talked a language similar to plain english, but that was determinist. You let people write their problems and views to the system, and the system determines which are the widest consensus available within a given scope and what are the highest priority problems (perceived by people). This has a lot of problems, but it's a good way to think about the topic. Even with such a system, would you really be solving the problems you want to solve?

If it does, then this is basically symbolic AI. You can try to relax requirements... but you kinda need an "automatic coordinator". If you go with a manual coordinator instead, then I doubt you will be able to scale anything that's not extremely rigid and hierarchical, at which point you are re-introducing many of the same problems you were trying to fight in the first place.


A combination of "all categories are fuzzy" and "all models are wrong but some are useful"? I too doubt the effectiveness of a symbolic AI approach. Although I studied that and other approaches in the field, you may note that my background is in biologically plausible methods for pursuing artificial intelligence.

I think the direct human input method is given too much focus although it and related interactions have their place. The fallible sensors directly reporting readings from reality already has sufficient noise related issues. I suspect more richly informing people will yield better results.

I am inspired by stories such as the fish farm pollution problem [0]. Consider how a reality based game theoretic analysis of agent choices might guide your selection of future work mates (or lakes) and facilitate a different friction in finding your next contribution to the world.

[0] search "3. the fish" on https://www.lesswrong.com/posts/TxcRbCYHaeL59aY7E/meditation...


I find your comment quite confusing.

>> A combination of "all categories are fuzzy" and "all models are wrong but some are useful"?

Are you talking about my first paragraph or symbolic AI?

>> The fallible sensors directly reporting readings from reality already has sufficient noise related issues.

I assume here you are trying to say that human input is not reliable.

I don't understand what's your approach with AI here. You seem to want to use it to better inform people? How? You are going to say that human input is not reliable, but then train an AI that can't explain itself and expect people to take its advice? Either noise can be palliated at scale in both places or none.

Finally, I'm very familiar with meditations on moloch. But you seem to be betting on an "education-based" solution, which doesn't fit very well with the scenario that meditations on moloch exposes, which is not that some people couldn't make better choices (for society, the collective), but rather that the "questionable" choices of a few can deeply compromise the game for everyone else. I mean, we all probably agree that it would be great to educate people on these concepts, but I doubt that will be enough to stop the dynamics that cause it.


I apologize for the unintended confusion. I don't find all expression safe in this context and have avoided some of it as well as the amount of work I could put into describing what amounts to a ~36 year life obsession for me.

> Are you talking about my first paragraph or symbolic AI?

In the link you provided and the second paragraph of your first reply you seem, to my reading, to suggest using a system to facilitate discovering agreement on specific actions, knowledge, and tactical choices. Stated differently agreement within groups, perhaps large groups. You discussed in both comments the challenge of being specific and static, which is the downfall, in my opinion, to many symbolic systems - the presumption that our ability to discretely describe reality is sufficient. To me fuzzy categories and useful broken models comment about that finding. The systems you are describing sound useful but seem to solve a different problem than I mean to target.

> I assume here you are trying to say that human input is not reliable.

Yes, I find human output to be unreliable and I believe it is well understood to be so. An example of a system that has elements of scaling social knowing is Facebook. I believe it is well understood that people often (and statically speaking prevalently) present a facsimile of themselves there when they are presenting anything actually more than superficially adjacent to themselves at all. This introduces varying amounts of noise in to the signal and displaces participation in life, perhaps in exchange for reduced communication overhead. Humans additionally make errors on the regular, whether through "fat fingers", an unexamined self, "bias", or whatever. See also "Nosedive" [0].

> I don't understand what's your approach with AI here

I haven't really described it - the ask was literally for the problem, not for solutions. There is a certain level of vaporware in my latest notion for exactly how to solve it. As stated obliquely however, there are aspects of the solution that I don't really want to be dragged through a discussion on here on HN.

> an AI that can't explain itself

I haven't specified unexplainable AI. I actually see evidence based explainability as a key feature of my current best formulation of a concrete solution. That, in context presents quite a few nuts to crack.

> Finally, I'm very familiar with meditations on moloch

I only meant to link the fish story but the link in MoM was broken and I failed to find a backup on archive.org, not putting a whole ton of effort into looking.

Consider how the described "games" change if those willing to cooperate and achieve the maximal outcomes could preselect to only play with those who are inclined to act similarly? If you grouped the defectors and cooperators to play within their chosen strategies based on prior action? Iterated games have different solutions and I find those indicative of life, except that social accountability doesn't scale. In real life such specificity is impossible and no guarantees exist. Yet, I believe that the rights systemic support structures could solve a number of problems, including a small movement of the needle towards greater game theoretic affinity and thereby a shift in the local maxima to which we have access.

[0] https://en.wikipedia.org/wiki/Nosedive_(Black_Mirror)


Thanks, that was much clearer. Well, there are indeed many options and paths we could take in the space, so good luck with whatever you end up trying. Only one final note: I'm a very secretive person myself, and even beyond that I understand your reticence to share more details about some of your specific ideas... but I think that sharing more openly would align better with that shift in the local maxima you aspire to achieve. For example, I'm sure at least some of us would be interested in reading a submission or blog post about many of these ideas.


The question is too far up in fuzzy space. Narrow down to several use cases and specific problems within those and search field will be more manageable. Examples: Social workers want to be able to handle more cases appropriately. How many cases can they handle without diminishing quality scores. Politicians want to appear caring to the needs of as many constituents as possible. How do they group needs into buckets to find what is most relevant. Find the overlap and dig into it with more cases and then questions.


Like automating the analysis of a recorded argument according to Gottman institute and other social heuristics for augmentation of marriage counseling services?

[edit: i.e. count positive and negative sentiment statements assigned to speakers and compare the per speaker ratio to the experimentally determined minimum "healthy" ratios not yet replicated]

You're right that there needs to be a tractable starting place. This is not lost on me. I may have used a flexible definition of "close to solving" but one's interpretation also fits into the scope of the effort. I'm at least 10% into it! ;P


A way to preserve and link factual data sets.

Most reference to Wikipedia are dead links.

Many legacy media will stealth edit articles or outright delete them.

Original media files can be loss and after strange eons their authenticity will not be able to be asserted.

It will soon be impossible to distinguish from deep fakes and actual original and genuine media.

Some regimes such as Maoist China wanted to rewrite their past from scratch and erased all historical artifacts from their territory.

There are strong pressure to create an Orwellian Doublespeak to erase certain words entirely from speech, books and records. With e-books now the norm it has now become legitimate question to ask if the books are the same they were when the author published them.

Collaborative content websites have shown that they were not immune to subversive large and organized influence operations.

I have set my mind to multiple solutions (even bought a catchy sounding *.org domain name!). Obviously it will have to be distributed as to build a consensus and thus it will have to rely on hashes. But hashes alone are meaningless so some from of information will have to come along with them, which in themselves are information to authenticate with other hashes. I was thinking that the authentication value would come from individual recognized signatories. Those would be a a mesh of statements of records. For example you might not trust your government, but you might trust you grandparents and you old neighbors who all agree that there was a statue on the corner of the street and they all link to each other and maybe link to hashes of pictures and 3D scans with links. Future generations can then confirm those links with other functional URIs.

Something like blockchain technology seems an obvious choice but I have no experience with that (for now) but also there is the problem that it needs to be easily usable; therefore there is a need of a bit of a centralization (catchy domain name yay!) although any one could setup his/her own service for certain specialized subjects.

Thoughts?


One solution might be to collaborate with, build upon, and donate to an existing reputable organization like archive.org or Wikipedia which takes snapshots of websites.

Given these snapshots, you could write manual extractors or build a machine learning system [1] to extract the main content of each page as plaintext. Then load these timestamped text file snapshots into git, which will give you a hash of the content and let you easily track changes.

Push the git repo to a few places like github, bitbucket, and maybe IPFS where people can mirror it.

[1] https://joyboseroy.medium.com/an-overview-of-web-page-conten...

Alternatively you could use the built-in reader mode in Firefox or Chrome which do this automatically, but then you'd have to figure out how to maintain stability of the extraction algorithm between new browser releases.


I'm working on a prototype that uses the compositional game theory [1] and adapts it to be able to reliably predict the order complexity of functors and their differences between states.

A huge bonus there would be when the order difference can be represented in a graph, so that tesselation or other approaches like a hypercube representation can be used for quick estimations. (that's what I'm aiming for right now)

If successful, the next step would be to integrate it into my web browser so that I can try out whether the equilibrium works as expected on some niche topics or forums.

[1] https://arxiv.org/abs/1603.04641


Yeah, this week I've restarted my scanning tunnel microscope that I'm failing to make work for years... The current one is a standard pair of long metal bars with the piezoelectric component on one end, with 2 screws, and a 3rd screw on the other end.

My problem is that it doesn't matter how I design the thing, either the screws offer too little precision so I can't help but to crush the tip into the sample every time, or too little travel distance so I can't help but crush the tip into the sample when adjusting the coarser screws near the tip. This is the kind of thing that looks like a non-problem on the web, because everybody just ignores this step.


It sounds like you need to add a "stage" to help position your sample. Flexures are systems that bend to perform motion, and can do surprising things that you can't do with joined together machined pieces.

Here's an open source stage project using flexures that will likely help

https://openflexure.org/projects/blockstage/

Also, see Dan Gelbart's 18 video series about building prototypes

https://www.youtube.com/watch?v=xMP_AfiNlX4


I'm a fan of this project, but flexures are antithetical to an STM assembly. And STM needs very rigid components, the smallest vibration can interact with the height adjustment and push the tip into the sample.

But it's a great assembly for anything that doesn't have a feedback on the positioning.


Your STM is missing a proper approach mechanism. The vertical range of the piezo that is used for scanning will only be a few 100 nanometer. A screw is too coarse for that! Stick-slip mechanisms with a ramp (Besocke or Beatle-type STM https://www.researchgate.net/figure/1-Diagram-of-the-Besocke... or page 25 of https://www.bgc-jena.mpg.de/bgc-mdi/uploads/Main/MoffatDiplo...) are one solution. Even with such a mechanism, the 'approach' phase takes many minutes!


Adding another step (not a problem, just maybe a solution), a quick estimate says that if I place a few kHz signal at the sample, it will induce enough current at the tip to be detected by the preamp when the distance reaches the micrometers. That's the same range you want to stop the approximation, so it may be a nice proximity signal.

I've got to try this on my next attempt.


Sounds good!

In the future, if you'd like even more precise measurements, theoretically you could use 2 different frequencies or a reflected source, and look at the interference or superposition of the waves.

I'm by no means an expert in this, but I've heard that optical measurement (eg: laser + Michelson interferometer) could theoretically take you down to the nanometer range.

But it's easy to go overboard with this, haha.

https://www.osapublishing.org/oe/fulltext.cfm?uri=oe-20-5-56...

https://iopscience.iop.org/article/10.1088/0957-0233/9/7/004


Wow, this sounds very ambitious!

Perhaps you could somehow attach the piezoelectric component or bars to a micrometer [1] which is designed for accurate and repeatable measurement?

[1] https://en.wikipedia.org/wiki/Micrometer


Yes, I could. I believe the most straight forward design would be 3 micrometers directly supporting the piezoelectric component, with no levers.

Yet, they are a bit expensive. I'm still not willing to budget all that, but I'm starting to consider it.


Digital micrometers are expensive, but you can get analog ones on AliExpress [1] or other places for around $10. Of course, the precision may not be as good as name-brand (eg: Mitutoyo) tools.

[1] https://www.aliexpress.com/wholesale?catId=0&initiative_id=S...


Yes! Thanks a lot.

Looks like the exact thing I need are micrometer heads, and some even come with nice threaded mounts.


One more idea, there exist worm drive micrometers which allow you to step down the linear movement per revolution even more:

https://www.global-optosigma.com/en_jp/Catalogs/pno/?from=pa...

If you have machining/fabrication skills, it might also be possible to buy a few worm gear sets and modify your micrometer to move really slowly but precisely.


Excellent, glad that you found this to be helpful. Good luck with your project!


We are working on a totally new way to do cold fusion, our only problem is getting enough new fuel into the reactor without disturbing the running process.

Any help would be greatly appreciated.


Could you use tiny pellets fed in by a linear actuator and gravity, rotary loader, or some kind of a conveyor belt? If you want an off-the-shelf solution that's easy to reload, perhaps you could repurpose the loading mechanism of a machine gun to dispense the pellets.


What does your reactor look like? What fuel are you using? How are you getting to ignition?


Disclaimer : I know nothing about anything.

Hawking radiation ?

Laser tunnel ?

Magnetic canon ?

Centrifugal launcher ?

Vacuum diffusion ?

Electrical beam lensing ?


> Hawking radiation ?

That actually does make for a fairly efficient (better than fusion in energy per unit fuel mass) reactor design in principle, but you need a sub-solar-mass black hole (in the 1 billion - 100 billion ton range), and there's no known practical way to produce one.


Tesla valve?


Maybe the solution is in your fridge?

... Not being flippant; I find that these kind of prompts can help in thinking from a new approach.


A search engine that prioritizes ad free, tracker free sites.

Of course google can't do it. But this is a ripe for someone to step in.



Stateful, exaclty once, event processing without the operational capacity to run a proper Flink cluster. This thing needs to be dead simple, pragmatic and cheap/simple to operate and update. The only stateful part in our infra at the moment is a PG database.

We are going to start work on this in a weeks, so I'm looking for some insights/shortcuts/existing projects that will make our lives easier.

The goals is to process events from students during exams (max 2500studnets/exam = ~100k-150k events) and generate notifications for teachers. No fancy ML/AI, just logic. Latency of max 1 min.

Our current plan is to let a worker pool lock onto exams (PG lock) and pull new event every few seconds for those exams where (time > last pull & time < now - 10s). All the notifications that are generated are committed together with a serialized state of the statemachine and the ID of the last processed event. Events would just be stored in PG.

This solution is mean to be simple, be implemented in really short timeframe and be a case study for a more "proper & large scale" architecture later on.

Any tips, tricks or past experiences are much appreciated. Also, if you think our current plan sucks, please let me know.


I think you could leverage SKIP LOCKED for this - this blog post https://www.2ndquadrant.com/en/blog/what-is-select-skip-lock... explains it nicely.


I've had good experiences with PQ (https://pypi.org/project/pq/). Any event that generates a notification triggers adds an entry to the queue. Worker processes get entries from the queue. The queue is stored as another table in your database whose structure and content is managed by PQ, though you can always read/write to it if you want. PQ handles the concurrency.


(EDIT: just realised that you specifically mentioned stateful event processing, while what I describe below are two approaches for stateless, exactly-once event processing)

Having had a few cracks at this problem, in my opinion using locks is the wrong approach.

What you will want is:

* split all input data in batches (eg batches of 10k records, or periodic heartbeats every X seconds, etc)

* assign each batch a unique identifier

* when writing data to the output store, store the batch id along the data;

* when retransmitting a batch for whatever reason, reuse the same batch id and overwrite any data in the output store that matches this batch id.

Obviously this becomes more tricky when you’re dealing with eg window functions or more complex aggregations.

In this situation, I believe that an approach such as “asynchronous barrier snapshotting” works best. Every X seconds, you increment an epoch. While incrementing, you stop ingestion. Then you first tell the output source to create a checkpoint, then the input source to create a checkpoint, and once both have been checkpointed, you can continue streaming data again.

Anyway, these are two approaches I’ve used over the years that work well. Explicit locks don’t work well in distributed processing, imho.


Sounds like a change data capture problem. Consider using Debezium, my team was able to use the standalone java engine to connect to a Postgres DB and stream (within the context of the Java app, not an external kafka stream) insert/update/delete events. You could filter those events and apply your notification and other logic to the filtered events.


PG is great for this. Should handle ~100 or more events per second without much work (but set up a retention policy, and watch out for tables growing to > ~1M rows, as that will kill you during autovacuum).

You can use txid_current_snapshot() and friends to track the last "timestamp". Proper use of locks will help you avoid the complexity associated with long-lived transactions.

Exactly-once semantics can be tricky to guarantee if you do it at the wrong layer of abstraction. Sometimes building exactly-once semantics on top of at-least-once semantics is the way to go.

Kafka and rabbit MQ are both overkill under 100 events/sec. The extra ops overhead isn't worth it. Besides, with PG it'll be nice to be able to always query a couple tables to completely discern the state of the system.


Message queue (i.e. RabbitMQ) sounds like a more natural fit for your problem.

What is the peak and avg QPS you need to support? High peak QPS might force you to introduce distributed workers and makes locking impractical.

Another consideration is how much do you care about data integrity. Would it be a problem if a few messages are lost? What if a message for processed twice? What if servers lost connection to db for a few seconds? What if a whole server/db goes down?


Does Kafka not get you halfway there? It will guarantee exactly once semantics. Use MSK or Confluent cloud if you can use managed services.

It’s a more future proof than building this on top of Postgres.


I'm not sure I'm close to solving it, but I have an approach that I'd like some feedback on.

I have a corpus of text in many Indian languages, which i'd like to index and search. The twist is that I'd like to support searches in English. The problem is that there are many phonetic transliterations of the same word (e.g the Hindi word for law can either be written as "qanoon" or "kanun"), and traditional spelling correction methods don't work because of excessive edit distance.

My approach is this: Use some sequence to sequence ML technique (LSTM, GRU, ..., attention) to a query in English to the most probable translation and then use that to look it up using a standard document indexing toolkit like Lucene. (I can put together a training dataset of english transliterations of sentences to their original text)

The problem is that I'd like the corpus, the index and the model to be all on a mobile. I have a suspicion that the above method won't straightforwardly fit on a mobile (for a few Gig of corpus text), and that the inference time may be long. Is this assumption wrong?

How would you solve the problem? Would TinyML be a better approach for the inferencing part?


I'm not sure I understand the problem specification. You want to be able to search "law", and find documents containing "qanoon" or "kanun", right? How does your proposed solution handle that? It seems like the approach with ML TL -> Lucene would still only find one of the two, unless your model is written to return a set of possible transliterations. Or are you saying your approach doesn't currently solve this part of the problem, and that's one of the things you'd like input on?

Is the corpus the only data you have, i.e. do you need to use it for training and validation as well?

In terms of the size of the data, if you want to store the corpus on the phone anyway, won't the index and model be relatively small in comparison?


No, sorry for the confusion. I want to be able to type “kanun” or “qanoon”, and have it infer the Hindi word “कानून”, which is an indexed word.

It is not necessary that there is a one to one correspondence between words. Sometimes two english words may represent one hindi word, or vice versa.

I believe I can build up a decent sized training/validation set, for example from Bollywood song lyric databases written in English, and mapping them to the Hindi equivalent (or Tamil, Bengali, etc).

As for your last question, I dont know, since I haver implemented an ML model in practice. I saw a tutorial on Bert this morning, where a word has 768 features. That itself sounds huge, leave alone the model itself.


A non ml way to approach this is to use phonetic distance, e.g. qanoon and kanun sound the same so they are close.

There is an algorithm called Soundex with python implementations you can try.


Refer what I said about edit distance. Traditional methods don't work well at all.

The fundamental difference between my problem and traditional spelling correction algos is that in the latter, there is a canonical correct spelling to be used as a reference. In my problem, there isn't. There are different approximate ways of spelling out most hindi words ... there is no one correct way. There are common patterns, sure, but it is too tedious to encode all the variations.


Working to enable users of https://www.DreamList.com to record audio of any length and see it transcribed, ideally at the same time as the recording, while the recording is also saved. The goal is for grandparents to save stories for loved ones and not worry about quality of the transcription - just talk. When the recording is saved, the transcription can be redone or tweaked if needed later, but the memory is not lost. DreamList is web and soon native apps, so WebRTC connected to a cloud transcription service is my first instinct, but there are benefits to native iOS apis as well - especially being able to keep share stories while listening to other streams also on iOS (families talking and digging into stories together). What architecture/transcription approaches would you suggest? Any gotchas you've seen dealing with similar problems (accuracy given accents, do we train our own transcription based on gathered data, etc)?


I worked on this for a couple years during a previous startup attempt.

I designed a custom STT model via Kaldi [0] and hosted it using a modified version of this server [1]. I deployed it to a 4GB EC2 instance with configurable docker layers (one for core utils, one for speech utils, one for the model) so we could spin up as many servers as we needed for each language.

I would recommend the WebRTC or Gstreamer approach, but I wouldn't recommend trying to build your own model. It's really hard. Google's Cloud API [2] works well across lots of accents and the price is honestly about the same as running your own server. If you want to host your own STT (for privacy or whatever), I'd recommend using Coqui [3] (from the guys that ran Mozilla's OpenSpeech program). Note that this will likely be much, much worse on accents than Google's model.

[0]: https://kaldi-asr.org/

[1]: https://github.com/alumae/kaldi-gstreamer-server

[2]: https://cloud.google.com/speech-to-text

[3]: https://coqui.ai/code

Edit: Forgot to mention, there's also a YC company called Deepgram that provides ASR/STT as a service, you could give them a shot: https://deepgram.com/


In my experience, Google's API completely fails when any slightly unusual vocabulary is involved (e.g. in this instance, grandparents talking about their past jobs), and tends to just silently skip over things. Amazon's wasn't much better with vocab., but at least didn't leave things out, so you could see problems. I don't have experience with any of these others, but I think for my purposes (subtitles for maths education videos) no one will have made an appropriate model yet.


I too am missing my 1990s forums experience. This feeling, and a particularly frustrating few minutes spent on LinkedIn prompted me to write something about it.

I discuss some intellectual problems and solutions.

https://blog.eutopian.io/building-a-better-linkedin/


How visually implement process in the app? I.e. how to guide users over complex process they need to do in the app to achieve success?

The process might span different medium (write email, do something in the app, check twitter, etc) and different activities multiple days. How to make sure they know what they should do next? Checklist? Emails? Slack? Wizard?


This would probably require a lot of front loaded work in your case, but if you need to train a boat load of people with very few (or zero) trainers, my favorite way to do it is (Atlassian’s Atlaskit Onboarding/Spotlight components)[https://atlaskit.atlassian.com/packages/design-system/onboar...]


Interesting... My to-go solution for this would be a detailed wiki page with screenshots, and link to that from a bunch of places. But I guess that's not an ideal solution really.


What do you use for the wiki?


World of Warcraft


we use little floating checklist from userpilot.com


I‘m trying to re-/sell cheap bulk object storage, by renting cheap dedicated servers (e.g. Hetzner), connect them using 10GbE and putting them into a big Ceph cluster.

My problem is how to bill people for consuming object storage properly. Do you do it retrospectively and take the fraud risk? Are there any pre-existing platforms that do Ceph billing?


This is what DigitalOcean did to create their block/object storage. If you're interested in doing this type of thing in a career capacity the storage team is hiring a lot of people right now. Feel free to reach out to me as I'm on that team. :)

https://www.digitalocean.com/blog/why-we-chose-ceph-to-build...


Can you explain how the economics of what you are doing would be competitive with something like S3 or B2? I feel like there could be a market/margin here but there are a lot of numbers involved to figure out the specifics.


Just take a look at the egress pricing of B2 (10USD/TB) S3 (92 USD/TB) and then look at Hetzner's 1 EUR / TB. There is quite a margin there - same thing with the storage costs (23,5,1.5).


I'm currently thinking about starting a very similar project, would you like to talk about it and exchange some learnings?


Sure, how can I reach you?


claudioreiter on telegram or via email hn@dagobert.pw


I find it hard to find communities for my ever-changing niche interests.

I’m working on community discussion boards which exist at the intersection of interests.

Eg. Mountain Biking/New Zealand, Propulsion/Submarine, Machine Learning/Aquaponics/Tomato, etc.

The search terms for interests are supplied from Wikipedia articles which avoids duplicate interests and allows for any level of granularity.

I find that key word functionality in search engines has degraded to the point that finding good content for niche interests is difficult. I’m hoping with this system I can view historical (and current) discussions around my many niche interest combos.

I’ve got the foundation done, I just need some feedback/advice on whether I’m reinventing the wheel here, or if others share this problem?


this kind of sounds like reddit to be honest, is reddit lacking something you wish would be there?


Reddit is great for finding general topics - science, news, hobbies, etc.

And the goal of this isn’t to replicate those more sweeping discussion boards because they’re great!

My issue is that once things get more niche, the subreddits are tough to find - they could have obscure names, and even just learning about them often involves another user recommending. Plus creating a subreddit for every niche intersection isn’t ideal.

With this I could zone in on exactly what I’m after, without sifting through all the unrelated stuff. With the Aquaponics/AI/Tomatoes, I’d be dealing with only that intersection point. Not the peripheral stuff.


The problem - correct string matching at scale. I am aware of fuzzy string matching. The problem is that the two strings can be > 90% similar even if the difference is, for example, one digit in the year of manufacturing. My current solution is to represent the 2 strings as similar as I can based on the available information by transforming (wrangling) the data to match the data as close as possible and then applying constraints based on make, model and year (they should be the same). It works pretty well, but I am looking for a more interactive (human-in-the-loop) solution.


I'd just slap a GUI / audit logs on top. Show the intermediate data (the “wrangling”), show the computed similarities, show the conclusion (this met that threshold, and the other was equal, so it's category seven).


Can you elaborate on the technical details: which language, library or framework would you use?


Tkinter, probably. Or a web interface. Depends on what I'm doing, honestly – the answer will always be “whatever's currently being used”.


I’m facing an issue where I store small binary data blobs within a Postgres column in order to benefit from delete cascades.

I’m considering moving the binary data into S3 and then doing the sync layer on the server (which means the front end requests the data from the backend and is given it back as a JSON object with base64 values).

Doing this manually via code isn’t impossible, just API intensive, so I’m wondering if this is a solved issue for anyone.

The why: The JSON blobs are recordings of words and sentences that can be copied between articles.


It's hard to give any advice on this until you detail the problem you're trying to solve. For example:

Where/why is your current system failing/inadequate/cumbersome?

Why do you want to move the data to s3?

Why are delete cascades important?


I've been thinking for a while about building a tool to generate ETL/ELT jobs for data warehousing. Yes, there are lots of such tools already, but I've become frustrated in one way or another with all that I've used so far (mainly with their bloated size and clunky "repositories" and inscrutable runtime engines). This "while" that I've been thinking is stretching out -- the other day I stumbled across my earliest vague musings on the subjects, and noted that they are a couple of years old already, so I'm beginning to think it's time to stop thinking and start building...

For various reasons -- mainly familiarity, FOSS ecosystem, and cross-platform compatibility -- I'm going to try to implement this in Free Pascal / Lazarus. There is one kind of component I'm definitely going to need, and if there were a ready-made one I could use in stead of building one from scratch, it would save me a lot of time and effort. I've looked around online, but so far haven't found "the perfect one". So, my question is:

Can anyone recommend a good FOSS graphical SQL query builder component -- i.e, one which presents tables and their columns to the end user, so they can specify joins and filters by clicking and dragging, etc -- for Lazarus (or in a pinch Delphi, to port to FP/L)?


I'm looking for a way to integrate a React app with an existing Vue... thing. Don't really need any communication between the two, just displaying it would be fine. My issue is: the Vue code just throws in <script> tags in the html and expects global variables (location instead of window.location), while the React code uses ES6 imports. The only partly working way I found is including <script> tags with a useEffect, but that doesn't play nicely with apex-charts for some reason, and includes forcing the html with a dangerouslySetInnerHTML after importing the existing file as a long string. Sub par, obviously. In addition, I'll probably need to include different vue apps in a couple different instances. Any suggestions? I think I might just keep them separate and open a new tab for the explanatory session. thanks!

Reasoning: helping with the code behind a paper on explanatory AI systems.

Related code on github if curious https://github.com/pollomarzo/map-generation/tree/main/graph

EDIT: thanks for suggestions will give them a spin throughout tomorrow :)


Have you tried SingleSPA? I’ve used it in the past to get React apps inside an old AngularJS app to replace parts of it over time and it worked pretty well. Docs say it works with Vue as well but I don’t have any direct experience with Vue at all far less for this kind of task so I’m not really sure it will work but it is worth looking at https://single-spa.js.org/


Have you tried using `useRef` and attaching the Vue component to a parent React node? I haven’t does this with Vue, but have with a vanilla JS chart library.


>just displaying it would be fine

Iframes with postmessage() where needed (like dynamic window size changes) isn't pretty, but it's easy to do.


How to create a good app.store for a smartphone OS?

Users should be able to install whatever software they want. Similar for developers, they should be free to publish whatever software they made.

Apple/Google approach is suboptimal because centralized point of failure. And they do censorship on their stores, both political and arbitrary.

Linux approach is suboptimal because users don’t have keyboards to create these sources.list text files. Even if they had qwerty keyboards, I don’t like the UX, too hard to use.

Traditional P2P like bit torrent + DHT is suboptimal because smartphones, would use too much electricity + bandwidth to be practical.

So far, I’m thinking about developers-hosted binary packages, and existing code-signing infrastructure for authenticity and integrity (Verisign, Comodo, Digicert, these guys, up to developers to choose one). The configuration issue from Linux should be solvable with QR codes scanned by camera, plus a custom URI handler for the web browser on the phone.

The main thing I don’t like about that approach — a store app on the device is a good UX from end users’ perspective. Yet it seems impossible to make one with that approach.

I’m very far from being blocked on that yet, but I will face that problem eventually.

P.S. I’m not going to solve security at that level. As Android store shows us, it’s borderline impossible even with Google’s resources. Modern mobile SOCs have enough juice to solve that properly, on the lower levels of the stack. Most of them support hardware-assisted virtualization. All of them are fast enough to run proper multi-user Linux, with security permissions and SELinux kernel module.


How to get my kids to bed to bed.


By no means am I an expert, but just another parent on HN.

1. Ritualize

I notice a pattern with my kids: going to bed is a ritual, and any deviation is reason enough for them to leave the bed.

2. Slow down in advance

Going to bed after playtime is impossible. So cut screens, playtime, play/listen some quiet music, read books or newspaper (again, no tablet/reader), at least 1h before bed time.

3. Recap the day

Remind your kids of their day, activities and make them aware of the fatigue. Works better when in bed, with mine.

4. Stay with them if they're afraid

Learn why they're afraid, teach them why there's no reason to be afraid. I've had to hang a sock to the door every night for months to scare tigers away :D It just works ^^

Every parent knows the pain and every kid has their own back story and the relationship with the parent(s) is key to finding a way into bed.

Eventually, they will sleep.

In our case, we settled that unless exceptional situations, our kids had to fall asleep in their own bed, because we wanted/needed our intimacy. To get that, I had to stay in my kids' room for as long as 2hrs for months, but didn't let go. Today, going to bed is thankfully not a situation anymore.

I'm not sure this is in any way helpful, but here's my shared experience and learnings. YMMV.


Best trick I've learned (and I have more than twice as many kids as the average citizen here :-) is to:

1. make sure they don't fall asleep with something they cannot keep all night (i.e. while you are singing to them, rocking them, sitting next to them or when they are drinking a bottle of milk etc)

2. make sure they understand that even if you leave the room it is just temporarily. Small kids are - for good reasons - very afraid of being forgotten or left alone.

2.1 Using a timer to remember to visit the room regularly and often as they learn to sleep alone can help a lot

2.2. Increase the interval each day. I increased it by two minutes each day.

2.3 If the kids are happy in their bed, continue to visit their room at the scheduled time: you don't want them to think that you forget them if they don't cry.

Using this method I've got my last few kids to enjoy going to bed and sleep better in less than a week fo each of them.


For mine: A reading light with a remote-control timer. We read a story together, brush teeth, then they get 10 minutes of independent time with their own storybook. But kids vary, so good luck.


This book has a lot about parenting, including how to get your kids to go to bed:

https://www.amazon.com/Bringing-Up-B%C3%A9b%C3%A9-Discovers-...


in general aim for repetition and stringing activities together.

So for us, when I first started to do this. Each night they get a 'treat' but to get that treat they first need to be ready for bed - eg bedroom ready to sleep in, correctly dressed/ washed etc..

then after the treat they must choose a calming activity - ideally in their bedroom eg reading (nothing that gets their heart rate up) for 30-60mins then they must bush their teeth

It's this point we say time for bed, but we allow them to carry on reading for another 30-60min then it's lights off

if they don't do the activities/ actions after the treat, then we warn them that they'll not get one tomorrow etc. (and really do what you say)

also you may need to flexible on the activities until they get into the swing of it


If they are young < 3years just run hairdryer song from YT. I give 5 mins max. You are welcome.


Heu, raise your voice


Try the 28hour day. Once your kids are sufficiently tired, they'll go to bed by themselves.

https://xkcd.com/320/


... not a parent?

(The only thing worse than trying to get a child who isn't tired to go to sleep is a child who is too tired to go to sleep.)


... I've got two! (0.3y, 2.7y)

Apparently I also still have a sense of humor. Or maybe I don't, because perhaps pretending one doesn't get the joke of doubling down on xkcd silliness is perhaps a joke in itself, which I didn't get.


Benadryl


Benadryl, no. But melatonin, sometimes, yes. My rule is to have a hard cutoff time after which it's better to take melatonin than to continue the cycle of whining, sleep deprivation, and next-day misery. The cutoff is late enough to have plenty of time to try all the other things involving wind-down rituals. It is not an every day thing. I found that having the consistency actually helps establish the rituals too.

Also, bedtime trouble usually means not enough outside time and physical activity during the day. Or that the kids want more of the parent's time.


Look into the research behind melatonin use. I’m not a doctor and certainly long term use of diphenhydramine is associated with neurological problems in old age, but I’m not sure melatonin should be used as a simple hypnotic as you are suggesting. It’s natural but so is testosterone. Hormones may not be good to tinker with. I say that as a long time user of melatonin. At the very least you may want to stick with lower dosages- nothing over 1mg which is about as low as is easy to find.


Kids versions don't come in anything higher than 1mg and one can make it a half-dose quite easily. It's really more of a last resort thing, and definitely not for every night's bedtime. How last resort? Maybe once or twice a month. Now that my kids are a bit older and bedtime rituals are established it's even more rare.

I realize that some parents reach for it every night and this is not something I'm suggesting.


I could really use some help with adversarial attacks.

If there's someone trying to use CV to recognize stuff and we are trying to prevent it (basically, a black box situation) - is it viable to use adversarial attacks at all?

Will it work long term? Can they overcome our AA by downsampling and adding noise? Can we make another AA that would still prevent them using CV? If it's better to chat elsewhere - my Twitter is in my profile.


If you have access to their model it's possible, and adversarial attacks can be made to survive different sorts of processing. But like all piracy prevention tools it's a game of cat and mouse so if you're working against an intelligent adversary it'll come down to how many resources each of you can contribute to this fight. (Btw your twitter is not in your profile. Not that I have anything to add in private, just thought you should know if you're relying on it.)


No, the case is not having their model. I understand it's a game of cat and mouse, but this one case is gonna have a very quick iteration cycle, so I'm not sure if it's worth trying at all, it's not the same thing to test an adversarial attack on a self driving car once.

Thanks, it didn't update my profile for some reason, fixed!


every week I read about autonomous driving adversarial attack, so I would say it is beyond feasible.


I've been working on applications of network flow theory and ecological measures to macroeconomics to determine mesoeconomic scale measures for use by decision makers and businesses. These measures use certain properties to determine both stability due to sudden shocks to the economic system as well as what various policy actions will have on the overall health.

This includes entirely different ways of visualizing free trade, etc.

I have made some significant progress in the past year, but I am running into this headache where the papers I need are not easily accessible (or cheap) since I'm not an academic and I'm literally doing this as a for fun side project. I've been trying to find a university or college willing to take me on in some capacity simply to let me get access to academic catalogues, but I am not having that much luck unfortunately.

The best results recently though include some really fascinating stuff that comes from statistical measures that give clarity to how a region would respond to being isolated for any period of time.


What are the resources to look at for designing the architecture of a scheduling system (for coroutines/threads)? Would I first look at how Operating systems implement it? Or how VMs like Erland implement it?

Same question for effects systems, where would I look to understand how they're designed and the trade offs for their design decisions?


It indeed makes sense to look at on OS. You might want to start with a simple one, like FreeRTOS - which is more or less just a task scheduler at core.

I would recommend not to start with coroutines in the beginning if your main focus is scheduling. In the end coroutines and async/await is about building a userspace scheduler on top of a scheduler that already exists in the OS, so you just get twice the amount of logic. However the schedulers used in userspace are often a lot more trivial than the OS ones, since they don't support preemption or priorities. Erlang might be the exception and an interesting thing to look into.


So don’t start with coroutines and just look at OS scheduler and possibly the erlang VM?


Everyone starts with callbacks, moves on to coroutines, and eventually ends up writing polling loops. Can take decades for each person to work through it.

Erlang's messaging may have potential as another alternative


By polling loop, do you mean like a while loop that checks a queue for work at a preset frame rate?


Go is a modern take on coroutines (M:N threading), when everyone else seems to be going the async/await route. Though, I don't know how simple that scheduler implementation is (or if there are books about the internals).

Since the Linux kernel has pluggable schedulers, that code might be well-structured for reading. Again, I don't know the specifics.


Need help to decide the tools to be used for the below problem:

The system is a bunch of batch jobs that are scheduled to run at different intervals. These jobs can be modelled as an acyclic directed graph of steps. They basically download files from vendors and map the rows inside them into a generic format (for generating reports). There are a lot of vendors and each vendor can have a different file format containing different fields -- hence requiring custom business logic to populate (map) the corresponding generic file (like aggregating fields, fetching values from DB, etc.). Also these vendors' files sometimes contain errors, or are dropped late for download, etc. -- failures can happen and these failed instances of jobs should be able to rerun.

Existing system is built using Spring Batch and Spring Integration. The problems with the existing system are:

1. there are more than 200 jobs and most of them have their own custom logic during mapping -- cannot be generified

2. lot of manual work needed to onboard new vendors

3. jobs are synchronous and run only on one node, typically for lots of hours

4. rerunning jobs is a nightmare

Dream state for this system:

1. Dynamically add jobs to the runtime using generic components that can be reused -- maybe through an API / UI

2. Preferably, multiple records from a single file be processed across distributed nodes to generate a single output generic file

3. Rerunning should be easier

I am a noob to CS. I did a good bit of research for the past month. Found a few data-science tools in Python -- which is a no-no for a production system. Also, I know that the steps cannot be made generic after some extent since custom mapping logic is required for almost every vendor. But asking to see what is possible. Any help to point to prospective tools and technologies to solve the above will be much appreciated.

Thanks


Use Airflow maybe?


Looks very promising. Can I add new jobs (tasks in Airflow's jargon) reusing my custom steps (operators in Airflow's jargon) during runtime? Also, is there something similar in Java, Go, etc.?


I'm trying to make social media moderation more democratic and using that to decide fuzzy questions like "should this post be censored", or "is this misleading" [0]. While the crowd's answer won't be perfect it will help sort through a lot of the noise and feel better than the decision of whatever mod happened to create the subreddit.

The problem: how can I make decisions based on a sample with a binary question. I think the central limit theorem applies, and I need to account for various priors and missing votes. Is there an existing solution to this problem? The server is written in nodejs, if that matters.

[0] - https://efficientdemocracy.com/about/what-is-this


I worked on a similar idea last year. What I did was take urls to content, scrape the content, and pipe it through a machine learning a evaluator to apply various labels and warnings to content. Lastly, add some nice embeddable UI to surface the report.

I got it to a decent state, but didn’t know how to propagate it or inject it into social communities. I wanted people to be able to tag it on Facebook, and it would reply with an informational card with the analysis and summary.

https://github.com/dino-dna/informed-citizen


Cool! Was it supervised learning?

I feel like machine learning isn't at the level where it can tell if something is misleading, unless it's from a known sketchy source.


This might be a good use-case for the "bayesian truth serum" http://economics.mit.edu/files/1966

This applies when our questions are not just trying to learn about the world (e.g. 'Our survey discovered that 10% of posts are considered misleading'); we are going to use their answers to decide on actions, e.g. removing posts, attaching warning labels, etc.

Those answering the questions know this, and (if they have a preference over which action will be taken) are incentivised to give more extreme answers. A classic example is an ice cream company surveying shoppers about the flavours they like: if I truthfully answer that I like chocolate slightly more than strawberry, this will have a small effect on the survey result, and hence the company's new product flavours. However, if I falsely say that chocolate is the best flavour I've ever encountered, and that strawberry makes me vomit, that will have a much stronger effect on the survey result, and make it more likely that the company will make the chocolate ice cream that I prefer.

The "bayesian truth serum" counteracts this by asking each question in two parts: there's the initial question we want answered, as well as an additional question: "how do you think others will answer?". For example:

- "I find this misleading" and "I think 80% of respondents will find this misleading"

- "I rate strawberry as 4/5" and "I think 10% of respondents will give strawberry 1/5; 20% 2/5; 50% 3/5; 15% 4/5; and 5% 5/5"

The first answers (the ones we care about) are weighted based on two conditions: how closely the estimated distribution matched the real answers, and how 'surprisingly popular' the first answer is.

To see why this cancels-out the incentives to lie: our best chance of affecting the result is to choose a 'surprisingly popular' answer, since this will contribute more weight to the result. However, these two constraints exactly cancel out:

- The answers we predict are popular, will also be those we predict are unsurprising (after all, we could predict them!)

- The answers we predict will be surprising, will also be those we predict are unpopular (that's why it would be surprising if they were popular!)

It turns out that the rational strategy, for swaying decisions as much as possible towards the outcomes we want, is to answer the first part truthfully.

A similar analysis applies to answering the second question (the estimates) truthfully. In that case there are two things to consider:

- We want our estimates to be as close as possible to the true distribution, in order to maximise our response's weight.

- We want to engineer our estimates such that the answers we disagree with get a high estimate, and hence appear 'unsurprising' (reducing the weight of those responses). Our estimates must sum to 100%, so decreasing the 'surprisingness' of one answer must increase the 'surprisingness' of the others. The effect we have on each answer's weight will be small, but it will affect every response which chooses that answer. Hence to have the largest impact, we need to decrease the 'surprisingness' of those answers we think will get the most responses. Yet that exactly what we've been asked for (an estimate of how popular we think each answer will be!)


That's a very interesting system! I've been reading about it but I'm not sure how it applies.

> (if they have a preference over which action will be taken) are incentivised to give more extreme answers.

With yes and no answers, how can answers become more extreme? If you are asked "Is this misleading? Yes/No" and it's only marked as misleading when "Yes" is the majority, then you are incentivized to answer with your true opinion. If you want the post to be marked misleading, then answering yes increases the chance that it is marked as such.

The bayesian truth sereum information score makes sense when you are trying to reward people for truthfully answering your questions; for example by paying them [0]. When asking "Is this misleading?", how do you use the information score to compute who won?

[0] - http://www.eecs.harvard.edu/cs286r/courses/fall10/papers/DW0...


For yes/no questions I think (but haven't checked the math) that the incentive is to shift the distribution closer to my opinion. If my opinion is, say, 75% that it's misleading, then the truthful response would be a coin toss with bias 75%.

However, if I know my answer will affect censorship, etc. then I may try to predict the resulting distribution, and vote yes if I predict it's less than 75%, and no if I predict it's more than 75%.

For example, I may be more "trigger happy" if I think people are more likely to believe something uncritically; I may be more of a "devil's advocate" if I think something is under-represented, or less likely to be taken seriously.


> how closely the estimated distribution matched the real answers

> A similar analysis applies to answering the second question (the estimates) truthfully.

How does this avoid (or compensate for) downweighting the preferences of people who are legitimately ignorant about what everyone else thinks (and consequently give estimated distributions that hardly match the real answers at all)?


The weights can incorporate a factor (0 < α ≤ 1 in that link) which adjusts the contribution of the prediction's accuracy. When α = 1, we get the zero-sum, purely competitive situation; we can make accurate prediction less important by choosing α < 1.

Although truth-telling is a Nash equilibrium of this setup, it's not the only one. However, as α → 0 the truth-telling equilibrium becomes dominant (i.e. achieves a higher expected payoff).


TLDR: What is the state of the art for one-shot or few-shot longitudinal (time-series) machine vision tracking of object boundaries with ~ pixel (~ 10 μm) precision?

Specifics: I'm tracking the edges of the knee meniscus from time lapse video (~ 1000 frames) to measure its deformation under load. This is in the context of research to prevent osteoarthritis. Due to material rotation and irregular geometry, background edges that started off occluded come into and out of view over time. This tends to confuse both machine vision algorithms and human labelers. Because the tracking is for strain measurements, the tracked edge must be the same in all frames; therefore, the tracked edge is be a foreground vs. slightly less foreground edge of the same material (low contrast), not foreground vs. background (much easier). Only few-shot approaches are likely to save time because only ~ 20 specimens are needed to accomplish the immediate objective, and follow-up experiments will probably differ enough to require re-training.

The current plan is to Google "few-shot image segmentation" and try things until something works or the manual labeling effort finishes first, but maybe one of you knows a shortcut. Work is also ongoing to bypass the problem by enhancing edge contrast or using 3D imaging, but machine vision would be the most cost-effective solution.


this more like a tool suggestion request, because I haven't been able to find a solution in Google.

I'm looking for a more advanced duplicate files finder in Linux, specially one that can handle folders.

Most tool just return a list of duplicate files, but what I need is to also know is if whole folders are duplicate, or subset of others and have it presented in an elegant way t o resolve conflict. ofc everything is in a big folder and everything is a mess, bit of a hoarding collection of files that got accumulated over the years.

I could probably code some stupid script myself, if I ever got the time (spoiler I probably won't) but I have no idea on how to present the result elegantly. So it would be nice if such tool already existed.


https://meldmerge.org/ might be what you are looking for. But if there are tools which can find duplicated files and directories which are named differently, I'd be interested in learning more about these as well.


meld can compare two directory if they have the same structure and name for files but can't really tell if I inside my foot folder, I have a folder root/X which is also present as root/category/b/X. (except all the file in it have been prepended by "foo" somehow)

For tools which can find differently names files you have for examples: fslint (gui) or fdupes (cli)


That's true, meld is designed for comparing directories which already have similar structures.

It seems like what we're looking for is content-addressable storage [1]. The theory behind it appears to be based on Merkle trees and cryptographic signatures [2].

IPFS already has an implementation [3] of this, and there are other implementations (borgbackup, restic, zpaq) listed in that link and in the content-addressable storage article. Disclaimer: I haven't used any of these yet, just found them a few minutes ago.

[1] https://en.wikipedia.org/wiki/Content-addressable_storage

[2] https://gist.github.com/mafintosh/464bb8f1451f22c9e5c5

[3] https://discuss.ipfs.io/t/ipfs-and-file-deduplication/4674


Problem: Create image with text. Solution: https://img.bruzu.com?a.t=Text

You can help by trying the API and giving honest feedback.

https://bruzu.com


The API docs' example URLs can't be copied and pasted, as "%20" gets inserted. I'm probably not the target audience for the API, but it's a neat idea. I think people will end up with images described with tons of magic numbers that relate to each other in invisible ways and become unmodifiable and unmaintainable. Variables might help, as might relative positioning and sizing of elements.

Some feedback on the Designer:

The size setting dropdown is quite strange. Choices of unfamiliar destinations don't seem to make sense, and some of the things that do seem familiar come out an unexpected size and shape (e.g. "infographic"). The pixel sizes are clearer, but better would be handles on the canvas that can be dragged. There's also a typo: "Choose form a list of sizes".

Circles don't get resized? I can drag handles to make the apparent bounding box bigger, but the circle doesn't change: https://imgur.com/a/KhYluKj. Other shapes seem okay. Chrome 89 on Linux.

The tutorial walkthrough pops up every time I go to the designer, even though I've been right through it.


Thanks a lot for taking time to trying it.

Fixed most of it.


Seems like it would be a lot easier and a lot more powerful to use SVG instead of a giant string of URL parameters.

Your service could provide pre-made templates and an editor, and expose textfields, images, fonts, etc options via URL parameters. Then your service just has to render the SVG and return it as an image with the requested dimensions/format.

Example:

`https://img.bruzu.com?s=<TEMPLATE ID>&title=Hello&font=arial&width=800&height=480&fmt=png`

You could also pass the raw SVG source as a parameter as well, maybe with base 64 encoding or something like that.


Problem with svgs is that the text don't auto scale.


Thanks, yes template is in future plans, will think about svgs.



It seems kind of interesting, what are some use cases you've seen for it?


Use cases:

1. Image generation automation: Like into automation like posting tweets as image to instagram.

2. Image generation scaling: Create multiple images with just variable text, like greeting messages or product images or open graph images.


I am a bit ashamed to ask about such a trivial topic on HN, but I am not really sure how AJAX in Wordpress plugins works.

I have a plugin that exports some WooCommerce orders into XLS. I would like to add a progress bar via AJAX, because the export may take very long for thousands of orders. But I am not really sure how to use AJAX in context of Wordpress specifically.

I would love to see a minimal functional example, a simple plugin that does something similar. So far, all the plugins I saw were pretty convoluted and I lost my track around the code.

(On a related note: a library of elementary examples for Wordpress plugin development would be nice. Like "This is how you create a menu entry.")


First of all, you'll need to make PHP to display "progress", probably you'll need to override ob_start() or something like that, and find a format that let you append new progress on the response on the fly.

I guess you already have an URL on your Wordpress setup that triggers this export. Let's call it {url}/export.

Wordpress already has jQuery by default included. So'll you need to call that URL using jQuery $.post and then, accordingly to the response, update your progress bar.

There is nothing specifically about Wordpress on this, besides the fact that you need to setup your own URL on Wordpress to do this, and then include your own JS after jQuery. That's all.

If you find this too-complicated, a quick-hack is to create a page on WP Admin called Export Tool, and then on your theme create page-export-tool.php. That .php will be called when visited that Export Tool page.


In my experience, accurately reflecting the progress of and AJAX request, so I’ve seen a lot of people (myself included) take the lazy way out and just show an indeterminate spinner or bar just to show that stuff is happening.


It takes a bit of code but if you know the length of the response you're expecting then you can use XHR's "progress" event. Just be aware that the event will happen frequently and contain all the data so far, to avoid inefficient parsing and substring related memory leaks. I think his problem might be more about using JS in WordPress though.


The idea entered my mind, but I am not happy with such cop-out, especially on my own site ... I would at the very last like to see iteration progress, which, while not 1:1 with time, is at least informative.


https://free-visit.net : Like Matterport but with an FPS game engine.

I am looking for my first client: Ideally someone in charge of a Museum/gallery or other grandiose indoor space.


Looks pretty slick. I am not a gamer, and the controls feel very backwards to me. You need to hold the mouse button to look in different directions, which makes it feel like you're dragging, but the direction you move it isn't the direction you're dragging the view. I don't think mouse capture is a good idea in the browser, and if you're aiming at museums and galleries, maybe reversing the mouse direction to make it like familiar dragging would be better.

Edit: There's a bug where if you start dragging with the mouse and let go the mouse button outside the 3D view, it acts like the button is still held down (a bit like mouse capture) which was easier but quite confusing.

It seems unusable with a touch screen on desktop.

The fact that you can fly is useful but non-obvious. I ended up down at floor level and wondering how to see the pictures on the walls.

The demo is in a /fr/ path, but is in English (Chrome offers to translate it to English, because it somehow thinks the English words are French), but then some parts of the interface like "Share your place" are in French.


Thanks for the usefull and very professional feebback.


I reiterate the sibling comment about mouse control, either use the actual mouse locking APIs of browsers, or make drag feel like drag.

Additionally, I think it would be best if by default you were stuck to standing head height, then you can either provide buttons to actually move up or down, or lean more into the game aspect and allow the user to jump. Right now it feels like you are floating around with a little drone or something.

In the vein of controls, please please please support WASD too. I understand if you instruct with arrow keys, since for non-gamer users it might be more obvious, but support WASD (or equivalent of what WASD is in QWERTY keyboards) anyways, for 2 reasons: it is much more ergonomic for people who use a mouse on their right hand (I myself am left-handed but use the right hand for mouse anyways), and is more ergonomic for some laptop users, since many laptops have half-size vertical arrow keys which are uncomfortable to press all but momentarily.


Yes. Thanks for the long feedback. The controls keys are not perfect yes. WASD to add to arrows ? Yes if not to hard too dev.

As for the head, yes you are right : I should add 'something' that tilts a bit the head up/down when needed.


Agree with another commenter about camera height. Couple more things.

1. Your demo level suffers from Z-fighting in a few places on the floors and walls. https://en.wikipedia.org/wiki/Z-fighting

2. When viewed full-screen on a 4k monitor, textures are too low resolution. A handwritten note on the wall is unreadable.

3. Lighting is too simple. Because that’s not an FPS shooter you probably don’t need dynamic lightning nor day/night cycle, but it’s still hard. Ideally you need these multiple PBR textures everywhere, and correspondingly complicated pixel shaders.


1: yes, improving it.

2: yes but it is a tradeoff between texture quality and minimazing loading time.

3: Yea, ideally. But KISS is my priority : We have an editor that aims to be simple enough for all, thus no BPR & no shader.


2 – can you possibly replace them with higher resolution ones after the scene is already running? Ideally, gradually with a blend over ~1 second.

3 — I see. Still, you could pre-compute local illumination automatically in the editor, and bake it somewhere. Maybe into vertex attributes, maybe into another lower-resolution R8_UNORM set of textures.


2- --> The person Who builds the space in 3D With free-visit 'builder' decides how much he squeeze down the texture quality. His choice depends on the target: Smartphone (low quality), computer with big screen (high quality texture)

3- --> I will see with client feedback. I do not want at this point to over-engeneer free-visit. First I must find my market.


Hangs when I click "Try the example".


Just Refresh the page : it will succesfully play (known bug)


Now it works.

I wish AirBnBs and hotel rooms would offer this type of preview of their premises.


I've been working on a distrbuted Layer 2/4 Load Balancer (like Katran, but no C++ involved) that's mostly complete but now needs some testing with large workloads (1m+ concurrents, >50Gbps). Guinea pigs sought.


I’m working on OCR to recognize the scores on pinball machines. I have about a quarter million photos encompassing every model of pinball machine in the world but I just don’t have the know-how to accommodate all the font styles.


Have you tried an off-the-shelf solution like Tesseract? It works quite well if you do the recommended preprocessing.


The preprocessing suggestions I see are to crop out everything except for the numbers and I don’t know how to do that programmatically. There’s many kinds of displays: rollers, 7-segment, dot matrix, and LCD.

The preprocessing to increase DPI to 300 did not help when I tried Tesseract, unfortunately. It’s hard to achieve a good contrast between the numbers and the backdrop


There are a lot of other options and preprocessing methods you can use to get better results. It's hard to tell without seeing the picture but thresholding/binarization might help with the contrast. In order to isolate the text, the mode option also makes a lot of difference: https://tesseract-ocr.github.io/tessdoc/ImproveQuality.html#...

If that doesn't work you'll have to add a text localization model to your pipeline.


Thank you for your guidance. I will investigate further


I’m planning on implementing a transactional (ACID) key-value storage on modern hardware, i.e. SSD and large RAM. The technical choice on question is the type of index storage to use: b+tree, extending hashing, or linear hashing.

There will be an appending WAL for batching the writes and for transaction support. A single checkpoint worker will apply the updates from WAL to the index storage periodically. The updates will be batched, to minimize the random seeks.

Please help me evaluate the three indexing approaches, on the criteria of fast read, fast write, cache friendliness and ease of supporting atomic update during checkpoint.


Ok, I'm double dipping. Another problem I'm trying to solve is: I've got a database of several hundred interesting conversation questions that I've collected over the years. Essentially just strings, though I've attempted to categorize them, rank them, and add other metadata. I'd like to figure out a way to sort them or dedupe them based on semantic similarity, but I'm not sure how to determine semantic similarity without painstakingly going through and manually looking for similar questions. Any suggestions on how to solve this would be welcome.


Put the questions in some semantic embedding space. Now you’ll have a vector representing each question. Then for each question, you can sort all the questions by how far the Euclidean distance is between their vectors. Or use some clustering algorithm like k-means to find clusters.

By web search I found this to tutorial to put sentences in an embedding space: https://github.com/BramVanroy/bert-for-inference/blob/master...

I did not read this and am not endorsing it, but it looks like it’s doing roughly what I’m suggesting.


Yep, exactly this. Check out sentence-transformers https://pypi.org/project/sentence-transformers/0.3.0/, they have some great pre-trained models. Once you have the embeddings you can just compute the cosine similarity.


You could use BERT or Word2Vec or GLOVE. They are very simple to use with HuggingFace's library.


I've been thinking about two interesting problems.

First, differentiating code to build a client side predictor with privacy as a consideration. I have code that describes how to translate domain messages into state changes, and I'm trying to figure out how to predict the effects of sending a message on a client even though the client has imperfect knowledge.

Second, AI for games with imperfect information. Specifically, how to build an AI for Battlestar Galactica.

These are in context of http://www.adama-lang.org/


https://share.securrr.app

A secure client side encrypted document (Passport / Id) sharing service

Needs a DSVGO lawyer as part of the team.


I like this. I dont know if it is your goal, but I'd like to see you (or an alterantive) to succeed and be used by big companies, a la Stripe. Need to share a document? No need to reupload, it's already part of Securr. Share with the company for a limited period of time, Securr make it hard to copy and to store (and illegal), etc.


I love that idea, you should submit that on frontpage.


Can't. As the legal framework does not exist yet. That's why I need a lawyer in the team.


You could potentially submit it as a Show HN and make it clear you very much need a lawyer to take the next step.*

Your call, of course. I'm just a random internet stranger spitballing ideas here for how to take that next step and I know next to nothing about this problem space.

Best of luck, whatever you choose to do.

* Make sure you read the rules for Show HN and use your own judgement there as to whether this qualifies.

https://news.ycombinator.com/showhn.html


Frontpage?


I meant submit it on HN as its own post.


I am trying to design a solution to something I encounted a lot during my hardware development days - keeping track of inventory of small and large parts alike. Basically a shelving system with an app which a user can search and the shelf would illuminate the correct bin.

My problem is, would this be a viable product for companies? It was always a hassle when I worked in labs but I am not sure if it is a big enough problem to design an entire IoT product around.


Sheesh, there's so many things I'm miles from solving this was a tough one.

It took a while but finally came up with something where it was actually closer than I thought.

Thanks for the inspiration.

Solved it.


I’m working on a MEMS problem. I need to be able to 3D print micron (or even sub micron) scale features for a very high aspect ratio MEMS which uses electromagnets to actuate. Only problem is the leading candidate technology EFAB (electrochemical fabrication) only works with conductive materials.

Does anyone know of a technology which provides similar abilities to EFAB but can also print features with non conductive materials?


Would something like Nanoscribe [1] work for you? The resulting structures are non-conductive. As part of our research we use it mostly for 3d-printing nanoscale optical elements, but we have also successfully used it for some mechanical support structures.

[1] https://www.nanoscribe.com


Not sure it gonna help, but some DLP 3D printers are very accurate, and most resins are non conductive. I know researchers were using one for micro-fluids applications. Not sure they can print sub-microns precisions, but 10μm is very real.


Could you use a conductive material, then deposit an insulating coating on top, or oxidize the material on top, making it an insulator?

Aluminum can be anodized, then sealed. As can a number of other metals.


I have an interesting challenge:

I want to add IoT tools in several apartments and allow my guests to control them through an app. Currently I already have an app, but no IoT integrations (basically just reservation).

How to do it safely? Also, which automations would be nice to have but not so obvious?

Thinking from things like light control, but also checking the apartment energy consumption in order to detect appliances that need maintenance before they break.


Web site building complexity, for now I started with a simple static site generator https://mkws.sh/. Right now, I'm not using any package manager, no config files, only one language for templates (sh), and obviously HTML, CSS, Js. I eventually plan to develop a simple CMS based on the same ideas. Ideas and code are welcome!


Hey there, how can I download/install this? I've been meaning to but I couldn't find another way besides https://mkws.sh/mkws@4.0.11.tgz which currently throws a 404 not found error. Any ideas?


> (sh), and obviously HTML, CSS, Js

Cut out the sh dependency and just use the "obvious" tech; make the site capable of generating itself, without reliance on any other tools.


What do you suggest for templating?


ES6 backtick strings.


I believe you misunderstand, I'd rather distance myself from the NodeJs ecosystem and use standard UNIX tools for developing web sites. I believe sh is great as a templating language.


I didn't say anything about NodeJS. (In fact, that would be adding a different dependency, not reducing the count by one.)


So how would you interpret the ES6 backtick strings?


The "obvious" way. (The same runtime that you're planning to use for the JS you mentioned in your original comment. <https://news.ycombinator.com/item?id=28349504>)


And do the generation on the client side?


If by "client side" you mean in the web browser of you, the author of a new piece of content, then yes—a site that is "capable of generating itself". (If you mean templates that are evaluated in the browser of every site visitor every time they refresh the page, then no.)


Ah, yeah, finally got your "capable of generating itself" idea. Pretty cool, interesting to experiment on it. I guess it would be something like a site that downloads itself also.


But then you would have a Js dependency.


Have a look at stuff like Netlify CMS, or any of the already existing website builders like WIX or Weebly or squarespace Etc.


Any headless CMS would work well my `mkws`, my idea is to build a smaller, simpler Wordpress that also comes with a tiny webserver, 0 config, no database, content stored as plain text files, just download and run. https://getkirby.com/ is closer in concept, but without the PHP dependency.


I have two app/websites ideas, one is a way to track your food and alert you when your food can spoil ( and sugest recipes with what you have in your house ) And a learn helper website/app that will give you paths to learn thins and register, track and help you through this process The first one is on the planning side, the second one already have some pages written in python/django


I have a graph with weighted edges. I want to remove edges to make the graph colorable with N colors (e.g. N=40) such that the total weight of removed edges is minimized. If I'm able to solve this problem, that will complete a project I've been working on for years now to make a working keyboard for a person I know that has cerebral palsy.


Here is an algorithm that could work and run reasonably quickly, although it might not be optimal:

1. Find all vertices with degree N > 40 (eg: find all points in the graph with more than 40 outgoing edges).

2. For each pair (a, b) of these degree N > 40 vertices, find the set of common points (c) connected to both a and b by edges, eg: there exists 2 edges (a - c) and (b - c). In essence, you're forming V shapes (a - c - b) where the two tips (a, b) of the V have at least N > 40 outgoing edges.

3. Identify the pairs of vertices (a, b) that are connected by 40 or more Vs (eg: there are at least 40 pairs of (a - c) and (b - c) edges).

4. Remove the (a - c) or (b - c) edge with the highest weight. If (a - c) and (b - c) have equal weight, remove the edge which is connected to the node a or b with more outgoing edges.

5. Repeat steps 3 and 4 until each (a, b) pair has at most 39 Vs connecting them together.

At this point, I think you can color the graph with N=40 colors (one color for a and b, then different colors for each in-between point of the 39 Vs between them).

There might be a way to improve the criteria for which edge to remove in step 4 (maybe using the backtracking approach mentioned by other commenters), but this should be a decent starting point.


Similar to what another commenter said about time, have you tried a backtracking approach by: (1) coloring the whole graph (let's say you end with 42 colors) (2) start with the highest colored nodes (e.g. N=42, which exceeds your threshold of 40) and (3) greedily remove the lowest-weight edges (or maybe, edges to other high-colored nodes) until you are either all colored with 40 colors or we reach an invalid state (ie node no longer in the graph) -- with the latter you can add another edge back and try again by removing the next-lowest edge.


That's what I'm doing now, but the results aren't great. If there's a way to estimate a lower bound on the number of edges to remove, I can figure out if the results aren't great because of the approximation, or because of the nature of the graph...


Graph coloring can mean vertex coloring, edge coloring, or total coloring. These are 3 different problems.

Regardless on the answer, I think this gonna be way too slow for your use case. Especially if you want the global optimum. Wikipedia says the current state-of-the-art is randomized algorithms, I don’t think these algorithms are looking for global optima.

I don’t recommend solving that, I think you’ll waste your time. How’s that related to the keyboard anyway?


Will the graph ever change? Is this a one-time computation?


One time.


Also... is it possible for you to add nodes? Ie split a N>40 node into two nodes.


No, though nodes can be deleted -- at a cost equal to the total cost of all the edges that contain it.


Could you put the graph up in a gist/pastebin?


What are the needs in time?


Can take as long as needed as it only needs to be colored one time. About 10,000 nodes, reasonably dense (about half the nodes will have 1,000+ edges).


What did you try to color the graph ? CSP ? Classic backtracking ? With how many colors it must be colored ?


I have a custom tlv encoding standard. It has 1 or 2 byte of header depending on type with the required info in the header. I have written an encoder and decoder for it, but now want to do it via ASN.1. But, I am not able to understand how to define header bytes and what rule each bit denotes in the header. Do I need to use ECN?


I have to wake up and pee 2-3 times a night. I put pillow under bed on my leg side and make leg side up. It is working so far. Every 4-5 hours from every 3 hours. I also trying Kegel exercise too. I can't stop drinking water before bed because I have acidity too and I can not sleep with dry throat :)


I have a pile of mp3s and want to splice them together with a single ffmpeg operation. Essentially injecting multiple small audio files into a large one using time codes. I know there's got to be a way to do it, but I have yet to find a way to do it in a single operation instead of multiple passes.


Some god awful combination in a complex filter using atrim to pull the pieces, adelay to set the positions in the output, and amix to put all of the output back together could probably do it. What that command may actually be is definitely open but that's probably the only way that would work as otherwise this is really 2 separate operations (chopping then merging) so there isn't going to be consideration for having it in a single premade flag.

On another note if the goal is just to avoid files/writing to disk then a bunch of ffmpeg splices to named pipes as inputs to another ffmpeg command to merge them could do the same without the command soup.


I'm very fine with command soup; the plan was to generate it with a script and run it directly. The input files are in a lossy format so I'd rather avoid having intermediate WAVs or double-encoding or something like that.


ffmpeg -f concat -i mylist.txt -c copy output.mp3

https://superuser.com/questions/587511/concatenate-multiple-...


It's not quite so simple; I need to insert smaller audio files into a larger file at specific time codes.


Splice you parent file at the time stamps, interleave the samples, then concat.


I'm trying to avoid reencoding multiple times, and saving WAV to disk puts serious pressure on storage constraints


You could use FLAC as an intermediate lossless codec which should reduce storage costs by about 1/2 to 2/3 compared to WAV.


Are they sequential, or are you splicing at arbitrary times?


They might be spliced in at arbitrary times but multiple audio files may be spliced in sequentially at an arbitrary time. I'm not going to be upset if there's a millisecond of gap between them or whatever by just splicing at approximate time codes to make them sequential.


Context: Not from the investment banking or trading background. Have been an investor / trader with modest gains.

How does one solve for risk in the markets? As in, mathematically. How does one do short term predictions of prices, with a day or two as the prediction range, with a probability > 0.5.


I’m looking for methods (NLP or otherwise) that maps/generate a short description (few words or less) with a given format (e.g. 3 word tag with each word from a list of choices) from a paragraph or a sentence. Results would be stored and clustered base on their distance


I am not there yet, but I am trying to make education loan-free and based on equity. The details of the project are written here: https://loan-free-ed.neocities.org


> when that student starts generating income then a small percentage from that income is auto-deducted and distributed to everyone who was involved in that student's education.

Your idea, if implemented well, may end up being a net positive for society, but I can't help imagining a future where every child, from the moment they are born, has a biometric ID connecting them to a consortium of companies which provide their education, health care, housing, energy, internet connectivity, transport, media access, and so on.

It would be like living in a company town, being paid in company scrip, except you wouldn't notice the restrictions (as long as you kept earning). If you ever increased your income, your consortium might let you choose whether you want to upgrade your housing or your health care plan, but if you lost your job, they'd force you to take one of their choosing and downgrade your plans if it had a lower salary.

In this dystopia, all consumables from food to toilet paper would presumably be sold by Amazon, and other items like furniture and electronics would be provided as a service so that you rent them from your consortium. The only question is why people wouldn't try to undo this system through the political process, but then we might ask that about the current system.


> Your idea, if implemented well, may end up being a net positive for society

Thank you!

> but I can't help imagining a future where every child, from the moment they are born, has a biometric ID connecting them to a consortium of companies which provide their education, health care, housing, energy, internet connectivity, transport, media access, and so on.

My idea will not have that side-effect because not only is the project non-profit and open source, but it is also decentralised. And if we keep thinkng of a dystopian future then we won't be able to do anything positive unless we become some sort of social revolutionaries. I don't have those skills. But I can think of small ideas to make a positive impact that benefits everyone though. The idea I listed in my original post above is very simple, it helps teachers, lecturers, or anyone who contibutes to a persons monetary decent life via education get appropriately paid for their efforts, and everyone(i.e. businesses) who benefits from an educated person should contribute towards that.

It is a simple idea but notoriously diffcult to deploy because there is a possibility of this getting caught up in a political slugfest.


I want to leverage data provided by the healthcare system and create a machine learning algorithm that monitors the patient at all times by leveraging Iot data, vitals, location, all possible identifiable data without breaching hipaa etc...


Some ZK way to prove that a piece of data was derived from some source, for example, proving a human fingerprint is unique identifying biometric data without showing the data itself or the person it is from and with no trusted authority.


Just thinking out loud here: I think it would require trusted sources (not necessarily a centralized authority) that have validate it and that you trust.

You can prove two balls are different colors to a colorblind person by having them show you two balls of X color and proving to them that you can differentiate them (watered down example), but it requires validated externalities (ex, you can see colors they can’t).

Defining the external validators is the hard part.


If it did require a trusted source, do you think there would be a way for said source to 1) involve no human administration, 2) behave deterministically and 3) be resistant to attacks that could break either of the above 2? That would also solve the problem.


I'm searching for a (reliable) open-source OCR tool for Arabic text.

The best option I tried is Google Cloud Vision, but it's still not accurate enough, and it could get quite expensive for large tasks.

Anybody knows a good software for that?


Trying to find line item based medical bills and laws that govern what information can patients obtain about medical procedure pre and post procedures.

Goal is to bring transparency to medical bills and remove the unknown.


I've been dealing with needing to integrate an arbitrary area of a brane representing a density map.

It's been slowing going getting all the maths done.


How can we make CTOs better? How does one CTO learn? How a community of CTOs can help create value to one another?


Why should CTOs only learn from CTOs?

The average CTO is not as knowledgeable as you might think.

For example, there are CTOs today building applications using no-code platforms, without any academic or technical background. Those people would have much to learn from an software engineering intern at any company.

CTO is a job title, and each company can grant that title at their discretion. Or Technoking, or any other title you might think of.


That's exactly the point, and mostly CTOs don't have much people internally to ask help from.

I am in a community with some CTOs in Latin America and I see this struggle happen every time.


Latin America is a mess in many ways.

VCs there suck, and do not understand that VCs are about inherently risky investments. They want the guarantees of a low risk business with the profitability of a high risk business.

Leadership in Latin American companies also sucks. As soon as the company has any revenue, the leadership will go all out and spend it all on themselves on some extravagant lifestyle adjustments rather than reinvesting it on the company. And because of this, many companies stay small and mediocre without fulfilling their true potential. They waste their money on MBAs learning things they are unwilling to apply.

Then, there is nepotism. Hiring from your family equates creating conflicts of interest, creating situations where relatives of key people do not need to comply with HR, do not need to be competent, cannot be fired and create an unprofessional atmosphere.

Compensation in most Latin American companies sucks. They aspire to be like Silicon Valley startups and hang framed posters of Steve Jobs on their wall, but when it's moment to create compensation packages they grant zero stock and award zero bonuses, all while having the same work-life balance of a startup. They do not understand the key role of employee stock in company growth.

Why the fuck would you work for a wannabe startup that doesn't give you any stock or bonuses? Or work for some entitled aggrandized clown that doesn't understand the simplest technical concepts? And the answer is: because business owners in Latin America have never had to care much about their workforce. Because of exploitation and their informal caste system, having a happy workforce has never been a requirement.

That's why Latin America is undergoing a massive brain drain that will only get worse in the years to come.


This applies 1:1 to Italy, I wonder if there's a shared cultural/religious/economic reason. Doesn't seem to apply, for example, to Portugal.


I don't think your take is wrong, I live here and agree with most stuff.

Yeah, Latin America sucks, I already know that.

Now back to my problem...


How memes could be another Bitcoin.


I am trying to load in raster layers from file in OpenLayers, but have failed so far.


This isn't a technical question, but it's a problem I'm going to have to solve soon!

I just accepted an EM role at a FAANG. This is a career "boomerang" for me - I was an engineer in the past, but then moved into technical support management. I'm coming back to engineering, but this is the first time I will have done it at a large company. All of my engineering experience has been at small scrappy startups where we just did everything, did it fast, and prioritized by whatever was most on fire. I don't think I've ever actually done a proper "sprint". While I have written a lot of code, the workstyle on my new team will be almost entirely foreign to me.

Whose got pro tips for leading an engineering team in a large organization? What makes a high-powered team? What are the easy mistakes that will drive us into the ground?


I've been an EM at a startup and an F500. Don't be scared to have less meetings than your peers. I basically only do one 1-hour sprint meeting each sprint to review the last sprint and to plan for the next one. It may help to have standups daily at first but you can often make them less frequent over time.

Also, always avoid meetings to ideate. These are some of the most common and they are a huge waste of time compared to listing out ideas in a doc and having people review that asynchronously. And yet these meetings have a tendency to get called all the time. For example "there was a fire, let's get all the leads/EMs/directors together for 1 hour to figure out how to avoid this next time".

Your two most important duties are 1) making sure your developers are given space to implement what's important and 2) building relationships throughout the company to better anticipate future needs that helps you do #1.

Happy to chat anytime as well, email in profile.


Take a look at the book "The Manager's Path" by Camille Fournier. It has surprisingly concrete advice for people in your exact position. One of the few "business books" I actually recommend to people.


(Caveat: not an EM myself)

This guy made the switch from dev to EM and has written articles on running effective teams,(e.g. one on how to help devs act as project leads) and even has resources like the docs he sends to tech leads when they start a new project, w/ lists of responsibilities etc: https://blog.pragmaticengineer.com/things-ive-learned-transi... (just linking to the most directly applicable article but definitely browse around)


I'm looking for the best way to implement Multitenancy, Submultitenancy Impersonation with JwtTokens and IdentityServer4 ( dot net).

I'm curious on how other people solved it ( by cookies, subdomain, ... ) and if you used a JwtToken for it.


Looking for something like authzed.com?


I've got the database and query part covered.

But I haven't decided yet on the actually flow. Where I'd identify the current tenant or impersonate him.

+ The influence of impersonation on that flow.


Community owned and operated FOSS alternative to MetaMask.

Deciding which features to ship and what to skip is a bear — it's a big product design space


Please make this a monthly thread!


Totally! Maybe if no one does we should do it every 1st of the month.


adapting the Sunday variation of boyer-moore algorithm to search backwards from point


Short version: I'm close to figuring out how to encourage more prototyping of software by making tests super easy to write during the prototyping process, and so de-risking rewrites. But one problem I've been stymied by is how to represent expectations of screens when they contain graphics.

Long version: My Mu project (https://github.com/akkartik/mu) is building a computing stack up from machine code. The goal is partly to figure out why people don't do real prototyping more often. We all know we should throw the first one away (https://wiki.c2.com/?PlanToThrowOneAway), but we rarely do so. The hypothesis is that we'd be better about throwing the first one away if rewriting was less risky. By the time the prototyping phase ends, a prototype often tacitly encodes lots of information that is risky to rewrite.

To falsify this hypothesis, I want to make it super easy to turn any manual run into a reproducible automated test. If all the tacit knowledge had tests (stuff you naturally did as you built features), rewriting would become a non-risky activity, even if it still requires some effort.Turning manual tests into automated ones requires carefully tracking dependencies and outlawing side-effects. For example, in Mu functions that modify the screen always take a screen object. That way I can start out with a manual test on the real screen, and easily swap in a fake screen to automate the test. Hence my problem:

How do you represent screens in a test?

Currently I represent screens as 2D arrays of characters. That is doable a lot of the time, but complicates many scenarios:

* Text mode character attributes. If I want to check the foreground or background color, I currently use primitives like `check-screen-in-bg` which ignores spaces in a 2D array, but checks that non-spaces match the given background attribute. In practice this leads frequently to tests that first check the character content on a screen and then perform more passes to check colors and other attributes.

* Non-text. Checking pixels scales poorly, either line at a time or pixel at a time. A good test should seem self-evident based on the name, but drawing ASCII art where each character is a pixel results in really long lines or stanzas. So far I maintain separate buffers for text vs pixels, so that at least text continues to test easily.

* Proportional fonts. Treating the screen as a grid of characters works only when each character is the same width. If widths differ I end up having to go back to treating characters as collections of pixels. So Mu currently doesn't support arbitrary proportional fonts.

* Unicode. Mu currently uses a single font, GNU Unifont (http://unifoundry.com/unifont/index.html). Unifont is mostly fixed-width, but lots of graphemes (e.g. Chinese, Japanese, Korean, Indian) require double-width to render. That takes us back to the problems of proportional fonts. Currently I permit just variable width in multiples of some underlying grid resolution, but it feels hacky.

Can people think of solutions to any of these bullets in a text-based language? Or a more powerful non-text representation?


Your project is quite impressive. And welcome to the world of computer graphics!

I'd consider taking inspiration from the following sources:

1. GUI toolkits like QT QML [1] or Android [2]. These typically build a hierarchical tree of different components (eg: start with a root window, which contains panes, which in turn contain text and buttons). Each component may contain different properties (eg: font, color), and properties may be inherited from the parent component.

Advantages:

+ preserves semantics of component properties and how they are linked to each other (eg: the caption is below the image)

Disadvantages:

- complexity: building a layout/constraint engine can be difficult, or alternatively you can use absolute positioning with relative offsets which can be tedious to use (in this case the layer-based approach below might make more sense).

[1] https://en.wikipedia.org/wiki/QML

[2] https://developer.android.com/guide/topics/ui/declaring-layo...

2. Graphical editor programs like Gimp or Photoshop, or Adobe Flash.

These build up a screen as a collection of vertically stacked layers or assets (eg: graphics, text, etc) with attached properties and optionally bounding boxes. Higher layers/assets occlude the content of the layers below them, so you would need to implement some kind of visibility logic.

Advantages:

+ simplicity

+ you can use identifiers for assets, and therefore don't need to perform pixel-by-pixel comparisons.

Disadvantages:

- may lose some information about how different components are related to each other

Also, rather than a raster pixel-based representation, it might make sense to use a vector representation internally [3]. The most popular vector representation is SVG. The full spec is very verbose, so you probably only want to implement a small subset of it. This would permit you to specify properties like line thickness, color, striped/dotted patterns. At render time, you could convert the (proportional) fonts to vectors as well for consistency, and then rasterize the entire scene when rendering to a display surface. But for testing, it would be better to use the scene graph / vector format which is easier for users to reason about.

[3] https://en.wikipedia.org/wiki/Vector_graphics

[4] http://blog.leahhanson.us/post/recursecenter2016/haiku_icons...

But perhaps this is over-complicating things.


Thank you for those suggestions! Do you know if any of those tools following either approach has any automated tests? My immediate problem is how to manage the complexity of implementing a layout engine or editor. Somewhere I need something checking that a given asset identifier results in specific pixels. And I'd like the tests for _that_ to be nice to read. It's a bit of a chicken-and-egg problem..


Glad that you found this to be useful!

Android includes the Espresso UI testing framework [1]. Essentially, you can specify matchers that compare your expected values or predicates against an actual object identified by an R.id identifier. It's very powerful (since you can write your own custom matchers) but can be cumbersome to use [2].

[1] https://developer.android.com/training/testing/espresso/basi...

[2] Example Espresso Test: https://github.com/android/testing-samples/blob/main/ui/espr...

https://github.com/android/testing-samples

Alternatively, Squish [3] is a very polished and more elegant commercial testing tool that lets you record test-cases using a GUI tool and convert them into (ideally modularized) methods that verify object properties or compare (masked) screenshots of the GUI:

[3] https://www.froglogic.com/squish/features/

Demo video (starting at 14:24): https://youtu.be/ElH-3MVHPRw?t=864

They abstract away a lot of the functionality using the Gherkin [4] domain-specific language so that tests are easier to read at a high level (but you can still dig down into the underlying programmatic implementation).

[4] https://cucumber.io/docs/guides/overview/

This is probably too much complexity for your use-case, but may provide some ideas or inspiration for what is possible. Perhaps a simplified matcher-style system might be a good starting point though.


I'm working on generating code across the stack and languages from source of truth data models.

Low code for devs. https://github.com/hofstadter-io/hof

Trying to reduce redundant tasks and simplify changes with minimal effort.


Who are "we"?


I suppose it is just him and the driver.


Probably the HN participants ("Quickfire problems, quickfire suggestions")


I'm working on destroying proof-of-work blockchains. I have a plan for BTC, but I'm not sure how to approach ETH. Advice would be appreciated.


If you're interested in eliminating proof-of-work for ETH, you should really take a look at the proof-of-stake network in progress. Keywords: "Eth2", "proof of stake" and "the merge".

When proof-of-stake takes over, there won't be any miners. The block proposal process is done by stakers instead. Some of the incentive issues with miners still exist with stakers, but raw competitive power consumption isn't one of them.

It's true that proof-of-stake has been talked about for years, but it has picked up momentum since late last year, as the staking network was actually launched.

The proof-of-stake network has been staking real ETH since end of last year, but does not yet handle mainnet ETH contract transactions. It's called Eth2, but that's caused some confusion, because it's not really a second version to run alongside the first, it is the R&D branch into proof-of-stake and other technical improvements, with mainnet ETH expected to adopt it in due course.

So, the Eth1 components have been renamed "execution layer", Eth2 components renamed "consensus layer", and through a series of testnets and API developments which have been quite active this year, a big change called "The Merge" is being worked on by multiple funded groups (for client diversity) of core Eth developers at the moment.

The Eth2 staking network that already exists has demonstrated the viability, and the investment of real ETH in serious quantities has built up some cryptoeconomic stability prior to its deployment as the ETH consensus layer. The time lag is intentional - you don't want to suddenly switch all ETH over to a network with too few invested stakers.


The proposed PoS scheme for ETH will run afoul of regulators as soon as a big-enough crime is financed on ETH. I do not need to put effort into destroying those things; they are ticking time bombs. PoS separates participants into pigs and chickens, and the pigs could find themselves liable as money handlers.


> I do not need to put effort into destroying those things [proof-of-stake blockchains]

You originally said you want to destroy proof-of-work blockchains, and were looking for advice on how to do that with ETH. It's already being destroyed on ETH by proof-of-stake. You asked for advice, that's the advice. You don't need to do anything except wait.

Now you are saying you don't need to put effort into destroying proof-of-stake. Why is that relevant here? It suggests to me your goal is different from what you originally stated. Are you looking to see the destruction of more than just proof-of-work? The destruction of BTC and ETH, even if they switch away to another consensus mechanism?


Interesting, what approach are you taking with BTC and what's the reason that approach won't work for ETH?


I imagine overwhelming miners with legitimate but expensive work. The way to generate this work is relatively sensitive to low-level blockchain parameters. BTC happens to fit the approach, but ETH doesn't. This isn't surprising because the approach was originally invented for BTC alone, prior to the proliferation of many different blockchains.


for eth - just wait?


I'm trying to be the Amazon of real estate. If anyone is interested DM me


That sounds interesting, sadly I couldn't find a way to message you on here, do you have any other way of contact?


Please provide a way to contact you, ideally email!


LOL

Plenty of problems - none of them technical - all people problems!


I think these can be discussed as well. In the end thats the ones we suffer the most from. (edit: typo)


Yep, lots of problems are people problems.


I am working on two problems:

1. I want to create way to generate electrical power without pollution. Basically, a closed cycle process that releases no pollutants, or electronic waste.

2. I want to do everything I can to eliminate gender bias in the world.


Regarding 1, if you can accept some pollution at the beginning, hydro-electrical can be a solution, albeit probably not for a global scale.

We have a small hydro-electrical plant una River near my house and really it's no big deal, it fits very nicely in the surrounding environment and it produces clean energy.

It's also educational because since the river is near the city small children classes can visit it and learn about it.


Regarding hydro, you can go micro to power a house or some small comunity. There are lot's of books on microhydropower, but get a look at this fantastic post at ludens.cl

http://ludens.cl/paradise/turbine/turbine.html


Rather than generating electricity directly, it might be more practical to reduce electricity consumption using other approaches:

Geothermal can be a solution for generating electricity directly, but if you'd like to minimize electronic waste perhaps it would be easier to use it to replace alternative energy sources for HVAC purposes.

Biofuels (eg: plant bamboo, grow it, then burn it) can also technically be closed cycle energy sources.

Solar water heaters can also reduce electrical or fossil-fuel-based energy consumed for generating hot water.


I love hydroelectric power and I want the circuits I design for solar to be capable of utilizing the raw power from a small turbine as well without any hardware modifications.


You sound like a wonderful person!

What progress have you made in your work on either front? What sort of work do you do to solve these problems?


I worked for years studying how to use microcontrollers and after a lot of determination now I have a $750,000 grant to build solar systems that are fireproof so you can install them anywhere. It will be a few years of work to get something suitable made but I have full confidence it can be done.

I am also spending much time lately in the SF kink community to build a fundamental understanding of the biases people have experienced in life with respect to their gender identity, and am strongly considering HRT so I can live life on the other side and experience the prejudice first hand.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: