Hacker News new | past | comments | ask | show | jobs | submit | djsamseng's comments login

How did you get access to HTGAA? I’ve really wanted to take this course but all of the videos and resources seem locked


Are you able to access last year's: https://howtogrowalmostanything.notion.site/

Otherwise, look out in Jan / Feb next year for applications to be a 'committed listener'.


Yes! This one works. Thank you so much!


Awesome! This is a huge opportunity to help a lot of people (clients, subcontractors and builders). A lot of money and time is wasted by the current inefficiencies. We gave takeoff construction plan parsing a go in 2022-2023 but couldn’t get the AI part to work well enough (and still haven’t been able to even with the latest ViT/ CLIP models). There was a lot of interest though!

- You’re right, data is very hard to come by. I’m curious, how do you plan to get around this? Outsourcing human labeling? We found it to be a very difficult task.

- The subcontractors and local construction companies we talked to were overwhelming excited about the idea.

- It’s entire people’s jobs to get this done and done correctly. They sit on site holding the pdfs in their hands, manually counting and calculating. You bet a lot of mistakes occur. They would absolutely love to have a digital assistant for this.

- Some of them (especially managers and owners) are quite technical and are using software such as BlueBeam and other CAD software to make these calculations. It’s quite manual currently, but gives great insight into a better solution. This led us to having the user manually select the symbol they wanted counted (which ML struggled to get right). Just getting the part counts (and highlighting them in the pdf) was a huge help!

- Impressive you got square footage calculations correct! In our experience, there was way too much variation between architects (and multistep dimension labeling) which made it hard (even for humans) to get right. How has your model generalized OOD thus far?

- Are you planning to integrate voice? Many of the subcontractors we worked with are very low tech. They usually talk with their clients in person, on the phone, or maybe text. But they don’t use email or their smart phones for much.

I will be following your work! I have friends who would love to use this once it passes the human threshold.


I think parsing a whole blueprint with monolithic models is really difficult, but the constrained object detection/semantic segmentation problems are significantly more tractable. You can chain those CV models with VLMs to do things like get scale right. I'm always interested in novel HCI paradigms like voice!


There's also the field theory of electricity: https://youtube.com/watch?v=bHIhgxav9LY

It’s an eye opening alternative explanation to the electrons flowing like a chain theory of this article.


Check if your site has any manual actions against it. https://support.google.com/webmasters/answer/9044175?sjid=11....

They might be trying to create toxic back links to their domains and if those domains 301 to your domain, I believe this can negatively impact the SEO of your domain (from what I read). If so you can try to disavow them https://support.google.com/webmasters/answer/2648487?hl=en


> So I click around, using the UI over and over, until I finally cannot give myself any more splinters.

I’d take this with a grain of salt (pun intended). There’s a lot of bugs that you cannot reproduce without certain permissions or a particular environment. Let alone the race conditions or user setup. In my experience, most bugs would not have been uncovered using this brute force approach. A few tests using your understanding of the code and critical thinking goes a lot further in my opinion.


“Sanding” shouldn’t be the only approach to testing an app. Developers should test using a variety of techniques. Some bugs are discovered through unit or integration tests, others by brute force, others still from end users


Come to Siem Reap, Cambodia. Start your day with a sunrise walk to the gym ($40/month). Grab an omelette and coffee for $3. Head home by 9am before it gets too hot. Work/learn/read/nap in the AC until 5pm. Take a sunset bike ride through the temples of Angkor Wat. Grab dinner downtown at the open breeze restaurants while people watching ($2-3 for a healthy meat and vegetable stir fry, $1 for pancakes, $2 for fried rice). Grab some drinks ($1) and play billiards with some friends. Head home to your modest 1 bedroom apartment for the night ($300/month). Not to mention the locals are really friendly here and if you’re in to helping out in some of your free time, you will be greatly valued and appreciated!

There’s a few other expenses and some cons of living here but some research and YouTube videos will help you figure out if it’s right for you. And of course you can ask me :)


email? I am (sandeeptech8@gmail.com)


I would say “hey, I’ve proven to be a critical part of this company since the beginning and I’d like a seat at the table to take this company to the finish line.” Then ask for 5-10% equity in addition to your salary.

I’d also say £40k was way too low. I’d guess the founders have significantly higher upside. It would be worth asking for transparency in order to determine fair compensation.


IMO close to zero chance of them saying yes to either of these


Agree 5-10% is too much of an ask. I’m not sure what is suitable for an early employee but that’s not going to float.


AFAIK, 0.5% or even 1% is normal in those cases.


Thanks for the link! I’ve been trying to figure out how to buy/make a Raman spectrometer for cheap (currently in a third world country too!). Have you built one yet? I’m having trouble finding the lenses needed (mostly because of my lack of knowledge). Any chance you know what to use?

Laser ($40): https://a.co/d/0wjNGBz

Diffraction grating ($12): https://a.co/d/6bpO8xm

Laser focusing lens: Not found

Fluorescence collection lens: Not found

Focusing lens: Not found

Collimating lenses: Not found


I haven't found the lenses yet although supposedly https://www.thorlabs.com/ has them and someone else recommended these guys to me for finding more high quality diffraction gratings than the 'rave goggles' quality you can find on amazon https://www.edmundoptics.com/


Thanks for sharing this! I didn’t know about these terms before. Every consider writing a blog post/tutorial on your knowledge of human speech in spectrograms? This is much more digestible than most of what’s out there


This is a pretty good introductory primer. https://medium.com/analytics-vidhya/understanding-the-mel-sp...

1. STFT (get frequencies from the audio signal)

2. Log scale/ decibel scale (since we hear on the log scale)

3. Optionally convert to the Mel scale (filters to how humans hear)

Happy to answer any questions


Thanks for your effort in sharing the link- am kind of comfortable with most of the theoretical aspects of STFT/FFT/MelScale etc.. but when i look at the spectrogram i still feel am missing something. When i look at the spectrogram i want to know how clear is the quality of the speech in the audio - is there background noise - Is there a reverb - Is there a loss anywhere - I have a feeling that these are possible to be learnt from analyzing spectrograms but not sure how to do it. Hence the question.


I would recommend constructing some spectrograms from specific sounds, especially simulated ones, to help you connect the visual with the audible.

For example:

- Sine sweeps (a sine wave that starts at a low frequency and sweeps up to a high one) - to learn associate the frequencies you hear with the Y-axis

- Sine pulses at various frequencies - to better understand the time axis

- different types of noise (e.g. white)

Perhaps move on to your own voice as well, and try different scales (log or mel spectrograms, which are commonly used).

With this, I think you can develop a familiarity quickly!


Look for clear and distinct frequency bands corresponding to the vocal range of human speech (generally around 100 Hz to 8 kHz). If the frequency bands are well defined and distinct then the speech is likely clear and intelligible. If the frequency bands are blurred or fuzzy then the speech may be muffled or distorted.

Note that speech like any audio source consists of multiple frequencies, a fundamental frequency and its harmonics.

Background noise can be identified as distinct frequency bands that are not part of the vocal range of human speech. E.g. if you see lots of bright lines below or above the human vocal range then there's lots of background noise. Especially lower frequencies can have a big impact on the perceived clarity of a recording whereas high frequencies come of as being more annoying.

Noise within the frequency range of human speech is harder to spot and you should always use your ears to decide whether it's noise or not.

You can also use a spectrogram to check for plosives (e.g. "s" "k" "t" sounds) as they also can make a recording sound bad/harsh.


Unfortunately I think the answer is “we don’t know” we have loads of techniques (ex: band pass filter) and hypotheses (ex: harmonic frequencies and timbre) but we haven’t been able to implement them perfectly which seems to be why deep learning has worked so well.

Personally I hypothesize that the reason it’s so hard is that the sources are intermixed sharing frequencies so isolating to certain frequencies doesn’t isolate a speaker. We’d need something like beam forming to know how much amplitude of each frequency to extract. I’d also hypothesize that humans, while able to focus on a directional source, also cannot “extract” clean signal either (imagine someone talking while a pan crashes on the floor - it completely drowns out what the person said)


Speech is pretty well understood - there are two complementary aspects to it, speech production (synthesis) and speech recognition (via the changing frequency components as show up in the spectrogram).

When we recognize speech is almost as if we're hearing the way the speaker is articulating words, since what we're recognizing is the changing resonant frequencies ("formants") of the vocal tract corresponding to articulation, as well as other articulation clues such as the sudden energy onset of plosives or high frequencies of fricatives (see my other post in this topic for a bit more info).

High quality (that is, highly intelligible) speech synthesis has been available for a long time based on this understanding of speech production/recognition. One of the earliest speech synthesizers was the DECTalk (from Digital Equipment) introduced in 1984 - a formant-based synthesizer based on the work of linguist Denis Klatt.

The fact that most of the information in speech comes from the formants can be proved by generating synthetic formant-only speech just consisting of sine waves at the changing formant frequencies. It doesn't sound at all natural, but nonetheless very easy to recognize.

The starting point for human speech recognition is similar to a spectrogram - it's a frequency analysis (cf FFT) done by the ear via the varying length hairs in the inner ear vibrating according to the frequencies present, therefore picking up the dominant formant frequencies.


Agreed theoretically however if I gave you two spectrograms, would you be able to tell which one is clear speech and which one is garbled? I’d bet we’d be able to come up with one that wouldn’t pass the sniff test.

If you know of any implementations that can look at a spectrogram and say “hey there’s peaks at 150hz, 220hz and 300hz with standard deviations of 5hz, 7hz, and 10hz, decreasing in frequency over time thus this is a deep voice saying ‘ay’” and get it right every time I’d be really interested in seeing it (besides neural networks)


Maybe an expert linguist (not me) could do a pretty good job of distinguishing noisy speech in most cases, but a neural net should certainly be able to be super-human as this.

Some sources of noise like the constant background hum (e.g. computer fan) are easy to spot though.


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: