In my case, loading the entire file is loading a tiny bit more data, so this fallback doesn't hurt, but it's still annoying, and broke any hope I had of doing something more clever with the dataset.
I suspect this is where Windows backwards compatibility bites them a bit. I've got a very old tool [1] that uses WriteProcessMemory and CreateRemoteThread to create a thread in the command process that launched it to remotely change the directory in that process.
It works to this day, despite looking exactly like what malware would do. My tool is nothing in the grand scheme, but I suspect I'm not the only one doing these sort of shenanigans, and no doubt some big important app is doing it and can't be bothered to fix itself, so MS is stuck supporting it.
I ended up using the same basic layout for the database behind a little IP lookup tool I wrote to make lookups somewhat responsive from JavaScript [1]. It ends up working out pretty well.
My internship project involved a tool to look up individual IP addresses as well as checking a range of address blocks. I ended up implementing the trie data structure, which performed well in terms of storage and run-time efficiency.
My initial approach was to store IP addresses into a hashmap but if the average query was to check address blocks like /24, then it'd turn out to be very poor data structure to search through.
WhisperX along with whisper-diarization, runs at something around 20x of real time on audio with a modern GPU, so for that part, you're looking at around $1 per twenty hours of content to run it on a g5.xlarge, not counting time to build up a node (or around 1/2 that for Spot prices, assuming you're much luckier than I am at getting stable spot instances these days).
You can short circuit that time to build up a node a bit with a prebaked AMI on AWS, but there's still some amount of time before a new node can start running at speed, around 10 minutes in my experience.
I haven't looked at this particular solution yet, but I really find the LLMs to be hit or miss at summarizing transcripts. Sometimes it's impressive, sometimes it's literally "informal conversation between multiple people about various topics"
The current directory is managed with SetCurrentDirectory/GetCurrentDirectory, however the cmd.exe command-line shell also stores the current directory for each drive in an environment variable like "=C:", and the CRT and shell hides all environment variables that start with a "=".
It gets mightily confused if these two concepts of current directory ever diverge.
Who is still using cmd.exe? I understand that there are system processes that still need it, but if you see a human using cmd in the year of our lord 2023, that’s a cry for help!
On my Windows 10 with no profile it takes 1-2 seconds (Ryzen 3600/M2/32GB RM). Like, what is it doing? I get annoyed if bash on Linux takes like 250ms.
It's installed everywhere on any version of windows and works fine for interactive tasks (personally I wouldn't write anything but the simplest scripts for it, anything with for loops is a big no-no)
I do. I don't like PowerShell (and it took me years to realize it wasn't a diagnostic tool for power management), and I find bash for Windows to be ill fitting. I don't do a lot of stuff in the command line on Windows, so working like it has for decades is a plus.
I wanted to see if I could do the lookup work client-side, and also include some more metadata about cloud provider's IPs (region, service, etc), not that it's really better, just a toy idea I had.
Extra scandalous too since stuff like load balancers will use a bunch of IPv4 addresses that they can now charge extra for, with no way to avoid the fees since you can't turn off IPv4.
I asked it to summarize the transcript for a podcast about an episode of Star Trek. One minor issue in grammar, but otherwise, it does a remarkable job of making a summary:
> In the Star Trek episode "The Cage", Captain Pike is held captive by a race of advanced televisions. He is offered a life of luxury in a cage, but he realizes that this is not the life he wants and manages to escape. The televisions watch everything on TV and have become too voyeuristic and passive. At the end of the episode, Pike makes the right choice and is able to see the difference between the falsehood of the fantasy and the difficult but more acceptable reality. This episode highlights the importance of exploration and being better than oneself, and is a reminder of the dangers of becoming too passive and voyeuristic. It also shows how technology can be used to create illusions and how these illusions can be used to manipulate people.
https://github.com/seligman/podcast_to_text/blob/master/sear...
In my case, loading the entire file is loading a tiny bit more data, so this fallback doesn't hurt, but it's still annoying, and broke any hope I had of doing something more clever with the dataset.