Hacker Newsnew | past | comments | ask | show | jobs | submit | csdvrx's commentslogin

Will the offline mode work on laptops?


I use both: they are extremely fast - faster than Wordpad (on a native windows installation), much faster than libreoffice (on linux with wine).


Read https://blogs.windows.com/windowsexperience/2024/12/06/phi-s...

I submitted it, as it gives a better picture of what Microsoft is trying to do: both the hardware, and the software.

Phi is small, not just for shows, but also to be able to run locally on the hardware they are planning for it to run on the copilot branded devices.


Yes: they are building both the software and the hardware for that: https://blogs.windows.com/windowsexperience/2024/12/06/phi-s...


Is anyone here using phi-4 multimodal for image-to-text tasks?

The phi models often punch above their weight, and I got curious about the vision models after reading https://unsloth.ai/blog/phi4 stories of finetuning

Since lmarena.ai only has the phi-4 text model, I've tried "phi-4 multimodal instruct" from openrouter.ai.

However, the results I get are far below what I would have expected.

Is there any "Microsoft validated" source (like https://chat.qwen.ai/c/guest for qwen) to easily try phi4 vision?


Sometimes, you can't find the average because it's undefined: it can happen with a Cauchy and a few other statistical distributions: the wikipedia page has a nice plot of how the first 2 moments don't converge https://en.wikipedia.org/wiki/Cauchy_distribution#History

When in doubt, don't use the mean: prefer more robust estimates, as even with degenerate statistical distributions, there are still some "good numbers to report" like the mode or the median.

And if you don't know statistics, just use a plot!


Indeed, the best averaging method depends on the underlying probability distribution from which data is drawn. Arithmetic is best for normal whereas geometric is better suited for lognormal distribution, and as the above comment suggests average is meaningless for most power law distributions where exponent is less than 2.

However, When all else fails, define your own Von Neumann entropy. Figure out how often you compile GCC, FFT, or video compression, then compute probabilities (ratios) and multiply by logarithm of speedups for each use case. Sum them up and report it as machine/architecture entropy and you'll win every argument about it.


I agree with your point, but it is funny to think about true consumer workloads: I... mostly JIT and run Javascript, layout algorithms, and whatever compositing cannot be offloaded to the GPU.


> it's like systemd trading off non-determinism for boot speed, when it takes 5 minutes to get through the POST

That's a bad analogy: if a given deterministic service ordering is needed for a service to correctly start (say because it doesn't start with the systemd unit), it means the non-deterministic systemd service units are not properly encoding the dependencies tree in the Before= and After=

When done properly, both solutions should work the same. However, the solution properly encoding the dependency graph (instead of just projecting it on a 1-dimensional sequence of numbers) will be more flexible: it's the better solution, because it will give you more speed but also more flexibility: you can see the branches any leaf depends on, remove leaves as needed, then cull the useless branches. You could add determinism if you want, but why bother?

It's like using the dependencies of linux packages, and leaving the job of resolving them to package managers (apt, pacman...): you can then remove the useless packages which are no longer required.

Compare that to doing a `make install` of everything to /usr/local in a specific order, as specified by a script: when done properly, both solutions will work, but one solution is clearly better than the other as it encodes more finely the existing dependencies instead of projecting them to a sequence.

You can add determinism if you want to follow a sequence (ex: `apt-get install make` before adding gcc, then add cuda...), or you can use meta package like build-essentials, but being restricted to a sequence gains you nothing.


I don't think it is a bad analogy

given how complicated the boot process is ([1]), and it occurs once a month, I'd rather it was as deterministic as possible

vs. shaving 1% off the boot time

[1]: distros continue to ship subtlety broken unit files, because the model is too complicated


Most systems do not have 5 minute POST times. That’s an extreme outlier.

Linux runs all over, including embedded systems where boot time is important.

Optimizing for edge cases on outliers isn’t a priority. If you need specific boot ordering, configure it that way. It doesn’t make sense for the entire Linux world to sacrifice boot speed.


I don't even think my Pentium 166 took 5 minutes to POST. Did computers ever take that long to POST??


Look at enterprise servers.

Competing POST in under 2 minutes is not guaranteed.

Especially the 4 socket beasts with lots of DIMMs.


Old machines probably didn't, no, but I have absolutely seen machines (Enterprise™ Servers) that took longer than that to get to the bootloader. IIRC it was mostly a combination of hardware RAID controllers and RAM... something. Testing?


It takes awhile to enumerate a couple TB worth of RAM dimms and 20+ disks.


Yeah, it was somewhat understandable. I also suspect the firmware was... let's say underoptimized, but I agree that the task is truly not trivial.


One thing I ran across when trying to figure this out previously - while some firmware is undoubtably dumb, a decent amount of it was that it was doing a lot more than typical PC firmware.

For instance, the slow RAM check POST I was experiencing is because it was also doing a quick single pass memory test. Consumer firmware goes ‘meh, whatever’.

Disk spin up, it was also staging out the disk power ups so that it didn’t kill the PSU - not a concern if you have 3-4 drives. But definitely a concern if you have 20.

Also, the raid controller was running basic SMART tests and the like. Which consumer stuff typically doesn’t.

Now how much any of this is worthwhile depends on the use case of course. ‘Farm of cheap PCs’ type cloud hosting environments, most these types of conditions get handled by software, and it doesn’t matter much if any single box is half broken.

If you have one big box serving a bunch of key infra, and reboot it periodically as part of ‘scheduled maintenance’ (aka old school on prem), then it does.


Physical servers do. It's always astounding to me how long it takes to initialise all that hardware.


See: the comment above and its folkloric concept of systemd as some kind of constraint solver

Unfortunately no one has actually bothered to write down how systemd really works; the closest to a real writeup out there is https://blog.darknedgy.net/technology/2020/05/02/0/


Oh? What's an example of a common way for unit files to be subtlely broken?


In the last discussion about such issues, someone linked https://www.abc.net.au/listen/programs/rearvision/the-dark-s... which is worth a read for past abuses:

> Military authorities in California requested census data to identify the Japanese-American population. Then in 1942, president Franklin Roosevelt issued an executive order to authorise their removal.


This is obviously how this is going to be used and anyone thinking otherwise is naive. They are building a system to track dissenters. First it will be used to track down illegal immigrants to justify its existence. We are pretty much going down the "first they came for.." list.


I don't think they even need to be illegal immigrants based on what we are seeing so far with detentions and deportations.

For example Rümeysa Öztürk (the topic popped up on a now-flagged submission, I'm not sure the etiquette about linking to that at this point). Mahmoud Khalil is another one (permanent resident).


why is there not equal scrutiny about a system of "undocumented, illegal" people who are obviously real people?


What system are you talking about?


Very interesting link!

Submitted!


For long term storage, prefer hard drives (careful about CMR vs SMR)

If you have specific random IO high performance needs, you can either

- get a SLC drive like https://news.solidigm.com/en-WW/230095-introducing-the-solid...

- make one yourself by hacking the firmware: https://news.ycombinator.com/item?id=40405578

Be careful when you use something "exotic", and do not trust drives that are too recent to be fully tested: I learned my lesson for M2 2230 drives https://www.reddit.com/r/zfs/comments/17pztue/warning_you_ma... which seems validated by the large numbers of similar experiences like https://github.com/openzfs/zfs/discussions/14793


> - make one yourself by hacking the firmware: https://news.ycombinator.com/item?id=40405578 Be careful when you use something "exotic", and do not trust drives that are too recent to be fully tested

Do you realize the irony of cautioning about buying off the shelf hardware but recommending hacking firmware yourself?


That "firmware hack" is just enabling an option that manufacturers have always had (effectively 100% "SLC cache") but almost always never use for reasons likely to do with planned obsolescence.


Converting a QLC chip into an SLC is not planned obsolescence. It’s a legitimate tradeoff after analyzing the marketplace that existing MTBF write lifetimes are within acceptable consumer limits and consumers would rather have more storage.

Edit: and to preempt the “but make it an option”. That requires support software they may not want to build and support requests from users complaining that toggling SLC mode lost all the data or toggling QLC mode back on did similarly. It’s a valid business decision to not support that kind of product feature.


And for the vast majority of use cases, even if QLC wears out TLC would be fine indefinitely. Limiting it to SLC capacity would be ridiculous.


I have USB drives with good old SLC flash, whose data is still intact after several decades (rated for retention of 10 years at 55C after 100K cycles - and they have not been cycled anywhere near that much.)

and consumers would rather have more storage

No one from the manufacturers tells them that the "more storage" - multiplicatively more - lasts exponentially less.

For the same price, would you rather have a 1TB drive that will retain data for 10 years after having written 100PB, or a 4TB one that will only hold that data for 3 months after having written 2PB?

That requires support software they may not want to build

The software is already there if you know where to look.

and support requests from users complaining that toggling SLC mode lost all the data or toggling QLC mode back on did similarly

Do they also get support requests from users complaining that they lost all data after reformatting the drive?

It’s a valid business decision to not support that kind of product feature.

The only "valid business decision" is to make things that don't last as long, so recurring revenue is guaranteed.

Finally, the "smoking gun" of planned obsolescence: SLC flash requires nowhere near as much ECC and thus controller/firmware complexity as MLC/TLC/QLC. It is also naturally faster. The NRE costs of controllers supporting SLC flash is a fraction of those for >1 bit per cell flash. QLC in particular, according to one datasheet I could find, requires ECC that can handle a bit error rate of 1E-2. One in a hundred bits read will be incorrect in normal operation of a QLC storage device. That's how idiotic it is --- they're operating at the very limits of error correction, just so they can have a measly 4x capacity increase over SLC which is nearly perfect and needs very minimal ECC. All this energy and resource usage dedicated to making things more complex and shorter-lasting can't be considered anything other than planned obsolescence.

Contrast this with SmartMedia, the original NAND flash memory card format, rated for 100K-1M cycles, using ECC that only needs to correct at most 1 bit in 2048, and with such high endurance that it doesn't even need wear leveling.

Also consider that SLC drives should cost a little less than 4x the price of QLC ones of the same capacity, given the lower costs of developing controllers and firmware, and the same price of NAND die, yet those rare SLC drives which are sold cost much more --- they're trying to price them out of reach of most people, given how much better they actually are.


No you’re right. You’ve uncovered a massive conspiracy where they’re out to get you.

> No one from the manufacturers tells them that the "more storage" - multiplicatively more - lasts exponentially less. For the same price, would you rather have a 1TB drive that will retain data for 10 years after having written 100PB, or a 4TB one that will only hold that data for 3 months after having written 2PB?

These numbers seem completely made up since these come with a 1 year warranty and such a product would be a money loser.

> Also consider that SLC drives should cost a little less than 4x the price of QLC ones of the same capacity, given the lower costs of developing controllers and firmware, and the same price of NAND die, yet those rare SLC drives which are sold cost much more --- they're trying to price them out of reach of most people, given how much better they actually are.

You have demonstrated a fundamental lack of understanding in economics. When there’s less supply (ie these products aren’t getting made), things cost more. You are arguing that it’s because these products are secretly too good whereas the simpler explanation is just that the demand isn’t there.


When there’s less supply (ie these products aren’t getting made), things cost more.

SLC and QLC is literally the same silicon these days, just controlled by an option in the firmware; the former doesn't even need the more complex sense and program/erase circuitry of the latter, and yields of die which can function acceptably in TLC or QLC mode are lower. If anything, SLC can be made from reject or worn MLC/TLC/QLC, something that AFAIK only the Chinese are attempting. Yet virgin SLC die are priced many times more, and drives using them nearly impossible to find.

such a product would be a money loser.

You just admitted it yourself - they don't want to make products that last too long, despite them actually costing less.

Intel's Optane is also worth mentioning as another "too good" technology.


I think you’re casually dismissing the business costs associated with maintaining a SKU and assuming manufacturing cost is the only thing that drives the final cost which isn't strictly true. The lower volumes specifically are why costs are higher regardless of it “just” being a firmware difference.


They did not recommend. They listed.


Tape is extremely cheap now. I booted up a couple laptops that have been sitting unpowered for over 7 years and the sata SSD in one of them has missing sectors. It had zero issues when shutdown.


Is tape actually cheap? Tape drives seem quite expensive to me, unless I don't have the right references.


Tapes are cheap, tape drives are expensive. Using tape for backups only starts making economic sense when you have enough data to fill dozens or hundreds of tapes. For smaller data sets, hard drives are cheaper.


Used LTO5+ drives are incredibly cheap, you can get a whole tape library with two drives and many tape slots for under 1k.

Tapes are also way more reliable than hard drives.


HDDs are a pragmatic choice for “backup” or offline storage. You’ll still need to power them up, just for testing, and also so the “grease” liquefies and they don’t stick.

Up through 2019 or so, I was relying on BD-XL discs, sized at 100GB each. The drives that created them could also write out M-DISC archival media, which was fearsomely expensive as a home user, but could make sense to a small business.

100GB, spread over one or more discs, was plenty of capacity to save the critical data, if I were judiciously excluding disposable stuff, such as ripped CD audio.


If you don’t have a massive amount of data to backup, used LTO5/6 drives are quite cheap, software and drivers is another issue however with a lot of enterprise kit.

The problem ofc is that with a tape you need to also have a backup tape drive on hand.

Overall if you get a good deal you can have a reliable backup setup for less than $1000 with 2 drives and a bunch of tape.

But this is only good if you have single digit of TBs or low double digit of TBs to backup since it’s slow and with a single tape drive you’ll have to swap tapes manually.

LTO5 is 1.5TB and LTO6 is 2.5TB (more with compression) it should be enough for most people.


> But this is only good if you have single digit of TBs or low double digit of TBs

That's not so enticing when I could get 3 16TB hard drives for half the price, with a full copy on each drive plus some par3 files in case of bad sectors.


You could, it’s really a question of what your needs are and what your backup strategy is.

Most people don’t have that much data to back up, I don’t backup movies and shows I download because I can always rebuild the library from scratch I only backup stuff I create, so personal photos, videos etc.

I’m not using a tape backup either, cloud backup is enough for me its cheap as long as you focus your backups to what matters the most.


I have used LTO5 drives under FreeBSD and Linux. Under Linux I used both LTFS and tar. There was zero issues with software.


Older drives are a bit better but still ymmv. Had quite a few issues with Ethernet based drives on Linux in the past.


The issue with tape is that you have to store it in a temperature controlled environment.


Tape sucks unless you've got massive amounts of money to burn. Not only are tape drives expensive, they only read the last two tape generations. It's entirely possible to end up in a future where your tapes are unreadable.


There's a lot of LTO drives around. I strongly doubt there will be any point in the reasonable lifetime of LTO tapes (let's say 30 years) where you wouldn't be able to get a correct-generation drive pretty easily.


While the tape is relatively cheap, the tape drives are not. The new ones typically starts at 4K USD, although sometimes for older models the prices can drop below 2K.


You can get LTO5+ drives on ebay for $100-400. Buying new doesn't make sense for homelab.


If you care about long term storage, make a NAS and run ZFS scrub (or equivalent) every 6 months. That will check for errors and fix them as they come up.

All error correction has a limit. If too many errors build up, it becomes unrecoverable errors. But as long as you reread and fix them within the error correction region, it's fine.


> run ZFS scrub (or equivalent) every 6 months

zfs in mirror mode offers redundancy at the block level but scrub requires plugging the device

> All error correction has a limit. If too many errors build up, it becomes unrecoverable errors

There are software solutions. You can specify the redundancy you want.

For long term storage, if using a single media that you can't plug and scrub, I recommend par2 (https://en.wikipedia.org/wiki/Parchive?useskin=vector) over NTFS: there are many NTFS file recovery tools, and it shouldn't be too hard to roll your own solution to use the redundancy when a given sector can't be read


What hardware, though? I want to build a NAS / attached storage array but after accidentally purchasing an SMR drive[0] I’m a little hesitant to even confront the project.

A few tens of TBs. Local, not cloud.

[0] Maybe 7 years ago. I don’t know if anything has changed since, e.g. honest, up-front labeling.

[0*] For those unfamiliar, SMR is Shingled Magnetic Recording. https://en.m.wikipedia.org/wiki/Shingled_magnetic_recording


I have a homelab with a bunch of old HP Gen 8 Microservers. They hold 4x 3.5" hdds and also an ssd (internally, replacing the optical slot):

https://www.ebay.com/itm/156749631079

These are reasonably low power, and can take up to 16GB of ECC ram which is fine for small local NAS applications. The cpu is socketed, so I've upgraded most of mine to 4 core / 8 thread Xeons. From rough memory of the last time I measured the power usage at idle, it was around 12w with the drives auto-spun down.

They also have a PCIe slot in the back, though it's older gen, but you'll be able to put a 10GbE card in it if that's your thing.

Software wise, TrueNAS works pretty well. Proxmox works "ok" too, but this isn't a good platform for virtualisation due to the maximum of 16GB ram.


> What hardware, though?

Good question. There seems to be no way to tell whether or not we're gonna get junk when we buy hard drives. Manufacturers got caught putting SMR into NAS drives. Even if you deeply research things before buying, everything could change tomorrow.

Why is this so hard? Why can't we have a CMR drive that just works? That we can expect to last for 10 years? That properly reports I/O errors to the OS?


Toshi Nx00/MG/MN are good picks. The company never failed us and I don't believe they've had the same kinds of controversies as the US competition.

Please don't tell everyone so we can still keep buying them? ;)


The Backblaze Drive Stats are always a good place to start: https://www.backblaze.com/blog/backblaze-drive-stats-for-202...

There might be SMR drives in there, but I suspect not.


Nothing can really save you from accidentally buying the wrong model other than research. For tens of TBs you can use either 4-8 >20TB HDDs or 6-12 8TB SSDs (e.g. Asustor). The difference really comes down to how much you're willing to pay.


SMR will store your data, just slowly.

It was a mistake for the Hard Drive business community to push them so hard IMO. But these days the 20TB+ drives are all HAMR or other heat/energy assisted tech.

If you are buying 8TB or so, just make sure to avoid SMR but otherwise you're fine. Even then, SMR stores data fine, it's just really really slow.


I use TrueNAS and it does a weekly scrub IIRC.


> (careful about CMR vs SMR)

Given the context of long term storage... why?


After I was bamboozled with a SMR drive, always great to just make the callout to those who might be unaware. What a piece of garbage to let vendors upsell higher numbers.

(Yes, I know some applications can be agnostic to SMR, but it should never be used in a general purpose drive).


Untested hypothesis, but I would expect the wider spacing between tracks in CMR makes it more resilient against random bit flips. I'm not aware of any experiments to prove this and it may be worth doing. If the HD manufacture can convince us that SMR is just as reliable for archival storage it would help them sell those drives since right now lots of people are avoiding SMR due to poor performance and the infamy of the bait-and-switch that happened a few years back.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: