This is actually a wider issue with live photos IMO - people often share them not remembering or noticing snippets of private conversations can be present. I have live photos enabled by default and have to remind myself to check the audio before I forward a picture quite often, or remember to turn it back to a static image. Sending your mother baby pics while your spouse moans about the mother-in-law in background, that kind of thing...
I'd actually probably get a lot of use from a feature whereby live photos displays a quick transcript of any words it picked up in the recording over the top of the image, or in a prompt, any time you try to share one with clear button to remove it. Similar to the local voicemail transcription feature, then I wouldn't have to waste time playing the recording. You could probably add the same detected words to the photos search index too.
Google Pixel has these too, but it calls them Motion Photos. They are formatted as a JPEG with an MP4 appended to it. Kind of a weird format but nice to know that you can recover the video from any motion photo with a clever dd invocation.
Here's my script for the curious:
#!/bin/bash
#extract mp4 files from Google Pixel 6 .MP.jpg files
set -e -u -o pipefail
extract () {
dd if="$1" of="$1.mp4" bs=$[$( strings -td "$1" | grep ftypisom | head -n1 | cut -d' ' -f1 ) - 4] skip=1
}
for fn in "$@"; do
extract $fn
done
I personally use NextCloud to sync all my photos to my home server. You can also copy them using ADB or MTP from your phone. I think you can also send them as a photo with standard android sharing and it will send the video in the concatenated format, though that may depend on the sharing app and settings used.
In iOS, when you take a photo, by default it actually takes a very short video clip and calls it a "live photo." If you receive a live photo, you can tap and hold on the photo and it will play the video, which sometimes includes audio.
... _and_ a still photo. You don't lose the full-resolution still photo, and the still photo isn't simply a frame from the video. In case anyone's curious.
Wait isn't the magic here that you do exactly that; choose your still photo from the video - thats what you do with the timeline picker in ios?
In other words the video is exactly the same quality as the photo in each frame so you can pick a moment a second later for example if you film something fast moving.
Does anyone know how it works and isn't gigabytes in size?
The stills captured in video are significantly lower quality than the still image at the heart of a live photo. The few frames of captured video and audio are to provide "live" context, not to let you choose the best shot. It was never intended as a burst capture tool, the iPhone has a separate burst mode by holding down the shutter button that works sort of how you are describing, and will provide a rapid burst of much higher quality stills.
You can change the keyframe on a live photo (ie the image displayed first) from the still capture to a video frame, but I generally would never do this due to the quality drop. It adds a little more memory context to a still photo, its not a burst capture tool.
There are actually some features with live photos I hadn't seen before though that are kind of interesting, for example you can fake a long exposure shot via a feature that interpolates information from the video stills into the main image:
This is true. Easy verification: hold your phone still and take a photo of a tree at some distance. Go to “key photo” selection scrubber. Zoom in on leaves. Scrub. Observe the quality significantly decreases when not on the preselected position.
One very cool thing about this is that when you snap the photo you actually get a half second or so of video from before you snapped the picture. I've had a number of cases where I fumbled as quick as I could to take a photo, missed the shot, but found that the shot I wanted was in fact captured by the live video.
Except it's not. What's the opposite of "live"? "Prerecorded"?
So these Live photos are supposedly not prerecoded - except they actually are. They are a prerecorded video that you send to someone. There is nothing live about it.
It’s “live” as in “alive” because you get a bit of motion. The opposite would be “inanimate”, which seems apropos.
You’re right that there’s probably a video format under the hood but the experience to meant to be closer to a still picture with a bit of added magic (a la Harry Potter). The clip is too short to catch much action or have (intentionally) meaningful audio.
Why do they even record / save sound? It's really dumb design. Most people, me included, simply don't know they're sending live photos. When you find out, first you're like oh that's kinda cool. Then you realize it also has sound and are mortified thinking back of what sound snippets you might have sent that you really didn't want to have sent.
1) photos permission includes access to full file for each photo, including geotagged metadata and any audio from "live photo"
This seems desirable and working as expected. You might legitimately want to share the live photo, or the photo with metadata. However, it would be nice if the photo picker asked you whether you wanted to share with metadata. (Note: unlike the photo picker in native apps, the Safari file picker, ie the HTML5 <input type="file" />, does strip EXIF data by default.)
2) it's easy to take a live photo without realizing it, and similarly easy to share it
This could be improved when selecting photos within the photo picker. There should be an option to separately select the static or full file (maybe with a long press).
2) granting access to "all photos" doesn't alert you when an app accesses a photo without you selecting it for some reason
This used to be much worse, when the default and only option was to grant access to all photos, or no photos. Ever since Apple added the option to select which photos an app can access, I think this is less of a problem. But there is an education issue; people don't realize that "all" means the app can truly read all your photos, including geotagged metadata and audio of live photos, even without you selecting one to "upload" (or "do something with," depending on the purpose of the app). Even as a relative expert, I never understood this until I saw a demo app (to prove the privacy issue) that looped through all your photos and displayed your geotagged locations.
But even if people did realize this, there is no indication that an app has accessed a photo, like there is with the icon indicating the microphone or camera was used recently. Perhaps a solution to this could be changing the photo picker to overlay a green dot on any photo the app has accessed.
Would be a lot more interesting if they got rid of the awful “select which photos the app can access” flow and just forced apps to use the system picker.
Agree - I don't know why this isn't the default, or why "all photos" needs to exist in the first place. The only valid use case I can think of is for selecting an image from a gallery in-app.
Other examples where all photos are required, if maybe not valid: third party automatic photo backup, showing a random photo on a widget, prompting a user to post a photo of a location to Google Maps automatically…
It should be possible to allow access to all files, or allow picking a single file. The weird intermediary where I need to first select that I don’t want to allow access to all files, then select the files I do want (the single file I’m posting/sending), then select that file again from the app’s custom picker from amongst the files I’ve previously shared with that app, is absolutely awful. Steve is rolling in his grave.
> Even as a relative expert, I never understood this until I saw a demo app (to prove the privacy issue) that looped through all your photos and displayed your geotagged locations.
Any link to this demo? Would love to use it as example
I can't find it, unfortunately, despite checking my purchase history and googling for relevant terms. I can't remember if it was even an official app - it might have just been a proof-of-concept with testflight. It was part of a blog post from a security researcher who was describing the situation, and it was prior to the "selected photos" permission, so I doubt it would work now anyway.
Most photos have a timestamp and geotag. Knowing whether you're in a vehicle, at a concert or sporting event, or really doing just about anything can be gathered from that information as well as whatever the photos is of. One second of audio isn't giving much (additional) useful data.
All of those individual seconds don't add up to a sum greater than their parts. There are trillions (quadrillions?) of seconds of reality that those same cameras/microphones didn't capture. Capturing a single second of each of a billion people's lives isn't really all that useful, especially for advertisers.
I’m willing to bet that an AI could learn a lot about a person by listening to a large number of short audio clips, together with the photos themselves.
One second isn't even long enough to hear the full pronunciation of all of most English words. Let's say people take ten photos per day. Let's generously say that captures ten random spoken words. Ten random words per day is hardly enough for a _human_ to learn anything about a person, let alone AI. AI cannot magically conjure data from noise.
And when you think about what people take pictures of (their parking spot, selfies, nudes, landmarks, birthday cakes, sunsets, cats), what's heard is likely not even relevant to the picture taker's life or interests. If I look at all of the photos I've taken in the last two weeks, I've got:
- Cat (2)
- Building (1)
- Stuff in my home (6)
- Selfie (4)
You can get that from me by calling and asking if Henry is there. I will answer "No, I'm sorry, but you must have the wrong number". Cheap with Twilio.
If they have access to your pictures, they have access to your videos. This matters because people don't think audio is being recorded when they take photos. As far as threat modeling goes, creating a cloned voice is something these apps could have already done.
If you accept that as true then you also have to accept that your voice is hopelessly copyable and defense against that is futile. So it’s not really important.
> then you also have to accept that your voice is hopelessly copyable and defense against that is futile.
I kinda agree, still worth it to mention to your family or whatever some kinda safe word or whatever. I speak several languages so I guess any scammer impersonating me would focus on one, but who knows you can also make it speak any language I guess.
In order to be safe you'll have to pre-establish a safe word only you and the other person know in order to avoid fake scams etc
How would you turn live photo audio into adtech-compliant metadata? AFAIK there's nothing remotely like that in the OpenRTB spec, which is the standard for all real-time bidding ads.
Perhaps I’m misunderstanding here. This seems like the spec for advertiser bidding. Why is this relevant for how user data is consumed for targeting purposes?
Yes. Facebook runs all photos thru digital recognition. If you look at the alt text for a photo in the feed it will have a string generated by their photo analyzer. But that's just the front. They also use facial recognition to learn which other FB accounts are in your orbit, or make shadow profiles if they don't recognize a facial fingerprint. That way if that person ever signs up, they match the metadata from the shadow profile. This is all in service of scaling ad targeting surface area, which can be sold by the eyeball to generate massive wealth.
There is nothing in that thread that proves it could happen, let alone is happening, beyond "my friend says it's possible" and then everyone jumping on that as if some random person assuming something is possible is the same as it's actually happening.
A falsehood only needs to smell like the truth to be believed. Access to photos is definitely happening but the rest is conjecture without some proper research into this.
Ultimately this seems like far too much hassle and processing for very little gain, given the 2 seconds of audio in a live photo is bound to be useless for determining product relevance. Advertising algorithms are clever enough to target well without having to go to the expense of analysing real time audio from your mic or your photos & videos.
Different thing though. Things like this feed the myth that facebook is always listening to you. It's close enough to the idea to allow the cognitive leap from 'possible' to 'happening' without more than anecdotal evidence. I've never seen conclusive proof, e.g. showing data collection, transmission or usage, that any targeted advertising is based on overheard conversations whilst a phone not in use. Targeted advertising is just really good at segmenting people, combined with confirmation bias and the number of ads people don't register mentally until they are relevant in this context.
Apple thing; they're short clips of about a second, with a few options for displaying as loops or long exposures.
I've used them deliberately perhaps twice, and wish I knew how to force all the far more numerous accidental uses into normal jpgs as I don't want to waste any of the limited iCloud storage space on video I'll never watch.
Samsung has something like this too, though I'm not sure what it's called. My mom was showing me "photos" she'd taken recently, and they all had 1-2 seconds of "noise" (motion) surrounding them. To me there is no appeal in this as a default setting, it just shows off how unsteadily you hold your phone when taking a picture. But I can see how an experienced photographer might take advantage of it when capturing moving targets.
I'd never heard of them either. Seems to be an iPhone thing.
"Basically, Live Photos on iPhone is a regular digital photo and a short 2 seconds video recording the few moments before the photo. Essentially, the iPhone camera captures 2 media files, one photo and one video. When viewing a Live Photo, the operating system, iOS plays the video file first and then it shows the picture."
The part that makes me feel old is that I have literally no idea why this sort of thing would be useful or desirable.
It's actually really nice. Not only does it provide a really good effect when swiping between photos in your photo album (it plays a bit of that video during the swipe, so your photo basically animates into place), but it often captures some really good stuff. I can't tell you in how many photos of my kids I've found utter delight in the live portion.
It also lets you pick a different key frame, so if e.g. you get someone blinking, you can pick a different frame from the video to use instead.
And if you capture photos back to back such that the live portions overlap, you can convert the group of photos into a single contiguous video.
They can be fun, you get a photo plus a short video "for free." Much easier than filming constantly and editing things down later. You can pick a different key frame to display as well, so it can be a good way to fix an awkwardly-timed yawn or such.
I don't use them personally, though, but I'm a curmudgeon about my camera controls ;)
I think Google added something similar in their Pixel camera app.
That's what I was picturing. I'm glad my impression isn't far off. Perhaps it's a "you have to use it to understand" sort of thing, but I truly struggle to see the value.
It doesn't matter, though. I don't use iPhones and even if I did, that I don't understand it is meaningless.
I love them. I often just swipe through memories in pictures, and it takes me back to the moment with the audio and quick movement before the shot was taken.
When on an iPhone and viewing a “live photo”, you can hold down on the screen and the image turns into a short video leading up to the moment of the photo.
I have been using social media strictly through mobile Safari for a month now and my ads have gotten so insanely irrelevant. It made me wonder what data they were getting from the app that they don’t have access to any more
The divergence between mobile Safari and in-app browsers is really frustrating. As far as I know, it's not possible to install content blockers that work within in-app browsers, which means that every Social Media or Newsreader app with an in-app browser is its own separate cookie sandbox exposing me to tracking scripts that I'd have otherwise avoided with content blockers in Safari. Not to mention the app itself has full access to the DOM of every page in its in-app browser. I know I can use PiHole or similar, but it feels like like a deceptive move from Apple, possibly with the motive of creating an incentive for developers to build for the app store where it's harder to block ads.
Some apps are friendly about it. I appreciate that Apollo gives me the option of which browser to use for opening links, and that it distinguishes between "in-app browser" just as it does between Chrome, Firefox and Safari.
For a long time I've had an idea for an app that can take advantage of this, but for the user's benefit. Imagine a newsreader app with modular news sources, where each "source" is a client side script that runs within the in-app browser, to navigate to a URL and then "parse the content" by getting rid of ads, paywalls, etc. So unlike a typical RSS feed reader app that makes you rawdog ten articles in an in-app browser before you hit a paywall, it would inject user-defined scripts into the in-app browser to make sure you never see the paywall in the first place. Philosophically, it's still a web browser, but with a more rigid interface for browsing between websites. The client is in control. It would be like Gopher for the modern age.
Isn’t it just as easy to forget that audio in videos can be accessed in the same way?
Sure, users mostly remember that they’re recording audio when shooting a video; but do they realise that advertisers have access to that as part of their photo library?
Even if you don't mind it accessing locations from individual photos, nothing prevents the app from scanning your entire library and getting all the locations from there too.
Same risk with timestamps - the app can get a list of all the timestamps and use that to confirm/refute other fuzzy datapoints collected elsewhere (web tracking, etc).
After experiencing facebook abusing the location metadata when sharing [0] I only use the iOS photos app and share from there to any app I don’t fully trust and each time I have to manually go and disable location sharing, I wish I could just default to never sharing it.
[0] by tagging the photo location or suggesting location specific things etc
Another thing to consider: A 3d fingerprint of a space can be created with live photos. I'm sure object recognition benefits from the small perspective changes as well.
If I were running a social network, I would prioritize using AI to describe photos and using that as part of the targeting data over the 1-1.5 seconds of audio in Live Photos.
Some people have no problems developing literal killing machines, or even using them. I'm not at all surprised some have no problem building advertisement targeting.
There are lots of coherent arguments (whether you agree with them or not) about why defence tech could be considered valuable. Ad-tech has a lot less going for it.
Most people don't care all that much how much value their work adds to society. As long as it doesn't cause much harm and lets me live comfy, who cares.
Google and Facebook both have an evil business model, and everyone works there is supporting their business model, working directly on ad or not, that’s why I feel pity for everyone that start with “ex-google ex-facebook etc” thinking it’s an honor badge but in reality it’s the opposite.
We sit and laugh at the kind of idiotic things people make up online, when most ad targeting is just done with search keywords and some web browsing info.
"Ad targeting is listening more often than you think" is a "conspiracy theory" I believe in.
I've had too many "mentioned an idea, then got targeted ads for it within a day" incidents to not think some combination of things on the Google Home or iPhones or other smart devices is spying. Yeah, yeah, familiar with all the "someone else on the same wifi probably searched for it" or "Baader-Meinhof" or what have you, but I don't live with a lot of people and Instagram ads in particular are VERY focused these days in a way that Baader-Meinhof doesn't really fit - if I'd seen the ad for [specific thing] a couple days before I mentioned it, it would've been noticeably weird and out of place in a different way.
I was in ad-tech for a bit in the last decade, and even at a tiny company there were some data sources we bought or heard about that were pretty spooky, so I would not be shocked if there are some wild ones out there today.
And yet there's still zero remotely plausible evidence for it.
Nobody has caught continuous audio feeds being transmitted from smart devices to the cloud (which would be noticeable due to increased network traffic and bandwidth usage) nor identified any secret speech recognition code on the client (which would be noticeable due to severely shortened battery life). Nobody who's worked in adtech has come forward to blow the whistle or admit that they shipped this feature for a big tech company.
I get why it's an appealing conspiracy from a gut instinct perspective, but it really makes no sense. When you're observing the behavior of billions of people and using machine learning algorithms trained to get the best results possible, some uncanny shit will naturally result. Look at how effective LLMs like ChatGPT have gotten without an obvious route to profitability, then think about how much more money has been invested into ad targeting algorithms just in the last couple decades alone.
If I was going to do it... I'd definitely hook in through non-battery-powered IOT devices. Something like a smart TV or various other home security stuff. The TV seems ideal, you have non-trivial compute there so you could do some local speech-to-text and keyword matching, then just periodically phone home with tiny bandwidth usage; that's enough to associate IPs to interests, and then that dataset doesn't even have to look that creepy at the surface-level (you wouldn't tell many people where you got it exactly) when you sell it on to ad networks...
Sneaky apps would be another source, obviously the phone OS/computer vendors wouldn't want this, but I imagine there's some cat and mouse. It's just a new version of browser toolbars, not something hard to imagine some unscrupulous 3rd party data collection company building.
I definitely wouldn't expect Facebook or Google to be doing it directly.
TVs already do a lot of such tracking, and they are open about it. Samsung famously has their ACR feature[1] which works in a manner similar to what you suggest - it basically phones home periodically with screenshots of what you're watching.
The main problem with the ad-companies-are-listening-to-you theory is that audio processing is very power hungry. Running an ASR model locally would eat up a ton of power. Just doing wakeword detection (where you're only listening for a specific phrase like "Hey Siri") generally requires a dedicated specialized chip so as to not impact power consumption too much.
Same problem if they were surreptitiously streaming audio to their servers. You would see it from the outgoing packets and streaming that amount of data would also be fairly expensive.
Audio doesn't require all that much bandwidth. 20Kb/Sec should be plenty for reasonable fi. "CD quality" would be 40Kb. Ultrasound maybe double. So... 20Kb would be 20 packets per second and that's being generous. That's without compression.
20 packets per second continuously from your phone to a single server, when you’re otherwise sitting there doing nothing would be very noticeable (for anyone who cared to look for it). And yet I’ve never seen any evidence reported of this.
I can save you the effort, it's not happening. This would be a monumental national security concern for non-US states, not even considering the technical limitations that exist for storing or utilizing literally millions of audio samples a minute.
People are incredibly predictable using only a handful of demographics, there's simply no need to invest the astronomical amount that would be necessary to process these conversations when there's already many simple ways to track/generate user interest.
My wife uses Instagram, and at times, she receives ads related to things we discuss together or with her friends. Initially, you might consider these occurrences as mere coincidences, given that they only happen occasionally. However, the level of specificity in these ads is uncanny.
Interestingly enough, we find ourselves getting unusually excited when this happens, primarily because English is not our native language. When the spyware manages to comprehend our conversations despite our heavy accents, it gives us a sense of improvement in our language skills.
Conversely, our sentiments take a complete turn when we unintentionally activate Apple’s Siri while discussing unrelated topics. For some reason, the dedicated chip for detecting “Hey, Siri!” interprets one of us uttering that command and starts listening in. We exchange glances, shed a few tears of frustration over the misunderstanding, and then burst into laughter.
I’m not entirely sure how they manage to accomplish this, but perhaps there’s an idea for an app hidden in these experiences. For instance, an app that consistently listens in the background and occasionally shares a relevant joke based on the ongoing conversation. That would certainly add a humorous twist to things.
I'd actually probably get a lot of use from a feature whereby live photos displays a quick transcript of any words it picked up in the recording over the top of the image, or in a prompt, any time you try to share one with clear button to remove it. Similar to the local voicemail transcription feature, then I wouldn't have to waste time playing the recording. You could probably add the same detected words to the photos search index too.