Hacker News new | past | comments | ask | show | jobs | submit login
Mechanical Turk Lessons Learned (curalate.com)
90 points by kmano8 on Feb 1, 2017 | hide | past | favorite | 81 comments



I wanted to read your article, but the design of your blog really sucks. I can hardly read the text, it's quite small. Monospace fonts are great for programming, bad for reading. Animated gifs every 5 lines is really annoying.

Sorry!


I hate to be negative, but am doing so in hopes that the author will improve. I had the same experience. With the animated gifs every paragraph, it gives the feel of being one of those Buzzfeed articles where one has to click through to eleven different pages to see the whole list. Really couldn't read any of it due to the distraction and style choices.


Buzzfeed doesn’t use slideshows. All the content is on one page.


Not that it absolves the owner of the site, but Readability mode is built just for sites like this.

Still doesn't help the gifs though.


I found the gifs really obnoxious. Yes we get it, it's an article about Mechanical Turk and the guy from Scrubs is named Turk. Wow! No need to repeat the same "joke" EIGHT times.


Sorry to ask, but what is readability mode and how do I use it. I googled, but due to SEO I get a ton of results.


Readability is a browser feature that extracts the text of a website and displays it in a pre-defined, easy to read way. Safari calls this feature “Reader”[1].

[1] https://support.apple.com/kb/PH21467?locale=en_US


Many browsers have a "reading" mode or extension that strips CSS and pictures. Safari Reader is built in, Firefox and Chrome have an extension called Readability that you can install.


It's built into firefox as far as I know, it's on the right hand end of the uri bar for me.


Holy cow I've used Firefox since it was called Mozilla browser and was owned by Netscape and I've never seen that a reader mode was built in.

I'm an idiot.


It's one of the best things about the browser on mobile phones. I never have an issue with weird JavaScript pop ups or sites that are not movie friendly.


don't feel so bad, I am in the same boat. I remember loading up Netscape to my 486 with a 3.5 floppy


It's a Firefox feature. That feature alone made me switch to FF for browsing.


Also in Safari, it's the little 3-line icon in the URL bar.


I ended up using dev-tools to delete all the gifs. I can live with play-once-then-stop, but the continual repeated movement was distracting.


document.querySelectorAll('img').forEach((im) => { im.style.display = 'none'; })


+ document.body.style.backgroundColor = 'white'; = can read now


Just press ESC and the gifs will stop

EDIT: Seems like I'm years behind https://bugzilla.mozilla.org/show_bug.cgi?id=825486

It's stop() now.


I write bookmarklets for things like this. Someone already posted some code below. They have the benefit of being quick to write, bespoke to my needs and low overhead.


I came here to say the same thing. I had to stop reading, the gifs were WAY too much.


I feel like the information in the article is incredibly useful, but like you, I found the gifs distracting to the point of absurdity. I've never seen this show, but is it good enough such that you would have it strewn all over your good information? The font choice and color definitely didn't help. I guess it is the whole signal to noise question in action.


I am on IPad with 10% battery and the last thing I want is 10 looping childish gifs running for 15 minutes while reading


I have to concur with a lot of the others on here. I thought the content was interesting, and I really wanted to read it all and learn from it, but the colours, font sizes, and oh - those awful animated gifs just resulted in me reeling back and closing the browser window.


Very first thing I did was block the gifs with the ad blocker, they are insufferable.


Why the GIFS?!?!!?! Couldn't read for more than a minute. And I am a Scrubs fan.


Do you use the cover of a book to judge it's contents? Design wasn't that bad (at least it's not red background with blue texts).

OP don't get discouraged by one off comments like this on HN. They rarely add any real value and most of the time just personal preferences.

Your content was very informative and valuable.


> Do you use the cover of a book to judge it's contents?

This is more like judging a book by its poor type-setting.


I'm willing to forgive a lot, but that site is atrocious. I could care less about 'well designed' - I'm fine with black and white with boring fonts - but gifs bouncing around all over the place combined with a difficult color scheme is too much.


We (the PLASMA research group at UMass, http://plasma.cs.umass.edu) developed a system called AutoMan specifically designed to automatically manage quality (as well as to automatically compute pay and time) for a wide variety of tasks. You basically invoke people as functions and it just works, with statistical guarantees (it also handles payment, etc. without any additional effort). Makes dealing with MTurk much nicer. Best used in Scala but also can be used from Java.

http://automan-lang.com https://github.com/plasma-umass/AutoMan

Paper here on AutoMan, round one: * http://cacm.acm.org/magazines/2016/6/202648-automan/abstract (CACM Research Highlight, 2015)

Original paper, not behind a paywall: * https://people.cs.umass.edu/~emery/pubs/res0007-barowy.pdf (OOPSLA '12)

New features described here have been rolled into AutoMan: "VoxPL: Programming with the Wisdom of the Crowd" (CHI '17, to appear): https://people.cs.umass.edu/~emery/pubs/voxpl-chi.pdf


Wow that's really cool!

We're working on a lot of the same things at Scale API (www.scaleapi.com). Starting with a higher quality set of task-completers, and building in similar statistical guarantees for our tasks.

One of the things we work on is building quality for responses that are little more complex (bounding boxes and audio transcription, for example). I'd be interested to see if we can apply some of your learnings to those task types!


This is both amazing and a bit unnerving.


Matroska brain confirmed. But seriously, I was looking for a tool like this.


For US-based people, I suggest using Mechanical Turk as a worker before creating HITs as a Requester. See https://www.reddit.com/r/HITsWorthTurkingFor/ for decent examples.

It makes me sad that so much of the work available on Mechanical Turk is poorly designed, and that workers have little recourse to bad Requesters.


Definitely. That's one of our focuses at Scale API (www.scaleapi.com). We build the UIs and the tooling to make sure they're efficient and intuitive, to avoid the exact problems of requesters making poorly designed work.


You mention your business with a hyperlink four separate times in this thread. I have to wonder how much of your purpose in posting is informative after the second or third time.


This. And pay good turkers accordingly. The average HIT is comically bad paid. Do people think that paying 20 cents for a job that takes 15 minutes to be well done will get them good work?


As a requester I'm aiming for federal minimum wage which, to my knowledge, is above average, but still very low of course. Most turkers are doing a tremendous job, but there are also quite a few bad apples who are consistently trying to cheat. It's amazing how much thought some people invest in finding ways to cheat when they could just do the freaking task. I'm not just losing data because of these people, I also have to screen the all data incredibly carefully. As a requester, I have to price this in and this hurts honest turkers. I feel sorry about this, but I just can't justify paying as if everyone was doing great work.


So then question...

If min wage is say $7.25 an hour, which is what you aim to pay turks... but then you end up throwing away 2/3 of your work (say have 3 turks do each piece to validate they did it right).... why not hire 1 local person for $15 an hour? Decent wage, less total money (15 vs $22 an hour), and likely better results.


why not hire 1 local person

I didn't see an answer to this ... so I'll take a stab at it.

One advantage of using a turk is it's on an as needed basis. Hire someone for 1 hour, 1 week, or even 1 month of what used to be called "piecework".

That's quite different than hiring a person locally. Where can you find local people to do piecework? It's quite likely the task or tasks envisioned don't constitute full time or even part time employment. They're just one-time tasks.

Maybe temp agencies still supply people for jobs like this? But you're limiting yourself to a small pool of people unless you're in a big metropolitan area or unless there's a University with a large group of hungry students nearby.


Two reasons: 1.) Data collection on MTurk is much faster because many people can work in parallel on the task. 2.) I'm doing academic surveys and need data from as many different people as possible.


AFAIK, you can filter 'bad' turkers (by country, performance, etc) and inform good ones that HITS are available. By the other hand, if you pay decently, turkers will be afraid to loose that income source and will perform their best. (Disclaimer: I've made some HITS in a distant past.)


I recently tested MTurk for my startup. We set up about 500 HITs to collect website URL and email from various businesses. We set the price at $0.05 (Amazon takes an additional $0.01). Jobs quickly got started and within 24 hours we had all of our data collected.

I'm not sure I would do it again though. A lot of the businesses we were targeting don't have a web presence and therefore "No URL/No Email" was a viable answer. However, when I went through the list to see 150 "No URL/No Email" answers I didn't know for sure whether that is true or whether the Turker realized they could just copy/paste and make a quick buck. Amazon does provide the amount of time they spent on the task so I rejected any that were less than 10 seconds as I felt like they didn't give it a good enough try. Over that, I just accepted the answer realize that it may be false.

In the end I think I spent more time going through results and correcting them then it actually saved me. I'm excited to use MTurk in the future again, but only for appropriate projects.


I think you probably needed to have some redundancy, have each business checked at least twice.


are there any tools that help you do QA on MT data entry in a Bayesian way?

seems like the situation is ripe for such a tool

---------

start with a subset of questions you have a predetermined answer to, only keep feed questions if the person responding has met a certain quality threshold on those question

every so often feed them a QA question

every so often send the same question to someone else to check it for redundancy

seems like there is a lot you could do to adjust theses based on Bayesian confidence intervals, and exactly how mission critical you need certain data to be

maybe something like that already exists, idk

------------- (edit: is this what the Scale API does?)


Yup. My old company (www.crowdflower.com) has a platform set up to do this.


Please continue, this is a highly valuable comment.


You are definitely correct. This was just our first kick at the can. It was only $30 or so to test out the service. We learned a bit of info that we'll apply to future jobs (if we do it again). Such as, telling the Turker what to do if they don't find a website/email for the business.


Sorry to hear about that experience. You should consider using Scale API (www.scaleapi.com). This is a perfect use case for our data collection API: https://docs.scaleapi.com/#create-data-collection-task


Why does the "Scale" logo on this page not link to your home page? From that API documentation page I have no context about what Scale API is or what it does. The "Introduction" page isn't even that helpful if I've never heard of you before.

All that said, Scale API seems like a nice alternative to Mechanical Turk for some kinds of tasks.


It's a fair point. Lots of documentation pages or company blogs don't link to the company's main website, and it's super annoying.


We're working on redesigning and cleaning up all of our pages! Sorry about that


Very interesting. I will keep this in mind since we do have another 10,000 or so jobs that we'll need completed in the next while.

The pricing page isn't totally helpful. I'm not sure if I would fall under `Comparison $0.10` or `Data Collection $0.25/minute`... or if many of those prices apply to a single job.

Either way, I'll keep your product in mind.


Tbh I found scaleapi to be quite expensive, much more than implementing your own QA process on Turk. Also if scaleapi goes down you have to do the work all over again. Scaleapi is nothing new, there have been others before but they were gone as quickly as they came, and suddenly all the API code stopped working.

It was a bad enough experience that I'd never base my mission critical product on a 3rd party startup. Amazon could go bad too but a small startup it's almost expected.


Cool! We're working on a redesign of the pricing page—stay tuned :)


So why use Mechanical Turk in the first place? Turkers will work for a single penny in many cases.

Exactly what I was looking for: the cold brutal logic of captialism. It's all good if it's low-cost.

Even after all of this you will still get bad answers.

...yes, of course, you're not paying for quality, you're paying for quantity and to reduce your costs. If you were paying for quality you would put up a few posters on college campuses and pay more.


Those animated gifs are extremely distracting.


Dark background, low contrast, ugly font, and the GIFs.

It's almost like somebody was trying hard to make this article unreadable.


They actually prevented me from reading the article.


I didn't even see them.


Hey guys! I'm cofounder of Scale API (www.scaleapi.com), a YC S16 company building an API for human intelligence. We've been working to obviate the need to tune your system to work on products like MTurk, and instead have a really simple API that just works. We've worked to build technology to guarantee quality to our customers and build a simple developer experience.

I really respect your ability to work with MTurk and have it work for you guys. In our experience, it often takes significant effort to get anything remotely functional and reliable on MTurk. That's why we're building Scale :)


I listened to you on Software Engineering Daily! Haven't found a use for the service yet but I've been keeping you guys in the back of my mind in case a good use case comes up at work.


scaleapi is great but too expensive to justify not building out your own QA process. Also don't want to be basing your product around an API that may or may not exist in a few years.


Huge fan of scale, could never get MTurk to work at the quality I wanted.


Your profile pic gif is broken. It just shows a still picture of you.


There was a great talk by some AWS guys at the aws re-invent summit on ways to improve accuracy using ideas similar to cross-validation...

It's on iTunes - title is: "Getting to ground truth with Amazon web services mechanical Turk"

Video also available on YouTube: https://m.youtube.com/watch?v=vRtLdeNl7Tg


I tested out Mechanical Turk back in 2008. I think I was trying to get a YouTube video promoted on some site.

I still have some credits on the system.

I am not even sure if I could come up with a good use. For those with current usage experience, could I create in theory a task where people would look up the best sights to see in the top 10 travel destinations?

Would this be a valid use case, and how would you deal with duplicates?


If you have the credits, survey how popular it is to drink and turk. I have coworkers who think its funny and claim its popular in their younger peer group. That anecdote is not actual data. I don't know if its popular or almost unheard of. I don't know if amazon allows meta-gaming like this. Amazon does not tolerate outright identity theft (send me your SS number to verify your accuracy) but a simple question like "how many alcohol drinks have you had today" is probably anonymous enough. Amazon spends a lot on advertising, so you can see why editors would not cover this story assuming it were true (which it might not be)

Drink and Turk is a drinking game where you try to turk enough to pay for your alcohol. I've seen some non-cooperative behavior of trying to find the funniest turk so the group laughs or give the most ridiculous answer still providing payment. Most turking is pretty boring but you'd be surprised what alcohol and youth and a slightly warped mind can do. There is supposedly a weed variation on this game, you can guess its obvious single rule.


Want to cofound a bar? You get a tablet at each table, and have to earn enough credits to get the next drink. No need to pay cash, drinks will be paid right out of your turk balance.


You can literally ask anything.

Want another opinion on something you're thinking about?

Make a task asking people to write in their opinion.

Ask people for domain name ideas. Take two pictures of yourself, ask people which outfit looks better.

Having trouble writing something? Feeling uncreative? Ask people to come up with 3 ideas on some topic. Ask people to re-word a paragraph for you.

On thing to keep in mind is that you can ask for the same task to be done 10 times by 10 different people, and you should reward people when they put in at least a good faith effort. It's a good idea to get redundancy intentionally if the task is somewhat open-ended, as you may not get exactly what you want and want the best of the effort of a variety of people.


Does a turker make at least minimum wage? (8,50€/h here in Germany)


You could make above minimum wage if there was actually a quantity of good HITs to choose to work.

Someone somewhere must've written a guide about sourcing SEO articles from MTurk because I used to write 150 word article summaries (more for fun than anything, liked to try to go as fast as possible, they provide link, you summarize, even if the source article is less than 150 words, so its more like rephrasing) for between $0.25-$0.50/summary. There was usually only about 30 summaries on any given day which makes me think it was involved as a part of a guide or someone's startup maybe because the requesters were almost always different.

Somedays there would be several hundred. I made over $15 in an hour on a few occasions, but it wouldn't be a consistent source of income.


It's very unlikely. I tried it a couple months ago just for the experience, and in the ~1.5 hours I spent, I may have made $5. I honestly don't understand why anyone would participate as a worker on the site.


That's more than the minimum wage where I live, and I am in Europe. I suppose there are many countries with people that would be happy to work for that rate.


From what I've read, there are 3 popular demographics that are happy working on MTurk: 1) People living in areas where a non-strenuous, $5/hour is hard to come by. 2) People who are bored and find MTurk a somewhat interesting way to make some spare cash (stay-at-home parents, elderly, students) 3) People who are bored of sitting in front of a screen at work all day with nearly nothing to do (security guards, cashiers) and could use some extra cash


In short, not often. It's likely that Master workers usually do, but because they are good rather than required to by policy.

For now, non US-based people can't sign up for AMT as workers (I think they made this change 2 years ago).


I believe plenty of people do it for a leisure activity rather than to make a living. Sort of like putting puzzles together or surfing the web. Some workers may even be at a dull job (such as a tollbooth operator at night) and their boss doesn't mind.


I tried it once for image tagging, and at least at the time they showed the effective $/hr rate in the UI, and it was around $4-$5.


Afaik they get paid per HIT so it all depends on how effective they are.


Even Firefox's new reader view din't get rid of most of those things. But it did fix the monospace font and contrast issues.


Can turkers be asked to install a chrome extension or run javascript in their consoles?

Also can you include javascript in the HIT HTML file?


Needs more animated gifs. What was the blog post about again?




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: