I wanted to read your article, but the design of your blog really sucks. I can hardly read the text, it's quite small. Monospace fonts are great for programming, bad for reading. Animated gifs every 5 lines is really annoying.
I hate to be negative, but am doing so in hopes that the author will improve. I had the same experience. With the animated gifs every paragraph, it gives the feel of being one of those Buzzfeed articles where one has to click through to eleven different pages to see the whole list. Really couldn't read any of it due to the distraction and style choices.
I found the gifs really obnoxious. Yes we get it, it's an article about Mechanical Turk and the guy from Scrubs is named Turk. Wow! No need to repeat the same "joke" EIGHT times.
Readability is a browser feature that extracts the text of a website and displays it in a pre-defined, easy to read way. Safari calls this feature “Reader”[1].
Many browsers have a "reading" mode or extension that strips CSS and pictures. Safari Reader is built in, Firefox and Chrome have an extension called Readability that you can install.
It's one of the best things about the browser on mobile phones. I never have an issue with weird JavaScript pop ups or sites that are not movie friendly.
I write bookmarklets for things like this. Someone already posted some code below. They have the benefit of being quick to write, bespoke to my needs and low overhead.
I feel like the information in the article is incredibly useful, but like you, I found the gifs distracting to the point of absurdity. I've never seen this show, but is it good enough such that you would have it strewn all over your good information? The font choice and color definitely didn't help. I guess it is the whole signal to noise question in action.
I have to concur with a lot of the others on here. I thought the content was interesting, and I really wanted to read it all and learn from it, but the colours, font sizes, and oh - those awful animated gifs just resulted in me reeling back and closing the browser window.
I'm willing to forgive a lot, but that site is atrocious. I could care less about 'well designed' - I'm fine with black and white with boring fonts - but gifs bouncing around all over the place combined with a difficult color scheme is too much.
We (the PLASMA research group at UMass, http://plasma.cs.umass.edu) developed a system called AutoMan specifically designed to automatically manage quality (as well as to automatically compute pay and time) for a wide variety of tasks. You basically invoke people as functions and it just works, with statistical guarantees (it also handles payment, etc. without any additional effort). Makes dealing with MTurk much nicer. Best used in Scala but also can be used from Java.
We're working on a lot of the same things at Scale API (www.scaleapi.com). Starting with a higher quality set of task-completers, and building in similar statistical guarantees for our tasks.
One of the things we work on is building quality for responses that are little more complex (bounding boxes and audio transcription, for example). I'd be interested to see if we can apply some of your learnings to those task types!
Definitely. That's one of our focuses at Scale API (www.scaleapi.com). We build the UIs and the tooling to make sure they're efficient and intuitive, to avoid the exact problems of requesters making poorly designed work.
You mention your business with a hyperlink four separate times in this thread. I have to wonder how much of your purpose in posting is informative after the second or third time.
This. And pay good turkers accordingly. The average HIT is comically bad paid. Do people think that paying 20 cents for a job that takes 15 minutes to be well done will get them good work?
As a requester I'm aiming for federal minimum wage which, to my knowledge, is above average, but still very low of course. Most turkers are doing a tremendous job, but there are also quite a few bad apples who are consistently trying to cheat. It's amazing how much thought some people invest in finding ways to cheat when they could just do the freaking task. I'm not just losing data because of these people, I also have to screen the all data incredibly carefully. As a requester, I have to price this in and this hurts honest turkers. I feel sorry about this, but I just can't justify paying as if everyone was doing great work.
If min wage is say $7.25 an hour, which is what you aim to pay turks... but then you end up throwing away 2/3 of your work (say have 3 turks do each piece to validate they did it right).... why not hire 1 local person for $15 an hour? Decent wage, less total money (15 vs $22 an hour), and likely better results.
I didn't see an answer to this ... so I'll take a stab at it.
One advantage of using a turk is it's on an as needed basis. Hire someone for 1 hour, 1 week, or even 1 month of what used to be called "piecework".
That's quite different than hiring a person locally. Where can you find local people to do piecework? It's quite likely the task or tasks envisioned don't constitute full time or even part time employment. They're just one-time tasks.
Maybe temp agencies still supply people for jobs like this? But you're limiting yourself to a small pool of people unless you're in a big metropolitan area or unless there's a University with a large group of hungry students nearby.
Two reasons: 1.) Data collection on MTurk is much faster because many people can work in parallel on the task. 2.) I'm doing academic surveys and need data from as many different people as possible.
AFAIK, you can filter 'bad' turkers (by country, performance, etc) and inform good ones that HITS are available. By the other hand, if you pay decently, turkers will be afraid to loose that income source and will perform their best. (Disclaimer: I've made some HITS in a distant past.)
I recently tested MTurk for my startup. We set up about 500 HITs to collect website URL and email from various businesses. We set the price at $0.05 (Amazon takes an additional $0.01). Jobs quickly got started and within 24 hours we had all of our data collected.
I'm not sure I would do it again though. A lot of the businesses we were targeting don't have a web presence and therefore "No URL/No Email" was a viable answer. However, when I went through the list to see 150 "No URL/No Email" answers I didn't know for sure whether that is true or whether the Turker realized they could just copy/paste and make a quick buck. Amazon does provide the amount of time they spent on the task so I rejected any that were less than 10 seconds as I felt like they didn't give it a good enough try. Over that, I just accepted the answer realize that it may be false.
In the end I think I spent more time going through results and correcting them then it actually saved me. I'm excited to use MTurk in the future again, but only for appropriate projects.
are there any tools that help you do QA on MT data entry in a Bayesian way?
seems like the situation is ripe for such a tool
---------
start with a subset of questions you have a predetermined answer to, only keep feed questions if the person responding has met a certain quality threshold on those question
every so often feed them a QA question
every so often send the same question to someone else to check it for redundancy
seems like there is a lot you could do to adjust theses based on Bayesian confidence intervals, and exactly how mission critical you need certain data to be
maybe something like that already exists, idk
-------------
(edit: is this what the Scale API does?)
You are definitely correct. This was just our first kick at the can. It was only $30 or so to test out the service. We learned a bit of info that we'll apply to future jobs (if we do it again). Such as, telling the Turker what to do if they don't find a website/email for the business.
Why does the "Scale" logo on this page not link to your home page? From that API documentation page I have no context about what Scale API is or what it does. The "Introduction" page isn't even that helpful if I've never heard of you before.
All that said, Scale API seems like a nice alternative to Mechanical Turk for some kinds of tasks.
Very interesting. I will keep this in mind since we do have another 10,000 or so jobs that we'll need completed in the next while.
The pricing page isn't totally helpful. I'm not sure if I would fall under `Comparison $0.10` or `Data Collection $0.25/minute`... or if many of those prices apply to a single job.
Tbh I found scaleapi to be quite expensive, much more than implementing your own QA process on Turk. Also if scaleapi goes down you have to do the work all over again. Scaleapi is nothing new, there have been others before but they were gone as quickly as they came, and suddenly all the API code stopped working.
It was a bad enough experience that I'd never base my mission critical product on a 3rd party startup. Amazon could go bad too but a small startup it's almost expected.
So why use Mechanical Turk in the first place? Turkers will work for a single penny in many cases.
Exactly what I was looking for: the cold brutal logic of captialism. It's all good if it's low-cost.
Even after all of this you will still get bad answers.
...yes, of course, you're not paying for quality, you're paying for quantity and to reduce your costs. If you were paying for quality you would put up a few posters on college campuses and pay more.
Hey guys! I'm cofounder of Scale API (www.scaleapi.com), a YC S16 company building an API for human intelligence. We've been working to obviate the need to tune your system to work on products like MTurk, and instead have a really simple API that just works. We've worked to build technology to guarantee quality to our customers and build a simple developer experience.
I really respect your ability to work with MTurk and have it work for you guys. In our experience, it often takes significant effort to get anything remotely functional and reliable on MTurk. That's why we're building Scale :)
I listened to you on Software Engineering Daily! Haven't found a use for the service yet but I've been keeping you guys in the back of my mind in case a good use case comes up at work.
scaleapi is great but too expensive to justify not building out your own QA process. Also don't want to be basing your product around an API that may or may not exist in a few years.
I tested out Mechanical Turk back in 2008. I think I was trying to get a YouTube video promoted on some site.
I still have some credits on the system.
I am not even sure if I could come up with a good use. For those with current usage experience, could I create in theory a task where people would look up the best sights to see in the top 10 travel destinations?
Would this be a valid use case, and how would you deal with duplicates?
If you have the credits, survey how popular it is to drink and turk. I have coworkers who think its funny and claim its popular in their younger peer group. That anecdote is not actual data. I don't know if its popular or almost unheard of. I don't know if amazon allows meta-gaming like this. Amazon does not tolerate outright identity theft (send me your SS number to verify your accuracy) but a simple question like "how many alcohol drinks have you had today" is probably anonymous enough. Amazon spends a lot on advertising, so you can see why editors would not cover this story assuming it were true (which it might not be)
Drink and Turk is a drinking game where you try to turk enough to pay for your alcohol. I've seen some non-cooperative behavior of trying to find the funniest turk so the group laughs or give the most ridiculous answer still providing payment. Most turking is pretty boring but you'd be surprised what alcohol and youth and a slightly warped mind can do. There is supposedly a weed variation on this game, you can guess its obvious single rule.
Want to cofound a bar? You get a tablet at each table, and have to earn enough credits to get the next drink. No need to pay cash, drinks will be paid right out of your turk balance.
Want another opinion on something you're thinking about?
Make a task asking people to write in their opinion.
Ask people for domain name ideas. Take two pictures of yourself, ask people which outfit looks better.
Having trouble writing something? Feeling uncreative? Ask people to come up with 3 ideas on some topic. Ask people to re-word a paragraph for you.
On thing to keep in mind is that you can ask for the same task to be done 10 times by 10 different people, and you should reward people when they put in at least a good faith effort. It's a good idea to get redundancy intentionally if the task is somewhat open-ended, as you may not get exactly what you want and want the best of the effort of a variety of people.
You could make above minimum wage if there was actually a quantity of good HITs to choose to work.
Someone somewhere must've written a guide about sourcing SEO articles from MTurk because I used to write 150 word article summaries (more for fun than anything, liked to try to go as fast as possible, they provide link, you summarize, even if the source article is less than 150 words, so its more like rephrasing) for between $0.25-$0.50/summary. There was usually only about 30 summaries on any given day which makes me think it was involved as a part of a guide or someone's startup maybe because the requesters were almost always different.
Somedays there would be several hundred. I made over $15 in an hour on a few occasions, but it wouldn't be a consistent source of income.
It's very unlikely. I tried it a couple months ago just for the experience, and in the ~1.5 hours I spent, I may have made $5. I honestly don't understand why anyone would participate as a worker on the site.
That's more than the minimum wage where I live, and I am in Europe. I suppose there are many countries with people that would be happy to work for that rate.
From what I've read, there are 3 popular demographics that are happy working on MTurk: 1) People living in areas where a non-strenuous, $5/hour is hard to come by. 2) People who are bored and find MTurk a somewhat interesting way to make some spare cash (stay-at-home parents, elderly, students) 3) People who are bored of sitting in front of a screen at work all day with nearly nothing to do (security guards, cashiers) and could use some extra cash
I believe plenty of people do it for a leisure activity rather than to make a living. Sort of like putting puzzles together or surfing the web. Some workers may even be at a dull job (such as a tollbooth operator at night) and their boss doesn't mind.
Sorry!