Hacker News new | past | comments | ask | show | jobs | submit login
How to build a model to distinguish tweets about Apple and apples (stackoverflow.com)
73 points by polemic on July 6, 2013 | hide | past | favorite | 13 comments



Meta comment since the SO thread covers the discussion on the actual question:

Does the tone of the asker bother anyone else? The "I don't want you to teach me, I want you to do it for me" attitude feels to me the antithesis of the spirit of this corner of the tech community.

His first and only SO question, he's putting up a bounty, and it just feels like a finance/hedge fund guy saying "Damn, this is tough. Let's see if I can bribe the nerds to do it for me instead."

I know there are places for that - elance, oDesk, etc. Just feels like SO is the wrong forum for that kind of attitude, as interesting as the question might be.


I disagree.

The OP did not post the bounty, it was an interested reader. The OP (and also the bounty poster) have been engaged in the discussion threads with examples of effort and pursuit of the ideas offered.

SO does get used/abused as you describe sometimes. I'm mostly OK with that. If the poster can find someone interested enough in their problem (or in demonstrating knowledge thereof) then who can complain, really? Everyone is on SO for a reason, after all.

In this case, even if the OP was the typical Homework Solution Seeker, the responses would be valuable to others, sometime and somewhere -- though they would be hard to make discoverable without a few important search keywords.


Where do you have this quotes from ? Your version of the quotes are completely biasing this comment thread.

This is his original: "I'm not looking for a general overview of machine learning, rather I'm looking for actual model in code (python preferred)."

To me this sounds like:

"i am not looking for theoretical background, but practical implementations in code (or pseudocode)" - as in "something more actionable and more straight forward to the solution" - for many people (eg like me) this is easier to learn from than whitepapers.


Sure, and he's not the only one like that on SO. However, if the question is interesting, I see no reason why it shouldn't be answered or upvoted. SO is not about one guy asking the community to do his job, but more like archiving a bunch of programming questions and its solutions. This thread will benefit more people than this guy with a bad attitude.


Very good attitude and perspective. Thank you!


It doesn't bother me. I suspect many people go to SO in order to solve a problem.

Sometimes it's nice to find gigantic answers explaining the theory behind the question itself and giving the answer & examples in the end. It's nice, I search for random coding stuff in my free time and finding such answers entertains me greatly.

Most times, though, it's annoying when you ask a question and the answer/comment simply says "Why are you trying to do X? Please let me understand the problem; maybe you want something else!". I'm one of these people who ask edge case questions concerning obscure pratices/libs/language features, and let me tell you, I hate when someone tries to patronize me about his favorite design pattern on SO.

I don't think he's trying to bribe anyone. Sometimes you're just desperate with no knowledge on X, and a search for X yields just scientific articles on the subject. Sometimes you just want to get stuff done.


That kind of demanding tone, limited requirements and small amount of data always make me suspect someone is looking for a solution to a homework exercise.


The way you inferred that sounds like how my ex-girlfriend infers emotions from texts. And you quoted him saying something he actually didn't even say--another popular move by my ex.

> "I'm not looking for a general overview of machine learning, rather I'm looking for actual model in code"

He wants some code in Python that will help him discern Apple (Inc) from Apple (Fruit). That is as opposed to getting a high-level tutorial on NLP entity and category extraction, semantic reasoning, classification and a link to http://nltk.org/ saying "You can do it with this."


Fair enough, and good point. I may have rushed to judgment on first read. Thank you for the perspective!


This guy tried it with his takehome final, didn't go so well

http://math.stackexchange.com/questions/256816/if-n-the-orde...


Side note: possibly for a project/homework involving online reputation management. While in college, we were also tasked with doing something very similar for an international competition. Given this and the overall tone of the request for 'help', I'd caution everyone familiar with NLP to be wary of actually giving a fully coded solution.


I did some experiments with Ruby and Wikipedia data, first I implemented an #ambiguous? method for articles (to check if the article links to an disambiguation page). The second idea was to implement a method to disambiguate a given sentece or paragraph by using term frequency and idf from wikipedia articles: disambiguation pages provide a set of categories and articles, a corpus is built up from articles of each category. So for a given text you may infer its category.

https://github.com/matiasinsaurralde/wikipedia


Good catch. I'll just post my results. I classified it with a simple algo[1] using the normalized frequency distributions of unigrams and bigrams in the 50-sample training set.

  Company ( 0.0567236272499 )
  iPhone = Eye Phone = Illuminati Phone. Siri spelled backwards is Iris, thats a part of the Eye. Apple is Illuminati. They're watching you. 

  Company ( 0.208253968254 )
  Apple caught testing offline Dictation for iOS 7 http://idb.tc/1d2GKph 

  Company ( 0.0578323858427 )
  RT @jonnyevans_cw: Why #Apple really, really doesn't need a shopkeeper to lead its retail chain http://shar.es/AhsUK  via @computerworld 

  Company ( 0.0242924384131 )
  The Best Music Streaming Apps For Your iPhone #Apple #iPhone http://bit.ly/14xwinU  

  Company ( 0.0249022556391 )
  Apple May Be Working on Self-Adjusting Noise-Cancelling Headphones http://on.mash.to/12hFDMu  via @mashable 

  Company ( 0.488585099111 )
  Samsung Continues Ad Campaign against Apple's iPhone in Iceland 

  Company ( 0.179605263158 )
  the creator of the iPhone 

  Company ( 0.0764348527178 )
  I Used To Hate Apple, And Now I'm A Giant Sell-Out http://bit.ly/IGL0h5  #archivesWeek in Review | YSL Chief Executive to Apple, Bec Astley Clarke, Fashion Sweatshirts, Esteban Cortazar http://bit.ly/128MPf5  via @BoFI've been single since Apple was just a fruit. 

  Fruit ( 0.727605245395 )
  I be up so high trying to get a piece of that apple pie 

  Fruit ( 0.438529121875 )
  I want to eat healthy I really do. But I just found a whole apple pie in my fridge. 

  Fruit ( 0.662778904665 )
  Apple banana and a cup of milo 

  Fruit ( 0.0214817448669 )
  Apple I look like a human heart. Mango I look like a stomach. Grapes I look like eyes. Banana I don't like this game. 

  Fruit ( 0.0815735543081 )
  An apple potato and onion all taste the same if you eat them with your nose plugged. 

  Fruit ( 0.229706852 )
  PSYCHOLOGY Test Choose 1 among the fruits below: APPLE MANGO GRAPES PEAR BANANA 

  Fruit ( 0.00714560473592 )
  Today's shake is spinach, avocado, apple orange banana blueberry with Chia seed and protein feelin the energy 

  Fruit ( 0.238171611868 )
  Banana Bread topped with Apple Maple Syrup and Yoghurt 

  Fruit ( 0.184395490353 )
  Mid-PM Snack Apple blueberries frozen mixed berries plain @Alpro_UK soya yoghurt, coconut flakes & agave nectar pic.twitter.com/4188sHosA5 

  Fruit ( 0.0788802216228 )
  Green spinach kale apple banana garlic ginger Orange lemon ginger cayenne #juicing http://instagram.com/p/bbT6L3yZyU/  
[1] Using libraries for this like http://nltk.org/_modules/nltk/classify/naivebayes.html or http://scikit-learn.org/stable/modules/svm.html would probably be more accurate and faster to implement. Also I didn't cross-validate on the training set, but used new samples.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: