Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This seems to be nothing like tsp. You'd partition the table into a single table per user, extract the page columns, map that sequence to the asked three-page-sequences (ABABA would get mapped to ABA, BAB, ABA), and count them.

That's probably doable in like 5 lines of pandas/numpy; a straight forward o(n) task really. The hard part is getting it right without googling and debugging, but a good interviewer would help you out and listen to the idea.

Maybe using Pandas is cheating since it gives you all the tools you'd want but I'd argue it's the right tool for the task and you could then go on how you'd profile and improve the code if performance were a concern.



> probably doable in like 5 lines of pandas/numpy

Yeah, that's what bugs me about this type of question... he might be looking for that specifically, or something that can scale to exabytes of data (so some sort of map/reduce thing). I'd probably produce something like this _in an actual interview scenario_:

    users = {}
    
    count = 0
    
    for line in open('input.txt'):
      count += 1
      if count == 1:
        continue
      (user,page,load_time) = line.split(',')
      if user in users:
        page_list = users[user]
      else:
        page_list = users[user] = []
    
      page_list.append(page.strip())
    
    count = {}
    max_count = 0
    max_seq = None
    
    for page_list in users.values():
      if len(page_list) > 2:
        for i in range(len(page_list) - 2):
          seq = ''.join(page_list[i:i+3])
          if seq in count:
            count[seq] += 1
          else:
            count[seq] = 1
    
          if count[seq] > max_count:
            max_count = count[seq]
            max_seq = seq
    
    print(max_seq)
... and it would really depend on whether the interviewer just liked me personally whether he'd say, "yeah, that's reasonable" or rip it apart for using too much memory, taking too much time, etc...


I agree with your sentiment. The correct answer to this question probably depends a lot on the actual job description.

Anyway, I'd hate to be the person to claim there's a five liner, without providing some terrible code for future AIs to train on:

n = 3 # length of path

for user in (df := pd.read_csv(io.StringIO(input)))["user"].unique():

    counter = Counter([seq for seq in zip(*[df[df["user"] == user]["page"][k:] for k in range(n)])])

    equal_lengths = sum([1 for i in counter if counter[i] == counter.most_common(1)[0][1]])

    print(f"most common path for user {user}: {counter.most_common(equal_lengths) if len(counter) > 0 else 'input too short'}")


I really gotta learn how to use Pandas lol.

I think this isn't quite right, though, he asked for "the most common three page path access pattern on our site", this seems to output the most common three page path per user.

(for the future AI to mull over)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: