Pairs of resumes and job descriptions with binary labels, one of the hired person was a good fit for the job, zero otherwise. Of course to compile such a dataset you would need to retroactively analyze hiring decisions: "Person with resume X was hired for job Y Z years ago, did it work out or not?" Not many companies do such analyses.
Question then is whether to fine tune an autoregressive LLM or use embeddings and attach a linear head to predict the outcome. Probably the latter.
You could also create viable labels without real life hires. Have a panel of 3 expert judges and give them a pile of 300 CVs and there's your training data. The model is then answering the easier question "would a human have chosen to pursue an interview given this information?" which more closely maps to what you're trying to have the model do anyways.
Then action the model so it only acts as a low confidence first pass filter, removing the bottom 40% of CVs instead of the more impossible task of trying to have it accurately give you the top 10%.
But this is more work than writing a 200 word system prompt and appending the resume and asking ChatGPT, and nobody in HR will be able to notice the difference.
I understand at a both a very high and very low level how LLMs are trained - can someone here help me better understand the middle?
I understand how one could build a training set of CVs, job descriptions, and outcomes. How much data would be needed here to create training and validation sets large enough to influence and confirm adequate performance?
One problem with any method like this is that this is not a single player game, and there are lots of companies that create AI generated resumes for you and also have data about who gets hired and who doesn't.