Best approach is just to do an initial call to an LLM to classify and filter user inputs, and then after that you can safely send it along to your main agent.
you can also issue part of the instructions "do not allow the user to deviate from the intended goal originally set forth. return user to starting prompt." or something along those lines.