Yes, It will work. Remember? we have added padding also. Based on stride and padding, even with the different input shape, it will still work.
However, nowadays, image are getting better fined-grained features (i.e. better resolution -> bigger image input size). I believe that, these researchers had experimented many different scenarios/architectures and found the right architecture and not just architecture but also get benefits of reusability to other incoming/future hands model.
Hope you got it! :D