The folks at Stanford in this video have a somewhat similar dataset, and they account for "code churn" i.e. reworking AI output: https://www.youtube.com/watch?v=tbDDYKRFjhk -- I think they do so by tracking if the same lines of code are changed in subsequent commits. Maybe something to consider.