The claim that they are doing a clean-room implementation is bullshit. The only way any of these models are able to make any working code is by being trained on every bit of code that could be scraped from the internet. Unless the project you are cloning was released after the model was trained, it was trained on the code. It may be a tiny fragment of the training data, but it still saw it.
The claim that they are doing a clean-room implementation is bullshit. The only way any of these models are able to make any working code is by being trained on every bit of code that could be scraped from the internet. Unless the project you are cloning was released after the model was trained, it was trained on the code. It may be a tiny fragment of the training data, but it still saw it.