GitHub is struggling to contain an ongoing attack that’s flooding the site with millions of code repositories. These repositories contain obfuscated malware that steals passwords and cryptocurrency from developer devices, researchers said.
The malicious repositories are clones of legitimate ones, making them hard to distinguish to the casual eye. An unknown party has automated a process that forks legitimate repositories, meaning the source code is copied so developers can use it in an independent project that builds on the original one. The result is millions of forks with names identical to the original one that add a payload that’s wrapped under seven layers of obfuscation. To make matters worse, some people, unaware of the malice of these imitators, are forking the forks, which adds to the flood.
Whack-a-mole
“Most of the forked repos are quickly removed by GitHub, which identifies the automation,” Matan Giladi and Gil David, researchers at security firm Apiiro, wrote Wednesday. “However, the automation detection seems to miss many repos, and the ones that were uploaded manually survive. Because the whole attack chain seems to be mostly automated on a large scale, the 1% that survive still amount to thousands of malicious repos.”
Given the constant churn of new repos being uploaded and GitHub’s removal, it’s hard to estimate precisely how many of each there are. The researchers said the number of repos uploaded or forked before GitHub removes them is likely in the millions. They said the attack “impacts more than 100,000 GitHub repositories.”
GitHub officials didn’t dispute Apiiro’s estimates and didn’t answer other questions sent by email. Instead, they issued the following statement:
GitHub hosts over 100M developers building across over 420M repositories, and is committed to providing a safe and secure platform for developers. We have teams dedicated to detecting, analyzing, and removing content and accounts that violate our Acceptable Use Policies. We employ manual reviews and at-scale detections that use machine learning and constantly evolve and adapt to adversarial tactics. We also encourage customers and community members to report abuse and spam.
Supply-chain attacks that target users of developer platforms have existed since at least 2016, when a college student uploaded custom scripts to RubyGems, PyPi, and NPM. The scripts bore names similar to widely used legitimate packages, but otherwise had no connection to them. A phone-home feature in the student’s scripts showed that the imposter code was executed more than 45,000 times on more than 17,000 separate domains, and more than half the time his code was given all-powerful administrative rights. Two of the affected domains ended in .mil, an indication that people inside the US military had run his script. This form of supply-chain attack is often referred to as typosquatting, because it relies on users making small errors when choosing the name of a package they want to use.
In 2021, a researcher used a similar technique to successfully execute counterfeit code on networks belonging to Apple, Microsoft, Tesla, and dozens of other companies. The technique—known as a dependency confusion or namespace confusion attack—started by placing malicious code packages in an official public repository and giving them the same name as dependency packages Apple and the other targeted companies use in their products. Automated scripts inside the package managers used by the companies then automatically downloaded and installed the counterfeit dependency code.
The technique observed by Apiiro is known as repo confusion.
“Similar to dependency confusion attacks, malicious actors get their target to download their malicious version instead of the real one,” Wednesday’s post explained. “But dependency confusion attacks take advantage of how package managers work, while repo confusion attacks simply rely on humans to mistakenly pick the malicious version over the real one, sometimes employing social engineering techniques as well.”