Devs unknowingly use “malicious” modules snuck into official Python repository

The official repository for the widely used Python programming language has been tainted with modified code packages, a computer security authority in Slovakia warned. The authority also said the packages have been downloaded by unwitting developers who incorporated them into software over the past three months.

Multiple code packages were uploaded to the Python Package Index, often abbreviated as PyPI, and were subsequently incorporated into software multiple times from June through this month, Slovakia's National Security Authority said in an advisory published Thursday. The unidentified people who made available the code packages gave them names that closely resembled those used for packages found in the standard Python library. The packages contained the exact same code as the upstream libraries except for an installation script, which was changed to include a "malicious (but relatively benign) code."

"Such packages may have been downloaded by unwitting developer[s] or administrator[s] by various means, including the popular 'pip' utility (pip install urllib)," Thursday's advisory stated. "There is evidence that the fake packages have indeed been downloaded and incorporated into software multiple times between June 2017 and September 2017."

Officials with the Slovak authority said they recently notified PyPI administrators of the activity, and all identified packages were taken down immediately. Removal of the infected libraries, however, does nothing to purge them from servers that installed them. The authority advised developers and administrators to check whether any of their servers are relying on the tainted packages. The advisory provided the specific commands that can be used to perform the check. In the event infected packages are found, administrators should remove them immediately and replace them with the proper package.

Shortly after Thursday's advisory went live, researcher and activist Benjamin Bach and freelance journalist Hanno Böck reported that they were able to seed PyPI with more than 20 libraries that are part of the Python standard library. They, too, modified the package installation files, in this case, with a script that caused developers to briefly connect to a server that recorded each developer's IP address. Within minutes, the server reported the libraries were being installed. Results published here showed the packages were downloaded almost 7,000 over a two-day period.

A case of mistaken identity

The problem is that packages in the standard Python library should originate only from their official source, rather than being downloaded from third-party repositories that store packages developed by non-official sources. Thursday's advisory and the results published on Friday demonstrate this best practice is being ignored by a significant number of developers and, in the process, could jeopardize the security of the resulting software. For instance, if a developer were to accidentally use a rogue pseudo-random number generator instead of Python's official secret module, an app's cryptographic functions might be easy for attackers to defeat.

"It's a very easy way to compromise many systems in a short time," Böck told Ars on Friday. "Ultimately, this comes down to the problem that everyone can upload to PyPI. Right now, this problem is completely ignored by the Python+PyPI people. We need at least to start a discussion about what the best solution should be."

By Saturday morning, PyPI administrators had removed the top 20 most-downloaded packages posted by Bach and Böck. It wasn't clear if PyPI was preventing new packages from using those names. Attempts to reach PyPI administrators weren't immediately successful.

The incidents closely resemble an attack carried out last year in a research experiment by a college student in Germany. As part of his bachelor thesis, University of Hamburg student Nikolai Philipp Tschacher uploaded packages to PyPI and two other repositories. The packages used names that were similar to widely used packages already submitted by other users. They also contained code that tracked the developers. Over a span of several months, his imposter code was executed more than 45,000 times on more than 17,000 separate domains, and more than half the time his code was given all-powerful administrative rights. Two of the affected domains ended in .mil, an indication that people inside the US military had run his script.

Bach and Böck started their project after discovering that many of the package names used in Tschacher's experiment had since become available again on PyPI, freeing the way for anyone else to offer malicious packages that used the same names.

"Benjamin [Bach] tried to tell both the Python security team and the PyPI devs about it and got no reaction," Böck said.

The problem is ultimately the result of developers and administrators who fail to inspect packages thoroughly. Adding to the insecurity, the widely used pip package management system (pictured above), which most Python developers rely on, doesn't require cryptographic signature before executing code when a package is installed. Böck said the PyPI is currently blocking use of the packages he and Bach used but that a more comprehensive solution still needs to be worked out.

Update: In an e-mail sent after this post went live, PyPI officials wrote:

Since the publishing of the announcement we've received many suggestions for how to prevent this sort of attack in the future. We're considering all of the options and nothing is off the table, but we caution that any solution will take time to implement.

Unlike some language package management systems, PyPI does not have any full time staff devoted to it. It is a volunteer run project with only two active administrators. As such, it doesn't currently have resources for some of the proposed solutions such as actively monitoring or approving every new project published to PyPI. Historically and by necessity we've relied on a reactive strategy of taking down potentially malicious projects as we've become aware of them.

The Python Software Foundation recognizes the importance of PyPI and the vulnerability of having so few people able to volunteer time to the project. For that reason it formed the Python Packaging Working Group in 2016 to help direct the project and raise funds to help support the sustainable development and maintenance of PyPI. While we hope that this sort of problem never arises again, we're grateful that it's once again turned a bright light on the fragility of the free and open source software projects and services which support us all. We can't do this alone and need everyone's help and support.

In the meantime, should anyone find any malicious or suspicious packages in the PyPI index, please report them as documented on https://pypi.org/security/ at once and our administrators will promptly respond and deal with the problems.

Listing image by Pythonregiuslover

Promoted Comments

YetAnotherAnonymousAppellation Ars Scholae Palatinae

jump to post

My biggest concern about something like this is that many people push Python as a very good language for beginners. I'm just speculating, but it seems to me that the combination of large numbers of unsophisticated users and a poorly-controlled source base is a recipe for really bad things to happen.

1096 posts | registered 1/8/2011
zmwangx Ars Centurion et Subscriptor

jump to post

Python developer here. PyPI does support GPG signatures (I do sign my packages), they are just not used by any standard tool like setuptools or pip. The reason is they simply don't help — GPG needs a web of trust, and there's no way to automatically establish this WoT on a package index where anyone could publish. At best you could make sure when you install a new version of a package, it is signed with the same key you first saw — but what if the developer switched to a new key? This just introduces trouble for little to no gain. Moreover, it doesn't help with namesquatting a bit.

There were talks about implement The Update Framework on PyPI, but I'm not familiar with it and not really sure how much it could help with security. IMO Python developers should be responsible for their own actions, i.e., not do stupid things like`pip install urllib`. Those that are noob enough to do this sort of things shouldn't be given access to critical servers (or any servers at all).

286 posts | registered 11/6/2014

Biz & IT —

Devs unknowingly use “malicious” modules snuck into official Python repository

Code packages available in PyPI contained modified installation scripts.

A case of mistaken identity

Further Reading

Promoted Comments

Promoted Comments

Channel Ars Technica

A case of mistaken identity

Further Reading

Promoted Comments

Promoted Comments

reader comments

Channel Ars Technica