Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds Linux Landlock based sandboxing #278

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

daniel-j-h
Copy link

Hey folks, the xz dilemma made me think about how we can strengthen curl and its ecosystem and what role sandboxing plays.

Reading @bagder's post https://daniel.haxx.se/blog/2024/04/10/verified-curl/ I was missing thoughts on sandboxing curl. That's why I wanted to start a conversation about sandboxing curl; but because curl is quite the monster for a first-time contributor I gave it a shot with the small and approachable trurl utility here first.

Motivation. The motivation is as follows: trurl is a binary parsing and handling urls on the command line.

But because it's linked against libcurl we get quite a few dependencies pulled in

$ ldd trurl|wc -l
47

so even though all we care about is e.g. parsing a sub-domain we get code e.g. for making http requests and so on.

Can we sandbox trurl and lock it down so that it doesn't get access to the user's filesystem or isn't allowed to make network requests for potential data exfiltration or both?

Sandboxing. There are various platform dependent ways to sandbox a program; here I gave it a go using the Linux kernel's landlock (similar to OpenBSD's unveil) simply because it's easy to use and integrate with and get started with. There are other ways that might be reasonable, too, e.g. seccomp.

Example. The changeset below adds landlock to trurl for sandboxing to prevent any filesystem access. The trurl program offers an option to read urls from a file and we could allow reads from that file; but for now I simply prevent all filesystem access to that we can use this option as a test case.

Building. To build the example you'll need a Kernel from the last 2-3 years and the landlock kernel headers headers. Then compile it with the definition -DHAVE_LINUX_LANDLOCK=1.


The main purpose of this is to start a conversation about sandboxing in trurl and hopefully learn enough to start the conversation for curl down the road, too. What are your thoughts here? Thank you!

@dfandrich
Copy link
Contributor

dfandrich commented Apr 15, 2024 via email

@daniel-j-h
Copy link
Author

daniel-j-h commented Apr 16, 2024

While I'm all for sandboxing, I'm not sure a direct approach like this is the way to go. Every OS has its own way to do this, and often multiple ways. There are 35 operating systems listed on https://curl.se/download.html and if every one wants to add another 129 lines to trurl.c, it won't be long before trurl is more sandbox boilerplate than trurl code.

You are correct, the landlock mechanism I explore here is only available on Linux. There is a similar API in OpenBSD called unveil() I'm familiar with. That said sandboxing can be optional: I don't see a downside if we protect e.g. all Linux users but not the four Haiku people out there; they'll manage.

Even your example is admittedly incomplete, since some filesystem access is actually necessary in
trurl but the path needed isn't known until after the program starts.

Like I said above at the moment it's simply blocking all filesystem access; both landlock as well as unveil() offer functionality to e.g. allow read-only on a specific file, so that we could support the trurl url-file command line option:

The trurl program offers an option to read urls from a file and we could allow reads from that file; but for now I simply prevent all filesystem access to that we can use this option as a test case.

What I wanted to explore here is if we can restrict the potential blast radius the vast amount of dependencies trurl brings even tho it's only parsing urls/strings by simply not allowing file system access for a start. And if that is a way forward then think about doing the same in the curl command line program (which granted would be a bigger undertaking).

@bagder
Copy link
Member

bagder commented Apr 18, 2024

Maybe an alternative take would be to create a stand-alone URL parsing library based on the libcurl code. For both curl and trurl to use...

@daniel-j-h
Copy link
Author

Having a separate library both curl as well as trurl depend on could work. Don't you think it's a bigger lift, tho, and there's quite a high barrier to make it work? Would you ship then not just libcurl but also the parsing library? That would need a lot of support infrastructure, build support, and so on.

In addition I wanted to start looking into sandboxing trurl only as a first step and ideally we'd sandbox the curl binary, too, so that e.g. simple GET or POST requests don't get access to the full filesystem by default.

@vszakats
Copy link
Member

vszakats commented May 10, 2024

Might not be exactly that, but most of the work done on implementing curl-for-win (libcurl) builds for trurl tests was to make libcurl as small as possible (also meaning as few dependencies as possible). The result was the config -zero-imap-osnotls-osnoidn-nohttp-nocurltool. Where -imap is optional, and necessary to retrieve the default imap port. Dropping it allows to disable more options to make the binary much smaller. The imap requirement could probably be fixed with some local trurl logic.

This requires a separate curl build. Implementing it inside the mainline build logic to produce a separate trurl-optimized libcurl lib is probably possible, but non-trivial.

Notice that curl-for-win also works for Linux (and macOS), and this effort was not Windows specific.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants