Increasing GitHub Actions Disk Space
A couple of days ago, all of the sudden, my jobs started running out of space.
I’ve started GoReleaser almost 2 years ago. This is a summary of (some) things I’ve learned down the road.
I already talked about GoReleaser here a few times, if you feel like reading about it first:
I tried to organize things in subtopics, some of them are bigger and may go further into the subject than others.
Without further due, let’s get started!
There are only two hard things in Computer Science: cache invalidation and naming things.
— Phil Karlton
People misreading GoReleaser as “gore leaser” is quite common.
I thought it was a really good name because it has “Go” and “Releaser”, so, yeah, it releases Go projects, pretty easy to figure out what is going on there.
But I totally forgot about trying it all lowercase and without context.
On the bright side, “gore leaser” would be a very good heavy metal band name, as someone already pointed out on Twitter (I just need a band and more guitar skills 😂).
I still think the name is OK though, after all, it is easy to write, and it tells what the software does. What can be bad is that the scope is very reduced: what if I decide to also release Rust projects?
Maybe I’m just real bad at naming things. 🤷♂️
At first glance, it seemed like a good idea. I needed to transport the “context”
of the release through the pipeline, and needed the features of the “regular”
context
package (like cancellation). So I created an internal context
package, which holds all the data I need plus a context.Context
instance, so I
can use them interchangeably.
The good thing is that everything is checked at compile time: I know that I can
access ctx.Config.ProjectName
and that it will work.
The bad thing about it is that it can be confusing for new contributors, as most
of them will expect context
to be the language’s context
package.
I’m not sure if that was a good idea, but it is confusing. Maybe I could at least rename the package — but that would require renaming things in a lot of files, so I keep postponing it.
GoReleaser can do a lot of things. Because of that, it also has a lot of tests, and some of them can be complex.
A good example is the Docker pipe tests, in which I have things like this:
var table = map[string]struct {
dockers []config.Docker
publish bool
expect []string
assertError errChecker
}{
// a lot of test cases
}
// later:
for name, docker := range table {
t.Run(name, func(tt *testing.T) {
// actually run the tests
})
}
This suite has several tests that create a lot of images. All of which were sharing the same image name and binary name.
When a test failed, it was hard to figure things out based on logs — especially when you have table-driven tests — which is the case for this example.
I’ve been slowly fixing those, so the fake data is unique — most times by using the test name or something like that. It helps a lot.
A cool trick I’ve learned reading other’s people code: the errChecker
interface I have on this suite:
type errChecker func(*testing.T, error)
I also have some “helper” functions:
var shouldErr = func(msg string) errChecker {
return func(t *testing.T, err error) {
assert.Error(t, err)
assert.Contains(t, err.Error(), msg)
}
}
var shouldNotErr = func(t *testing.T, err error) {
assert.NoError(t, err)
}
Then, on my cases I can have things like:
"successfull test case": {
// omitted details for the sake of brevity
assertError: shouldNotErr,
},
"bad template test case": {
// omitted details for the sake of brevity
assertError: shouldErr(`template: tmpl:1: unexpected "}" in operand`),
},
Now I can wrap several complex tests with no repeated code.
Of course, besides error handling, you can use that for other purposes.
This was the main reason I decided to write
nfpm. I was using
fpm, which is good, but people will
have weird environments. Random versions of things — or just very old versions
of things, weird PATH
setups, and a lot of things you were not expecting.
You can either “guard” your software against that — say, works only on version v1.4.2 of something, make the code work with multiple versions, or remove the dependency.
What I’ve learned is that the less 3rd parties you depend on, the better. More importantly, I’ve learned that sometimes it is just not worth it to remove some dependency from a 3rd party.
For example, GoReleaser depended on fpm. At some point, I decided to remove the dependency entirely because it was generating a lot of bugs and random build failures.
To be able to do that, I wrote nfpm, which
is guarded against multiple versions of rpmbuild
, its only external
dependency.
By removing fpm, I also removed the
dependencies on tar
, Ruby, gem
and possibly others.
I could probably write rpmbuild
in Go as well, so I could get rid of that one
last dependency, but, is it worth it?
I think it would have a bad return over the investiment of my time, so, no.
There is no lib to deal with that, and RPM packages seem to be really complex to
generate. Plus, rpmbuild
is already distributed for all major platforms.
Was removing fpm a good investment of my time? Yes, I have way fewer reports of “deb packaging not working” and way less unstable builds.
When writing nfpm, I’ve also learned:
rpmbuild
;Important: is not that fpm is not good, it is awesome software! I just didn’t want to guard GoReleaser against all the combinations of things that could go wrong, and I didn’t need all its features either. If you need to package your software in a lot of formats using a single tool, fpm for the win!
I’ve learned that it is very hard to get just the exact amount of documentation, so it doesn’t suck.
If you write too much, people probably won’t read. It is also likely to get too complex, thus also hard to grasp.
If you write too little, people maybe will read, but will not learn everything they need.
Writing more docs also eventually leads to more complicated and confusing docs — just like writing more code, who knew!?
I tried providing some kind of commented config examples, thinking it may be straightforward enough (as people could copy and change).
But, just as most commented config files out there, most people won’t read them. And I don’t blame them, I should probably do a better job on that.
I still don’t know the right/the best way of doing this, just learned one more non-optimal way. If you do, I would love to chat about it though!
In the beginning, I split the archive package into another repository, because I thought it would be useful to other people as well.
I’ll move it into GoReleaser’s tree very soon.
— me, 2018
I don’t usually work with monorepos, so I haven’t planned GoReleaser to work in that way. A few things are already fixed, but it still won’t support releasing each artifact with a separated tag, for example.
I think that in the particular case of GoReleaser, a monorepo will be easier to manage:
dep ensure -update
on GoReleaser;So, yes, I probably should have gone with a monorepo.
Is not that a monorepo is the fix for all the problems though.
I think the real issue is that I forced myself to split things too early.
Maybe nfpm and godownloader are less wrong, but archive for sure was a mistake. Premature optimization… of sorts.
Someone way smarter than me once said:
Bad programmers worry about the code. Good programmers worry about data structures and their relationships.
— Linus Torvalds
Remember context
from before, right? It was even worse. The way the artifacts
were stored in context was in way that work great for a few artifacts, not so
great with several artifacts and several kinds of artifacts. The way it was lead
to a lot of bad code all spread across almost all pipes.
Later on, I’ve added the artifact
package, which abstract this into a simple
slice of Artifact
, also adding a nice DSL to filter artifacts by kind etc,
and I had to refactor a lot of things for that.
So, now if I want to upload all Linux packages for amd64
in a pipe, for
example, I can:
ctx.Artifacts.Filter(
artifact.And(
artifact.ByGoarch("amd64"),
artifact.ByType(artifact.LinuxPackage),
),
).List()
Before that, it was stored as a map[string]map[string][]Binary
, so I had to do
things like:
for platform, binaries := range ctx.Binaries {
if !strings.Contains(platform, "amd64") {
continue
}
for folder, binaries := range groups {
}
}
And the only way to say if a Binary
was actually a Linux Package would be by
the file extension.
I’m still not sure that current form is the best solution, but for sure is better than the first ones. Another good catch here is that I can replace the internals of how it works while keeping the way pipes access it.
To summarize, I’ve learned, in the hard way, that bad decisions on the data structures lead to bad decisions on the interactions between them, which lead to bad design in general.
It is not easy to be the product owner, and the developer at the same time.
As a developer, it is easy to say “yes” to things. It is just more code, right? It reminds me of this Tweet:
First law of software quality:
errors = (more code)^2
E = mc^2
— @ingramchen, Nov 12, 2014
Assuming that errors = (more code)^2
and more features = more code
are both
true, how do I decide if something goes in or not?
I have been trying to follow what Solomon Hykes once said:
Rule #1 of open-source: no is temporary, yes is forever.
— @solomonstre, Mar 30, 2016
Some things are, like, just… cute… you will regret adding those. Other people also learned adding things because it’s kind of cool may not be a very good idea:
The entire talk is great, but the point I refer to is at 14:38.
The reasoning for the “yes is forever” thing is that if you say no to something, you can always go back later and say yes if you change your mind, but once something is in, you can never take it out without breaking changes.
So, having that in mind, when I look at a feature request, I always try to ask myself:
— Am I 100% sure that this should go in?
— Am I 100% sure that this is the way we should do it?
I usually don’t rush into deciding that. I take a look, think what I have to
think about it, give it a day or two (or more), take another look and think
again. If the answer to both questions was yes both times, I “say yes” to the
PR/feature request/whatever. Otherwise, I suggest what I think that needs to
change (and explain why), or just say why I don’t want that as nicely as I can
and close as wontfix
.
Even with all that in mind, it is hard to say no. People spend time working on that pull request, and they are full of good intentions.
Maybe it is their first pull request ever, or the person is just really excited about it, who knows?!
Most maintainers try to be nice — I surely do, but I know that it can still be a bad experience.
One thing that I’ve learned and that I think helps: open an issue first, ask if the maintainer would be interested in a PR implementing the feature you want to implement. Someone did that on GoReleaser (I think), and I believe it is a great way of saving everyone’s time!
If the maintainer says no, but you still really do want that feature, keep a fork. Everybody wins. 🙂
Technically, GoReleaser is still not v1, so it should mean that I could just break stuff… of course, I don’t want to do that. I want the transitions to be as easy and painless as they can be.
That’s why I’ve added deprecation notices to the
docs, and when you run goreleaser
with
a deprecated config, it will put a WARN
log pointing to that URL.
I still think there should be a way of making that more visible to users… maybe adding a 1min sleep or something can be a valid approach, but I’m still not sure about it.
I’ve also learned it is hard to find if someone is using that thing you want to deprecate. Maybe I should add some kind of tracking? Don’t know.
I’ve learned that I just don’t know how to handle those things on GoReleaser because people may not even read the log unless it fails (e.g. running on the CI).
We (people) usually don’t read terms, licenses and, etc because it is boring.
I release most of my software under the MIT license, including GoReleaser.
Summarizing: the MIT license says you can basically do whatever with my software as is.
That means that you can open an issue with some bug or feature request, and I could literally play Battlefield forever instead of fixing it (I really could, just look at those shiny stats that would be way better if I played sober most of the time haha).
I think I’ve learned things on both sides of the coin:
Or, as I like to say, no one owes anyone anything.
In the issue template, I ask people to ask questions on Slack. I think that is not optimal, as most people will try to search for they problem on Google, and Google does not index Slack conversations.
I’ve learned that probably the best place for questions on rather small communities is GitHub issues. I’ve seen bigger ones, like Hugo’s, using Discourse, and it seems to serve them well.
On GoReleaser’s case that seems overkill.
I could probably write about more things, for sure. Some topics were still really hard for me to externalize in words in a form that makes sense, so I end up removing them… at least for now.
Anyway, I hope that the reading of my many screw ups was interesting and that you enjoyed it. If not, please feel free to comment/complain below or contact me in any way (except maybe phone calls haha)! I’ll be glad to discuss it and maybe learn that I was wrong about one more thing.
Spoiler: I’ll talk more or less about those topics at GopherCon Brazil 2018, so your feedback is greatly appreciated! Hope to see you there!