image is huge
tomdavidson opened this issue · 19 comments
The docker image is huge at 1.3GB. Can it be split up based on language?
Does anyone know where's the code repository for the codeclimate/codeclimate-parser Docker image? (Is it open-source?) The same applies to codeclimate/codeclimate-structure :/
If someone can point me to the repo that generates that image, maybe I can do something to shrink it...
I'm just getting started with codeclimate, based on GitLab's integration into its Auto Devops. However, the humongous images make them totally impractical to use.
The codeclimate/codeclimate-parser image is definitely a problem. Looking at its history, it starts off with the following RUN instruction, which results in a very bloated image layer that makes up most of the downstream images' size (about 2.03GB uncompressed).
/bin/sh -c apt-get update &&
apt-get -y upgrade &&
RUNLEVEL=1 apt-get install --yes --no-install-recommends bash build-essential ca-certificates curl git gnupg &&
curl --location --silent https://deb.nodesource.com/setup_8.x | bash - &&
mkdir -p /usr/share/man/man1 &&
RUNLEVEL=1 apt-get install --yes --no-install-recommends dumb-init nginx nodejs openjdk-8-jdk-headless maven php php-common php-cli php-fpm php-xml composer python python-pip python-setuptools ruby2.3 ruby2.3-dev bundler clang libatomic1 libicu-dev libxml2 &&
curl -O https://storage.googleapis.com/golang/go1.9.2.linux-amd64.tar.gz &&
tar -xvf go1.9.2.linux-amd64.tar.gz &&
mv go /usr/local &&
curl -LO https://swift.org/builds/swift-4.0.3-release/ubuntu1610/swift-4.0.3-RELEASE/swift-4.0.3-RELEASE-ubuntu16.10.tar.gz &&
tar -xvf swift-4.0.3-RELEASE-ubuntu16.10.tar.gz &&
mv swift-4.0.3-RELEASE-ubuntu16.10 /usr/local
Snippet of image layers showing the size:
<missing> 4 days ago /bin/sh -c #(nop) ENV PATH=/usr/local/sbin:/… 0B
<missing> 4 days ago /bin/sh -c apt-get update && apt-get -y upgr… 2.03GB
<missing> 2 months ago /bin/sh -c #(nop) COPY file:9cdcb21d66199876… 198B
<missing> 2 months ago /bin/sh -c #(nop) LABEL maintainer=Code Clim… 0B
<missing> 2 months ago /bin/sh -c #(nop) CMD ["bash"] 0B
<missing> 2 months ago /bin/sh -c #(nop) ADD file:f30a8b5b7cdc9ba33… 55.3MB
For the codeclimate/codeclimate-structure image, that alone is 87% of the image's size.
@kinghuang I still don't know where on github is the codeclimate/codeclimate-parser Dockerfile. Maybe @larkinscott can tell us something about it?
👋 Hi. You're right that the image is big, largely because of the parser base image, which bundles support for many languages. That was a technical decision which has enabled us to add support for new languages faster, with the tradeoff that the image has gotten large. That repository is not currently open source, but we can update this thread if that changes.
If you're using the CLI and installing updates over time, we recommend running docker system prune occasionally to remove old, untagged images, which is a pretty good way to free up disk space.
@maxjacobson thanks for the info and clarity. I had been preferring code climate engines over sonarqube primarily because they were thought to be open source and I am disappointed to "discover" we have been using closed source without realizing it. I would appreciate it if this situation was communicated clearly in the documentation.
@maxjacobson Thanks for the info. My main problem isn't with storage size. Rather, the images have to be downloaded over and over again in CI jobs that make use of Docker-in-Docker and don't persist images across runs. This isn't a problem when images are in the 10s or low 100s MB. codeclimate's images are exceptionally large at 2 GB each, and it takes multiple 2 GB images to do a simple scan.
@tomdavidson I appreciate that feedback and I'll share it internally.
@kinghuang I'm not very familiar with the GitLab - Code Climate integration. When using Code Climate analysis on codeclimate.com, we cache images across many builds, and that's the environment we've optimized for. I hear you that it's not ideal, and I'm afraid I don't see a clear and short path to resolving it at the moment.
@maxjacobson I wonder how the "micro-service" approach that is being used for codeclimate "engines" impacts/detriments your ability to support languages at the pace you guys want to achieve.
This is a honest question - I'm really interested in the woes of "micro-services" approach, specially as I liked very much how the codeclimate engine containers work in conjunction...
Is there some way to share the same layers in codeclimate/codeclimate-duplication with codeclimate/codeclimate-structure??? It's a PITA to wait for Codeclimate CLI to download two huge images on a fresh Docker for Desktop VM.
For you consideration: Isn't it better to delegate each language support to different "engines" (or "sub-engines")? IMHO "duplication" is actually a code "structure" problem.
The image is now at 4.64 GB on my machine. Any updates on this ?
REPOSITORY TAG CREATED SIZE
codeclimate/codeclimate-duplication latest 10 days ago 4.64GB
codeclimate/codeclimate-structure latest 11 days ago 5.07GB
codeclimate/codeclimate latest 13 days ago 108MB
We are using Codeclimate with GitLabCI, this is really an issue when trying to keep our pipeline time as short as possible. The code check step takes ~6min and Codeclimate checks are running for ~20s, with more than 5 min for pulling this image and others. It makes it practically unusable for regular build with fast feedback...
Any update or idea to reduce the image size?
Images are now over 5.5G:
# docker image ls | grep -i codeclimate
codeclimate/codeclimate-eslint latest e1ff49ba2166 6 days ago 1.09GB
codeclimate/codeclimate-structure latest a003287f6f64 6 weeks ago 5.61GB
codeclimate/codeclimate-duplication latest cfc8aef3a663 6 weeks ago 5.62GB
We're using Codeclimate with GitLab and because of the size of the images we're running our of disk space.
We opted out using codeclimate as a quality tool because of the mentioned issues. Having over 500% of the check run time consumed for pulling and overprovisioning runner disks for the sheer amount of data was considered ineffective.
We opted out using codeclimate as a quality tool because of the mentioned issues. Having over 500% of the check run time consumed for pulling and overprovisioning runner disks for the sheer amount of data was considered ineffective.
Same here. I gave up using Code Climate ages ago because most of the time was spent pulling Code Climate images, rather than the code checks themselves. It's a huge waste of time.
Hi y'all. I no longer work at Code Climate so I can't effect any change for you and I'm not speaking for them, but I am still subscribed to this thread, and I can perhaps clarify a bit the incentives at play here.
Code Climate has a paid offering at codeclimate.com which is well-supported and does not have this problem. If you run its solution via Gitlab, Code Climate is not making any money from that, and so it's not the best use of their time to optimize that use-case when they can instead work on improving the service for their paying users.
It's nothing malicious, it just is what it is.
For the best experience, use codeclimate.com.
It's nothing malicious, it just is what it is.
On the one hand, I am understanding the reasoning, and as a company, would probably make the same decision.
On the other hand, my understanding of open source software and building it is in conflict with the decision, to have a central closed source repository, which is the base of the company's business, but integral part of the "open source" solution.
If I was the business, I'd see the GitLab users are potential customers, not a nuisance that I can't be bothered with.
I wasn't aware of Code Climate before GitLab added integrated support for it. I liked what it did. But, I'm not about to sign up for a service that apparently doesn't care about its users.
Hi @cawolf and @kinghuang ,
Thanks for the feedback! I definitely understand the sentiment, and OSS continues to be very important to us. ❤️ The current GitLab option you're using is a remnant of a time where supported GitHub, GitLab, and Bitbucket.
Unfortunately, we don't currently offer support for our Quality product with any non-GitHub SCMs. Sorry about the limitation with that. We do plan to revisit that in the future, but in the meantime our focus is on GitHub functionality.
Let me know if you have any other questions! Here to help.
Emily and the Code Climate Support Team