Skip to main content

Git Repo Scanner

License Apache-2.0GitHub release (latest SemVer)OWASP Incubator ProjectArtifact HUBGitHub Repo starsTwitter Follower

What is Git-Repo-Scanner?#

Git-Repo-Scanner is a small Python script which discovers repositories on GitHub or GitLab. The main purpose of this scanner is to provide a cascading input for the gitleaks and semgrep scanners.

Deployment#

The git-repo-scanner chart can be deployed via helm:

# Install HelmChart (use -n to configure another namespace)helm upgrade --install git-repo-scanner secureCodeBox/git-repo-scanner

Scanner Configuration#

The scanner options can be divided into two groups for Gitlab and GitHub. You can choose the git repository type with the option:

--git-type githubor--git-type Gitlab

GitHub#

For type GitHub you can use the following options:

  • --organization: The name of the GitHub organization you want to scan.
  • --url: The url of the api for a GitHub enterprise server. Skip this option for repos on https://github.com.
  • --access-token: Your personal GitHub access token (needs full repo rights if you want to also find private repositories, otherwise repo:status and public_repo is sufficient).
  • --ignore-repos: A list of GitHub repository ids you want to ignore
  • --obey-rate-limit: True to obey the rate limit of the GitHub server (default), otherwise False
  • --activity-since-duration: Return git repo findings with repo activity (e.g. commits) more recent than a specific date expressed by a duration (now + duration). A duration string is a possibly signed sequence of decimal numbers, each with optional fraction and a unit suffix, such as '1h' or '2h45m'. Valid time units are 'm', 'h', 'd', 'w'.
  • --activity-until-duration: Return git repo findings with repo activity (e.g. commits) older than a specific date expressed by a duration (now + duration). A duration string is a possibly signed sequence of decimal numbers, each with optional fraction and a unit suffix, such as '1h' or '2h45m'. Valid time units are 'm', 'h', 'd', 'w'.
  • --annotate-latest-commit-id: Set to True to annotate the results with the SHA1 of the latest commit on the main branch. Causes an extra API hit per repository. False by default.

For now only organizations are supported, so the option is mandatory. We strongly recommend providing an access token for authentication, otherwise the API rate limiting will kick in after about 30 repositories scanned.

GitLab#

For type GitLab you can use the following options:

  • --url: The url of the GitLab server.
  • --access-token: Your personal GitLab access token (needs at least read_api and read_repository scopes).
  • --group: A specific GitLab group id you want to san, including subgroups.
  • --ignore-groups: A list of GitLab group ids you want to ignore
  • --ignore-repos: A list of GitLab project ids you want to ignore
  • --obey-rate-limit: True to obey the rate limit of the GitLab server (default), otherwise False
  • --activity-since-duration: Return git repo findings with repo activity (e.g. commits) more recent than a specific date expressed by a duration (now + duration). A duration string is a possibly signed sequence of decimal numbers, each with optional fraction and a unit suffix, such as '1h' or '2h45m'. Valid time units are 'm', 'h', 'd', 'w'.
  • --activity-until-duration: Return git repo findings with repo activity (e.g. commits) older than a specific date expressed by a duration (now + duration). A duration string is a possibly signed sequence of decimal numbers, each with optional fraction and a unit suffix, such as '1h' or '2h45m'. Valid time units are 'm', 'h', 'd', 'w'.
  • --annotate-latest-commit-id: Set to True to annotate the results with the SHA1 of the latest commit on the main branch. Causes an extra API hit per repository. False by default.

For Gitlab, the url and the access token is mandatory. If you don't provide a specific group id, all projects on the Gitlab server are going to be discovered.

Requirements#

Kubernetes: >=v1.11.0-0

Values#

KeyTypeDefaultDescription
cascadingRules.enabledboolfalseEnables or disables the installation of the default cascading rules for this scanner
parser.envlist[]Optional environment variables mapped into each parseJob (see: https://kubernetes.io/docs/tasks/inject-data-application/define-environment-variable-container/)
parser.image.pullPolicystring"IfNotPresent"Image pull policy. One of Always, Never, IfNotPresent. Defaults to Always if :latest tag is specified, or IfNotPresent otherwise. More info: https://kubernetes.io/docs/concepts/containers/images#updating-images
parser.image.repositorystring"docker.io/securecodebox/parser-git-repo-scanner"Parser image repository
parser.image.tagstringdefaults to the charts versionParser image tag
parser.ttlSecondsAfterFinishedstringnilseconds after which the kubernetes job for the parser will be deleted. Requires the Kubernetes TTLAfterFinished controller: https://kubernetes.io/docs/concepts/workloads/controllers/ttlafterfinished/
scanner.activeDeadlineSecondsstringnilThere are situations where you want to fail a scan Job after some amount of time. To do so, set activeDeadlineSeconds to define an active deadline (in seconds) when considering a scan Job as failed. (see: https://kubernetes.io/docs/concepts/workloads/controllers/job/#job-termination-and-cleanup)
scanner.backoffLimitint3There are situations where you want to fail a scan Job after some amount of retries due to a logical error in configuration etc. To do so, set backoffLimit to specify the number of retries before considering a scan Job as failed. (see: https://kubernetes.io/docs/concepts/workloads/controllers/job/#pod-backoff-failure-policy)
scanner.envlist[]Optional environment variables mapped into each scanJob (see: https://kubernetes.io/docs/tasks/inject-data-application/define-environment-variable-container/)
scanner.extraContainerslist[]Optional additional Containers started with each scanJob (see: https://kubernetes.io/docs/concepts/workloads/pods/init-containers/)
scanner.extraVolumeMountslist[]Optional VolumeMounts mapped into each scanJob (see: https://kubernetes.io/docs/concepts/storage/volumes/)
scanner.extraVolumeslist[]Optional Volumes mapped into each scanJob (see: https://kubernetes.io/docs/concepts/storage/volumes/)
scanner.image.pullPolicystring"IfNotPresent"Image pull policy. One of Always, Never, IfNotPresent. Defaults to Always if :latest tag is specified, or IfNotPresent otherwise. More info: https://kubernetes.io/docs/concepts/containers/images#updating-images
scanner.image.repositorystring"docker.io/securecodebox/scanner-git-repo-scanner"Container Image to run the scan
scanner.image.tagstringnildefaults to the charts version
scanner.nameAppendstringnilappend a string to the default scantype name.
scanner.resourcesobject{}CPU/memory resource requests/limits (see: https://kubernetes.io/docs/tasks/configure-pod-container/assign-memory-resource/, https://kubernetes.io/docs/tasks/configure-pod-container/assign-cpu-resource/)
scanner.securityContextobject{"allowPrivilegeEscalation":false,"capabilities":{"drop":["all"]},"privileged":false,"readOnlyRootFilesystem":true,"runAsNonRoot":true}Optional securityContext set on scanner container (see: https://kubernetes.io/docs/tasks/configure-pod-container/security-context/)
scanner.securityContext.allowPrivilegeEscalationboolfalseEnsure that users privileges cannot be escalated
scanner.securityContext.capabilities.drop[0]string"all"This drops all linux privileges from the container.
scanner.securityContext.privilegedboolfalseEnsures that the scanner container is not run in privileged mode
scanner.securityContext.readOnlyRootFilesystembooltruePrevents write access to the containers file system
scanner.securityContext.runAsNonRootbooltrueEnforces that the scanner image is run as a non root user
scanner.ttlSecondsAfterFinishedstringnilseconds after which the kubernetes job for the scanner will be deleted. Requires the Kubernetes TTLAfterFinished controller: https://kubernetes.io/docs/concepts/workloads/controllers/ttlafterfinished/

License#

License

Code of secureCodeBox is licensed under the Apache License 2.0.

Examples#

github-secureCodeBox-scan#

This example scans the organization secureCodeBox on github. Remember to add an access token to not encounter rate limiting:

# SPDX-FileCopyrightText: 2021 iteratec GmbH## SPDX-License-Identifier: Apache-2.0
apiVersion: "execution.securecodebox.io/v1"kind: Scanmetadata:  name: "scan-github"spec:  scanType: "git-repo-scanner"  parameters:    - "--git-type"    - "github"    - "--organization"    - "secureCodeBox"  cascades:    matchLabels:      securecodebox.io/intensive: medium      securecodebox.io/invasive: non-invasive

gitlab-group-scan#

This example shows how to scan a specific group on a GitLab server. It also excludes certain subgroups and projects contained in this group:

# SPDX-FileCopyrightText: 2021 iteratec GmbH## SPDX-License-Identifier: Apache-2.0
apiVersion: "execution.securecodebox.io/v1"kind: Scanmetadata:  name: "scan-company-gitlab-group"spec:  scanType: "git-repo-scanner"  parameters:    - "--git-type"    - "gitlab"    - "--url"    - "https://gitlab.your-company.com"    - "--access-token"    - "<YOUR-GITLAB-TOKEN>"    - "--group" #Gitlab group id    - "542"    - "--ignore-groups" #A group can contain subgroups    - "723"    - "--ignore-projects" #Gitlab project id    - "423"    - "123"  cascades:    matchLabels:      securecodebox.io/intensive: medium      securecodebox.io/invasive: non-invasive