Skip to main content

Git Repo Scanner

License Apache-2.0GitHub release (latest SemVer)OWASP Incubator ProjectArtifact HUBGitHub Repo starsTwitter Follower

What is Git-Repo-Scanner?#

Git-Repo-Scanner is a small Python script which discovers repositories on GitHub or GitLab. The main purpose of this scanner is to provide a cascading input for the gitleaks. scanner.

Deployment#

The git-repo-scanner chart can be deployed via helm:

# Install HelmChart (use -n to configure another namespace)helm upgrade --install git-repo-scanner secureCodeBox/git-repo-scanner

Scanner Configuration#

The scanner options can be divided into two groups for Gitlab and GitHub. You can choose the git repository type with the option:

--git-type githubor--git-type Gitlab

GitHub#

For type GitHub you can use the following options:

  • --organization: The name of the GitHub organization you want to scan.
  • --url: The url of the api for a GitHub enterprise server. Skip this option for repos on https://github.com.
  • --access-token: Your personal GitHub access token.
  • --ignore-repos: A list of GitHub repository ids you want to ignore
  • --obey-rate-limit: True to obey the rate limit of the GitHub server (default), otherwise False
  • --activity-since-duration: Return git repo findings with repo activity (e.g. commits) more recent than a specific date expressed by a duration (now + duration). A duration string is a possibly signed sequence of decimal numbers, each with optional fraction and a unit suffix, such as '1h' or '2h45m'. Valid time units are 'm', 'h', 'd', 'w'.
  • --activity-until-duration: Return git repo findings with repo activity (e.g. commits) older than a specific date expressed by a duration (now + duration). A duration string is a possibly signed sequence of decimal numbers, each with optional fraction and a unit suffix, such as '1h' or '2h45m'. Valid time units are 'm', 'h', 'd', 'w'.

For now only organizations are supported, so the option is mandatory. We strongly recommend providing an access token for authentication. If not provided the rate limiting will kick in after about 30 repositories scanned.

GitLab#

For type GitLab you can use the following options:

  • --url: The url of the GitLab server.
  • --access-token: Your personal GitLab access token.
  • --group: A specific GitLab group id you want to san, including subgroups.
  • --ignore-groups: A list of GitLab group ids you want to ignore
  • --ignore-repos: A list of GitLab project ids you want to ignore
  • --obey-rate-limit: True to obey the rate limit of the GitLab server (default), otherwise False
  • --activity-since-duration: Return git repo findings with repo activity (e.g. commits) more recent than a specific date expressed by a duration (now + duration). A duration string is a possibly signed sequence of decimal numbers, each with optional fraction and a unit suffix, such as '1h' or '2h45m'. Valid time units are 'm', 'h', 'd', 'w'.
  • --activity-until-duration: Return git repo findings with repo activity (e.g. commits) older than a specific date expressed by a duration (now + duration). A duration string is a possibly signed sequence of decimal numbers, each with optional fraction and a unit suffix, such as '1h' or '2h45m'. Valid time units are 'm', 'h', 'd', 'w'.

For Gitlab, the url and the access token is mandatory. If you don't provide a specific group id, all projects on the Gitlab server are going to be discovered.

Requirements#

Kubernetes: >=v1.11.0-0

Values#

KeyTypeDefaultDescription
parser.envlist[]Optional environment variables mapped into each parseJob (see: https://kubernetes.io/docs/tasks/inject-data-application/define-environment-variable-container/)
parser.image.repositorystring"docker.io/securecodebox/parser-git-repo-scanner"Parser image repository
parser.image.tagstringdefaults to the charts versionParser image tag
parser.ttlSecondsAfterFinishedstringnilseconds after which the kubernetes job for the parser will be deleted. Requires the Kubernetes TTLAfterFinished controller: https://kubernetes.io/docs/concepts/workloads/controllers/ttlafterfinished/
scanner.backoffLimitint3There are situations where you want to fail a scan Job after some amount of retries due to a logical error in configuration etc. To do so, set backoffLimit to specify the number of retries before considering a scan Job as failed. (see: https://kubernetes.io/docs/concepts/workloads/controllers/job/#pod-backoff-failure-policy)
scanner.envlist[]Optional environment variables mapped into each scanJob (see: https://kubernetes.io/docs/tasks/inject-data-application/define-environment-variable-container/)
scanner.extraContainerslist[]Optional additional Containers started with each scanJob (see: https://kubernetes.io/docs/concepts/workloads/pods/init-containers/)
scanner.extraVolumeMountslist[]Optional VolumeMounts mapped into each scanJob (see: https://kubernetes.io/docs/concepts/storage/volumes/)
scanner.extraVolumeslist[]Optional Volumes mapped into each scanJob (see: https://kubernetes.io/docs/concepts/storage/volumes/)
scanner.image.repositorystring"docker.io/securecodebox/scanner-git-repo-scanner"Container Image to run the scan
scanner.image.tagstringnildefaults to the charts version
scanner.nameAppendstringnilappend a string to the default scantype name.
scanner.resourcesobject{}CPU/memory resource requests/limits (see: https://kubernetes.io/docs/tasks/configure-pod-container/assign-memory-resource/, https://kubernetes.io/docs/tasks/configure-pod-container/assign-cpu-resource/)
scanner.securityContextobject{}Optional securityContext set on scanner container (see: https://kubernetes.io/docs/tasks/configure-pod-container/security-context/)
scanner.ttlSecondsAfterFinishedstringnilseconds after which the kubernetes job for the scanner will be deleted. Requires the Kubernetes TTLAfterFinished controller: https://kubernetes.io/docs/concepts/workloads/controllers/ttlafterfinished/

License#

License

Code of secureCodeBox is licensed under the Apache License 2.0.

Examples#

github-secureCodeBox-scan#

This example scans the organization secureCodeBox on github. Remember to add an access token to not encounter rate limiting:

# SPDX-FileCopyrightText: 2021 iteratec GmbH## SPDX-License-Identifier: Apache-2.0
apiVersion: "execution.securecodebox.io/v1"kind: Scanmetadata:  name: "scan-github"spec:  scanType: "git-repo-scanner"  parameters:    - "--git-type"    - "github"    - "--organization"    - "secureCodeBox"  cascades:    matchLabels:      securecodebox.io/intensive: medium      securecodebox.io/invasive: non-invasive

gitlab-group-scan#

This example shows how to scan a specific group on a GitLab server. It also excludes certain subgroups and projects contained in this group:

# SPDX-FileCopyrightText: 2021 iteratec GmbH## SPDX-License-Identifier: Apache-2.0
apiVersion: "execution.securecodebox.io/v1"kind: Scanmetadata:  name: "scan-company-gitlab-group"spec:  scanType: "git-repo-scanner"  parameters:    - "--git-type"    - "gitlab"    - "--url"    - "https://gitlab.your-company.com"    - "--access-token"    - "<YOUR-GITLAB-TOKEN>"    - "--group" #Gitlab group id    - "542"    - "--ignore-groups" #A group can contain subgroups    - "723"    - "--ignore-projects" #Gitlab project id    - "423"    - "123"  cascades:    matchLabels:      securecodebox.io/intensive: medium      securecodebox.io/invasive: non-invasive