Skip to main content

Semgrep

Semgrep logo

License Apache-2.0GitHub release (latest SemVer)OWASP Incubator ProjectArtifact HUBGitHub Repo starsTwitter Follower

What is Semgrep?#

Semgrep ("semantic grep") is a static source code analyzer that can be used to search for specific patterns in code. It allows you to either write your own rules, or use one of the many pre-defined rulesets curated by the semgrep team.

To learn more about semgrep, visit semgrep.dev.

Deployment#

The semgrep chart can be deployed via helm:

# Install HelmChart (use -n to configure another namespace)helm upgrade --install semgrep secureCodeBox/semgrep

Scanner Configuration#

Semgrep requires one or more ruleset(s) to run its scans. Refer to the semgrep rule database for more details. A good starting point would be p/ci (for security checks with a low false-positive rate) or p/security-audit (for a more comprehensive security audit, which may include more false-positive results).

Semgrep needs access to the source code to run its analysis. To use it with secureCodeBox, you thus need a way to provision the data into the scan container. The recommended method is to use initContainers to clone a VCS repository. The simplest example, using a public Git repository from GitHub, looks like this:

apiVersion: "execution.securecodebox.io/v1"kind: Scanmetadata:  name: "semgrep-vulnerable-flask-app"spec:  # Specify a Kubernetes volume that will be shared between the scanner and the initContainer  volumes:    - name: repository      emptyDir: {}  # Mount the volume in the scan container  volumeMounts:    - mountPath: "/repo/"      name: repository  # Specify an init container to clone the repository  initContainers:    - name: "provision-git"      # Use an image that includes git      image: bitnami/git      # Mount the same volume we also use in the main container      volumeMounts:        - mountPath: "/repo/"          name: repository      # Specify the clone command and clone into the volume, mounted at /repo/      command:        - git        - clone        - "https://github.com/we45/Vulnerable-Flask-App"        - /repo/flask-app  # Parameterize the semgrep scan itself  scanType: "semgrep"  parameters:    - "-c"    - "p/ci"    - "/repo/flask-app"

If your repository requires authentication to clone, you will have to give the initContainer access to some method of authentication. This could be a personal access token (GitHub, GitLab), project access token (GitLab), deploy key (GitHub / GitLab), deploy token (GitLab), or a server-to-server token (GitHub). Due to the large variety of options, we do not provide documentation for all of them here. Refer to the linked documentation for details on the different methods, and remember to use Kubernetes secrets to manage keys and tokens.

Cascading Rules#

By default, the semgrep scanner does not install any cascading rules, as some aspects of the semgrep scan (like the used ruleset) should be customized. However, you can easily create your own cascading rule, for example to run semgrep on the output of git-repo-scanner. As a starting point, consider the following cascading rule to scan all public GitHub repositories found by git-repo-scanner using the p/ci ruleset of semgrep:

apiVersion: "cascading.securecodebox.io/v1"kind: CascadingRulemetadata:  name: "semgrep-public-github-repos"  labels:    securecodebox.io/invasive: non-invasive    securecodebox.io/intensive: mediumspec:  matches:    anyOf:      # We want to scan public GitHub repositories. Change "public" to "private" to scan private repos instead      - name: "GitHub Repo"        attributes:          visibility: public  scanSpec:    # Configure the scanSpec for semgrep    scanType: "semgrep"    parameters:      - "-c"      - "p/ci"  # Change this to use a different rule set      - "/repo/"    volumes:      - name: repo        emptyDir: {}    volumeMounts:      - name: repo        mountPath: "/repo/"    initContainers:      - name: "git-clone"        image: bitnami/git        # The command assumes that GITHUB_TOKEN contains a GitHub access token with access to the repository.        # GITHUB_TOKEN is set below in the "env" section.        # If you do not wan to use an access token, remove it from the URL below.        command:          - git          - clone          - "https://$(GITHUB_TOKEN)@github.com/{{{attributes.full_name}}}"          - /repo/        volumeMounts:          - mountPath: "/repo/"            name: repo        # Load the GITHUB_TOKEN from the kubernetes secret with the name "github-access-token"        # Create this secret using, for example:        #     echo -n 'YOUR TOKEN GOES HERE' > github-token.txt && kubectl create secret generic github-access-token --from-file=token=github-token.txt        # IMPORTANT: Ensure that github-token.txt does not have a new line at the end of the file. This is automatically done by using "echo -n" to create it.        # However, if you create it with an editor, some editors (most notably, vim) will create hidden newlines at the end of files, which will cause issues.        env:          - name: GITHUB_TOKEN            valueFrom:              secretKeyRef:                name: github-access-token                key: token

Use this configuration as a baseline for your own rules.

Requirements#

Kubernetes: >=v1.11.0-0

Values#

KeyTypeDefaultDescription
cascadingRules.enabledbooltrueEnables or disables the installation of the default cascading rules for this scanner
parser.backoffLimitint3
parser.envlist[]
parser.image.pullPolicystring"IfNotPresent"
parser.image.repositorystring"securecodebox/parser-semgrep"
parser.image.tagstringnil
scanner.backoffLimitint3
scanner.envlist[]
scanner.extraContainerslist[]
scanner.extraVolumeMountslist[]
scanner.extraVolumeslist[]
scanner.image.pullPolicystring"IfNotPresent"
scanner.image.repositorystring"docker.io/returntocorp/semgrep"
scanner.image.tagstringnil
scanner.resourcesobject{}
scanner.securityContext.allowPrivilegeEscalationboolfalse
scanner.securityContext.capabilities.drop[0]string"all"
scanner.securityContext.privilegedboolfalse
scanner.securityContext.readOnlyRootFilesystemboolfalse
scanner.securityContext.runAsNonRootbooltrue
scanner.ttlSecondsAfterFinishedstringnil

License#

License

Code of secureCodeBox is licensed under the Apache License 2.0.

Examples#

vulnerable-flask-app#

apiVersion: "execution.securecodebox.io/v1"kind: Scanmetadata:  name: "semgrep-vulnerable-flask-app"spec:  volumes:    - name: test-dir      emptyDir: {}  volumeMounts:    - mountPath: "/test/"      name: test-dir  scanType: "semgrep"  parameters:    - "-c"    - "p/ci"    - "/test/flask"  initContainers:    - name: "provision-git"      image: bitnami/git      command:        - git         - clone        - "https://github.com/we45/Vulnerable-Flask-App"        - /test/flask      volumeMounts:        - mountPath: "/test/"          name: test-dir