Semgrep

What is Semgrep?

Semgrep ("semantic grep") is a static source code analyzer that can be used to search for specific patterns in code. It allows you to either write your own rules, or use one of the many pre-defined rulesets curated by the semgrep team.

To learn more about semgrep, visit semgrep.dev.

Deployment

The semgrep chart can be deployed via helm:

# Install HelmChart (use -n to configure another namespace)
helm upgrade --install semgrep secureCodeBox/semgrep

Scanner Configuration

Semgrep requires one or more ruleset(s) to run its scans. Refer to the semgrep rule database for more details. A good starting point would be p/ci (for security checks with a low false-positive rate) or p/security-audit (for a more comprehensive security audit, which may include more false-positive results).

Semgrep needs access to the source code to run its analysis. To use it with secureCodeBox, you thus need a way to provision the data into the scan container. The recommended method is to use initContainers to clone a VCS repository. The simplest example, using a public Git repository from GitHub, looks like this:

apiVersion: "execution.securecodebox.io/v1"
kind: Scan
metadata:
  name: "semgrep-vulnerable-flask-app"
spec:
  # Specify a Kubernetes volume that will be shared between the scanner and the initContainer
  volumes:
    - name: repository
      emptyDir: {}
  # Mount the volume in the scan container
  volumeMounts:
    - mountPath: "/repo/"
      name: repository
  # Specify an init container to clone the repository
  initContainers:
    - name: "provision-git"
      # Use an image that includes git
      image: bitnami/git
      # Mount the same volume we also use in the main container
      volumeMounts:
        - mountPath: "/repo/"
          name: repository
      # Specify the clone command and clone into the volume, mounted at /repo/
      command:
        - git
        - clone
        - "https://github.com/we45/Vulnerable-Flask-App"
        - /repo/flask-app
  # Parameterize the semgrep scan itself
  scanType: "semgrep"
  parameters:
    - "-c"
    - "p/ci"
    - "/repo/flask-app"

If your repository requires authentication to clone, you will have to give the initContainer access to some method of authentication. This could be a personal access token (GitHub, GitLab), project access token (GitLab), deploy key (GitHub / GitLab), deploy token (GitLab), or a server-to-server token (GitHub). Due to the large variety of options, we do not provide documentation for all of them here. Refer to the linked documentation for details on the different methods, and remember to use Kubernetes secrets to manage keys and tokens.

Cascading Rules

By default, the semgrep scanner does not install any cascading rules, as some aspects of the semgrep scan (like the used ruleset) should be customized. However, you can easily create your own cascading rule, for example to run semgrep on the output of git-repo-scanner. As a starting point, consider the following cascading rule to scan all public GitHub repositories found by git-repo-scanner using the p/ci ruleset of semgrep:

apiVersion: "cascading.securecodebox.io/v1"
kind: CascadingRule
metadata:
  name: "semgrep-public-github-repos"
  labels:
    securecodebox.io/invasive: non-invasive
    securecodebox.io/intensive: medium
spec:
  matches:
    anyOf:
      # We want to scan public GitHub repositories. Change "public" to "private" to scan private repos instead
      - name: "GitHub Repo"
        attributes:
          visibility: public
  scanSpec:
    # Configure the scanSpec for semgrep
    scanType: "semgrep"
    parameters:
      - "-c"
      - "p/ci"  # Change this to use a different rule set
      - "/repo/"
    volumes:
      - name: repo
        emptyDir: {}
    volumeMounts:
      - name: repo
        mountPath: "/repo/"
    initContainers:
      - name: "git-clone"
        image: bitnami/git
        # The command assumes that GITHUB_TOKEN contains a GitHub access token with access to the repository.
        # GITHUB_TOKEN is set below in the "env" section.
        # If you do not wan to use an access token, remove it from the URL below.
        command:
          - git
          - clone
          - "https://$(GITHUB_TOKEN)@github.com/{{{attributes.full_name}}}"
          - /repo/
        volumeMounts:
          - mountPath: "/repo/"
            name: repo
        # Load the GITHUB_TOKEN from the kubernetes secret with the name "github-access-token"
        # Create this secret using, for example:
        #     echo -n 'YOUR TOKEN GOES HERE' > github-token.txt && kubectl create secret generic github-access-token --from-file=token=github-token.txt
        # IMPORTANT: Ensure that github-token.txt does not have a new line at the end of the file. This is automatically done by using "echo -n" to create it.
        # However, if you create it with an editor, some editors (most notably, vim) will create hidden newlines at the end of files, which will cause issues.
        env:
          - name: GITHUB_TOKEN
            valueFrom:
              secretKeyRef:
                name: github-access-token
                key: token

Use this configuration as a baseline for your own rules.

Requirements

Kubernetes: >=v1.11.0-0

Values

Key	Type	Default	Description
cascadingRules.enabled	bool	`false`	Enables or disables the installation of the default cascading rules for this scanner
imagePullSecrets	list	`[]`	Define imagePullSecrets when a private registry is used (see: https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/)
parser.affinity	object	`{}`	Optional affinity settings that control how the parser job is scheduled (see: https://kubernetes.io/docs/tasks/configure-pod-container/assign-pods-nodes-using-node-affinity/)
parser.backoffLimit	int	`3`
parser.env	list	`[]`
parser.image.pullPolicy	string	`"IfNotPresent"`	Image pull policy. One of Always, Never, IfNotPresent. Defaults to Always if :latest tag is specified, or IfNotPresent otherwise. More info: https://kubernetes.io/docs/concepts/containers/images#updating-images
parser.image.repository	string	`"securecodebox/parser-semgrep"`	Parser image repository
parser.image.tag	string	defaults to the charts version	Parser image tag
parser.nodeSelector	object	`{}`	Optional nodeSelector settings that control how the scanner job is scheduled (see: https://kubernetes.io/docs/tasks/configure-pod-container/assign-pods-nodes/)
parser.resources	object	`{ requests: { cpu: "200m", memory: "100Mi" }, limits: { cpu: "400m", memory: "200Mi" } }`	Optional resources lets you control resource limits and requests for the parser container. See https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/
parser.scopeLimiterAliases	object	`{}`	Optional finding aliases to be used in the scopeLimiter.
parser.tolerations	list	`[]`	Optional tolerations settings that control how the parser job is scheduled (see: https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/)
scanner.activeDeadlineSeconds	string	`nil`	There are situations where you want to fail a scan Job after some amount of time. To do so, set activeDeadlineSeconds to define an active deadline (in seconds) when considering a scan Job as failed. (see: https://kubernetes.io/docs/concepts/workloads/controllers/job/#job-termination-and-cleanup)
scanner.affinity	object	`{}`	Optional affinity settings that control how the scanner job is scheduled (see: https://kubernetes.io/docs/tasks/configure-pod-container/assign-pods-nodes-using-node-affinity/)
scanner.backoffLimit	int	3	There are situations where you want to fail a scan Job after some amount of retries due to a logical error in configuration etc. To do so, set backoffLimit to specify the number of retries before considering a scan Job as failed. (see: https://kubernetes.io/docs/concepts/workloads/controllers/job/#pod-backoff-failure-policy)
scanner.env	list	`[]`	Optional environment variables mapped into each scanJob (see: https://kubernetes.io/docs/tasks/inject-data-application/define-environment-variable-container/)
scanner.extraContainers	list	`[]`	Optional additional Containers started with each scanJob (see: https://kubernetes.io/docs/concepts/workloads/pods/init-containers/)
scanner.extraVolumeMounts	list	`[]`	Optional VolumeMounts mapped into each scanJob (see: https://kubernetes.io/docs/concepts/storage/volumes/)
scanner.extraVolumes	list	`[]`	Optional Volumes mapped into each scanJob (see: https://kubernetes.io/docs/concepts/storage/volumes/)
scanner.image.pullPolicy	string	`"IfNotPresent"`	Image pull policy. One of Always, Never, IfNotPresent. Defaults to Always if :latest tag is specified, or IfNotPresent otherwise. More info: https://kubernetes.io/docs/concepts/containers/images#updating-images
scanner.image.repository	string	`"docker.io/returntocorp/semgrep"`	Container Image to run the scan
scanner.image.tag	string	`nil`	defaults to the charts appVersion
scanner.nodeSelector	object	`{}`	Optional nodeSelector settings that control how the scanner job is scheduled (see: https://kubernetes.io/docs/tasks/configure-pod-container/assign-pods-nodes/)
scanner.podSecurityContext	object	`{}`	Optional securityContext set on scanner pod (see: https://kubernetes.io/docs/tasks/configure-pod-container/security-context/)
scanner.resources	object	`{}`	CPU/memory resource requests/limits (see: https://kubernetes.io/docs/tasks/configure-pod-container/assign-memory-resource/, https://kubernetes.io/docs/tasks/configure-pod-container/assign-cpu-resource/)
scanner.securityContext	object	`{"allowPrivilegeEscalation":false,"capabilities":{"drop":["all"]},"privileged":false,"readOnlyRootFilesystem":false,"runAsNonRoot":false}`	Optional securityContext set on scanner container (see: https://kubernetes.io/docs/tasks/configure-pod-container/security-context/)
scanner.suspend	bool	`false`	if set to true the scan job will be suspended after creation. You can then resume the job using `kubectl resume <jobname>` or using a job scheduler like kueue
scanner.tolerations	list	`[]`	Optional tolerations settings that control how the scanner job is scheduled (see: https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/)
scanner.ttlSecondsAfterFinished	string	`nil`	seconds after which the Kubernetes job for the scanner will be deleted. Requires the Kubernetes TTLAfterFinished controller: https://kubernetes.io/docs/concepts/workloads/controllers/ttlafterfinished/

License

Code of secureCodeBox is licensed under the Apache License 2.0.

CPU architectures

The scanner is currently supported for these CPU architectures:

linux/amd64

Examples

vulnerable-flask-app

Scan
Findings

# SPDX-FileCopyrightText: the secureCodeBox authors
#
# SPDX-License-Identifier: Apache-2.0

apiVersion: "execution.securecodebox.io/v1"
kind: Scan
metadata:
  name: "semgrep-vulnerable-flask-app"
spec:
  volumes:
    - name: test-dir
      emptyDir: {}
  volumeMounts:
    - mountPath: "/test/"
      name: test-dir
  scanType: "semgrep"
  parameters:
    - "-c"
    - "p/ci"
    - "/test/flask"
  initContainers:
    - name: "provision-git"
      image: bitnami/git
      command:
        - git
        - clone
        - "https://github.com/we45/Vulnerable-Flask-App"
        - /test/flask
      volumeMounts:
        - mountPath: "/test/"
          name: test-dir

# SPDX-FileCopyrightText: the secureCodeBox authors
#
# SPDX-License-Identifier: Apache-2.0

[
  {
    "name": "javascript.lang.correctness.useless-eqeq.eqeq-is-bad",
    "location": "/test/flask/app/static/loader.js:91-91",
    "description": "Detected a useless comparison operation `0 == 0` or `0 != 0`. This operation is always true. If testing for floating point NaN, use `math.isnan`, or `cmath.isnan` if the number is complex.",
    "category": "correctness",
    "severity": "HIGH",
    "attributes":
      {
        "cwe": null,
        "owasp_category": null,
        "references": null,
        "rule_source": "https://semgrep.dev/r/javascript.lang.correctness.useless-eqeq.eqeq-is-bad",
        "matching_lines": 'K.h.i.Bf=function(b,c){var d=0,e=0,f=!1;b=K.h.i.Ta(b,c).split(K.h.i.Sl);for(c=0;c<b.length;c++){var g=b[c];K.h.i.qh(g)?(d++,e++):K.h.i.Mg.test(g)?f=!0:K.h.i.rg(g)?e++:K.h.i.Zj.test(g)&&(f=!0)}return 0==e?f?K.h.i.O.Ua:K.h.i.O.sa:d/e>K.h.i.dl?K.h.i.O.Va:K.h.i.O.Ua};K.h.i.vq=function(b,c){return K.h.i.Bf(b,c)==K.h.i.O.Va};K.h.i.ht=function(b,c){b&&(c=K.h.i.Dl(c))&&(b.style.textAlign=c==K.h.i.O.Va?K.h.i.ec:K.h.i.cc,b.dir=c==K.h.i.O.Va?"rtl":"ltr")};',
      },
    "id": "ee0afb67-a248-4bee-9863-68573bc900a9",
    "parsed_at": "2021-10-15T09:05:12.769Z",
  },
  {
    "name": "python.flask.security.dangerous-template-string.dangerous-template-string",
    "location": "/test/flask/app/app.py:103-114",
    "description": "Found a template created with string formatting. This is susceptible to server-side template injection and cross-site scripting attacks.",
    "category": "security",
    "severity": "HIGH",
    "attributes":
      {
        "cwe": "CWE-96: Improper Neutralization of Directives in Statically Saved Code ('Static Code Injection')",
        "owasp_category": "A1: Injection",
        "references":
          [
            "https://nvisium.com/blog/2016/03/09/exploring-ssti-in-flask-jinja2.html",
            "https://pequalsnp-team.github.io/cheatsheet/flask-jinja2-ssti",
          ],
        "rule_source": "https://semgrep.dev/r/python.flask.security.dangerous-template-string.dangerous-template-string",
        "matching_lines": "    template = '''<html>\n    <head>\n    <title>Error</title>\n    </head>\n    <body>\n    <h1>Oops that page doesn't exist!!</h1>\n    <h3>%s</h3>\n    </body>\n    </html>\n    ''' % request.url\n\n    return render_template_string(template, dir = dir, help = help, locals = locals),404",
      },
    "id": "496862b3-6f61-4119-a5d7-f3ddec8ddc7e",
    "parsed_at": "2021-10-15T09:05:12.769Z",
  },
  {
    "name": "python.flask.security.dangerous-template-string.dangerous-template-string",
    "location": "/test/flask/app/app.py:271-281",
    "description": "Found a template created with string formatting. This is susceptible to server-side template injection and cross-site scripting attacks.",
    "category": "security",
    "severity": "HIGH",
    "attributes":
      {
        "cwe": "CWE-96: Improper Neutralization of Directives in Statically Saved Code ('Static Code Injection')",
        "owasp_category": "A1: Injection",
        "references":
          [
            "https://nvisium.com/blog/2016/03/09/exploring-ssti-in-flask-jinja2.html",
            "https://pequalsnp-team.github.io/cheatsheet/flask-jinja2-ssti",
          ],
        "rule_source": "https://semgrep.dev/r/python.flask.security.dangerous-template-string.dangerous-template-string",
        "matching_lines": "                    template = '''<html>\n                        <head>\n                        <title>Error</title>\n                        </head>\n                        <body>\n                        <h1>Oops Error Occurred</h1>\n                        <h3>%s</h3>\n                        </body>\n                        </html>\n                        ''' % str(e)\n                    return render_template_string(template, dir=dir, help=help, locals=locals), 404",
      },
    "id": "ded6aac2-e6bf-411a-9696-f6d70e3f9750",
    "parsed_at": "2021-10-15T09:05:12.769Z",
  },
  {
    "name": "python.flask.security.insecure-deserialization.insecure-deserialization",
    "location": "/test/flask/app/app.py:329-329",
    "description": "Detected the use of an insecure deserialization library in a Flask route. These libraries are prone to code execution vulnerabilities. Ensure user data does not enter this function. To fix this, try to avoid serializing whole objects. Consider instead using a serializer such as JSON.",
    "category": "security",
    "severity": "HIGH",
    "attributes":
      {
        "cwe": "CWE-502: Deserialization of Untrusted Data",
        "owasp_category": "A8: Insecure Deserialization",
        "references": ["https://docs.python.org/3/library/pickle.html"],
        "rule_source": "https://semgrep.dev/r/python.flask.security.insecure-deserialization.insecure-deserialization",
        "matching_lines": "        ydata = yaml.load(y)",
      },
    "id": "dfdf9a67-1ec3-40d8-8b5f-862ca5ebe3db",
    "parsed_at": "2021-10-15T09:05:12.769Z",
  },
  {
    "name": "python.lang.security.insecure-hash-algorithms.insecure-hash-algorithm-md5",
    "location": "/test/flask/app/app.py:141-141",
    "description": "Detected MD5 hash algorithm which is considered insecure. MD5 is not collision resistant and is therefore not suitable as a cryptographic signature. Use SHA256 or SHA3 instead.",
    "category": "security",
    "severity": "MEDIUM",
    "attributes":
      {
        "cwe": "CWE-327: Use of a Broken or Risky Cryptographic Algorithm",
        "owasp_category": "A3: Sensitive Data Exposure",
        "references":
          [
            "https://tools.ietf.org/html/rfc6151",
            "https://crypto.stackexchange.com/questions/44151/how-does-the-flame-malware-take-advantage-of-md5-collision",
            "https://pycryptodome.readthedocs.io/en/latest/src/hash/sha3_256.html",
          ],
        "rule_source": "https://semgrep.dev/r/python.lang.security.insecure-hash-algorithms.insecure-hash-algorithm-md5",
        "matching_lines": "            hash_pass = hashlib.md5(password).hexdigest()",
      },
    "id": "4524f52b-7cb8-4a5b-8a89-12c188efc92e",
    "parsed_at": "2021-10-15T09:05:12.769Z",
  },
  {
    "name": "python.requests.security.disabled-cert-validation.disabled-cert-validation",
    "location": "/test/flask/tests/e2e_zap.py:17-18",
    "description": "Certificate verification has been explicitly disabled. This permits insecure connections to insecure servers. Re-enable certification validation.",
    "category": "security",
    "severity": "HIGH",
    "attributes":
      {
        "cwe": "CWE-295: Improper Certificate Validation",
        "owasp_category": "A3: Sensitive Data Exposure",
        "references":
          [
            "https://stackoverflow.com/questions/41740361/is-it-safe-to-disable-ssl-certificate-verification-in-pythonss-requests-lib",
          ],
        "rule_source": "https://semgrep.dev/r/python.requests.security.disabled-cert-validation.disabled-cert-validation",
        "matching_lines": "login = requests.post(target_url + '/login',\n                      proxies=proxies, json=auth_dict, verify=False)",
      },
    "id": "18a0cd4b-4b43-4017-8d90-1e6de5dfde76",
    "parsed_at": "2021-10-15T09:05:12.769Z",
  },
  {
    "name": "python.requests.security.disabled-cert-validation.disabled-cert-validation",
    "location": "/test/flask/tests/e2e_zap.py:28-29",
    "description": "Certificate verification has been explicitly disabled. This permits insecure connections to insecure servers. Re-enable certification validation.",
    "category": "security",
    "severity": "HIGH",
    "attributes":
      {
        "cwe": "CWE-295: Improper Certificate Validation",
        "owasp_category": "A3: Sensitive Data Exposure",
        "references":
          [
            "https://stackoverflow.com/questions/41740361/is-it-safe-to-disable-ssl-certificate-verification-in-pythonss-requests-lib",
          ],
        "rule_source": "https://semgrep.dev/r/python.requests.security.disabled-cert-validation.disabled-cert-validation",
        "matching_lines": "    get_cust_id = requests.get(\n        target_url + '/get/2', proxies=proxies, headers=auth_header, verify=False)",
      },
    "id": "6ffd9ab4-f736-473b-88b3-24f1e1103ec6",
    "parsed_at": "2021-10-15T09:05:12.769Z",
  },
  {
    "name": "python.requests.security.disabled-cert-validation.disabled-cert-validation",
    "location": "/test/flask/tests/e2e_zap.py:36-37",
    "description": "Certificate verification has been explicitly disabled. This permits insecure connections to insecure servers. Re-enable certification validation.",
    "category": "security",
    "severity": "HIGH",
    "attributes":
      {
        "cwe": "CWE-295: Improper Certificate Validation",
        "owasp_category": "A3: Sensitive Data Exposure",
        "references":
          [
            "https://stackoverflow.com/questions/41740361/is-it-safe-to-disable-ssl-certificate-verification-in-pythonss-requests-lib",
          ],
        "rule_source": "https://semgrep.dev/r/python.requests.security.disabled-cert-validation.disabled-cert-validation",
        "matching_lines": "    fetch_customer_post = requests.post(\n        target_url + '/fetch/customer', json=post, proxies=proxies, headers=auth_header, verify=False)",
      },
    "id": "b9d7d55c-d314-440d-a3dc-e41a5dd2ec0f",
    "parsed_at": "2021-10-15T09:05:12.769Z",
  },
  {
    "name": "python.requests.security.disabled-cert-validation.disabled-cert-validation",
    "location": "/test/flask/tests/e2e_zap.py:44-45",
    "description": "Certificate verification has been explicitly disabled. This permits insecure connections to insecure servers. Re-enable certification validation.",
    "category": "security",
    "severity": "HIGH",
    "attributes":
      {
        "cwe": "CWE-295: Improper Certificate Validation",
        "owasp_category": "A3: Sensitive Data Exposure",
        "references":
          [
            "https://stackoverflow.com/questions/41740361/is-it-safe-to-disable-ssl-certificate-verification-in-pythonss-requests-lib",
          ],
        "rule_source": "https://semgrep.dev/r/python.requests.security.disabled-cert-validation.disabled-cert-validation",
        "matching_lines": "    search_customer_username = requests.post(\n        target_url + '/search', json=search, proxies=proxies, headers=auth_header, verify=False)",
      },
    "id": "f82d51de-8ce7-43fb-a225-6b7662418ea9",
    "parsed_at": "2021-10-15T09:05:12.769Z",
  },
]

What is Semgrep?​

Deployment​

Scanner Configuration​

Cascading Rules​

Requirements​

Values​

License​

CPU architectures​

Examples​

vulnerable-flask-app​