Python Regex to SSL-сертификат

1

У меня есть текст, похожий на этот:

Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number: 8580482261496855974 (0x7713ff27ce0f7da6)
    Signature Algorithm: sha256WithRSAEncryption
        Issuer: C=US, O=Google Trust Services, CN=Google Internet Authority G3
        Validity:
            Not Before: Jun 12 13:37:16 2018 GMT
            Not After: Aug 21 12:13:00 2018 GMT
        Subject: C=US, ST=California, L=Mountain View, O=Google LLC, CN=www.google.com
        Subject Public Key Info:
            Public Key Algorithm: rsaEncryption
                Public-Key: (2048 bit)
                Modulus:
                    00:b0:55:b7:46:5c:44:fb:25:15:16:8d:6b:33:72:
                    b8:11:cd:3a:a6:ea:c1:54:a3:ce:ce:18:76:e6:c5:
                    65:d8:37:d9:55:dc:79:9f:1d:10:5a:63:67:58:bc:
                    46:d0:3e:05:be:6b:d0:d7:c5:4a:c5:c7:83:4a:ff:
                    19:22:f7:f3:41:0d:da:d1:63:3f:67:ea:e2:80:6e:
                    38:5b:d4:0a:a4:ec:a6:b1:33:a5:f5:e8:78:5d:e3:
                    a1:e9:a5:f7:3d:df:2f:4f:de:54:f5:9e:b3:d9:ce:
                    fd:2d:0f:c8:6c:d1:13:6e:d9:e2:8a:a0:ba:20:34:
                    3a:43:4c:1f:c4:06:9c:2f:0e:59:59:98:33:e4:2a:
                    08:fc:eb:34:17:4b:3f:30:5c:3c:1d:7e:eb:d6:2b:
                    f4:4b:d9:c3:99:4e:60:c7:3d:61:de:5c:14:ac:26:
                    44:92:49:95:83:a8:5b:74:c4:56:aa:9f:15:b8:8d:
                    2f:a3:1e:51:57:a4:40:bf:47:4c:fe:74:ad:da:c4:
                    59:69:bb:b7:29:0a:5b:bc:a8:a2:f1:a0:a4:00:fd:
                    a3:72:b5:ec:f7:60:79:2d:a5:74:2c:d6:ce:8b:a9:
                    66:64:db:20:01:33:81:f4:28:f8:e7:94:fd:e4:e5:
                    e3:0a:5b:b4:e9:95:a1:91:f4:61:28:b7:89:10:1f:
                    0b:21
                Exponent: 65537 (0x10001)
        X509v3 extensions:
            X509v3 Extended Key Usage: 
                TLS Web Server Authentication
            X509v3 Subject Alternative Name: 
                DNS:www.google.com
            Authority Information Access: 
                CA Issuers - URI: http://pki.goog/gsr2/GTSGIAG3.crt
                OCSP - URI: http://ocsp.pki.goog/GTSGIAG3

            X509v3 Subject Key Identifier: 
                DC:F3:42:F7:EC:C1:A6:3D:91:E1:CB:54:8C:8B:6A:EE:6D:F2:9C:76
            X509v3 Basic Constraints: critical
                CA: FALSE
            X509v3 Authority Key Identifier: 
                keyid:77:C2:B8:50:9A:67:76:76:B1:2D:C2:86:D0:83:A0:7E:A6:7E:BA:4B

            X509v3 Certificate Policies: 
                Policy: 1.3.6.1.4.1.11129.2.5.3
                Policy: 2.23.140.1.2.2

            X509v3 CRL Distribution Points: 

                Full Name:
                  URI: http://crl.pki.goog/GTSGIAG3.crl

    Signature Algorithm: 
       sha256WithRSAEncryption:
         5b:11:27:3c:91:44:5c:55:de:96:05:7e:67:b4:d3:fc:42:90:
         2c:a9:06:a6:2f:00:2c:28:1a:20:d3:ba:35:a8:55:b6:da:09:
         6a:77:22:02:91:f5:9f:35:d7:d7:ca:c7:56:a9:5a:7e:24:25:
         45:a7:ce:c1:19:dc:25:09:5b:6d:06:fe:97:33:ce:48:31:2e:
         11:20:df:21:ff:67:ba:0b:14:ca:08:83:15:69:7d:ed:3a:8a:
         9c:3e:65:0f:5c:35:c9:e0:be:fd:e4:df:f5:00:9e:05:56:e5:
         a3:1c:96:86:01:59:43:07:8e:56:72:6b:10:69:03:4f:e9:28:
         f4:1b:7b:95:18:d6:d9:79:ec:b8:fd:1f:c7:17:22:5d:d1:df:
         11:30:47:a5:4a:3e:73:f9:ae:03:36:28:6f:d3:f7:10:39:23:
         84:ea:e0:ee:7e:64:98:ae:2d:ef:b4:de:10:c7:45:3f:21:02:
         60:c7:1d:55:2d:66:82:0a:03:64:35:ed:22:c7:d6:88:0e:04:
         a6:71:59:29:da:42:ab:ca:28:bf:99:76:ca:f2:0c:ba:3b:6b:
         0f:85:4b:d3:f6:94:4c:07:4b:ce:df:c7:d4:05:61:8d:49:85:
         52:52:88:22:ce:25:17:dd:99:29:11:49:2f:e6:03:a3:cf:ef:
         20:34:06:37

Я пытаюсь создать регулярное выражение, которое возвращает только текст после второго появления алгоритма подписи: в этом случае это будет sha256WithRSAEncryption, а другой регулярный эксперимент, который возвращает значение по умолчанию:

5b: 11: 27: 3c: 91: 44: 5c: 55: de: 96: 05: 7e: 67: b4: d3: fc: 42: 90: 2c: a9: 06: a6: 2f: 00: 2c: 28: 1a: 20: d3: ba: 35: a8: 55: b6: da: 09: 6a: 77: 22: 02: 91: f5: 9f: 35: d7: d7: ca: c7: 56: a9: 5a: 7e: 24: 25: 45: a7: ce: c1:19: dc: 25: 09: 5b: 6d: 06: fe: 97: 33: ce: 48: 31: 2e: 11: 20: df: 21: ff: 67: ba: 0b: 14: ca: 08: 83: 15: 69: 7d: ed: 3a: 8a: 9c: 3e: 65: 0f: 5c: 35: c9: e0: be: fd: e4: df: f5: 00: 9e: 05: 56: e5: a3: 1c: 96: 86: 01: 59: 43: 07: 8e: 56: 72: 6b: 10: 69: 03: 4f: e9: 28: f4: 1b: 7b: 95: 18: d6: d9: 79: ec: b8: fd: 1f: c7: 17: 22: 5d: d1: df: 11: 30: 47: a5: 4a: 3e: 73: f9: ae: 03: 36: 28: 6f: d3: f7: 10: 39: 23: 84: ea: e0: ee: 7e: 64: 98: ae: 2d: ef: b4: de: 10: c7: 45: 3f: 21: 02: 60: c7: 1d: 55: 2d: 66: 82: 0a: 03: 64: 35: ed: 22: c7: d6: 88: 0e: 04: a6: 71: 59: 29: da: 42: ab: ca: 28: bf: 99: 76: ca: f2: 0c: ba: 3b: 6b: 0f: 85: 4b: d3: f6: 94: 4c: 07: 4b:: d: cf: ef: 20: 34: 06: 37

Для первого я частично добился,

(?<=Signature Algorithm:) \w+

возвращаясь из обоих вхождений, но для второго выражения у меня нет этой идеи. Может ли кто-нибудь помочь?


Использование d = list(map(Parse.parse_input, list(filter(None, content.split('\n'))))) и после print(d) меня есть этот вывод

[[token(name='key', value='Certificate')], [token(name='key', value='Data')], [token(name='key', value='Version'), token(name='value', value='3 (0x2)')], [token(name='key', value='Serial Number'), token(name='value', value='7733016171915258262 (0x6b51313cb15c2996)')], [token(name='key', value='Signature Algorithm'), token(name='value', value='sha256WithRSAEncryption')], [token(name='key', value='Issuer'), token(name='value', value='C=US, O=Google Trust Services, CN=Google Internet Authority G3')], [token(name='value', value='        Validity')], [token(name='key', value='Not Before'), token(name='value', value='Jun 12 13:34:52 2018 GMT')], [token(name='key', value='Not After '), token(name='value', value='Aug 21 12:13:00 2018 GMT')], [token(name='key', value='Subject'), token(name='value', value='C=US, ST=California, L=Mountain View, O=Google LLC, CN=www.google.com')], [token(name='key', value='Subject Public Key Info')], [token(name='key', value='Public Key Algorithm'), token(name='value', value='id-ecPublicKey')], [token(name='key', value='Public-Key'), token(name='value', value='(256 bit)')], [token(name='key', value='pub ')], [token(name='value', value='                    04:dd:be:47:ad:46:49:9f:15:65:28:2a:18:fe:67:')], [token(name='value', value='                    51:a9:24:43:30:e6:97:00:f9:46:93:9a:82:15:22:')], [token(name='value', value='                    8c:9f:cb:58:2f:5b:5a:c1:89:cb:2a:60:12:e4:d7:')], [token(name='value', value='                    15:ab:3d:05:30:e2:fe:06:2c:44:00:d2:02:a4:e1:')], [token(name='value', value='                    12:ac:56:08:54')], [token(name='key', value='ASN1 OID'), token(name='value', value='prime256v1')], [token(name='key', value='NIST CURVE'), token(name='value', value='P-256')], [token(name='key', value='X509v3 extensions')], [token(name='key', value='X509v3 Extended Key Usage ')], [token(name='value', value='                TLS Web Server Authentication')], [token(name='key', value='X509v3 Key Usage'), token(name='value', value='critical')], [token(name='value', value='                Digital Signature')], [token(name='key', value='X509v3 Subject Alternative Name ')], [token(name='value', value='                DNS:www.google.com')], [token(name='key', value='Authority Information Access ')], [token(name='value', value='                CA Issuers - URI:http://pki.goog/gsr2/GTSGIAG3.crt')], [token(name='value', value='                OCSP - URI:http://ocsp.pki.goog/GTSGIAG3')], [token(name='key', value='X509v3 Subject Key Identifier ')], [token(name='value', value='                1F:1C:3D:AB:8D:02:9C:05:26:80:EE:32:DE:9C:80:05:81:6A:C7:AD')], [token(name='key', value='X509v3 Basic Constraints'), token(name='value', value='critical')], [token(name='value', value='                CA:FALSE')], [token(name='key', value='X509v3 Authority Key Identifier ')], [token(name='value', value='                keyid:77:C2:B8:50:9A:67:76:76:B1:2D:C2:86:D0:83:A0:7E:A6:7E:BA:4B')], [token(name='key', value='X509v3 Certificate Policies ')], [token(name='key', value='Policy'), token(name='value', value='1.3.6.1.4.1.11129.2.5.3')], [token(name='key', value='Policy'), token(name='value', value='2.23.140.1.2.2')], [token(name='key', value='X509v3 CRL Distribution Points ')], [token(name='key', value='Full Name')], [token(name='value', value='                  URI:http://crl.pki.goog/GTSGIAG3.crl')], [token(name='key', value='Signature Algorithm'), token(name='value', value='sha256WithRSAEncryption')], [token(name='value', value='         8c:be:ff:6a:3b:9c:4b:88:86:bc:d4:e7:b6:df:5c:d5:18:c0:')], [token(name='value', value='         5b:4c:15:2c:cb:86:94:ca:3b:ff:8d:73:30:a4:b2:bc:bb:10:')], [token(name='value', value='         a7:92:79:bb:d7:4b:79:a2:8e:66:e3:b4:a2:b4:3c:b0:41:e1:')], [token(name='value', value='         cd:62:b9:d9:68:57:05:55:22:b6:37:06:14:36:8f:6a:d1:6d:')], [token(name='value', value='         de:4b:80:b4:0a:17:e7:77:e4:c8:02:72:ae:31:91:28:59:7a:')], [token(name='value', value='         1e:0d:1f:27:c9:29:97:55:0f:36:c7:7f:46:ff:c7:e9:ab:ac:')], [token(name='value', value='         77:da:05:17:eb:28:bc:23:cb:60:a2:80:82:59:a1:91:da:50:')], [token(name='value', value='         06:2d:40:bb:15:4e:31:a9:b4:84:ac:21:55:47:1d:aa:80:66:')], [token(name='value', value='         a8:3f:39:7d:21:7d:d3:e0:8c:9b:7f:a0:6a:17:62:df:fa:15:')], [token(name='value', value='         2f:98:fc:74:c0:d0:95:af:0a:38:b1:36:2e:e6:14:af:2b:f3:')], [token(name='value', value='         60:0f:67:bb:c4:5a:75:a7:61:02:60:10:27:c0:77:4d:c4:fc:')], [token(name='value', value='         f6:da:f2:83:53:cd:43:42:9b:83:a3:04:3d:9a:80:d5:87:b5:')], [token(name='value', value='         79:7d:91:48:7e:cf:f0:fe:97:e0:ce:45:d9:85:6b:40:31:f5:')], [token(name='value', value='         be:e1:c9:b4:e5:cf:e6:c0:2f:dc:cc:1e:d1:40:f4:25:8e:94:')], [token(name='value', value='         fc:4c:c8:a7')]]

и когда я продолжаю print(Parse(iter(d))._result) код print(Parse(iter(d))._result) меня есть этот вывод

{'Certificate': 'CA Issuers - URI:http://pki.goog/gsr2/GTSGIAG3.crt\nOCSP - URI:http://ocsp.pki.goog/GTSGIAG3'}
Теги:

1 ответ

2
Лучший ответ

Вы можете использовать re.findall:

import re
d = re.findall('(?<=Signature Algorithm: sha256WithRSAEncryption\n)\s+\w+:[\w:\n\s]+', content)[-1]
print(d)

Выход:

5b:11:27:3c:91:44:5c:55:de:96:05:7e:67:b4:d3:fc:42:90:
2c:a9:06:a6:2f:00:2c:28:1a:20:d3:ba:35:a8:55:b6:da:09:
6a:77:22:02:91:f5:9f:35:d7:d7:ca:c7:56:a9:5a:7e:24:25:
45:a7:ce:c1:19:dc:25:09:5b:6d:06:fe:97:33:ce:48:31:2e:
11:20:df:21:ff:67:ba:0b:14:ca:08:83:15:69:7d:ed:3a:8a:
9c:3e:65:0f:5c:35:c9:e0:be:fd:e4:df:f5:00:9e:05:56:e5:
a3:1c:96:86:01:59:43:07:8e:56:72:6b:10:69:03:4f:e9:28:
f4:1b:7b:95:18:d6:d9:79:ec:b8:fd:1f:c7:17:22:5d:d1:df:
11:30:47:a5:4a:3e:73:f9:ae:03:36:28:6f:d3:f7:10:39:23:
84:ea:e0:ee:7e:64:98:ae:2d:ef:b4:de:10:c7:45:3f:21:02:
60:c7:1d:55:2d:66:82:0a:03:64:35:ed:22:c7:d6:88:0e:04:
a6:71:59:29:da:42:ab:ca:28:bf:99:76:ca:f2:0c:ba:3b:6b:
0f:85:4b:d3:f6:94:4c:07:4b:ce:df:c7:d4:05:61:8d:49:85:
52:52:88:22:ce:25:17:dd:99:29:11:49:2f:e6:03:a3:cf:ef:
20:34:06:37

Изменение: дополнительные параметры:

[public_key, _], [exponent, _] = [x.split('\n') for x in re.findall('(?<=Public\-Key:\s)[\w\s\(\)]+|(?<=Exponent:\s)[\w\s\(\)]+', content)]
modulus = re.findall('(?<=Modulus:\n)\s+[a-z0-9\:\n\s]+', content)

Однако еще лучше создать простой синтаксический анализатор:

class Parse:
  token = collections.namedtuple('token', ['name', 'value'])
  def __init__(self, _parsed):
     self.parsed = _parsed
     self.data_list = []
     self._result = {}
     self.parse()
  @property
  def certificate(self):
     return self._result
  def parse(self):
     current = next(self.parsed, None)
     if current is not None and not self.data_list:
        _key, *_vals = current
        if _vals:
          self._result[_key.value] = _vals[0].value
        else:
          if _key.name == 'key':
             _r = Parse(self.parsed)
             if _r.data_list:
                self._result[_key.value] = '\n'.join(re.sub('^\s+', '', i) for i in _r.data_list)
                self.parsed = _r.parsed
             else:
                self._result[_key.value] = _r._result
          else:
             self.data_list.append(_key.value)
             while True:
                _next = next(self.parsed, None)
                if _next is None or any(i.name == 'key' for i in _next):
                   self.parsed = iter([_next]+[i for i in self.parsed])
                   break
                self.data_list.append(_next[0].value)
        self.parse()   
  @classmethod
  def parse_input(cls, _input):
    if re.findall('(?<=^)[\w\s\-]+(?=:$)|(?<=^)[\w\s\-]+(?=:\s)|(?<=\s)[\w\s\-]+(?=:\s)|(?<=\s)[\w\s\-]+(?=:$)', _input) and not re.findall('\w+:\w+:\w+:\w:\w+', _input):
       _c = [cls.token('key', re.sub('^\s+|:', '', _input))] if len(list(filter(None, re.split(':$|:\s', _input)))) == 1 else [cls.token(i, b) for i, b in zip(['key', 'value'], re.split(':\W', _input))]
       return [cls.token(i.name, re.sub('^\s+', '', i.value)) if i.name == 'key' else i for i in _c]
    return [cls.token('value', _input)]

d = list(map(Parse.parse_input, list(filter(None, content.split('\n')))))
print(Parse(iter(d)).certificate)

Выход:

{'Certificate': {'Data': {'Version': '3 (0x2)', 'Serial Number': '8580482261496855974 (0x7713ff27ce0f7da6)', 'Signature Algorithm': 'sha256WithRSAEncryption', 'Issuer': 'C=US, O=Google Trust Services, CN=Google Internet Authority G3', 'Validity': {'Not Before': 'Jun 12 13:37:16 2018 GMT', 'Not After': 'Aug 21 12:13:00 2018 GMT', 'Subject': 'C=US, ST=California, L=Mountain View, O=Google LLC, CN=www.google.com', 'Subject Public Key Info': {'Public Key Algorithm': 'rsaEncryption', 'Public-Key': '(2048 bit)', 'Modulus': '00:b0:55:b7:46:5c:44:fb:25:15:16:8d:6b:33:72:\nb8:11:cd:3a:a6:ea:c1:54:a3:ce:ce:18:76:e6:c5:\n65:d8:37:d9:55:dc:79:9f:1d:10:5a:63:67:58:bc:\n46:d0:3e:05:be:6b:d0:d7:c5:4a:c5:c7:83:4a:ff:\n19:22:f7:f3:41:0d:da:d1:63:3f:67:ea:e2:80:6e:\n38:5b:d4:0a:a4:ec:a6:b1:33:a5:f5:e8:78:5d:e3:\na1:e9:a5:f7:3d:df:2f:4f:de:54:f5:9e:b3:d9:ce:\nfd:2d:0f:c8:6c:d1:13:6e:d9:e2:8a:a0:ba:20:34:\n3a:43:4c:1f:c4:06:9c:2f:0e:59:59:98:33:e4:2a:\n08:fc:eb:34:17:4b:3f:30:5c:3c:1d:7e:eb:d6:2b:\nf4:4b:d9:c3:99:4e:60:c7:3d:61:de:5c:14:ac:26:\n44:92:49:95:83:a8:5b:74:c4:56:aa:9f:15:b8:8d:\n2f:a3:1e:51:57:a4:40:bf:47:4c:fe:74:ad:da:c4:\n59:69:bb:b7:29:0a:5b:bc:a8:a2:f1:a0:a4:00:fd:\na3:72:b5:ec:f7:60:79:2d:a5:74:2c:d6:ce:8b:a9:\n66:64:db:20:01:33:81:f4:28:f8:e7:94:fd:e4:e5:\ne3:0a:5b:b4:e9:95:a1:91:f4:61:28:b7:89:10:1f:\n0b:21', 'X509v3 extensions': 'DNS:www.google.com', 'CA Issuers - URI': 'http', 'OCSP - URI': 'http', 'X509v3 Subject Key Identifier ': 'DC:F3:42:F7:EC:C1:A6:3D:91:E1:CB:54:8C:8B:6A:EE:6D:F2:9C:76', 'CA': 'FALSE', 'X509v3 Authority Key Identifier ': 'keyid:77:C2:B8:50:9A:67:76:76:B1:2D:C2:86:D0:83:A0:7E:A6:7E:BA:4B', 'Policy': '2.23.140.1.2.2', 'X509v3 CRL Distribution Points ': {'Full Name': {'URI': 'http', 'Signature Algorithm ': {'sha256WithRSAEncryption': '5b:11:27:3c:91:44:5c:55:de:96:05:7e:67:b4:d3:fc:42:90:\n2c:a9:06:a6:2f:00:2c:28:1a:20:d3:ba:35:a8:55:b6:da:09:\n6a:77:22:02:91:f5:9f:35:d7:d7:ca:c7:56:a9:5a:7e:24:25:\n45:a7:ce:c1:19:dc:25:09:5b:6d:06:fe:97:33:ce:48:31:2e:\n11:20:df:21:ff:67:ba:0b:14:ca:08:83:15:69:7d:ed:3a:8a:\n9c:3e:65:0f:5c:35:c9:e0:be:fd:e4:df:f5:00:9e:05:56:e5:\na3:1c:96:86:01:59:43:07:8e:56:72:6b:10:69:03:4f:e9:28:\nf4:1b:7b:95:18:d6:d9:79:ec:b8:fd:1f:c7:17:22:5d:d1:df:\n11:30:47:a5:4a:3e:73:f9:ae:03:36:28:6f:d3:f7:10:39:23:\n84:ea:e0:ee:7e:64:98:ae:2d:ef:b4:de:10:c7:45:3f:21:02:\n60:c7:1d:55:2d:66:82:0a:03:64:35:ed:22:c7:d6:88:0e:04:\na6:71:59:29:da:42:ab:ca:28:bf:99:76:ca:f2:0c:ba:3b:6b:\n0f:85:4b:d3:f6:94:4c:07:4b:ce:df:c7:d4:05:61:8d:49:85:\n52:52:88:22:ce:25:17:dd:99:29:11:49:2f:e6:03:a3:cf:ef:\n20:34:06:37'}}}}}}}}
  • 0
    Спасибо, это сработало отлично. Еще одну вещь, которую я пытался распространить на Modulus: Public-Key: и Exponent: но я не получил удовлетворительного результата. Не могли бы вы помочь с этими делами снова? :)
  • 0
    @Hagzel Рад помочь. Regex может быть использован для захвата этих случаев, однако, я думаю, что работа должна быть завершена простым анализатором. Тем не менее, я добавлю решения в регулярных выражениях для всех ...
Показать ещё 9 комментариев

Ещё вопросы

Сообщество Overcoder
Наверх
Меню