Получить узел внука, используя bs4

Question

Получить узел внука, используя bs4

1

У меня есть этот html со мной. Я пытаюсь извлечь весь текст в тегах span. Мне нужно проверить три вещи. Верхний узел будет ul и в его attr будет иметь класс = "a-unordered-list a-vertical a-spacing-none">. Далее появится тег li. У него не будет никакого attr (не знаю, как это проверить). Следующий тег ul span с атрибутом attr = "a-list-item".

Я попытался использовать этот code-

for line in soup.find_all('ul',attrs={"class" : "a-unordered-list a-vertical a-spacing-none"}):
    for inner_lines in soup.findChildren('li'):
        for inner_inner_lines in soup.findChildren('span',attrs={"class" : "a-list-item"}):
            print(inner_inner_lines.text.split())

Для этого html-

<ul class="a-unordered-list a-vertical a-spacing-none">
    Make sure this fits by entering your model number

    <div id="hsx-rpp-bullet-fits-message" class="aok-hidden">
        <div class="a-box a-alert-inline a-alert-inline-success hsx-rpp-fitment-bullets">
            <div class="a-box-inner a-alert-container"><i class="a-icon a-icon-alert"></i>
                <div class="a-alert-content">
                    This fits your&nbsp;<span class="hsx-rpp-bullet-model-info"></span>.
                </div>
            </div>
        </div>
    </div>

    <li id="replacementPartsFitmentBullet" data-doesntfitmessage="We're not sure this item fits your " data-fitsmessage="This fits your " class="aok-hidden"><span class="a-list-item">
        <span id="replacementPartsFitmentBulletInner"> <a class="a-link-normal hsx-rpp-fitment-focus" href="#">Make sure this fits</a>
                <span>by entering your model number.</span>
        </span>
        </span>
    </li>

    <script type="text/javascript">
        P.when("ReplacementPartsBulletLoader").execute(function(module) {
            module.initializeDPX();
        })
    </script>

    <li><span class="a-list-item"> 
                            Powerful 8th Generation Intel Core i5-8250U 1.6GHz (Turbo up to 3.4GHz) processor

                        </span></li>

    <li><span class="a-list-item"> 
                            15.6" Full HD WideView display with ASUS Splendid software enhancement

                        </span></li>

    <li><span class="a-list-item"> 
                            14.2" wide, 0.8" thin and portable footprint with 0.3" ASUS NanoEdge bezel for a stunning 80% screen-to-body ratio

                        </span></li>

    <li><span class="a-list-item"> 
                            8GB DDR4 RAM and 128GB SSD + 1TB HDD storage combo; Ergonomic chiclet keyboard with fingerprint sensor

                        </span></li>

    <li><span class="a-list-item"> 
                            Comprehensive connections including USB 3.1 Type-C (Gen1), USB 3.0, USB 2.0, and HDMI; Lightning-fast 802.11ac Wi-Fi keeps you connected through any congestion or interference

                        </span></li>

</ul>

Он не работает после нескольких проб и ошибок. Пожалуйста помоги.

J. Doe 15 авг. 2018, в 00:48

Источник

Теги:

python

web-scraping

beautifulsoup

1 ответ

Ещё вопросы

darshvader · Accepted Answer · 2018-08-14T20-35-00.000Z

Попробуй это:

ul = soup.find('ul', {'class':'a-unordered-list a-vertical a-spacing-none'})
li = ul.findChildren('li', id=False, recursive=False)
for child in li:
    span = child.find_all('span', {'class':'a-list-item'})
    for node3 in span:
        print(node3.get_text().strip())

Надеюсь, это то, что вы ищете :)

Мне нужно получить текст из всех тегов <span>, которые имеют class = "a-list-item" и находятся под тегом <li> без атрибута, а теги li должны находиться под тегом <ul> с class = " a-неупорядоченный список a-вертикальный a-spacing-none ". Понял? Он проверяет верхний узел, затем узел под ним, затем еще один шаг под ним. Трехступенчатый подход.