Python - транспонировать Pandas DataFrame

3

Я хотел бы перенести следующий фрейм данных, чтобы экспортировать его в таблицу Oracle.

0    ID                                    Available Quota  \
1  1724       GOM COD GOM HADD GOM BB GREYSOLE DABS GOM YT   
2  1578  GBE COD GBW COD GB BB GB YT SNE BB SNE YT GOM ...   
3   310  GBE COD GBW COD DABS WHAKE POLL RED SNE BB GOM BB   

0                                 Live Weight Pounds  \
1                        2328 445 3007 850 3101 1995   
2     538 5894 1755 243 490 153 3965 2727 9227 15060   
3  825 9033 1241 3120 65234 76610 1688 1195 2121 ...   

0                                              Price Date Posted  
1                                     Package $9,000        5/20  
2  $1.00 $0.40 $0.20 $1.00 $0.45 $0.50 $0.15 $0.2...        5/20  
3                                    Package $15,000        5/20

В идеале данные должны совпадать так, чтобы я мог легко поместить его в свою базу данных Oracle:

Изображение 207631

И начало второго идентификатора должно выглядеть так:

Изображение 207634

Исходная таблица данных выглядит так, моя цель состоит в том, чтобы анализировать самые последние данные даты:

Изображение 207637

Использование pd.transpose ничего не изменило, потому что мой DataFrame, по-видимому, (3, 5), и он должен быть (5, 5) для работы. И использование pd.melt() привело к:

                     0                                              value
0                   ID                                               1724
1                   ID                                               1578
2                   ID                                                310
3      Available Quota       GOM COD GOM HADD GOM BB GREYSOLE DABS GOM YT
4      Available Quota  GBE COD GBW COD GB BB GB YT SNE BB SNE YT GOM ...
5      Available Quota  GBE COD GBW COD DABS WHAKE POLL RED SNE BB GOM BB
6   Live Weight Pounds                        2328 445 3007 850 3101 1995
7   Live Weight Pounds     538 5894 1755 243 490 153 3965 2727 9227 15060
8   Live Weight Pounds  825 9033 1241 3120 65234 76610 1688 1195 2121 ...
9                Price                                     Package $9,000
10               Price  $1.00 $0.40 $0.20 $1.00 $0.45 $0.50 $0.15 $0.2...
11               Price                                    Package $15,000
12         Date Posted                                               5/20
13         Date Posted                                               5/20
14         Date Posted                                               5/20

.... который также не будет работать для экспорта.

Мой соответствующий код:

with open(file_path, 'r') as f:
            def read_html_latest(filename, **kwargs):
            #with open(filename) as f:
                text = f.read().replace('<br>', ' ')
                df = pd.read_html(text, **kwargs)[0]
                column_headers = ['ID', 'Available Quota', 'Live Weight Pounds', 'Price', 'Date Posted']
                df.columns = df.loc[0]
                df = df.loc[1:]
                return df.assign(d=pd.to_datetime(df['Date Posted'], format='%m/%d')) \
                       .query('d == d.max()') \
                       .drop('d', 1)
            df = read_html_latest(filename, attrs={'class': 'MsoNormalTable'})
            print(df)

Любая помощь, решающая это, будет очень признательна, спасибо много.

Источник HTML-кода:

<html>
<head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>FW: NEFS 2 Available Quota 5/21</title>
<link rel="important stylesheet" href="">
<style>div.headerdisplayname {font-weight:bold;}</style></head>
<body>
<table border=0 cellspacing=0 cellpadding=0 width="100%" class="header-part1"><tr><td><b>Subject: </b>FW: NEFS 2 Available Quota 5/21</td></tr><tr><td><b>From: </b>Claire Fitz-Gerald <[email protected]></td></tr><tr><td><b>Date: </b>5/21/2014 10:08 AM</td></tr></table><br>
<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40"><head><META HTTP-EQUIV="Content-Type" CONTENT="text/html; "><meta name=Generator content="Microsoft Word 12 (filtered medium)"><!--[if !mso]><style>v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
</style><![endif]--><style><!--
/* Font Definitions */
@font-face
    {font-family:"Cambria Math";
    panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
    {font-family:Calibri;
    panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
    {font-family:Tahoma;
    panose-1:2 11 6 4 3 5 4 4 2 4;}
@font-face
    {font-family:"Franklin Gothic Book";
    panose-1:2 11 5 3 2 1 2 2 2 4;}
@font-face
    {font-family:"Franklin Gothic Demi";
    panose-1:2 11 7 3 2 1 2 2 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
    {margin:0in;
    margin-bottom:.0001pt;
    font-size:11.0pt;
    font-family:"Calibri","sans-serif";}
a:link, span.MsoHyperlink
    {mso-style-priority:99;
    color:blue;
    text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
    {mso-style-priority:99;
    color:purple;
    text-decoration:underline;}
span.EmailStyle17
    {mso-style-type:personal;
    font-family:"Calibri","sans-serif";
    color:windowtext;}
span.title1
    {mso-style-name:title1;
    font-family:"Arial","sans-serif";
    color:#1F487E;
    font-weight:normal;}
span.EmailStyle19
    {mso-style-type:personal-reply;
    font-family:"Calibri","sans-serif";
    color:#1F497D;}
.MsoChpDefault
    {mso-style-type:export-only;
    font-size:10.0pt;}
@page WordSection1
    {size:8.5in 11.0in;
    margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
    {page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]--></head><body lang=EN-US link=blue vlink=purple><div class=WordSection1><p class=MsoNormal><span style='color:#1F497D'>Please see the below quota listings.<o:p></o:p></span></p><p class=MsoNormal><span style='color:#1F497D'><o:p>&nbsp;</o:p></span></p><p class=MsoNormal><span style='color:#1F497D'>Thanks,<o:p></o:p></span></p><p class=MsoNormal><span style='color:#1F497D'><o:p>&nbsp;</o:p></span></p><div><p class=MsoNormal><span style='font-size:12.0pt;font-family:"Franklin Gothic Book","sans-serif";color:#1F497D'>Claire Fitz-Gerald<o:p></o:p></span></p><p class=MsoNormal><i><span style='font-size:10.0pt;font-family:"Franklin Gothic Book","sans-serif";color:#1F497D'><o:p>&nbsp;</o:p></span></i></p><p class=MsoNormal><b><span style='font-family:"Franklin Gothic Demi","sans-serif";color:#002776'>Cape Cod Commercial Fishermen Alliance<o:p></o:p></span></b></p><p class=MsoNormal><b><span style='font-family:"Franklin Gothic Book","sans-serif";color:#DE3500'>~ Small Boats.&nbsp; Big Ideas. ~</span></b><b><span style='color:#DE3500'><o:p></o:p></span></b></p></div><p class=MsoNormal><span style='color:#1F497D'><o:p>&nbsp;</o:p></span></p><div><div style='border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in 0in 0in'><p class=MsoNormal><b><span style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'>From:</span></b><span style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'> David Leveille [mailto:[email protected]] <br><b>Sent:</b> Wednesday, May 21, 2014 8:50 AM<br><b>To:</b> David Leveille<br><b>Subject:</b> NEFS 2 Available Quota 5/21<o:p></o:p></span></p></div></div><p class=MsoNormal><o:p>&nbsp;</o:p></p><p class=MsoNormal><span style='font-size:12.0pt;font-family:"Arial","sans-serif";color:#1F487E'>AVAILABLE QUOTA FY 2014</span><span style='font-size:12.0pt;font-family:"Times New Roman","serif"'><o:p></o:p></span></p><table class=MsoNormalTable border=0 cellspacing=0 cellpadding=0 width="71%" style='width:71.28%'><tr><td width=220 style='width:164.95pt;border:none;border-bottom:solid windowtext 1.0pt;background:#8BCDFF;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><b><span style='font-size:9.0pt;font-family:"Arial","sans-serif";color:black'>ID <o:p></o:p></span></b></p></td><td width=161 style='width:120.75pt;border:none;border-bottom:solid windowtext 1.0pt;background:#8BCDFF;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='mso-line-height-alt:15.0pt'><b><span style='font-size:18.0pt;font-family:"Arial","sans-serif";color:black'>Available Quota <o:p></o:p></span></b></p></td><td width=189 style='width:141.75pt;border:none;border-bottom:solid windowtext 1.0pt;background:#8BCDFF;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='mso-line-height-alt:15.0pt'><b><span style='font-size:18.0pt;font-family:"Arial","sans-serif";color:black'>Live Weight Pounds <o:p></o:p></span></b></p></td><td width=126 style='width:94.55pt;border:none;border-bottom:solid windowtext 1.0pt;background:#8BCDFF;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='mso-line-height-alt:15.0pt'><b><span style='font-size:18.0pt;font-family:"Arial","sans-serif";color:black'>Price <o:p></o:p></span></b></p></td><td width=168 style='width:125.95pt;border:none;border-bottom:solid windowtext 1.0pt;background:#8BCDFF;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='mso-line-height-alt:15.0pt'><b><span style='font-size:18.0pt;font-family:"Arial","sans-serif";color:black'>Date Posted <o:p></o:p></span></b></p></td></tr><tr><td width=220 style='width:164.95pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>1724<o:p></o:p></span></p></td><td width=161 style='width:120.75pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>GOM COD<br>GOM HADD<br>GOM BB<br>GREYSOLE<br>DABS<br>GOM YT<o:p></o:p></span></p></td><td width=189 style='width:141.75pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>2328<br>445<br>3007<br>850<br>3101<br>1995<o:p></o:p></span></p></td><td width=126 style='width:94.55pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>Package<o:p></o:p></span></p><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'><o:p>&nbsp;</o:p></span></p><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>$9,000<o:p></o:p></span></p></td><td width=168 style='width:125.95pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>5/20<o:p></o:p></span></p></td></tr><tr><td width=220 style='width:164.95pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>1578<o:p></o:p></span></p></td><td width=161 style='width:120.75pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>GBE COD<br>GBW COD<br>GB BB<br>GB YT<br>SNE BB<br>SNE YT<br>GOM BB<br>Whake<br>POLL<br>RED<o:p></o:p></span></p></td><td width=189 style='width:141.75pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>538<br>5894<br>1755<br>243<br>490<br>153<br>3965<br>2727<br>9227<br>15060<o:p></o:p></span></p></td><td width=126 style='width:94.55pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>$1.00<br>$0.40<br>$0.20<br>$1.00<br>$0.45<br>$0.50<br>$0.15<br>$0.20<br>$0.01<br>$0.01<o:p></o:p></span></p></td><td width=168 style='width:125.95pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>5/20<o:p></o:p></span></p></td></tr><tr><td width=220 style='width:164.95pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>310<o:p></o:p></span></p></td><td width=161 style='width:120.75pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>GBE COD<br>GBW COD<br>DABS<br>WHAKE<br>POLL<br>RED<br>SNE BB<br>GOM BB<o:p></o:p></span></p></td><td width=189 style='width:141.75pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>825<br>9033<br>1241<br>3120<br>65234<br>76610<br>1688<br>1195<br>2121<br>7285<o:p></o:p></span></p></td><td width=126 style='width:94.55pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>Package<o:p></o:p></span></p><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'><o:p>&nbsp;</o:p></span></p><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>$15,000<o:p></o:p></span></p></td><td width=168 style='width:125.95pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>5/20<o:p></o:p></span></p></td></tr><tr style='height:23.25pt'><td width=220 style='width:164.95pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt;height:23.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>347<o:p></o:p></span></p></td><td width=161 style='width:120.75pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt;height:23.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>SNE BB<o:p></o:p></span></p></td><td width=189 style='width:141.75pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt;height:23.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>8,000<o:p></o:p></span></p></td><td width=126 style='width:94.55pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt;height:23.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>$0.50<o:p></o:p></span></p></td><td width=168 style='width:125.95pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt;height:23.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>5/7<o:p></o:p></span></p></td></tr><tr><td width=220 style='width:164.95pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>1878A<o:p></o:p></span></p></td><td width=161 style='width:120.75pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>GOM COD<br>GOM HADD<br>SNE BB<br>GOM BB<br>GB BB<br>GREYSOLE<br>GOM YT<br>SNE YT<br>POLL<o:p></o:p></span></p></td><td width=189 style='width:141.75pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>6188<br>635<br>3916<br>7873<br>6762<br>3358<br>9776<br>271<br>186550<o:p></o:p></span></p></td><td width=126 style='width:94.55pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>$1.95<br>$1.35<br>$0.50<br>$0.50<br>$0.20<br>$1.40<br>$1.20<br>$0.50<br>$0.01<o:p></o:p></span></p></td><td width=168 style='width:125.95pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>5/12<o:p></o:p></span></p></td></tr><tr><td width=220 style='width:164.95pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>1878B<o:p></o:p></span></p></td><td width=161 style='width:120.75pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>GBE COD<br>GBW COD<br>GB YT<o:p></o:p></span></p></td><td width=189 style='width:141.75pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>1113<br>12186<br>850<o:p></o:p></span></p></td><td width=126 style='width:94.55pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>Package<br>$10,000<o:p></o:p></span></p></td><td width=168 style='width:125.95pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>5/12<o:p></o:p></span></p></td></tr></table><p class=MsoNormal><o:p>&nbsp;</o:p></p><p class=MsoNormal><o:p>&nbsp;</o:p></p><p class=MsoNormal>David Leveille<o:p></o:p></p><p class=MsoNormal>II Northeast Fishery Sector Inc.<o:p></o:p></p><p class=MsoNormal>10 Witham Street<o:p></o:p></p><p class=MsoNormal>Gloucester, MA. 01930<o:p></o:p></p><p class=MsoNormal>Cell 978 375 3509<o:p></o:p></p><p class=MsoNormal>Fax 978 281 1555<o:p></o:p></p><p class=MsoNormal>Web <a href="http://nefs2.com/">http://nefs2.com/</a><o:p></o:p></p><p class=MsoNormal><o:p>&nbsp;</o:p></p><div class=MsoNormal align=center style='text-align:center'><span style='font-size:12.0pt;font-family:"Times New Roman","serif"'></body></html>
</body>
</html>
  • 1
    Как вы определяете, сколько текстовых значений будет в «Доступной квоте», я вижу один или несколько текстов. Также, как вы хотите, чтобы цена от 2-го ряда?
  • 0
    Ну, в столбце Доступная квота есть ограниченное количество видов, я не знаю, почему он печатает только 7 или 8, а затем ставит «...». выглядит как; все они должны соответствовать своим соответствующим квотам
Показать ещё 8 комментариев
Теги:
pandas
transpose

1 ответ

2
Лучший ответ

Этот код работает через каждую ячейку, создает списки и затем перечислит в кадр данных. Обратите внимание, что этот код будет работать только тогда, когда количество элементов одинаково во всей ячейке по строке..

from bs4 import BeautifulSoup, NavigableString, Tag
import pandas as pd
import numpy as np
def celltext(cell):
    '''    
        textlist=[]
        for br in cell.findAll('br'):
            next = br.nextSibling
            if not (next and isinstance(next,NavigableString)):
                continue
            next2 = next.nextSibling
            if next2 and isinstance(next2,Tag) and next2.name == 'br':
                text = str(next).strip()
                if text:
                    textlist.append(next)
        return (textlist)
    '''
    textlist=[]
    y = cell.find('span')
    for a in y.childGenerator(): 
        if isinstance(a, NavigableString):
            textlist.append(str(a))
    return (textlist)

html=open('patht\to\html.html','r').read()
soup = BeautifulSoup(html, 'lxml') # Parse the HTML as a string
table = soup.find_all('table')[1] # Grab the second table

df_Quota = pd.DataFrame()

for row in table.find_all('tr'):    
    columns = row.find_all('td')
    if columns[0].get_text().strip()<>'ID':  # skip header 
        Quota = celltext(columns[1]) 
        Weight =  celltext(columns[2])
        price =  celltext(columns[3])

        Nrows= max([len(Quota),len(Weight),len(price)]) #get the max number of rows

        IDList = [columns[0].get_text()] * Nrows
        DateList = [columns[4].get_text()] * Nrows

        if price[0].strip()=='Package':
             price = [columns[3].get_text()] * Nrows

        if len(Quota)<len(Weight):  #if Quota has less itmes extened with nan
           lstnans= [np.nan]*(len(Weight)-len(Quota))
           Quota.extend(lstnans)

        FinalDataframe = pd.DataFrame(
        {
        'ID':IDList,    
         'AvailableQuota': Quota,
         'LiveWeightPounds': Weight,
         'price':price,
         'DatePosted':DateList
        })
    df_Quota= df_Quota.append(FinalDataframe)
print df_Quota

Выход

 AvailableQuota DatePosted     ID LiveWeightPounds            price
0        GOM COD       5/12  1878A             6188            $1.95
1       GOM HADD       5/12  1878A              635            $1.35
2         SNE BB       5/12  1878A             3916            $0.50
3         GOM BB       5/12  1878A             7873            $0.50
4          GB BB       5/12  1878A             6762            $0.20
5       GREYSOLE       5/12  1878A             3358            $1.40
6         GOM YT       5/12  1878A             9776            $1.20
7         SNE YT       5/12  1878A              271            $0.50
8           POLL       5/12  1878A           186550            $0.01
0        GOM COD       5/20   1724             2328   Package $9,000
1       GOM HADD       5/20   1724              445   Package $9,000
2         GOM BB       5/20   1724             3007   Package $9,000
3       GREYSOLE       5/20   1724              850   Package $9,000
4           DABS       5/20   1724             3101   Package $9,000
5         GOM YT       5/20   1724             1995   Package $9,000
0        GBE COD       5/20   1578              538            $1.00
1        GBW COD       5/20   1578             5894            $0.40
2          GB BB       5/20   1578             1755            $0.20
3          GB YT       5/20   1578              243            $1.00
4         SNE BB       5/20   1578              490            $0.45
5         SNE YT       5/20   1578              153            $0.50
6         GOM BB       5/20   1578             3965            $0.15
7          Whake       5/20   1578             2727            $0.20
8           POLL       5/20   1578             9227            $0.01
9            RED       5/20   1578            15060            $0.01
0        GBE COD       5/20    310              825  Package $15,000
1        GBW COD       5/20    310             9033  Package $15,000
2           DABS       5/20    310             1241  Package $15,000
3          WHAKE       5/20    310             3120  Package $15,000
4           POLL       5/20    310            65234  Package $15,000
5            RED       5/20    310            76610  Package $15,000
6         SNE BB       5/20    310             1688  Package $15,000
7         GOM BB       5/20    310             1195  Package $15,000
8            NaN       5/20    310             2121  Package $15,000
9            NaN       5/20    310             7285  Package $15,000
0         SNE BB        5/7    347            8,000            $0.50
0        GOM COD       5/12  1878A             6188            $1.95
1       GOM HADD       5/12  1878A              635            $1.35
2         SNE BB       5/12  1878A             3916            $0.50
3         GOM BB       5/12  1878A             7873            $0.50
4          GB BB       5/12  1878A             6762            $0.20
5       GREYSOLE       5/12  1878A             3358            $1.40
6         GOM YT       5/12  1878A             9776            $1.20
7         SNE YT       5/12  1878A              271            $0.50
8           POLL       5/12  1878A           186550            $0.01
0        GBE COD       5/12  1878B             1113   Package$10,000
1        GBW COD       5/12  1878B            12186   Package$10,000
2          GB YT       5/12  1878B              850   Package$10,000
  • 0
    Вау! это мило!
  • 0
    Ничего себе, это выглядит абсолютно безупречно. Будет ли легко исключить все даты, кроме самой последней? В этом случае сохраните только данные, соответствующие 5/20?
Показать ещё 18 комментариев

Ещё вопросы

Сообщество Overcoder
Наверх
Меню