Friday, July 21, 2006
A Bit About This Blog
A lot of terrific information is available from NHL.com. Most of it is pretty raw.
And a lot of people around internet land have scraped the info from there, dug through it to find what they were looking for, and crushed some hockey myths and made the rest of us a bit smarter about the NHL game along the way.
The idea here is that the process should be as painless as possible. It would take hours to sift through the information manually, and writing macro code in Excel can speed up that process a tonne. And hopefully a lot of the guys with ZERO programming skills, but good eyes, rational minds and a passion for the game ... hopefully some of those guys will see just how easy this is once you get past the first couple of hours on the learning curve. That's a big part of what I hope to accomplish with this specialized little blog. Folks like Lowetide, Dennis, Andy Grabia, and dozens others like them. Once they realize that it takes less time to prove (or disprove) their point in a compelling way ... less time than takes to actually argue about it ... then we're going to be moving forward with speed I think.
Personally I'm not a programmer, and have limited skills. Programmers are a strangely helpful bunch though, and in my experience they'll solve your problems for you if you just ask at an online forum.
Hopefully other people will want to contribute here, leaving bits of code to cut and paste to save everyone else from doing actual work :) . If you have something to post, please just send me an email or leave a note in the comments section and I'll send you the 'invite' link.
Let's keep it simple. And just Excel VBA code for here.
If I don't drag smart, non-computer types into the weird world of Roger Neilson ... then I've failed with this little venture.
And a lot of people around internet land have scraped the info from there, dug through it to find what they were looking for, and crushed some hockey myths and made the rest of us a bit smarter about the NHL game along the way.
The idea here is that the process should be as painless as possible. It would take hours to sift through the information manually, and writing macro code in Excel can speed up that process a tonne. And hopefully a lot of the guys with ZERO programming skills, but good eyes, rational minds and a passion for the game ... hopefully some of those guys will see just how easy this is once you get past the first couple of hours on the learning curve. That's a big part of what I hope to accomplish with this specialized little blog. Folks like Lowetide, Dennis, Andy Grabia, and dozens others like them. Once they realize that it takes less time to prove (or disprove) their point in a compelling way ... less time than takes to actually argue about it ... then we're going to be moving forward with speed I think.
Personally I'm not a programmer, and have limited skills. Programmers are a strangely helpful bunch though, and in my experience they'll solve your problems for you if you just ask at an online forum.
Hopefully other people will want to contribute here, leaving bits of code to cut and paste to save everyone else from doing actual work :) . If you have something to post, please just send me an email or leave a note in the comments section and I'll send you the 'invite' link.
Let's keep it simple. And just Excel VBA code for here.
If I don't drag smart, non-computer types into the weird world of Roger Neilson ... then I've failed with this little venture.
Scraping data from NHL.com onto your hard drive
This macro strips the play-by-play sheets for the 0304 regular season off of the NHL.com database and into a directory called C:\NHL\PL0304. You will have to create the directory on your hard drive first, Or you can create a directory with a different name so long as you change the red bit of code below to match it.. By changing PL to GS in the code below you would be scraping the Game Sheets for thst season instead. By changing 20032004 to 20052006 you grab the sheets for theis past season. Simple enough, no?
BTW: I don't actually know how this code works, I just copied it from the net. But it does work for me.
Once you've created the directory on your hard drive, just open Excel, press Alt-F11, press Alt-I, press M, paste everything onto the sheet that shows up. Then press F5 and voila! You're scraping off data!
Little bits of code to come later will let you make use of this stuff you've scraped off in a simple and fast way.
Let me know if this works for anyone. Lowetide ... I'm looking at you. :-)
_____________________________________________________
Public Const INTERNET_OPEN_TYPE_PRECONFIG = 0
Public Const INTERNET_OPEN_TYPE_DIRECT = 1
Public Const INTERNET_OPEN_TYPE_PROXY = 3
Public Const scUserAgent = "VB OpenUrl"
Public Const INTERNET_FLAG_RELOAD = &H80000000
Public Declare Function InternetOpen Lib "wininet.dll" Alias _
"InternetOpenA" (ByVal sAgent As String, ByVal lAccessType _
As Long, ByVal sProxyName As String, ByVal sProxyBypass _
As String, ByVal lFlags As Long) As Long
Public Declare Function InternetOpenUrl Lib "wininet.dll" Alias _
"InternetOpenUrlA" (ByVal hOpen As Long, ByVal sUrl As String, _
ByVal sHeaders As String, ByVal lLength As Long, ByVal lFlags As _
Long, ByVal lContext As Long) As Long
Public Declare Function InternetReadFile Lib "wininet.dll" _
(ByVal hFile As Long, ByVal sBuffer As String, ByVal _
lNumBytesToRead As Long, lNumberOfBytesRead As Long) As Integer
Public Declare Function InternetCloseHandle Lib "wininet.dll" _
(ByVal hInet As Long) As Integer
Public Declare Function GIFToBMP Lib "gif2bmp.dll" _
(ByVal lpstrGIF As String, ByVal lpstrBMP As String) As Integer
Sub GetHTMLFromURL()
Dim sUrl As String
Dim s As String
Dim hOpen As Long
Dim hOpenUrl As Long
Dim bDoLoop As Boolean
Dim bRet As Boolean
Dim sReadBuffer As String * 2048
Dim lNumberOfBytesRead As Long
Dim j As Integer: Dim i As Integer: Dim m As Integer
Dim k As Long
Dim sG() As Byte
On Error GoTo NEXTj
For j = 1 To 1230
m = 20000 + j
s = Empty
sUrl = "http://www.nhl.com/scores/htmlreports/20032004/PL0" & m & ".HTM"
hOpen = InternetOpen(scUserAgent, INTERNET_OPEN_TYPE_PRECONFIG, vbNullString, vbNullString, 0)
hOpenUrl = InternetOpenUrl(hOpen, sUrl, vbNullString, 0, INTERNET_FLAG_RELOAD, 0)
bDoLoop = True
While bDoLoop
sReadBuffer = vbNullString
bRet = InternetReadFile(hOpenUrl, sReadBuffer, Len(sReadBuffer), lNumberOfBytesRead)
s = s & Left$(sReadBuffer, lNumberOfBytesRead)
If Not CBool(lNumberOfBytesRead) Then bDoLoop = False
Wend
If hOpenUrl <> 0 Then InternetCloseHandle (hOpenUrl)
If hOpen <> 0 Then InternetCloseHandle (hOpen)
k = Len(s)
ReDim sG(k) As Byte
For i = 1 To k: sG(i) = Asc(Mid(s, i, 1)): Next i
Open "C:\NHL\PL0304\PL0" & m & ".HTM" For Binary As #44 Len = 1
For i = 1 To k: Put #44, i, sG(i): Next i
Close #44
NEXTj:
Next j
End Sub
BTW: I don't actually know how this code works, I just copied it from the net. But it does work for me.
Once you've created the directory on your hard drive, just open Excel, press Alt-F11, press Alt-I, press M, paste everything onto the sheet that shows up. Then press F5 and voila! You're scraping off data!
Little bits of code to come later will let you make use of this stuff you've scraped off in a simple and fast way.
Let me know if this works for anyone. Lowetide ... I'm looking at you. :-)
_____________________________________________________
Public Const INTERNET_OPEN_TYPE_PRECONFIG = 0
Public Const INTERNET_OPEN_TYPE_DIRECT = 1
Public Const INTERNET_OPEN_TYPE_PROXY = 3
Public Const scUserAgent = "VB OpenUrl"
Public Const INTERNET_FLAG_RELOAD = &H80000000
Public Declare Function InternetOpen Lib "wininet.dll" Alias _
"InternetOpenA" (ByVal sAgent As String, ByVal lAccessType _
As Long, ByVal sProxyName As String, ByVal sProxyBypass _
As String, ByVal lFlags As Long) As Long
Public Declare Function InternetOpenUrl Lib "wininet.dll" Alias _
"InternetOpenUrlA" (ByVal hOpen As Long, ByVal sUrl As String, _
ByVal sHeaders As String, ByVal lLength As Long, ByVal lFlags As _
Long, ByVal lContext As Long) As Long
Public Declare Function InternetReadFile Lib "wininet.dll" _
(ByVal hFile As Long, ByVal sBuffer As String, ByVal _
lNumBytesToRead As Long, lNumberOfBytesRead As Long) As Integer
Public Declare Function InternetCloseHandle Lib "wininet.dll" _
(ByVal hInet As Long) As Integer
Public Declare Function GIFToBMP Lib "gif2bmp.dll" _
(ByVal lpstrGIF As String, ByVal lpstrBMP As String) As Integer
Sub GetHTMLFromURL()
Dim sUrl As String
Dim s As String
Dim hOpen As Long
Dim hOpenUrl As Long
Dim bDoLoop As Boolean
Dim bRet As Boolean
Dim sReadBuffer As String * 2048
Dim lNumberOfBytesRead As Long
Dim j As Integer: Dim i As Integer: Dim m As Integer
Dim k As Long
Dim sG() As Byte
On Error GoTo NEXTj
For j = 1 To 1230
m = 20000 + j
s = Empty
sUrl = "http://www.nhl.com/scores/htmlreports/20032004/PL0" & m & ".HTM"
hOpen = InternetOpen(scUserAgent, INTERNET_OPEN_TYPE_PRECONFIG, vbNullString, vbNullString, 0)
hOpenUrl = InternetOpenUrl(hOpen, sUrl, vbNullString, 0, INTERNET_FLAG_RELOAD, 0)
bDoLoop = True
While bDoLoop
sReadBuffer = vbNullString
bRet = InternetReadFile(hOpenUrl, sReadBuffer, Len(sReadBuffer), lNumberOfBytesRead)
s = s & Left$(sReadBuffer, lNumberOfBytesRead)
If Not CBool(lNumberOfBytesRead) Then bDoLoop = False
Wend
If hOpenUrl <> 0 Then InternetCloseHandle (hOpenUrl)
If hOpen <> 0 Then InternetCloseHandle (hOpen)
k = Len(s)
ReDim sG(k) As Byte
For i = 1 To k: sG(i) = Asc(Mid(s, i, 1)): Next i
Open "C:\NHL\PL0304\PL0" & m & ".HTM" For Binary As #44 Len = 1
For i = 1 To k: Put #44, i, sG(i): Next i
Close #44
NEXTj:
Next j
End Sub