DRT Driver Update Site Invalid URLs - to be corrected

jcgriff2

Co-Founder / Admin
BSOD Instructor/Expert
Microsoft MVP (Ret.)
Staff member
Joined
Feb 19, 2012
Posts
21,541
Location
New Jersey Shore
The Driver Reference Table (DRT) was created by John Carrona (usasma) in 2009 and currently contains >4,000 drivers + driver update site links and is constantly updated with new drivers as well as corrections to existing ones.

The DRT's driver update site links do change on occasion thus rendering them invalid (HTTP 404/other). We would like to get these links corrected ASAP.

If you come upon a DRT entry that contains an invalid driver update site URL, please post the DRT driver link, the valid driver update site URL (if known) and any other pertinent information so corrections can be made to the DRT to keep it as current and as accurate as possible.

Thank you.
 
Last edited:
I'll get back to you on this soon John. I've just run the table through an automatic link checker & am currently compiling a friendly dataset for us to work from.

Richard
 
It's going to be a couple of hours before I finish this (lectures to attend!) so I'll give you a ball-park figure: we're looking at round about 150 broken links.
 
OK, in just a moment I shall be including in my next post a complete dataset of bad links. There are going to be both false positives and false negatives. The false negative rate is not something I can do anything I can do very much about technologically, but will be fairly low [in theory I can test for various silent error 200 redirects and the like, but it's probably more trouble than it's worth. There won't be many]. We'll just deal with those few as and when we stumble across them. As for the false positives, these are something we are going to have to test for manually from the small dataset.

There are a number of different results. "Checking..." basically means connection timeout. For the most part with these ones, we're not just looking at a bad link, we're looking at the whole website having gone down over the years, the domain not having been renewed, or whatever. Most of these are going to be legitimately broken links. Websites marked as "secure" are either internal links (for some reasons I don't understand), or external secure links. These could not be properly tested by the tool I was using, so most are going to be false positives. Error codes 5xx are likely to be completely broken servers, so are likely to be legitimate. The 4xx errors are likely to mean that just the link has changed but website still active. There should be no false positives in the 404 errors (in theory - in practice a good webpage could return 404, although this is very unlikely), although a typo in the URL whilst copying into the table could cause a 404 whilst the intended destination is still active. 3xx errors will be mixed - most will be redirecting to the homepage or generic support portal, a few may be usefully redirected.

There's a total of 687 entries in the list overall, which is a bit of a pain. I'll see what I can do shortly to whittle it down a little by taking out all of the obvious false positives. Then we'll have to share it out & start opening them all up & looking for alternatives :(

If we work on this as a team, it may be worthwhile starting an Office Online or similar document. I'll see what I can get knocked up. For now though, this is just a quick data dump - I'll see if I can make it better before we set to.

Richard
 

Attachments

Hello Richard,

I am taking a look at the Dataset as we speak. I see that there are few links like "Driver Reference Table - a2ddax86.sys" in the Dataset which are Secure. But the drivers which are Secure is actually very less if we check the actual number of the Drivers on the DRT. Did you remove the rest of the Secure links?

Around 687 broken links would be a painstaking effort. Now, even if there are like 6 people in the team, each would get more than 100 links to process. This is going to be a one hell of a boring ride but we need to do it in order to have the DRT maintained.
If you are forming a team, I would love to be a part of that ^_^. But, we would need to think of a better way of saving the new links since these 687 entries would be needed to have them pushed again to the DRT and would consume a lot of time.

Also, are you fluent in JavaScript Richard?

When are we going to hear your receipt of the MVP Award? ;)


-Pranav
 
There are numerous problems here. The first is that we need to be able to easily update the DRT with the new links. For one thing, the dataset doesn't include the actual driver names from which the link came. It is likely that I won't be making an HTML & Excel table of this, but will work instead from the raw mySQL database for this purpose. The next problem is many driver entries have the same link - there are many duplicates. Although I have dup eliminated my dataset, we would obviously need to make sure to update every entry in the DRT.

We also need to make sure to do it in such a way that we can directly re-import the results (we're not going to be filling in 600+ driver submission forms!). If necessary, we'll do a raw import on the SQL database. But...this takes careful planning & thoughtful consideration, not everybody rushing blindly into checking links.

blueelvis said:
I am taking a look at the Dataset as we speak. I see that there are few links like "Driver Reference Table - a2ddax86.sys" in the Dataset which are Secure. But the drivers which are Secure is actually very less if we check the actual number of the Drivers on the DRT. Did you remove the rest of the Secure links?

I have not yet removed any entries. I think the ones shown are only outgoing links from a DRT entry do another DRT entry. I'm not really sure though, and intend to investigate further.


As for JavaScript, no, I've used it only a couple of times. Although I can program in other languages. And as for the MVP award, it's for Microsoft to decide at what point I become worthy :)
 
Richard - thanks for the hard work on this.
Unfortunately, the broken links are only a part of the problem. What of the links that work, but don't point to a significant entry?
The Advance Search page can drag out the links that we need, and replacing them (although boring) is very possible.
Also, in the past, Laxer has gone in and done mass replacements for us (I believe it was in the SQL table).
 
What of the links that work, but don't point to a significant entry?
I believe this is referring to links that go to a domain, but not the "/downloads" (or similar) page...?

If so, that's a tough one, John -- there is no way for Richard's app to know & report with any reasonable degree of certainty that a link is directing to an actual driver download page v. some other page within the site.

Also, in the past, Laxer has gone in and done mass replacements for us (I believe it was in the SQL table).

Yes, that was done initially for all Atheros drivers & it worked well.

We simply replaced one URL with another; all instances.
 
Richard, I'm a little confused here as to the contents of the Excel XLSX file.

I though you were auto-testing the driver update site links (the "Source" column in the DRT) and reporting broken links (HTTP 404 or other).

If so, I don't understand where entries like these are coming from:
Code:
http://www.carrona.org/drivers/driver.php?id=a2ddax86.sys	*Secure**
http://www.carrona.org/drivers/driver.php?id=avnetflt.sys	*Secure**
http://www.carrona.org/drivers/driver.php?id=awlegacy.sys	*Secure**
http://www.carrona.org/drivers/driver.php?id=bdisk.sys	*Secure**
http://www.carrona.org/drivers/driver.php?id=BdNet.sys	*Secure**
http://www.carrona.org/drivers/driver.php?id=BMLoad.sys	*Secure**
http://www.carrona.org/drivers/driver.php?id=BRCMHD32.sys	*Secure**
http://www.carrona.org/drivers/driver.php?id=cbfs4.sys	*Secure**

I can't find any of those DRT links in the "Source" column -- or anywhere else in the DRT for that matter.
 
Richard, could you please generate a freshly generated data which contains the invalid URLs? I would like to work on them as time permits. Can you think of some way in which the manual work required is less? :p

Also, I would love to know how you generated the bad links and compiled them :)

-Pranav
 
Richard, could you please generate a freshly generated data which contains the invalid URLs? I would like to work on them as time permits. Can you think of some way in which the manual work required is less? :p

Also, I would love to know how you generated the bad links and compiled them :)

-Pranav

You could pretty easily write a function that trys to grab the HTTP headers for all the webpages linked in the DRT.

Any that don't return successfully flag them to be displayed later as ones to investigate.

I suspect Richard did something similar as in the file he attached you can clearly see the link and the HTTP response code.

If there are a whole bunch that match the same format you can essentially do a find & replace using SQLs replace and like/rlike.
 
I'm not sure how Richard tested it but here is a solution for you.

I am not at home and cannot access any remote servers so I will do it in an odd way.

Step 1: Get all the Links.
I just ran a javascript that spits all the important links out to the page.
Load the DRT and paste this into the URL bar:
Code:
javascript: alert('After clicking OK this will generate all the links in the Source Column(Takes about 30 secs)'); var cells = document.getElementsByClassName('col4'); var out = document.getElementById('last-updated'); out.innerHTML = ''; var c = 0; for(var i=0; i < cells.length; i++) { var links = cells[i].getElementsByTagName('a'); for(var j=0; j < links.length; j++){ if(c==99) { out.innerHTML += '<hr></hr>'; c=0; } out.innerHTML += links[j].href + '<br/>'; c++; } }

Step 2:
Once the page finishes loading copy/paste all of the links section by section into this site:
Bulk HTTP Header Response Checker & Comparison Tool

It will process all the links and let you know what ones are bad. (Ctrl+f for 500 and 404)

Only bad thing about this method is it only can check 100 links at a time. (reason for the sections)
This is in no way a good solution but will give you something to work on.

I would suggest writting something yourself to do the checks from your local machine. Should be much faster and easier to use.
 
I can't remember precisely which one I used, but I basically found a 404 checking website online (one which allows thousands of errors), left it for half an hour to an hour to churn away, then stuck the whole dataset into Excel and tidied it up. It took a little while (I probably could have coded something in C# faster), but it did the job.
 
Back
Top