Posts
37
Joined
3/25/2023
Location
North Salt Lake, UT, USA
Obviously a random thing to throw up in the moto related thread but I am needing help on how to get supercross race result data via api/web scrapper/or even a csv file download.
Creating my own fantasy supercross game for my friends and I but can’t figure out how to get results data outside of manual entry. (Building something that play more similar fantasy football)
The ama’s website is protected from web scrapers.
Any suggestions appreciated.
I had to download the pdf results and use a pdf table converter to convert them to csv. It was pretty manual and kinda clunky.
chat gpt 4 data analyst mode
I've used an MSAcess database to pull data off of a website but you would need to know VBA at minimum. I'm sure these days AI could write it all for you or get close. It would be pretty cool to have the raw database of all MX and SX results from the beginning to run nonsense queries on.
@notjake -
Did you ever find anything out about your question? I was thinking about building a NFL style supercross fantasy league app myself. It would not be practical to manually do all the data updating for the races.
Probably similarly to you, my first thought was to search for an api and it there doesn't seem to be one.
Then I found the page below that is pretty close to an api. But, seems to only get the most recent or current race.
https://live.amasupercross.com/
I was also thinking about something similar to this that scrapes the data from the supercrosslive or racerx page.
https://github.com/jenkins1085/supercross-data/commit/8c9c8895612cfeac5…
The Shop
DeCal Works Huge Plastic Inventory of UFO and Polisport kits.
Luxon 4-Post Bar Mounts
$189.95 - $239.95
Free shipping: VITALMX
I think your best bet would be to scrape the data from the series website or the Racer X Vault
AMA's site posts the pdfs directly after the qualifying and races. You can probably figure out their link taxonomy for those and download the pdfs to your server and parse them from a pdf to csv converter programmatically to get the results into a structured data set.
Sorry just saw you said AMA's site is protected from scrapers, if you made a database table that kept track of the time of the requests, as long as you don't try to pull every link at the same time, you might be able to get by. Maybe I'll attempt it and report back the results. There's always a work around.
Update, was able to scrape all event links pretty easily, here they are. I'm not going to actually do this, but you should write a filter for which ones you want. Now let's see if we can get the pdf links from one event, if that works, this should be pretty easy. Will post update soon.
[
"https://archives.amasupercross.com/events.html/2024/index.html?EventID=…",
"https://archives.amasupercross.com/events.html/2023/index.html?EventID=…",
"https://archives.amasupercross.com/events.html/2022/index.html?EventID=…",
"https://archives.amasupercross.com/events.html/2021/index.html?EventID=…",
"https://archives.amasupercross.com/events.html/2024/index.html?EventID=…",
"https://archives.amasupercross.com/events.html/2023/index.html?EventID=…",
"https://archives.amasupercross.com/events.html/2022/index.html?EventID=…",
"https://archives.amasupercross.com/events.html/2021/index.html?EventID=…",
"https://archives.amasupercross.com/events.html/2024/index.html?EventID=…",
"https://archives.amasupercross.com/events.html/2023/index.html?EventID=…",
"https://archives.amasupercross.com/events.html/2022/index.html?EventID=…",
"https://archives.amasupercross.com/events.html/2021/index.html?EventID=…",
"https://archives.amasupercross.com/events.html/2024/index.html?EventID=…",
"https://archives.amasupercross.com/events.html/2023/index.html?EventID=…",
"https://archives.amasupercross.com/events.html/2022/index.html?EventID=…",
"https://archives.amasupercross.com/events.html/2021/index.html?EventID=…",
"https://archives.amasupercross.com/events.html/2024/index.html?EventID=…",
"https://archives.amasupercross.com/events.html/2023/index.html?EventID=…",
"https://archives.amasupercross.com/events.html/2022/index.html?EventID=…",
"https://archives.amasupercross.com/events.html/2021/index.html?EventID=…",
"https://archives.amasupercross.com/events.html/2024/index.html?EventID=…",
"https://archives.amasupercross.com/events.html/2023/index.html?EventID=…",
"https://archives.amasupercross.com/events.html/2022/index.html?EventID=…",
"https://archives.amasupercross.com/events.html/2021/index.html?EventID=…",
"https://archives.amasupercross.com/events.html/2024/index.html?EventID=…",
"https://archives.amasupercross.com/events.html/2023/index.html?EventID=…",
"https://archives.amasupercross.com/events.html/2022/index.html?EventID=…",
"https://archives.amasupercross.com/events.html/2021/index.html?EventID=…",
"https://archives.amasupercross.com/events.html/2024/index.html?EventID=…",
"https://archives.amasupercross.com/events.html/2023/index.html?EventID=…",
"https://archives.amasupercross.com/events.html/2022/index.html?EventID=…",
"https://archives.amasupercross.com/events.html/2021/index.html?EventID=…",
"https://archives.amasupercross.com/events.html/2024/index.html?EventID=…",
"https://archives.amasupercross.com/events.html/2023/index.html?EventID=…",
"https://archives.amasupercross.com/events.html/2022/index.html?EventID=…",
"https://archives.amasupercross.com/events.html/2021/index.html?EventID=…",
"https://archives.amasupercross.com/events.html/2024/index.html?EventID=…",
"https://archives.amasupercross.com/events.html/2023/index.html?EventID=…",
..... and more, shortened for clarity
]
Here's the list of the data urls they are hiding behind an ajax request. Each one of these should give you an xml document you can parse to get the pdf downloads. 24 is the race year, the second part of the number is the round number. 2405 would be Year 2024, Round 5 for example. You can create this easily with the last list I gave you by getting the EventID from the query string in the list of event urls, and creating the xml link from their taxonomy. Then make your request to each round you want and parse the xml. Next update here in a few.
[
"https://archives.amasupercross.com/xml/sx/events/S2405SchedRes.xml",
"https://archives.amasupercross.com/xml/sx/events/S2305SchedRes.xml",
"https://archives.amasupercross.com/xml/sx/events/S2205SchedRes.xml",
"https://archives.amasupercross.com/xml/sx/events/S2105SchedRes.xml",
"https://archives.amasupercross.com/xml/sx/events/S2410SchedRes.xml",
"https://archives.amasupercross.com/xml/sx/events/S2315SchedRes.xml",
"https://archives.amasupercross.com/xml/sx/events/S2210SchedRes.xml",
"https://archives.amasupercross.com/xml/sx/events/S2110SchedRes.xml",
"https://archives.amasupercross.com/xml/sx/events/S2415SchedRes.xml",
"https://archives.amasupercross.com/xml/sx/events/S2320SchedRes.xml",
"https://archives.amasupercross.com/xml/sx/events/S2215SchedRes.xml",
"https://archives.amasupercross.com/xml/sx/events/S2115SchedRes.xml",
"https://archives.amasupercross.com/xml/sx/events/S2420SchedRes.xml",
"https://archives.amasupercross.com/xml/sx/events/S2325SchedRes.xml",
"https://archives.amasupercross.com/xml/sx/events/S2220SchedRes.xml",
"https://archives.amasupercross.com/xml/sx/events/S2120SchedRes.xml",
"https://archives.amasupercross.com/xml/sx/events/S2425SchedRes.xml",
"https://archives.amasupercross.com/xml/sx/events/S2330SchedRes.xml",
"https://archives.amasupercross.com/xml/sx/events/S2225SchedRes.xml",
"https://archives.amasupercross.com/xml/sx/events/S2125SchedRes.xml",
"https://archives.amasupercross.com/xml/sx/events/S2430SchedRes.xml",
"https://archives.amasupercross.com/xml/sx/events/S2333SchedRes.xml",
"https://archives.amasupercross.com/xml/sx/events/S2230SchedRes.xml",
"https://archives.amasupercross.com/xml/sx/events/S2130SchedRes.xml",
"https://archives.amasupercross.com/xml/sx/events/S2435SchedRes.xml",
"https://archives.amasupercross.com/xml/sx/events/S2335SchedRes.xml",
"https://archives.amasupercross.com/xml/sx/events/S2235SchedRes.xml",
"https://archives.amasupercross.com/xml/sx/events/S2135SchedRes.xml",
"https://archives.amasupercross.com/xml/sx/events/S2440SchedRes.xml",
"https://archives.amasupercross.com/xml/sx/events/S2340SchedRes.xml",
"https://archives.amasupercross.com/xml/sx/events/S2240SchedRes.xml",
"https://archives.amasupercross.com/xml/sx/events/S2140SchedRes.xml",
"https://archives.amasupercross.com/xml/sx/events/S2445SchedRes.xml",
"https://archives.amasupercross.com/xml/sx/events/S2345SchedRes.xml",
"https://archives.amasupercross.com/xml/sx/events/S2245SchedRes.xml",
"https://archives.amasupercross.com/xml/sx/events/S2145SchedRes.xml",
"https://archives.amasupercross.com/xml/sx/events/S2450SchedRes.xml",
"https://archives.amasupercross.com/xml/sx/events/S2350SchedRes.xml",
"https://archives.amasupercross.com/xml/sx/events/S2250SchedRes.xml",
"https://archives.amasupercross.com/xml/sx/events/S2150SchedRes.xml",
"https://archives.amasupercross.com/xml/sx/events/S2455SchedRes.xml",
"https://archives.amasupercross.com/xml/sx/events/S2355SchedRes.xml",
"https://archives.amasupercross.com/xml/sx/events/S2255SchedRes.xml",
"https://archives.amasupercross.com/xml/sx/events/S2155SchedRes.xml",
"https://archives.amasupercross.com/xml/sx/events/S2460SchedRes.xml",
"https://archives.amasupercross.com/xml/sx/events/S2360SchedRes.xml",
"https://archives.amasupercross.com/xml/sx/events/S2260SchedRes.xml",
"https://archives.amasupercross.com/xml/sx/events/S2160SchedRes.xml",
"https://archives.amasupercross.com/xml/sx/events/S2465SchedRes.xml",
"https://archives.amasupercross.com/xml/sx/events/S2365SchedRes.xml",
"https://archives.amasupercross.com/xml/sx/events/S2265SchedRes.xml",
"https://archives.amasupercross.com/xml/sx/events/S2165SchedRes.xml",
"https://archives.amasupercross.com/xml/sx/events/S2470SchedRes.xml",
"https://archives.amasupercross.com/xml/sx/events/S2370SchedRes.xml",
"https://archives.amasupercross.com/xml/sx/events/S2270SchedRes.xml",
"https://archives.amasupercross.com/xml/sx/events/S2170SchedRes.xml",
... also shortened for clarity
]
Alright my bro, got it to get all the data you should need, if you want help building this DM me and we can work something out. Got the data compiler built to get you this nice list of all of the pdfs from race day. Don't really feel like parsing the pdfs right now, but that should be super easy with tabula-py or similar library.
https://tabula-py.readthedocs.io/en/latest/getting_started.html#example
{
"Best Lap Times": "https://archives.amasupercross.com/xml/SX/events/S2405/SJQ2OVR.pdf",
"Individual Lap Times": "https://archives.amasupercross.com/xml/SX/events/S2405/S1F1RID.pdf",
"Individual Segment Times": "https://archives.amasupercross.com/xml/SX/events/S2405/S1F1IND.pdf",
"Fastest Segment Times": "https://archives.amasupercross.com/xml/SX/events/S2405/S1F1SEG.pdf",
"Combined Qualifying Times": "https://archives.amasupercross.com/xml/SX/events/S2405/S1QCOVR.pdf",
"Starting Lineup": "https://archives.amasupercross.com/xml/SX/events/S2405/S1F1LINEUP.pdf",
"Provisional Results": "https://archives.amasupercross.com/xml/SX/events/S2405/S1F1RES.pdf",
"Lap Chart": "https://archives.amasupercross.com/xml/SX/events/S2405/S1F1LAP.pdf",
"Official Results": "https://archives.amasupercross.com/xml/SX/events/S2405/S1F1PRESS.pdf",
"250SX West Rider Point Standings": "https://archives.amasupercross.com/xml/SX/events/S2405/2WF1POINTS.pdf",
"250SX West Rider Point Standings (Pos)": "https://archives.amasupercross.com/xml/SX/events/S2405/2WF1POINTSPOSITI…",
"Rider Point Standings (Pos)": "https://archives.amasupercross.com/xml/SX/events/S2405/S1F1POINTSPOSITI…",
"SMX Combined Points": "https://archives.amasupercross.com/xml/SX/events/S2405/X1F1SMXPOINTS.pdf",
"SMX Combined Points (Pos)": "https://archives.amasupercross.com/xml/SX/events/S2405/X1F1SMXPOINTSPOS…",
"Rider Point Standings": "https://archives.amasupercross.com/xml/SX/events/S2405/S1F1POINTS.pdf",
"Manufacturer Point Standings": "https://archives.amasupercross.com/xml/SX/events/S2405/S0F1MANUFACTURER…"
}
Hey nwmoto131, would love to talk more about data for a project I’m working on. Send through a DM if you’re available for a chat. Thanks
Looks like AMA killed the https://archives.amasupercross.com/ site.
Does anybody know where/if they moved their results archive to?
Post a reply to: Random: Getting supercross results via web scraper/api/or csv file