Random: Getting supercross results via web scraper/api/or csv file

notjake
Posts
37
Joined
3/25/2023
Location
North Salt Lake, UT, USA

Obviously a random thing to throw up in the moto related thread but I am needing help on how to get supercross race result data via api/web scrapper/or even a csv file download.

 

Creating my own fantasy supercross game for my friends and I but can’t figure out how to get results data outside of manual entry. (Building something that play more similar fantasy football)

The ama’s website is protected from web scrapers.

Any suggestions appreciated.

 

2
1
|
12/22/2023 8:25pm Edited Date/Time 12/22/2023 8:25pm

I had to download the pdf results and use a pdf table converter to convert them to csv. It was pretty manual and kinda clunky. 

1
12/22/2023 10:05pm

chat gpt 4 data analyst mode

1
1
clem
Posts
363
Joined
6/12/2009
Location
Thibodaux, LA, USA
12/23/2023 7:15am

I've used an MSAcess database to pull data off of a website but you would need to know VBA at minimum. I'm sure these days AI could write it all for you or get close. It would be pretty cool to have the raw database of all MX and SX results from the beginning to run nonsense queries on.

1
tedder900
Posts
1
Joined
2/6/2024
Location
MIDLAND, MI, USA
2/6/2024 12:58pm Edited Date/Time 2/6/2024 12:58pm

@notjake - 

Did you ever find anything out about your question? I was thinking about building a NFL style supercross fantasy league app myself. It would not be practical to manually do all the data updating for the races. 

Probably similarly to you, my first thought was to search for an api and it there doesn't seem to be one. 

Then I found the page below that is pretty close to an api. But, seems to only get the most recent or current race.

https://live.amasupercross.com/

I was also thinking about something similar to this that scrapes the data from the supercrosslive or racerx page.

https://github.com/jenkins1085/supercross-data/commit/8c9c8895612cfeac5…

The Shop

2/6/2024 1:36pm

I think your best bet would be to scrape the data from the series website or the Racer X Vault

nwmoto131
Posts
19
Joined
1/18/2020
Location
Bellingham, WA, USA
2/6/2024 1:52pm

AMA's site posts the pdfs directly after the qualifying and races. You can probably figure out their link taxonomy for those and download the pdfs to your server and parse them from a pdf to csv converter programmatically to get the results into a structured data set.

nwmoto131
Posts
19
Joined
1/18/2020
Location
Bellingham, WA, USA
2/6/2024 1:54pm

Sorry just saw you said AMA's site is protected from scrapers, if you made a database table that kept track of the time of the requests, as long as you don't try to pull every link at the same time, you might be able to get by. Maybe I'll attempt it and report back the results. There's always a work around. Wink

nwmoto131
Posts
19
Joined
1/18/2020
Location
Bellingham, WA, USA
2/6/2024 2:26pm Edited Date/Time 2/6/2024 5:57pm

Update, was able to scrape all event links pretty easily, here they are. I'm not going to actually do this, but you should write a filter for which ones you want. Now let's see if we can get the pdf links from one event, if that works, this should be pretty easy. Will post update soon. Wink

[

    "https://archives.amasupercross.com/events.html/2024/index.html?EventID=…",

    "https://archives.amasupercross.com/events.html/2023/index.html?EventID=…",

    "https://archives.amasupercross.com/events.html/2022/index.html?EventID=…",

    "https://archives.amasupercross.com/events.html/2021/index.html?EventID=…",

    "https://archives.amasupercross.com/events.html/2024/index.html?EventID=…",

    "https://archives.amasupercross.com/events.html/2023/index.html?EventID=…",

    "https://archives.amasupercross.com/events.html/2022/index.html?EventID=…",

    "https://archives.amasupercross.com/events.html/2021/index.html?EventID=…",

    "https://archives.amasupercross.com/events.html/2024/index.html?EventID=…",

    "https://archives.amasupercross.com/events.html/2023/index.html?EventID=…",

    "https://archives.amasupercross.com/events.html/2022/index.html?EventID=…",

    "https://archives.amasupercross.com/events.html/2021/index.html?EventID=…",

    "https://archives.amasupercross.com/events.html/2024/index.html?EventID=…",

    "https://archives.amasupercross.com/events.html/2023/index.html?EventID=…",

    "https://archives.amasupercross.com/events.html/2022/index.html?EventID=…",

    "https://archives.amasupercross.com/events.html/2021/index.html?EventID=…",

    "https://archives.amasupercross.com/events.html/2024/index.html?EventID=…",

    "https://archives.amasupercross.com/events.html/2023/index.html?EventID=…",

    "https://archives.amasupercross.com/events.html/2022/index.html?EventID=…",

    "https://archives.amasupercross.com/events.html/2021/index.html?EventID=…",

    "https://archives.amasupercross.com/events.html/2024/index.html?EventID=…",

    "https://archives.amasupercross.com/events.html/2023/index.html?EventID=…",

    "https://archives.amasupercross.com/events.html/2022/index.html?EventID=…",

    "https://archives.amasupercross.com/events.html/2021/index.html?EventID=…",

    "https://archives.amasupercross.com/events.html/2024/index.html?EventID=…",

    "https://archives.amasupercross.com/events.html/2023/index.html?EventID=…",

    "https://archives.amasupercross.com/events.html/2022/index.html?EventID=…",

    "https://archives.amasupercross.com/events.html/2021/index.html?EventID=…",

    "https://archives.amasupercross.com/events.html/2024/index.html?EventID=…",

    "https://archives.amasupercross.com/events.html/2023/index.html?EventID=…",

    "https://archives.amasupercross.com/events.html/2022/index.html?EventID=…",

    "https://archives.amasupercross.com/events.html/2021/index.html?EventID=…",

    "https://archives.amasupercross.com/events.html/2024/index.html?EventID=…",

    "https://archives.amasupercross.com/events.html/2023/index.html?EventID=…",

    "https://archives.amasupercross.com/events.html/2022/index.html?EventID=…",

    "https://archives.amasupercross.com/events.html/2021/index.html?EventID=…",

    "https://archives.amasupercross.com/events.html/2024/index.html?EventID=…",

    "https://archives.amasupercross.com/events.html/2023/index.html?EventID=…",

..... and more, shortened  for clarity

]

2
nwmoto131
Posts
19
Joined
1/18/2020
Location
Bellingham, WA, USA
2/6/2024 2:46pm Edited Date/Time 2/6/2024 5:57pm

Here's the list of the data urls they are hiding behind an ajax request. Each one of these should give you an xml document you can parse to get the pdf downloads. 24 is the race year, the second part of the number is the round number. 2405 would be Year 2024, Round 5 for example. You can create this easily with the last list I gave you by getting the EventID from the query string in the list of event urls, and creating the xml link from their taxonomy. Then make your request to each round you want and parse the xml. Next update here in a few.

[

    "https://archives.amasupercross.com/xml/sx/events/S2405SchedRes.xml",

    "https://archives.amasupercross.com/xml/sx/events/S2305SchedRes.xml",

    "https://archives.amasupercross.com/xml/sx/events/S2205SchedRes.xml",

    "https://archives.amasupercross.com/xml/sx/events/S2105SchedRes.xml",

    "https://archives.amasupercross.com/xml/sx/events/S2410SchedRes.xml",

    "https://archives.amasupercross.com/xml/sx/events/S2315SchedRes.xml",

    "https://archives.amasupercross.com/xml/sx/events/S2210SchedRes.xml",

    "https://archives.amasupercross.com/xml/sx/events/S2110SchedRes.xml",

    "https://archives.amasupercross.com/xml/sx/events/S2415SchedRes.xml",

    "https://archives.amasupercross.com/xml/sx/events/S2320SchedRes.xml",

    "https://archives.amasupercross.com/xml/sx/events/S2215SchedRes.xml",

    "https://archives.amasupercross.com/xml/sx/events/S2115SchedRes.xml",

    "https://archives.amasupercross.com/xml/sx/events/S2420SchedRes.xml",

    "https://archives.amasupercross.com/xml/sx/events/S2325SchedRes.xml",

    "https://archives.amasupercross.com/xml/sx/events/S2220SchedRes.xml",

    "https://archives.amasupercross.com/xml/sx/events/S2120SchedRes.xml",

    "https://archives.amasupercross.com/xml/sx/events/S2425SchedRes.xml",

    "https://archives.amasupercross.com/xml/sx/events/S2330SchedRes.xml",

    "https://archives.amasupercross.com/xml/sx/events/S2225SchedRes.xml",

    "https://archives.amasupercross.com/xml/sx/events/S2125SchedRes.xml",

    "https://archives.amasupercross.com/xml/sx/events/S2430SchedRes.xml",

    "https://archives.amasupercross.com/xml/sx/events/S2333SchedRes.xml",

    "https://archives.amasupercross.com/xml/sx/events/S2230SchedRes.xml",

    "https://archives.amasupercross.com/xml/sx/events/S2130SchedRes.xml",

    "https://archives.amasupercross.com/xml/sx/events/S2435SchedRes.xml",

    "https://archives.amasupercross.com/xml/sx/events/S2335SchedRes.xml",

    "https://archives.amasupercross.com/xml/sx/events/S2235SchedRes.xml",

    "https://archives.amasupercross.com/xml/sx/events/S2135SchedRes.xml",

    "https://archives.amasupercross.com/xml/sx/events/S2440SchedRes.xml",

    "https://archives.amasupercross.com/xml/sx/events/S2340SchedRes.xml",

    "https://archives.amasupercross.com/xml/sx/events/S2240SchedRes.xml",

    "https://archives.amasupercross.com/xml/sx/events/S2140SchedRes.xml",

    "https://archives.amasupercross.com/xml/sx/events/S2445SchedRes.xml",

    "https://archives.amasupercross.com/xml/sx/events/S2345SchedRes.xml",

    "https://archives.amasupercross.com/xml/sx/events/S2245SchedRes.xml",

    "https://archives.amasupercross.com/xml/sx/events/S2145SchedRes.xml",

    "https://archives.amasupercross.com/xml/sx/events/S2450SchedRes.xml",

    "https://archives.amasupercross.com/xml/sx/events/S2350SchedRes.xml",

    "https://archives.amasupercross.com/xml/sx/events/S2250SchedRes.xml",

    "https://archives.amasupercross.com/xml/sx/events/S2150SchedRes.xml",

    "https://archives.amasupercross.com/xml/sx/events/S2455SchedRes.xml",

    "https://archives.amasupercross.com/xml/sx/events/S2355SchedRes.xml",

    "https://archives.amasupercross.com/xml/sx/events/S2255SchedRes.xml",

    "https://archives.amasupercross.com/xml/sx/events/S2155SchedRes.xml",

    "https://archives.amasupercross.com/xml/sx/events/S2460SchedRes.xml",

    "https://archives.amasupercross.com/xml/sx/events/S2360SchedRes.xml",

    "https://archives.amasupercross.com/xml/sx/events/S2260SchedRes.xml",

    "https://archives.amasupercross.com/xml/sx/events/S2160SchedRes.xml",

    "https://archives.amasupercross.com/xml/sx/events/S2465SchedRes.xml",

    "https://archives.amasupercross.com/xml/sx/events/S2365SchedRes.xml",

    "https://archives.amasupercross.com/xml/sx/events/S2265SchedRes.xml",

    "https://archives.amasupercross.com/xml/sx/events/S2165SchedRes.xml",

    "https://archives.amasupercross.com/xml/sx/events/S2470SchedRes.xml",

    "https://archives.amasupercross.com/xml/sx/events/S2370SchedRes.xml",

    "https://archives.amasupercross.com/xml/sx/events/S2270SchedRes.xml",

    "https://archives.amasupercross.com/xml/sx/events/S2170SchedRes.xml",

... also shortened for clarity

]

1
2
nwmoto131
Posts
19
Joined
1/18/2020
Location
Bellingham, WA, USA
2/6/2024 3:18pm

Alright my bro, got it to get all the data you should need, if you want help building this DM me and we can work something out. Got the data compiler built to get you this nice list of all of the pdfs from race day. Don't really feel like parsing the pdfs right now, but that should be super easy with tabula-py or similar library.

https://tabula-py.readthedocs.io/en/latest/getting_started.html#example

{

    "Best Lap Times": "https://archives.amasupercross.com/xml/SX/events/S2405/SJQ2OVR.pdf",

    "Individual Lap Times": "https://archives.amasupercross.com/xml/SX/events/S2405/S1F1RID.pdf",

    "Individual Segment Times": "https://archives.amasupercross.com/xml/SX/events/S2405/S1F1IND.pdf",

    "Fastest Segment Times": "https://archives.amasupercross.com/xml/SX/events/S2405/S1F1SEG.pdf",

    "Combined Qualifying Times": "https://archives.amasupercross.com/xml/SX/events/S2405/S1QCOVR.pdf",

    "Starting Lineup": "https://archives.amasupercross.com/xml/SX/events/S2405/S1F1LINEUP.pdf",

    "Provisional Results": "https://archives.amasupercross.com/xml/SX/events/S2405/S1F1RES.pdf",

    "Lap Chart": "https://archives.amasupercross.com/xml/SX/events/S2405/S1F1LAP.pdf",

    "Official Results": "https://archives.amasupercross.com/xml/SX/events/S2405/S1F1PRESS.pdf",

    "250SX West Rider Point Standings": "https://archives.amasupercross.com/xml/SX/events/S2405/2WF1POINTS.pdf",

    "250SX West Rider Point Standings (Pos)": "https://archives.amasupercross.com/xml/SX/events/S2405/2WF1POINTSPOSITI…",

    "Rider Point Standings (Pos)": "https://archives.amasupercross.com/xml/SX/events/S2405/S1F1POINTSPOSITI…",

    "SMX Combined Points": "https://archives.amasupercross.com/xml/SX/events/S2405/X1F1SMXPOINTS.pdf",

    "SMX Combined Points (Pos)": "https://archives.amasupercross.com/xml/SX/events/S2405/X1F1SMXPOINTSPOS…",

    "Rider Point Standings": "https://archives.amasupercross.com/xml/SX/events/S2405/S1F1POINTS.pdf",

    "Manufacturer Point Standings": "https://archives.amasupercross.com/xml/SX/events/S2405/S0F1MANUFACTURER…"

}

2
2
3/15/2024 3:49am
nwmoto131 wrote:
Alright my bro, got it to get all the data you should need, if you want help building this DM me and we can work something...

Alright my bro, got it to get all the data you should need, if you want help building this DM me and we can work something out. Got the data compiler built to get you this nice list of all of the pdfs from race day. Don't really feel like parsing the pdfs right now, but that should be super easy with tabula-py or similar library.

https://tabula-py.readthedocs.io/en/latest/getting_started.html#example

{

    "Best Lap Times": "https://archives.amasupercross.com/xml/SX/events/S2405/SJQ2OVR.pdf",

    "Individual Lap Times": "https://archives.amasupercross.com/xml/SX/events/S2405/S1F1RID.pdf",

    "Individual Segment Times": "https://archives.amasupercross.com/xml/SX/events/S2405/S1F1IND.pdf",

    "Fastest Segment Times": "https://archives.amasupercross.com/xml/SX/events/S2405/S1F1SEG.pdf",

    "Combined Qualifying Times": "https://archives.amasupercross.com/xml/SX/events/S2405/S1QCOVR.pdf",

    "Starting Lineup": "https://archives.amasupercross.com/xml/SX/events/S2405/S1F1LINEUP.pdf",

    "Provisional Results": "https://archives.amasupercross.com/xml/SX/events/S2405/S1F1RES.pdf",

    "Lap Chart": "https://archives.amasupercross.com/xml/SX/events/S2405/S1F1LAP.pdf",

    "Official Results": "https://archives.amasupercross.com/xml/SX/events/S2405/S1F1PRESS.pdf",

    "250SX West Rider Point Standings": "https://archives.amasupercross.com/xml/SX/events/S2405/2WF1POINTS.pdf",

    "250SX West Rider Point Standings (Pos)": "https://archives.amasupercross.com/xml/SX/events/S2405/2WF1POINTSPOSITI…",

    "Rider Point Standings (Pos)": "https://archives.amasupercross.com/xml/SX/events/S2405/S1F1POINTSPOSITI…",

    "SMX Combined Points": "https://archives.amasupercross.com/xml/SX/events/S2405/X1F1SMXPOINTS.pdf",

    "SMX Combined Points (Pos)": "https://archives.amasupercross.com/xml/SX/events/S2405/X1F1SMXPOINTSPOS…",

    "Rider Point Standings": "https://archives.amasupercross.com/xml/SX/events/S2405/S1F1POINTS.pdf",

    "Manufacturer Point Standings": "https://archives.amasupercross.com/xml/SX/events/S2405/S0F1MANUFACTURER…"

}

Hey nwmoto131, would love to talk more about data for a project I’m working on. Send through a DM if you’re available for a chat. Thanks

Post a reply to: Random: Getting supercross results via web scraper/api/or csv file

The Latest