About VCE Data Explorer
A project to make VCE statistics accessible, transparent, and actually useful for students.
Why this website exists
This project started as a personal tool to help me make decisions about my own studies, and it eventually grew into the platform you see today. The VCE Data Explorer solves a few problems I'd faced while looking for data online.
- VCAA reports are difficult to use. The honour roll is the best example of this. With the VCE Data Explorer, all the data you need is sorted nicely and is accessible in one place. Further, there's no other way to access the actual number of high achievers without going into the VCAA reports due to websites such as quppa.net simply don't count students who don't consent to their names being released.
- Accurate scaling data is hard to come by. The VTAC scaling report rounds scaled scores to 0dp when they use 2dp internally, and all the ATAR calculators online do the same.
- Existing scaling calculators lack transparency. I wanted to know exactly how estimates were reached, and existing calculators just didn't give you this information or any indication of when their information was from.
How is the scaling data calculated?
The Mathematical Model
VTAC doesn't publish a "formula" for scaling. Instead, they provide "anchor points" in their annual reports that show how raw scores of 20, 25, 30, 35, 40, 45, and 50 are scaled for each subject.
To determine the scores in between these points, two methods are used:
1. PCHIP Interpolation
For the lower end of the scores (<20), a "Piecewise Cubic Hermite Interpolating Polynomials" (PCHIP) is used.
2. Weighted Regressions
For the high end, crowdsourced data is incorporated and a weighted cubic polynomial is fit that respects the official anchors while shifting slightly to match real-world student reports.
# 1. Official VTAC Anchor Points (Standard 20-50 range)
vcaa_x = [20, 25, 30, 35, 40, 45, 50]
vcaa_y = [21.5, 28.2, 34.4, 40.8, 45.6, 48.9, 50.8]
# 2. Lower End Stability (Raw < 20)
# We use PCHIP to ensure the curve follows anchors perfectly.
pchip = Scipy.PchipInterpolator(vcaa_x, vcaa_y)
# 3. Higher End Precision (Raw >= 20)
# We blend official data with verified crowdsourced points.
all_x = vcaa_x + crowdsourced_x
all_y = vcaa_y + crowdsourced_y# 4. Weighted Regression Logic
# We use a graduated weighting system based on proximity.
# Only if verified data is very close does it override the official point.
anchor_weights = []
for anchor in vcaa_x:
dist = min(abs(anchor - tx) for tx in crowd_x)
if dist == 0: weight = 0 # Exact match (Crowd wins)
elif dist == 1: weight = 50 # Very close (Heavy reduction)
elif dist == 2: weight = 200 # Close (Moderate reduction)
elif dist == 3: weight = 500 # Nearby (Slight reduction)
else: weight = 1000 # Far away (Full stability)
anchor_weights.append(weight)
# 5. Resulting Curve
final_weights = anchor_weights + [1000] * len(crowd_x)
model = np.polyfit(all_x, all_y, deg=3, w=final_weights)
⚠️ Accuracy Disclaimer
Please note the scaling graphs are statistical approximations. They may be slightly inaccurate for specific scores due to the nature of curve fitting and the lack of official information. Always use the "Calculated" line as an estimate, not a guarantee.
The Distribution Graph
The Study Score Distribution graphs are reconstructed visualizations. Because VCAA only releases the numbers of students scoring above 40, a standard bell curve (Gaussian Distribution) is used with the subject-specific mean and standard deviation to estimate the rest of the cohort.
Note: These graphs are for illustrative purposes only. They help you see how your score relates to the rest of the state, but they shouldn't be used to determine your exact rank or score.