You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Show similar parameter study for sarsa(λ) with different methods of traces and the true online version. Can also add implementations of Sarsa(λ) that use q-learning and use the distribution function over output to bypass the action value estimation
2065
2063
"""
2066
2064
2067
2065
# ╔═╡ 5bc128ec-2934-4aa5-a922-9017f647e1b3
2068
2066
md"""
2069
2067
#### Sarsa(λ) Parameter Studies With Mountain Car Tile Coding
2070
2068
"""
2071
2069
2072
-
# ╔═╡ 5652f3fd-ec23-4dfb-a171-1e1ed0de275a
2073
-
#add button to run all parameter studies
2074
-
2075
-
# ╔═╡ 251a762a-0d78-419f-b38d-8000d1c072af
2070
+
# ╔═╡ c19209dc-bddf-4390-95a9-fc1d1d836a8a
2076
2071
md"""
2077
-
##### Sarsa$(λ)$ with $\epsilon = 0.01$
2072
+
##### Sarsa$$(λ)$$ with $$\epsilon = 0.01$$
2078
2073
"""
2079
2074
2080
-
# ╔═╡ 54a335f3-672d-4897-b181-e1ee31ba11e1
2075
+
# ╔═╡ 5652f3fd-ec23-4dfb-a171-1e1ed0de275a
2076
+
#=╠═╡
2077
+
@bind run_mountaincar_λ_study1 CounterButton("Run Parameter Study (could take several minutes)")
2078
+
╠═╡ =#
2079
+
2080
+
# ╔═╡ 2c425a9a-49ae-48d3-8ab7-f3c12b081180
2081
2081
md"""
2082
-
##### Expected Sarsa$(λ)$ with $\epsilon = 0.01$
2082
+
##### Expected Sarsa$$(λ)$$ with $$\epsilon = 0.01$$
2083
2083
"""
2084
2084
2085
-
# ╔═╡ 3086d674-49e4-48b9-ae98-9dede3e98fc8
2085
+
# ╔═╡ d7c7316d-aac3-4500-ac3c-0c21b9cf5215
2086
+
#=╠═╡
2087
+
@bind run_mountaincar_λ_study2 CounterButton("Run Parameter Study (could take several minutes)")
2088
+
╠═╡ =#
2089
+
2090
+
# ╔═╡ aea15e6d-9873-406b-993b-04717dad01c6
2086
2091
md"""
2087
-
##### DP$(λ)$ with $\epsilon = 0.01$
2092
+
##### DP$$(λ)$$ with $$\epsilon = 0.01$$
2088
2093
2089
2094
In this method the full transition distribution is used and only state values are estimated.
2090
2095
"""
2091
2096
2092
-
# ╔═╡ 978bb3cd-2b9f-4c73-9d1e-897efbc56f9d
2097
+
# ╔═╡ c57b4792-928a-4450-9364-786e9f186cc8
2098
+
#=╠═╡
2099
+
@bind run_mountaincar_λ_study3 CounterButton("Run Parameter Study (could take several minutes)")
2100
+
╠═╡ =#
2101
+
2102
+
# ╔═╡ b28f47cc-eda7-4961-b6b3-569753386249
2093
2103
md"""
2094
-
##### True Online Sarsa$(λ)$ with $ϵ = 0.01$
2104
+
##### True Online Sarsa$$(λ)$$ with $$ϵ = 0.01$$
2095
2105
2096
2106
Notice that here a slightly lower value of $\lambda$ is optimal which increases the degree of bootstrapping compared to Sarsa$(\lambda)$
2097
2107
"""
2098
2108
2099
-
# ╔═╡ e7beffa8-cea1-497f-80d5-278c3be17802
2109
+
# ╔═╡ 31633123-0249-4d15-b6fe-59480d3038eb
2110
+
#=╠═╡
2111
+
@bind run_mountaincar_λ_study4 CounterButton("Run Parameter Study (could take several minutes)")
2112
+
╠═╡ =#
2113
+
2114
+
# ╔═╡ 48c87368-6f11-4330-9a29-3ecbf60cd146
2100
2115
md"""
2101
-
##### True Online Expected Sarsa$(λ)$ with $ϵ = 0.01$
2116
+
##### True Online Expected Sarsa$$(λ)$$ with $$ϵ = 0.01$$
2102
2117
2103
2118
Similar results to above as we'd expect for such a small value of $\epsilon$
2104
2119
"""
2105
2120
2106
-
# ╔═╡ 0385d4b6-9e60-4e0a-83dd-a9989bdb5cc8
2121
+
# ╔═╡ 831b925f-9f76-48e2-9de0-32724215c568
2122
+
#=╠═╡
2123
+
@bind run_mountaincar_λ_study5 CounterButton("Run Parameter Study (could take several minutes)")
2124
+
╠═╡ =#
2125
+
2126
+
# ╔═╡ 438726e5-f9a1-4bf7-abda-e5bb0eb30c39
2107
2127
md"""
2108
-
##### True Online DP$(λ)$ with $ϵ = 0.01$
2128
+
##### True Online DP$$(λ)$$ with $$ϵ = 0.01$$
2109
2129
2110
2130
Bests results so far which also favor a higher value of $\lambda$ which indicates less reliance on bootstrapping.
2111
2131
"""
2112
2132
2133
+
# ╔═╡ 4d00dfcc-7b01-4335-95ba-0b31fa0e62ad
2134
+
#=╠═╡
2135
+
@bind run_mountaincar_λ_study6 CounterButton("Run Parameter Study (could take several minutes)")
2136
+
╠═╡ =#
2137
+
2113
2138
# ╔═╡ 0a5bec4a-0e65-4753-a1e8-f7b3c6a061df
2114
2139
md"""
2115
2140
##### Results Visualization for Best Training Parameters
@@ -2282,20 +2307,16 @@ function tile_coding_setup(min_value::S, max_value::S, tile_size::S, num_tilings
@@ -2305,50 +2326,45 @@ function run_mountaincar_sarsa_λ(num_steps::Integer, num_tiles::Integer, num_ti
2305
2326
scatter(x = α_list, y = y, name = "λ = $λ")
2306
2327
end
2307
2328
for λ in λ_list]
2308
-
plot(traces, Layout(xaxis_title = "Learning Rate", yaxis_title = "Average Steps Per Episode Averaged <br> Over the First $num_steps Steps and $num_trials Runs"))
2329
+
plot(traces, Layout(xaxis_title = "Learning Rate", yaxis_title = "Average Steps Per Episode Averaged <br> Over the First $num_steps Steps and $num_trials Runs", yaxis_range = [ymin, ymax], xaxis_type = "log"))
0 commit comments