Add support for VECTOR column type #9758

nicktobey · 2025-09-02T08:26:33Z

This PR implements the changes required for Dolt to support VECTOR column types (implemented in GMS in dolthub/go-mysql-server#3162)

Actually managing storage read/writes was the easy part and pretty much worked out of the box.

The most extensive changes are to the algorithm for creating vector indexes, which previously assumed that every vector was represented as a 20-byte Hash and represented paths through the tree as multiple 20-byte hashes concatenated to each other. Since VECTOR columns are stored using adaptive encoding, they may be a hash but may also be a variable-length byte buffer. Thus, the index builder needed to be smarter and represent these paths as a proper tuple. Doing it this way also allowed me to clean up the index building code in a way that is hopefully both more readable and eliminates some unnecessary memory copies. I added some clarifying comments, but it could potentially benefit from even more comments.

The other big change was to the test suites. The vector index tests had some hardcoded assumptions about the representation of vectors that needed to be fixed, so I used this as an opportunity to clean that up to.

…upport VECTOR types.

… vector indexes.

…te.sh

coffeegoddd · 2025-09-02T11:11:32Z

@nicktobey DOLT

comparing_percentages
100.000000 to 100.000000

version	result	total
`2638558`	ok	5937471

version	total_tests
`2638558`	5937471

correctness_percentage
100.0

coffeegoddd · 2025-09-02T11:21:36Z

@coffeegoddd DOLT

comparing_percentages
100.000000 to 100.000000

version	result	total
`0a84064`	ok	5937471

version	total_tests
`0a84064`	5937471

correctness_percentage
100.0

macneale4

A few small changes requested. There is plenty here I am unfamiliar with, but the existing tests pass which are a big plus. If you can provide anything which educates me about index builds, that would be great.

macneale4 · 2025-09-02T20:05:10Z

go/libraries/doltcore/schema/typeinfo/vector.go

+	if lengthStr, ok := params[vectorTypeParam_Length]; ok {
+		length, err = strconv.ParseInt(lengthStr, 10, 64)
+		if err != nil {
+			return nil, err


This will provide a pretty cryptic error, I assume. This is really a runtime failure that should never happen. I've been adding context with:

fmt.Errorf("runtime error: vector length unparsable: %w", err)

This way we can search for string and find the code that generated the error quickly.

macneale4 · 2025-09-02T20:07:43Z

go/libraries/doltcore/schema/typeinfo/vector.go

+	var length int64
+	var err error
+	if lengthStr, ok := params[vectorTypeParam_Length]; ok {
+		length, err = strconv.ParseInt(lengthStr, 10, 64)


We support 64 bits of resolution here? Isn't 32 bits way beyond reasonable?

Whole function was unused. I removed it.

macneale4 · 2025-09-02T22:22:28Z

go/libraries/doltcore/schema/typeinfo/vector.go

+	case types.NullKind:
+		_ = reader.ReadKind()
+		return nil, nil
+	}


I don't know much about this area of the code, but shouldn't we be doing something like:

case types.NullKind: _ = reader.SkipValue(f) return types.NullValue, nil }

f is the NomsBinFormat

I don't even know if this is reachable. It looks like it's only used for converting Noms values, which we probably never do anymore? I just copied in the implementation used for binary types, but I wonder if we can just delete all this instead.

go/libraries/doltcore/schema/typeinfo/vector.go

macneale4 · 2025-09-02T22:40:47Z

go/store/prolly/proximity_map.go

 		level = 255 - level // we currently store the level as 255 - the actual level for sorting purposes.
 		depth := int(maxLevel - level)

 		// hashPath is a list of concatenated hashes, representing the sequence of closest vectors at each level of the tree.


comment is for a var being removed.

go/store/prolly/vector_index_chunker.go

…te.sh

coffeegoddd · 2025-09-03T03:24:32Z

@nicktobey DOLT

comparing_percentages
100.000000 to 100.000000

version	result	total
`6f09282`	ok	5937471

version	total_tests
`6f09282`	5937471

correctness_percentage
100.0

coffeegoddd · 2025-09-03T03:31:02Z

@coffeegoddd DOLT

comparing_percentages
100.000000 to 100.000000

version	result	total
`1aa6047`	ok	5937471

version	total_tests
`1aa6047`	5937471

correctness_percentage
100.0

macneale4

LGTM!!

…ey/vector3

coffeegoddd · 2025-09-03T19:10:35Z

@nicktobey DOLT

comparing_percentages
100.000000 to 100.000000

version	result	total
`35d8800`	ok	5937471

version	total_tests
`35d8800`	5937471

correctness_percentage
100.0

nicktobey added 10 commits September 2, 2025 01:38

Replace float64 with float32 in vector operations.

8359b3d

Add vector type to dolt storage

7b92b44

Add vector index on vector columns

b9eff18

Populate ctx and errors for converting values to float vectors, and s…

1844ceb

…upport VECTOR types.

Change signature of GetSubtrees

e79fd1e

Change signature of LoadSubtrees and support vector indexes.

1eb4db4

Rework TestProximityMap to handle both json vector indexes and VECTOR…

2aee623

… vector indexes.

Refactor vector index builder to work with adaptable-encoded columns.

745897d

Correctly dump vector type.

279c17b

Update comments in building vector index maps.

18ee011

nicktobey force-pushed the nicktobey/vector3 branch 3 times, most recently from 5c996ba to 54ef702 Compare September 2, 2025 10:00

Dependency bump

2638558

nicktobey force-pushed the nicktobey/vector3 branch from 54ef702 to 2638558 Compare September 2, 2025 10:38

[ga-format-pr] Run go/utils/repofmt/format_repo.sh and go/Godeps/upda…

0a84064

…te.sh

coffeegoddd added the correctness_approved label Sep 2, 2025

macneale4 self-requested a review September 2, 2025 19:56

macneale4 requested changes Sep 3, 2025

View reviewed changes

nicktobey and others added 4 commits September 2, 2025 19:21

Respond to PR feedback.

7082e3a

Merge remote-tracking branch 'origin/main' into nicktobey/vector3

8776e76

Bump GMS

6f09282

[ga-format-pr] Run go/utils/repofmt/format_repo.sh and go/Godeps/upda…

1aa6047

…te.sh

macneale4 approved these changes Sep 3, 2025

View reviewed changes

nicktobey added 2 commits September 3, 2025 17:44

[ga-bump-dep] Bump dependency in Dolt by nicktobey

d423df9

Remove unused functions.

441cea6

Merge remote-tracking branch 'origin/nicktobey-9aefd194' into nicktob…

35d8800

…ey/vector3

nicktobey merged commit 5c2a2d9 into main Sep 3, 2025
23 of 24 checks passed

Uh oh!

Add support for VECTOR column type #9758

Add support for VECTOR column type #9758

Uh oh!

Conversation

nicktobey commented Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coffeegoddd commented Sep 2, 2025

Uh oh!

coffeegoddd commented Sep 2, 2025

Uh oh!

macneale4 left a comment

Choose a reason for hiding this comment

Uh oh!

macneale4 Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

nicktobey Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

macneale4 Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

nicktobey Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

macneale4 Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

nicktobey Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

macneale4 Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coffeegoddd commented Sep 3, 2025

Uh oh!

coffeegoddd commented Sep 3, 2025

Uh oh!

macneale4 left a comment

Choose a reason for hiding this comment

Uh oh!

coffeegoddd commented Sep 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

nicktobey commented Sep 2, 2025 •

edited

Loading