|
487 | 487 | value, followed by that many key/value pairs. A block
|
488 | 488 | with count zero indicates the end of the map. Each item
|
489 | 489 | is encoded per the map's value schema.</p>
|
490 |
| - |
| 490 | + |
491 | 491 | <p>If a block's count is negative, its absolute value is used,
|
492 | 492 | and the count is followed immediately by a <code>long</code>
|
493 | 493 | block <em>size</em> indicating the number of bytes in the
|
494 | 494 | block. This block size permits fast skipping through data,
|
495 | 495 | e.g., when projecting a record to a subset of its fields.</p>
|
496 |
| - |
| 496 | + |
497 | 497 | <p>The blocked representation permits one to read and write
|
498 | 498 | maps larger than can be buffered in memory, since one can
|
499 | 499 | start writing items without knowing the full length of the
|
500 | 500 | map.</p>
|
501 |
| - |
| 501 | + |
502 | 502 | </section>
|
503 | 503 |
|
504 | 504 | <section id="union_encoding">
|
|
569 | 569 |
|
570 | 570 | </section>
|
571 | 571 |
|
| 572 | + <section id="single_object_encoding"> |
| 573 | + <title>Single-object encoding</title> |
| 574 | + |
| 575 | + <p>In some situations a single Avro serialized object is to be stored for a |
| 576 | + longer period of time. One very common example is storing Avro records |
| 577 | + for several weeks in an <a href="http://kafka.apache.org/">Apache Kafka</a> topic.</p> |
| 578 | + <p>In the period after a schema change this persistance system will contain records |
| 579 | + that have been written with different schemas. So the need arises to know which schema |
| 580 | + was used to write a record to support schema evolution correctly. |
| 581 | + In most cases the schema itself is too large to include in the message, |
| 582 | + so this binary wrapper format supports the use case more effectively.</p> |
| 583 | + |
| 584 | + <section id="single_object_encoding_spec"> |
| 585 | + <title>Single object encoding specification</title> |
| 586 | + <p>Single Avro objects are encoded as follows:</p> |
| 587 | + <ol> |
| 588 | + <li>A two-byte marker, <code>C3 01</code>, to show that the message is Avro and uses this single-record format (version 1).</li> |
| 589 | + <li>The 8-byte little-endian CRC-64-AVRO <a href="#schema_fingerprints">fingerprint</a> of the object's schema</li> |
| 590 | + <li>The Avro object encoded using <a href="#binary_encoding">Avro's binary encoding</a></li> |
| 591 | + </ol> |
| 592 | + </section> |
| 593 | + |
| 594 | + <p>Implementations use the 2-byte marker to determine whether a payload is Avro. |
| 595 | + This check helps avoid expensive lookups that resolve the schema from a |
| 596 | + fingerprint, when the message is not an encoded Avro payload.</p> |
| 597 | + |
| 598 | + </section> |
| 599 | + |
572 | 600 | </section>
|
573 | 601 |
|
574 | 602 | <section id="order">
|
|
1237 | 1265 | </ul>
|
1238 | 1266 | </section>
|
1239 | 1267 |
|
1240 |
| - <section> |
| 1268 | + <section id="schema_fingerprints"> |
1241 | 1269 | <title>Schema Fingerprints</title>
|
1242 | 1270 |
|
1243 | 1271 | <p>"[A] fingerprinting algorithm is a procedure that maps an
|
|
0 commit comments