Class ArrowDataset

A Dataset manages the production and manipulation of tiles. Each plot has a single dataset; the dataset handles all transformations around data through batchwise operations.

Hierarchy

Constructors

Properties

_ix_seed: number = 0
_schema?: Schema<any>
plot: Plot
promise: Promise<void> = ...
root_tile: ArrowTile
tileProxy?: TileProxy
transformations: Record<string, Transformation<ArrowTile>> = {}

Accessors

  • get highest_known_ix(): number
  • The highest known point that deepscatter has seen so far. This is used to adjust opacity size.

    Returns number

  • get ready(): Promise<void>
  • Returns Promise<void>

  • get table(): Table<any>
  • Attempts to build an Arrow table from all record batches. If some batches have different transformations applied, this will error

    Returns Table<any>

Methods

  • Parameters

    • ids: Record<string, number>

      A list of ids to get, keyed to the value to set them to.

    • field_name: string

      The name of the new field to create

    • key_field: string = '_id'

      The column in the dataset to match them against.

    Returns void

  • Parameters

    • field_name: string

      the name of the column to create

    • buffer: Uint8Array

      An Arrow IPC Buffer that deserializes to a table with columns('data' and '_tile')

    Returns void

  • Given an ix, apply a transformation to the point at that index and return the transformed point (not just the transformation, the whole point) As a side-effect, this applies the transformaation to all other points in the same tile.

    Parameters

    • transformation: string

      The name of the transformation to apply

    • ix: number

      The index of the point to transform

    Returns Promise<StructRowProxy<any>>

  • Parameters

    • dimension: string
    • max_ix: number = 1e6

    Returns [number, number]

  • Returns

    A structRowProxy for the point with the given index.

    Parameters

    • ix: number

      The index of the point to get.

    Returns StructRowProxy<any>[]

  • Finds the points and tiles that match the passed ix

    Returns

    A list of [tile, point] pairs that match the index.

    Parameters

    • ix: number

      The index of the point to get.

    Returns [Tile, StructRowProxy<any>, number][]

  • Returns

    True if the column exists in the dataset, false otherwise.

    Parameters

    • name: string

      The name of the column to check for

    Returns boolean

  • Map a function against all tiles. It is often useful simply to invoke Dataset.map(d => d) to get a list of all tiles in the dataset at any moment.

    Returns

    A list of the results of the function in an order determined by 'after.'

    Type Parameters

    • U

    Parameters

    • callback: ((tile: ArrowTile) => U)

      A function to apply to each tile.

    • after: boolean = false

      Whether to perform the function in bottom-up order

    Returns U[]

  • Parameters

    Returns Generator<StructRowProxy<any>, void, unknown>

  • This allows creation of a new column in your chart.

    A few thngs to be aware of: the point function may be run millions of times. For best performance, you should not wrap complicated logic in this: instead, generate any data structures outside the function.

    name: the name to identify the new column in the data. pointFunction: a function that runs on a single row of data. It accepts a single argument, the data point to be transformed: technically this is a StructRowProxy on the underlying Arrow frame, but for most purposes you can treat it as a dict. The point is read-only--you cannot change attributes.

    For example: suppose you have a ['lat', 'long'] column in your data and want to create a new set of geo coordinates for your data. You can run the following. { const scale = d3.geoMollweide().extent([-20, -20, 20, 20]) scatterplot.register_transformation('mollweide_x', datum => { return scale([datum.long, datum.lat])[0] }) scatterplot.register_transformation('mollweide_y', datum => { return scale([datum.long, datum.lat])[1] }) }

    Note some constraints: the scale is created outside the functions, to avoid the overhead of instantiating it every time; and the x and y coordinates are created separately with separate function calls, because it's not possible to assign to both x and y simultaneously.

    Parameters

    • name: string
    • pointFunction: PointFunction
    • prerequisites: string[] = []

    Returns void

  • Invoke a function on all tiles in the dataset that have been downloaded. The general architecture here is taken from the d3 quadtree functions. That's why, for example, it doesn't recurse.

    Parameters

    • callback: ((tile: ArrowTile) => void)

      The function to invoke on each tile.

    • after: boolean = false

      Whether to execute the visit in bottom-up order. Default false.

    • filter: ((t: ArrowTile) => boolean) = ...

    Returns void

  • Invoke a function on all tiles in the dataset, downloading those that aren't here yet.. The general architecture here is taken from the d3 quadtree functions. That's why, for example, it doesn't recurse.

    Parameters

    • callback: ((tile: ArrowTile) => Promise<void>)

      The function to invoke on each tile.

    • after: boolean = false

      Whether to execute the visit in bottom-up order. Default false.

    • starting_tile: ArrowTile = null
    • filter: ((t: ArrowTile) => boolean) = ...
    • updateFunction: ((tile: ArrowTile, completed: any, total: any) => Promise<void>)
        • (tile: ArrowTile, completed: any, total: any): Promise<void>
        • Parameters

          Returns Promise<void>

    Returns Promise<void>

Generated using TypeDoc