Add instrumentation for Nebulex, a distributed cache library. This
library provides solid telemetry support for this initial
implementation.
Caching implementation is mostly based on in-memory storage (like ETS)
and RPC calls for distribution (via OTP libraries, like :erpc). AFAICT,
there is not much specifics for how to translate into Semantic
Attributes: those caches are not quite a DB, except maybe for the one
which implements the storage; the RPC can't be reliably captured
either.
Given the above constraints, this initial implementation instruments the
library via custom attributes (namespaced as `nebulex.*`). It's not 100%
clear the behaviour of OTel for actual distributed caches - from my
tests, that may create some orphan spans. I think that's fine as first
release.
Nebulex follow the patterns of Ecto, so this instrumentation follows a
similar pattern of OpentelemetryEcto. It does include a `setup_all/1`
function for convenience, that leverages the :init events Nebulex emit
on process start.
Co-authored-by: Tristan Sloughter <t@crashfast.com>
First one is related to `OpenTelemetry.Ctx` API. I've noticed in a few
scenarios the current span of a trace may get lost after Ecto calls.
Looking at the The `attach/1` typespec, it's a Ctx -> Token, while
`dettach/1` as Token -> Ctx function. That made me assume the expected
input of dettach is the return type of attach. Indeed, after this change
we got the behavior of Ecto calls preserve the parent span untouched.
That leads to a second bug found. When ecto does simple calls within a
Task, due the special propagation code for preloads that means it will
skip the current span, if any. The solution here is to first check the
current process.
One test was added to reproduce this bug.
* Fix CI errors, update GHA deps, update versions
* output syntax
* remove OTP 22 tests
* set concurrency to cancel in progress
* whitespace
* incompatible vsns and failed test
* Try pulling excludes out
* Escaping
* Just drop < 1.13 until this can move to workflows
* quote
* fix remove extra bracket in mix.exs
* use capture log for less verbose test output
* add span_name opt for overriding span name
* add moduledoc
* allow function for span_name opt
Instead of custom attributes, leverage the status description as
described in Semantic Conventions. This approach is taken from current
`opentelemetry_ecto` implementation.
Small non-related change is a fix the license description in `mix.exs`.
Default exporter immediately attempts on start connect to
`:otel-collector` default port. As we don't have any collector running
on our test environment, this results in a few warnings. That's not an
issue in itself, as code immediately switches to another export, but
creates a lot of noise.
This patch moves the exporter setup to `config/test.exs`, essentially
removing the need to restart opentelemetry applicationn for each test
case. The only work setup blocks do is update the exporter's target pid.
The processor was changed to simple mode, available now, which also remove
another vector of (unlikely but theoretically possible) race-conditions.
* Handle binary resp_status from Cowboy and rename http.status to http.status_code
* Fix test to use http.status_code as well
* Handle resp_status could be undefined but not error
- This could be due to websocket upgrade request.
* Rename Status1 and transform_status to a more concise naming
* Add test case for handling binary response code
* Fix syntax and failing tests
* Always convert cowboy status to status code
* Set otel span status as error when status code >= 500
There is an edge case, if you use `forward/4` and use Plug.ErrorHandler,
then when an exception reaches the outer router, then Plug.send_resp
will be called, triggering `[:phoenix, :endpoint, :stop]`, and the span
will be gone by the time the outer router gets the exception. This
causes this telemetry handler to crash and be detached.
Sequence of events:
- [:phoenix, :endpoint, :start]
- [:phoenix, :router_dispatch, :exception] (inner router)
- [:phoenix, :endpoint, :stop]
- [:phoenix, :router_dispatch, :exception] (outer router) ** here there is no span, crashes
Initial approach follows Ecto instrumentation, recording spans for all
Redix `[:redix, :pipeline, :stop]` events.
The command sanitization is inspired-by and adapted from [Java
instrumentation][1], from where I've also copied the actual commands and
what configuration should they follow.
Network attributes are tracked via a "sidecar" process, which keeps
track of connection attributes also via `telemetry`. This extra bit of
bookkeeping is needed as command events doesn't include that piece of
information, unfortunately.
[1]: b2bc41453b/instrumentation-api/src/main/java/io/opentelemetry/instrumentation/api/db/RedisCommandSanitizer.java
By default a new trace is automatically started when a job is processed
by monitoring these events:
* `[:oban, :job, :start]` — at the point a job is fetched from the database and will execute
* `[:oban, :job, :stop]` — after a job succeeds and the success is recorded in the database
* `[:oban, :job, :exception]` — after a job fails and the failure is recorded in the database
To also record a span when a job is created and to link traces together
`Oban.insert/2` has to be replaced by `OpentelemetryOban.insert/2`.
Before:
```elixir
%{id: 1, in_the: "business", of_doing: "business"}
|> MyApp.Business.new()
|> Oban.insert()
```
After:
```elixir
%{id: 1, in_the: "business", of_doing: "business"}
|> MyApp.Business.new()
|> OpentelemetryOban.insert()
```
Co-authored-by: Tristan Sloughter <t@crashfast.com>
* Set error status on error instead of just adding attribute
* Use Exception.message if error is an exception
Co-authored-by: Bryan Naegele <bryannaegele@users.noreply.github.com>
As it stands, when source is nil (which, for example, can happen if there is a
call to `Repo.transaction`), the name of the span ends in an odd ':'. This
removes that ':'.
Co-authored-by: Bryan Naegele <bryannaegele@users.noreply.github.com>
* Update otel-phoenix deps
* CI not triggering
* Remove deprecated plug conn property
* Retry shared matrix with 1.11 include fix
* Try reading the file again
* Still can't get file reading right
* Update source links