Jennifer Chue
08/27/2025, 8:46 AM"Previous period"
comparison is derived? Right now, my assumption is that the # days are being calculated between the two selected dates, thereafter the previous dates will be calculated by subtracting the # days from the two selected dates.
For example, for a static period from "1/27/2024 00:00" to "8/27/2024 23:59", there are 214 days in between, thus the previous dates calculated would be from "6/27/2023 00:00" to "1/26/2024 23:59". However, the numbers seem to be slightly off, would appreciate any help to correct the logic used 🙏
cc @Alson YapIsmail Karafakioglu
08/27/2025, 9:31 AMJennifer Chue
08/27/2025, 10:14 AM"Previous period"
column is not consistent...? Sometimes it shows .88217 then another time .88214 for example.Jennifer Chue
08/27/2025, 10:15 AMJoseph Heun
08/27/2025, 2:59 PMJennifer Chue
08/29/2025, 9:24 AM"Previous period"
discussion, I’d like to better understand how GD is executing these calculations. For some context, we have two workspaces, namely internal-mtdev
(which returns unfiltered data, i.e., all records for GD to handle filtering) and internal-test
(which returns data filtered by date before being sent to GD) that we are working on to filter the data based on the date before returning to GD.
However, the numbers shown on both workspaces do not correspond to the actual sum in MongoDB (neither do they correspond to each other strangely). For example, for the period from "7/1/2024 00:00" to "12/31/2024 23:59", the sum should be 2717147324996.3745
. This value remains unchanged throughout all conversion and processing steps before it is returned to GD (logs from internal-test
). I've also created dashboards for internal-mtdev and internal-test to test.
Would really appreciate any help or insight into what might be causing this discrepancy, thank you! 🙏Mauricio Cabezas
08/29/2025, 2:25 PMActivity_Emission_Volume_CO2e
are slightly different. We found that In DEV there are multiple datasets with the same fact ID, while TEST only has one, then we forced the use of the corresponding dataset with USING EmissionActivities
, by creating a metric SELECT SUM(fact) USING dataset, but we obtained the same results.
To troubleshoot further, could you help with a few things?
1. Try running the same aggregation over a small range (5–10 days) and, if possible, share the raw MongoDB values for that period.
2. Share the exact MongoDB query you’re using for the sum.
3. You mentioned TEST filters by date before sending to GD—could you clarify how that filtering is applied? I ask this because if I use ALL dates in the filter, I would expect the same period of time 07.01.2024 to 12.31.2024, but is not the case
4. Have you seen similar discrepancies on other facts, or only on Activity_Emission_Volume_CO2e
?
5. For DEV, please double-check that the MongoDB query is pulling from the same dataset (EmissionActivities
), since multiple datasets have the same fact ID there.
This should help us pinpoint if the difference starts in MongoDB, FluxConnect, or within GD itself.
Thank you for your cooperation and patience.Jennifer Chue
09/01/2025, 6:34 AM12/27/2024
)
2. MongoDB query:
[
{
$match: {
activityDate: {
$gte: ISODate("2024-12-27T00:00:00Z"),
$lte: ISODate("2024-12-27T23:59:00Z")
}
}
},
{
$group: {
_id: null,
totalEmissionVolume: {
$sum: "$emissionVolume"
}
}
}
]
3. The query object below is passed to the find_arrow_all
method via the query parameter to apply the filtering.
{'activityDate': {'$gte': datetime.datetime(2024, 12, 27, 0, 0), '$lte': datetime.datetime(2024, 12, 27, 23, 59)}}
4. Yes, I've tested with other facts and the decimal places are slightly off for them too.
5. I've double checked and the MongoDB query is indeed pulling from EmissionActivities
.
Here's the corresponding numbers respectively, have also edited the dashboards linked previously to reflect this date as well:
Dev: 12,257,277,645.4401245
Test: 12,257,277,645.4401646
MongoDB: 12257277645.440123Julius Kos
09/01/2025, 11:55 AMEmissionActivities
?Jennifer Chue
09/02/2025, 7:15 AMpyarrow.float64()
and the datatype in MongoDB is Int32
or Double
. In the LDM, the source type is Numeric
.Jennifer Chue
09/02/2025, 8:10 AMFrancisco Antunes
09/02/2025, 8:59 AMFrancisco Antunes
09/02/2025, 12:58 PMDouble
data type (which is a type of float).
Floating-point representation isn’t able to exactly represent all decimal numbers, and when aggregating values with a lot of decimal digits (like in this case), there is some loss in precision. When aggregated across thousands of values, this can add up.
I ran some tests with the raw data you sent me, and running a normal SUM on the values (I used awk
) returns the exact value we can see in the mtdev
workspace - which confirms that the imprecision on the floats is the problem. I even tried running more precise calculations, and got results much closer to what you got on MongoDB.
This subject is explained much better in this MongoDB Article: Quick Start: BSON Data Types - Decimal128. It also proposes a solution for these very precise values: casting them as decimal128
data type instead. This will ensure that the data isn’t processed as Floats (which is MongoDB’s default for this kind of numbers with decimal digits) and help resolve the imprecision problem.
Incidentally, I believe it might also resolve the different values across workspaces - as even differences in the order of the sums, when performing floating-point arithmetic, can lead to (very) small differences in results - such as you noticed, beyond the 4th decimal digit.Jennifer Chue
09/05/2025, 2:13 AMFrancisco Antunes
09/05/2025, 9:18 AM