Submitted By: Ken Moffat Date: 2021-11-26 Initial Package Version: From git 5.15 branch. Upstream Status: The 5.15.7 changes. Origin: upstream diff -Naur a/CHROMIUM_VERSION b/CHROMIUM_VERSION --- a/CHROMIUM_VERSION 1970-01-01 01:00:00.000000000 +0100 +++ b/CHROMIUM_VERSION 2021-11-20 03:33:14.168428455 +0000 @@ -0,0 +1,2 @@ +Based on Chromium version: 87.0.4280.144 +Patched with security patches up to Chromium version: 94.0.4606.61 diff -Naur a/CVE-fixes b/CVE-fixes --- a/CVE-fixes 2021-08-25 16:32:13.000000000 +0100 +++ b/CVE-fixes 2021-11-20 04:05:13.820919579 +0000 @@ -1,12 +1,35 @@ Cumulative CVE fixes since 5.15.2 (the last non paid-for release) -Fixed in this 5.15.6 snapshot: +Fixed in this 5.15.7 patch: -CVE-2021-30604: Use after free in ANGLE Not yet public -CVE-2021-30603: Race in WebAudio Not yet public -CVE-2021-30602: Use after free in WebRTC Not yet public -CVE-2021-30599: Type Confusion in V8 Not yet public -CVE-2021-30598: Type Confusion in V8 Not yet public +CVE-2021-37980 : Inappropriate implementation in Sandbox High +CVE-2021-37979 : Heap buffer overflow in WebRTC High +CVE-2021-37978 : Heap buffer overflow in Blink High +CVE-2021-37975 : Use after free in V8 High +CVE-2021-37973 : Use after free in Portals Critical +CVE-2021-37972 : Out of bounds read in libjpeg-turbo High +CVE-2021-37971 : Incorrect security UI in Web Browser UI. Medium +CVE-2021-37968 : Inappropriate implementation in Background Fetch API Medium +CVE-2021-37967 : Inappropriate implementation in Background Fetch API Medium +CVE-2021-37962 : Use after free in Performance Manager High +CVE-2021-30633: Use after free in Indexed DB API Critical +CVE-2021-30630: Inappropriate implementation in Blink Medium +CVE-2021-30629: Use after free in Permissions High +CVE-2021-30628: Stack buffer overflow in ANGLE High +CVE-2021-30627: Type Confusion in Blink layout High +CVE-2021-30626: Out of bounds memory access in ANGLE High +CVE-2021-30625: Use after free in Selection API High +CVE-2021-30618: Inappropriate implementation in DevTools High +CVE-2021-30616: Use after free in Media. High +CVE-2021-30613: Use after free in Base internals High + +Previously fixed in the 5.15.6 snapshot: + +CVE-2021-30604: Use after free in ANGLE High +CVE-2021-30603: Race in WebAudio High +CVE-2021-30602: Use after free in WebRTC High +CVE-2021-30599: Type Confusion in V8 High +CVE-2021-30598: Type Confusion in V8 High CVE-2021-30588: Type Confusion in V8 High CVE-2021-30587: Inappropriate implementation in Compositing Medium CVE-2021-30585: Use after free in sensor handling High @@ -42,7 +65,7 @@ CVE-2021-30510: Race in Aura CVE-2021-30508: Heap buffer overflow in Media Feeds -Previously fixed in qtwebengien-20210401-upstream_fixes-1.patch +Previously fixed in qtwebengine-20210401-upstream_fixes-1.patch CVE-2021-21233-Heap-buffer-overflow-in-ANGL.patch CVE-2021-21231-Insufficient-data-validation.patch diff -Naur a/examples/webenginewidgets/markdowneditor/mainwindow.cpp b/examples/webenginewidgets/markdowneditor/mainwindow.cpp --- a/examples/webenginewidgets/markdowneditor/mainwindow.cpp 2021-08-24 13:35:32.000000000 +0100 +++ b/examples/webenginewidgets/markdowneditor/mainwindow.cpp 2021-11-20 03:27:27.961255186 +0000 @@ -170,7 +170,7 @@ void MainWindow::onFileSaveAs() { QString path = QFileDialog::getSaveFileName(this, - tr("Save MarkDown File"), "", tr("MarkDown File (*.md, *.markdown)")); + tr("Save MarkDown File"), "", tr("MarkDown File (*.md *.markdown)")); if (path.isEmpty()) return; m_filePath = path; diff -Naur a/GIT-VERSIONS b/GIT-VERSIONS --- a/GIT-VERSIONS 2021-08-25 16:32:49.000000000 +0100 +++ b/GIT-VERSIONS 2021-11-20 04:10:05.796278465 +0000 @@ -1,44 +1,33 @@ -qtwebengine-5.15.6 is at +This patch updates BLFS's qtwebengien-5.15.6 to 5.15.7. +It is taken from the 5.15 branch after the 5.15.7 changes were merged back. -commit 2acbba86362ac3a1c2d8c20390dc263875f8f09c (HEAD -> 5.15.6, origin/5.15.6) -Author: Michael BrĂ¼ning -Date: Tue Aug 24 14:21:04 2021 +0200 +qtwebengien is at + +commit 604f42c37b36a4674f953665f84872e4d83e0316 (HEAD -> 5.15) +Author: Allan Sandfeld Jensen +Date: Tue Oct 19 14:46:32 2021 +0200 Update Chromium - Submodule src/3rdparty c8087cb6..9f71911e: - - > [Backport] CVE-2021-30560: Use after free in Blink XSLT - - Task-number: QTBUG-94103 - Change-Id: I3e43653b6b3370d71b09b52a781a3b1d6c82293e - Reviewed-by: Allan Sandfeld Jensen + Submodule src/3rdparty 9f71911e3..8c0a9b445: + > Revert "[Backport] Security bug 1239116" + [...] src/3rdparty/chromium is at -commit 9f71911e38c041cedc5291c5e772b7d03ce8b8c8 (HEAD -> 87-based, origin/87-based) -Author: Roger Zanoni -Date: Wed Jul 28 09:34:36 2021 +0000 +commit 8c0a9b4459f5200a24ab9e687a3fb32e975382e5 (HEAD -> 87-based) +Author: Allan Sandfeld Jensen +Date: Tue Oct 19 17:35:51 2021 +0000 - [Backport] CVE-2021-30560: Use after free in Blink XSLT + Revert "[Backport] Security bug 1239116" - Cherry-pick of patch originally reviewed on - https://chromium-review.googlesource.com/c/chromium/src/+/3042731: - Fix use-after-free with XSLT strip-space + This reverts commit adcb7c9d94a507286e02c6c2005498011eb5a554. - (cherry picked from commit 79fc7bcbc940a66f4edfd2c49a5e63106074836a) - - Fixed: 1219209 - Change-Id: I3baab9d1b419407d964a80f10c6ca05e0294554f - Commit-Queue: Joey Arhar - Cr-Original-Commit-Position: refs/heads/master@{#892861} - Reviewed-by: Jana Grill - Owners-Override: Jana Grill - Commit-Queue: Roger Felipe Zanoni da Silva - Cr-Commit-Position: refs/branch-heads/4430@{#1545} - Cr-Branched-From: e5ce7dc4f7518237b3d9bb93cccca35d25216cbe-refs/heads/master@{#857950} - Reviewed-by: Allan Sandfeld Jensen + Reason for revert: Doesn't compile on its own + For details of all the backported CVE fixes since qtwebengine-5.15.2 see CVE-fixes. +See CHROMIUM_VERSION for details of chromium patches backported + diff -Naur a/.qmake.conf b/.qmake.conf --- a/.qmake.conf 2021-08-24 13:35:32.000000000 +0100 +++ b/.qmake.conf 2021-11-20 03:26:34.578137143 +0000 @@ -5,4 +5,4 @@ load(qt_build_config) CONFIG += warning_clean -MODULE_VERSION = 5.15.6 +MODULE_VERSION = 5.15.7 diff -Naur a/src/3rdparty/chromium/AUTHORS b/src/3rdparty/chromium/AUTHORS --- a/src/3rdparty/chromium/AUTHORS 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/AUTHORS 2021-11-20 03:43:06.618124013 +0000 @@ -33,6 +33,7 @@ Adenilson Cavalcanti Aditya Bhargava Adrian Belgun +Adrian Ratiu Ahmet Emir Ercin Ajay Berwal Ajay Berwal diff -Naur a/src/3rdparty/chromium/base/threading/scoped_blocking_call_internal.cc b/src/3rdparty/chromium/base/threading/scoped_blocking_call_internal.cc --- a/src/3rdparty/chromium/base/threading/scoped_blocking_call_internal.cc 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/base/threading/scoped_blocking_call_internal.cc 2021-11-20 03:35:18.840494500 +0000 @@ -56,7 +56,20 @@ IOJankMonitoringWindow::ScopedMonitoredCall::ScopedMonitoredCall() : call_start_(TimeTicks::Now()), - assigned_jank_window_(MonitorNextJankWindowIfNecessary(call_start_)) {} + assigned_jank_window_(MonitorNextJankWindowIfNecessary(call_start_)) { + if (assigned_jank_window_) { + // TimeTicks using a monotonic clock and MonitorNextJankWindowIfNecessary + // synchronizing via a lock to return |assigned_jank_window_| was initially + // believed to guarantee that |call_start_| is either equal or beyond + // |assigned_jank_window_->start_time_|. Violating this assumption can + // result in negative indexing and OOB-writes in AddJank(). + // We now know this assumption can be violated. This condition hotfixes + // the issue by discarding ScopedMonitoredCalls where it occurs. + // TODO(crbug.com/1209622): Implement a proper fix. + if (call_start_ < assigned_jank_window_->start_time_) + assigned_jank_window_.reset(); + } +} IOJankMonitoringWindow::ScopedMonitoredCall::~ScopedMonitoredCall() { if (assigned_jank_window_) { @@ -187,6 +200,11 @@ void IOJankMonitoringWindow::OnBlockingCallCompleted(TimeTicks call_start, TimeTicks call_end) { + // Confirm we never hit a case of TimeTicks going backwards on the same thread + // nor of TimeTicks rolling over the int64_t boundary (which would break + // comparison operators). + DCHECK_LE(call_start, call_end); + if (call_end - call_start < kIOJankInterval) return; @@ -210,6 +228,9 @@ void IOJankMonitoringWindow::AddJank(int local_jank_start_index, int num_janky_intervals) { + DCHECK_GE(local_jank_start_index, 0); + DCHECK_LT(local_jank_start_index, kNumIntervals); + // Increment jank counts for intervals in this window. If // |num_janky_intervals| lands beyond kNumIntervals, the additional intervals // will be reported to |next_|. diff -Naur a/src/3rdparty/chromium/base/threading/scoped_blocking_call_internal.h b/src/3rdparty/chromium/base/threading/scoped_blocking_call_internal.h --- a/src/3rdparty/chromium/base/threading/scoped_blocking_call_internal.h 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/base/threading/scoped_blocking_call_internal.h 2021-11-20 03:35:18.840494500 +0000 @@ -93,6 +93,12 @@ static constexpr TimeDelta kTimeDiscrepancyTimeout = TimeDelta::FromMicroseconds(10 * 1000 * 1000 * 1000LL); static constexpr int kNumIntervals = kMonitoringWindow / kIOJankInterval; + // kIOJankIntervals must integrally fill kMonitoringWindow + static_assert((kMonitoringWindow % kIOJankInterval).is_zero(), ""); + + // Cancelation is simple because it can only affect the current window. + static_assert(kTimeDiscrepancyTimeout < kMonitoringWindow, ""); + private: friend class base::RefCountedThreadSafe; friend void base::EnableIOJankMonitoringForProcess(IOJankReportingCallback); diff -Naur a/src/3rdparty/chromium/build/config/win/BUILD.gn b/src/3rdparty/chromium/build/config/win/BUILD.gn --- a/src/3rdparty/chromium/build/config/win/BUILD.gn 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/build/config/win/BUILD.gn 2021-11-20 03:41:21.545790330 +0000 @@ -42,6 +42,8 @@ # and with this switch, clang emits it like this: # foo/bar.cc:12:34: error: something went wrong use_clang_diagnostics_format = false + + qt_uses_static_runtime = false } # This is included by reference in the //build/config/compiler config that @@ -469,7 +471,12 @@ # Component mode: dynamic CRT. Since the library is shared, it requires # exceptions or will give errors about things not matching, so keep # exceptions on. - configs = [ ":dynamic_crt" ] + if (qt_uses_static_runtime) { + # we always do is_shared, however qt can link final lib as static, with static runtime + configs = [ ":static_crt" ] + } else { + configs = [ ":dynamic_crt" ] + } } else { if (current_os == "winuwp") { # https://blogs.msdn.microsoft.com/vcblog/2014/06/10/the-great-c-runtime-crt-refactoring/ diff -Naur a/src/3rdparty/chromium/chrome/browser/ui/webui/discards/discards_ui.h b/src/3rdparty/chromium/chrome/browser/ui/webui/discards/discards_ui.h --- a/src/3rdparty/chromium/chrome/browser/ui/webui/discards/discards_ui.h 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/chrome/browser/ui/webui/discards/discards_ui.h 2021-11-20 03:39:18.065771968 +0000 @@ -37,7 +37,6 @@ private: std::unique_ptr ui_handler_; - std::unique_ptr site_data_provider_; std::string profile_id_; WEB_UI_CONTROLLER_TYPE_DECL(); diff -Naur a/src/3rdparty/chromium/components/content_settings/core/browser/content_settings_provider.h b/src/3rdparty/chromium/components/content_settings/core/browser/content_settings_provider.h --- a/src/3rdparty/chromium/components/content_settings/core/browser/content_settings_provider.h 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/components/content_settings/core/browser/content_settings_provider.h 2021-11-20 03:37:36.802356986 +0000 @@ -23,7 +23,7 @@ class ProviderInterface { public: - virtual ~ProviderInterface() {} + virtual ~ProviderInterface() = default; // Returns a |RuleIterator| over the content setting rules stored by this // provider. If |incognito| is true, the iterator returns only the content @@ -33,6 +33,9 @@ // (including |GetRuleIterator|) for the same provider until the // |RuleIterator| is destroyed. // Returns nullptr to indicate the RuleIterator is empty. + // + // This method needs to be thread-safe and continue to work after + // |ShutdownOnUIThread| has been called. virtual std::unique_ptr GetRuleIterator( ContentSettingsType content_type, const ResourceIdentifier& resource_identifier, diff -Naur a/src/3rdparty/chromium/components/paint_preview/common/subset_font.cc b/src/3rdparty/chromium/components/paint_preview/common/subset_font.cc --- a/src/3rdparty/chromium/components/paint_preview/common/subset_font.cc 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/components/paint_preview/common/subset_font.cc 2021-11-20 03:42:30.009703768 +0000 @@ -59,6 +59,41 @@ } // namespace +template using resource = + std::unique_ptr, P>>; +using HBBlob = resource; +using HBFace = resource; +using HBSubsetInput = resource; +using HBSet = resource; + +template using void_t = void; +template +struct SkPDFHarfBuzzSubset { + // This is the HarfBuzz 3.0 interface. + // hb_subset_flags_t does not exist in 2.0. It isn't dependent on T, so inline the value of + // HB_SUBSET_FLAGS_RETAIN_GIDS until 2.0 is no longer supported. + static HBFace Make(T input, hb_face_t* face) { + // TODO: When possible, check if a font is 'tricky' with FT_IS_TRICKY. + // If it isn't known if a font is 'tricky', retain the hints. + hb_subset_input_set_flags(input, 2/*HB_SUBSET_FLAGS_RETAIN_GIDS*/); + return HBFace(hb_subset_or_fail(face, input)); + } +}; +template +struct SkPDFHarfBuzzSubset(), std::declval())), + decltype(hb_subset_input_set_drop_hints(std::declval(), std::declval())), + decltype(hb_subset(std::declval(), std::declval())) + >> +{ + // This is the HarfBuzz 2.0 (non-public) interface, used if it exists. + // This code should be removed as soon as all users are migrated to the newer API. + static HBFace Make(T input, hb_face_t* face) { + hb_subset_input_set_retain_gids(input, true); + return HBFace(hb_subset(face, input)); + } +}; + // Implementation based on SkPDFSubsetFont() using harfbuzz. sk_sp SubsetFont(SkTypeface* typeface, const GlyphUsage& usage) { int ttc_index = 0; @@ -71,10 +106,12 @@ hb_set_t* glyphs = hb_subset_input_glyph_set(input.get()); // Owned by |input|. usage.ForEach(base::BindRepeating(&AddGlyphs, base::Unretained(glyphs))); - hb_subset_input_set_retain_gids(input.get(), true); + HBFace subset = SkPDFHarfBuzzSubset::Make(input.get(), face.get()); + if (!subset) { + return nullptr; + } - HbScoped subset_face(hb_subset(face.get(), input.get())); - HbScoped subset_blob(hb_face_reference_blob(subset_face.get())); + HbScoped subset_blob(hb_face_reference_blob(subset.get())); if (!subset_blob) return nullptr; diff -Naur a/src/3rdparty/chromium/components/performance_manager/persistence/site_data/site_data_cache_impl.cc b/src/3rdparty/chromium/components/performance_manager/persistence/site_data/site_data_cache_impl.cc --- a/src/3rdparty/chromium/components/performance_manager/persistence/site_data/site_data_cache_impl.cc 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/components/performance_manager/persistence/site_data/site_data_cache_impl.cc 2021-11-20 03:39:28.865600459 +0000 @@ -122,8 +122,8 @@ return iter->second; // If not create a new one and add it to the map. - internal::SiteDataImpl* site_data = - new internal::SiteDataImpl(origin, this, data_store_.get()); + internal::SiteDataImpl* site_data = new internal::SiteDataImpl( + origin, weak_factory_.GetWeakPtr(), data_store_.get()); // internal::SiteDataImpl is a ref-counted object, it's safe to store a raw // pointer to it here as this class will get notified when it's about to be diff -Naur a/src/3rdparty/chromium/components/performance_manager/persistence/site_data/site_data_cache_impl.h b/src/3rdparty/chromium/components/performance_manager/persistence/site_data/site_data_cache_impl.h --- a/src/3rdparty/chromium/components/performance_manager/persistence/site_data/site_data_cache_impl.h 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/components/performance_manager/persistence/site_data/site_data_cache_impl.h 2021-11-20 03:39:28.865600459 +0000 @@ -15,6 +15,7 @@ #include "base/files/file_path.h" #include "base/gtest_prod_util.h" #include "base/macros.h" +#include "base/memory/weak_ptr.h" #include "base/scoped_observer.h" #include "base/sequence_checker.h" #include "components/performance_manager/persistence/site_data/site_data_cache.h" @@ -100,6 +101,8 @@ SEQUENCE_CHECKER(sequence_checker_); + base::WeakPtrFactory weak_factory_{this}; + DISALLOW_COPY_AND_ASSIGN(SiteDataCacheImpl); }; diff -Naur a/src/3rdparty/chromium/components/performance_manager/persistence/site_data/site_data_impl.cc b/src/3rdparty/chromium/components/performance_manager/persistence/site_data/site_data_impl.cc --- a/src/3rdparty/chromium/components/performance_manager/persistence/site_data/site_data_impl.cc 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/components/performance_manager/persistence/site_data/site_data_impl.cc 2021-11-20 03:39:28.865600459 +0000 @@ -159,7 +159,7 @@ } SiteDataImpl::SiteDataImpl(const url::Origin& origin, - OnDestroyDelegate* delegate, + base::WeakPtr delegate, SiteDataStore* data_store) : load_duration_(kSampleWeightFactor), cpu_usage_estimate_(kSampleWeightFactor), @@ -188,13 +188,19 @@ DCHECK(!IsLoaded()); DCHECK_EQ(0U, loaded_tabs_in_background_count_); - DCHECK(delegate_); - delegate_->OnSiteDataImplDestroyed(this); - - // TODO(sebmarchand): Some data might be lost here if the read operation has - // not completed, add some metrics to measure if this is really an issue. - if (is_dirty_ && fully_initialized_) - data_store_->WriteSiteDataIntoStore(origin_, FlushStateToProto()); + // Make sure not to dispatch a notification to a deleted delegate, and gate + // the DB write on it too, as the delegate and the data store have the + // same lifetime. + // TODO(https://crbug.com/1231933): Fix this properly and restore the end of + // life write here. + if (delegate_) { + delegate_->OnSiteDataImplDestroyed(this); + + // TODO(sebmarchand): Some data might be lost here if the read operation has + // not completed, add some metrics to measure if this is really an issue. + if (is_dirty_ && fully_initialized_) + data_store_->WriteSiteDataIntoStore(origin_, FlushStateToProto()); + } } base::TimeDelta SiteDataImpl::FeatureObservationDuration( diff -Naur a/src/3rdparty/chromium/components/performance_manager/persistence/site_data/site_data_impl.h b/src/3rdparty/chromium/components/performance_manager/persistence/site_data/site_data_impl.h --- a/src/3rdparty/chromium/components/performance_manager/persistence/site_data/site_data_impl.h 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/components/performance_manager/persistence/site_data/site_data_impl.h 2021-11-20 03:39:28.865600459 +0000 @@ -159,7 +159,7 @@ friend class performance_manager::MockDataCache; SiteDataImpl(const url::Origin& origin, - OnDestroyDelegate* delegate, + base::WeakPtr delegate, SiteDataStore* data_store); virtual ~SiteDataImpl(); @@ -263,7 +263,13 @@ // The delegate that should get notified when this object is about to get // destroyed, it should outlive this object. - OnDestroyDelegate* const delegate_; + // The use of WeakPtr here is a temporary, minimally invasive fix for the UAF + // reported in https://crbug.com/1231933. By using a WeakPtr, the call-out + // is avoided in the case where the OnDestroyDelegate has been deleted before + // all SiteDataImpls have been released. + // The proper fix for this is going to be more invasive and less suitable + // for merging, should it come to that. + base::WeakPtr const delegate_; // Indicates if this object has been fully initialized, either because the // read operation from the database has completed or because it has been diff -Naur a/src/3rdparty/chromium/components/permissions/permission_request_manager.cc b/src/3rdparty/chromium/components/permissions/permission_request_manager.cc --- a/src/3rdparty/chromium/components/permissions/permission_request_manager.cc 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/components/permissions/permission_request_manager.cc 2021-11-20 03:36:49.874084070 +0000 @@ -486,12 +486,13 @@ } void PermissionRequestManager::ShowBubble() { - // There is a race condition where the request might have been removed already - // so double-checking that there is a request in progress (crbug.com/1041222). - if (!IsRequestInProgress()) + // There is a race condition where the request might have been removed + // already so double-checking that there is a request in progress. + // + // There is no need to show a new bubble if the previous one still exists. + if (!IsRequestInProgress() || view_) return; - DCHECK(!view_); DCHECK(web_contents()->IsDocumentOnLoadCompletedInMainFrame()); DCHECK(current_request_ui_to_use_); diff -Naur a/src/3rdparty/chromium/components/renderer_context_menu/render_view_context_menu_base.cc b/src/3rdparty/chromium/components/renderer_context_menu/render_view_context_menu_base.cc --- a/src/3rdparty/chromium/components/renderer_context_menu/render_view_context_menu_base.cc 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/components/renderer_context_menu/render_view_context_menu_base.cc 2021-11-20 03:38:56.130118035 +0000 @@ -432,10 +432,13 @@ ui::PageTransition transition, const std::string& extra_headers, bool started_from_context_menu) { + // Do not send the referrer url to OTR windows. We still need the + // |referring_url| to populate the |initiator_origin| below for browser UI. + GURL referrer_url; + if (disposition != WindowOpenDisposition::OFF_THE_RECORD) + referrer_url = referring_url.GetAsReferrer(); content::Referrer referrer = content::Referrer::SanitizeForRequest( - url, - content::Referrer(referring_url.GetAsReferrer(), - params_.referrer_policy)); + url, content::Referrer(referrer_url, params_.referrer_policy)); if (params_.link_url == url && disposition != WindowOpenDisposition::OFF_THE_RECORD) diff -Naur a/src/3rdparty/chromium/content/browser/background_fetch/background_fetch_job_controller.cc b/src/3rdparty/chromium/content/browser/background_fetch/background_fetch_job_controller.cc --- a/src/3rdparty/chromium/content/browser/background_fetch/background_fetch_job_controller.cc 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/content/browser/background_fetch/background_fetch_job_controller.cc 2021-11-20 03:38:45.242289807 +0000 @@ -173,6 +173,8 @@ // TODO(crbug.com/884672): Stop the fetch if the cross origin filter fails. BackgroundFetchCrossOriginFilter filter(registration_id_.origin(), *request); request->set_can_populate_body(filter.CanPopulateBody()); + if (!request->can_populate_body()) + has_failed_cors_request_ = true; } void BackgroundFetchJobController::DidUpdateRequest(const std::string& guid, @@ -253,7 +255,14 @@ void BackgroundFetchJobController::AbortFromDelegate( BackgroundFetchFailureReason failure_reason) { - failure_reason_ = failure_reason; + if (failure_reason == BackgroundFetchFailureReason::DOWNLOAD_TOTAL_EXCEEDED && + has_failed_cors_request_) { + // Don't expose that the download total has been exceeded. Use a less + // specific error. + failure_reason_ = BackgroundFetchFailureReason::FETCH_ERROR; + } else { + failure_reason_ = failure_reason; + } Finish(failure_reason_, base::DoNothing()); } diff -Naur a/src/3rdparty/chromium/content/browser/background_fetch/background_fetch_job_controller.h b/src/3rdparty/chromium/content/browser/background_fetch/background_fetch_job_controller.h --- a/src/3rdparty/chromium/content/browser/background_fetch/background_fetch_job_controller.h 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/content/browser/background_fetch/background_fetch_job_controller.h 2021-11-20 03:38:45.242289807 +0000 @@ -210,6 +210,10 @@ blink::mojom::BackgroundFetchFailureReason failure_reason_ = blink::mojom::BackgroundFetchFailureReason::NONE; + // Whether one of the requests handled by the controller failed + // the CORS checks and should not have its response exposed. + bool has_failed_cors_request_ = false; + // Custom callback that runs after the controller is finished. FinishedCallback finished_callback_; diff -Naur a/src/3rdparty/chromium/content/browser/background_fetch/storage/get_developer_ids_task.cc b/src/3rdparty/chromium/content/browser/background_fetch/storage/get_developer_ids_task.cc --- a/src/3rdparty/chromium/content/browser/background_fetch/storage/get_developer_ids_task.cc 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/content/browser/background_fetch/storage/get_developer_ids_task.cc 2021-11-20 03:38:31.033513971 +0000 @@ -9,6 +9,7 @@ #include "base/bind.h" #include "content/browser/background_fetch/storage/database_helpers.h" #include "content/browser/service_worker/service_worker_context_wrapper.h" +#include "content/browser/service_worker/service_worker_registration.h" namespace content { namespace background_fetch { @@ -26,6 +27,28 @@ GetDeveloperIdsTask::~GetDeveloperIdsTask() = default; void GetDeveloperIdsTask::Start() { + service_worker_context()->FindReadyRegistrationForIdOnly( + service_worker_registration_id_, + base::BindOnce(&GetDeveloperIdsTask::DidGetServiceWorkerRegistration, + weak_factory_.GetWeakPtr())); +} + +void GetDeveloperIdsTask::DidGetServiceWorkerRegistration( + blink::ServiceWorkerStatusCode status, + scoped_refptr registration) { + if (ToDatabaseStatus(status) != DatabaseStatus::kOk || !registration) { + SetStorageErrorAndFinish( + BackgroundFetchStorageError::kServiceWorkerStorageError); + return; + } + + // TODO(crbug.com/1199077): Move this check into the SW context. + if (registration->origin() != origin_) { + SetStorageErrorAndFinish( + BackgroundFetchStorageError::kServiceWorkerStorageError); + return; + } + service_worker_context()->GetRegistrationUserKeysAndDataByKeyPrefix( service_worker_registration_id_, {kActiveRegistrationUniqueIdKeyPrefix}, base::BindOnce(&GetDeveloperIdsTask::DidGetUniqueIds, diff -Naur a/src/3rdparty/chromium/content/browser/background_fetch/storage/get_developer_ids_task.h b/src/3rdparty/chromium/content/browser/background_fetch/storage/get_developer_ids_task.h --- a/src/3rdparty/chromium/content/browser/background_fetch/storage/get_developer_ids_task.h 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/content/browser/background_fetch/storage/get_developer_ids_task.h 2021-11-20 03:38:31.033513971 +0000 @@ -16,6 +16,9 @@ #include "url/origin.h" namespace content { + +class ServiceWorkerRegistration; + namespace background_fetch { // Gets the developer ids for all active registrations - registrations that have @@ -34,6 +37,9 @@ void Start() override; private: + void DidGetServiceWorkerRegistration( + blink::ServiceWorkerStatusCode status, + scoped_refptr registration); void DidGetUniqueIds( blink::ServiceWorkerStatusCode status, const base::flat_map& data_map); diff -Naur a/src/3rdparty/chromium/content/browser/bad_message.h b/src/3rdparty/chromium/content/browser/bad_message.h --- a/src/3rdparty/chromium/content/browser/bad_message.h 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/content/browser/bad_message.h 2021-11-20 03:39:05.825965069 +0000 @@ -261,6 +261,7 @@ RWH_CLOSE_PORTAL = 233, MSDH_INVALID_STREAM_TYPE = 234, WCI_INVALID_DOWNLOAD_IMAGE_RESULT = 243, + RFH_CHILD_FRAME_UNEXPECTED_OWNER_ELEMENT_TYPE = 244, // Please add new elements here. The naming convention is abbreviated class // name (e.g. RenderFrameHost becomes RFH) plus a unique description of the diff -Naur a/src/3rdparty/chromium/content/browser/content_index/content_index_database.cc b/src/3rdparty/chromium/content/browser/content_index/content_index_database.cc --- a/src/3rdparty/chromium/content/browser/content_index/content_index_database.cc 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/content/browser/content_index/content_index_database.cc 2021-11-20 03:37:01.641901745 +0000 @@ -183,6 +183,11 @@ return; } + if (!service_worker_registration->origin().IsSameOriginWith(origin)) { + std::move(callback).Run(blink::mojom::ContentIndexError::STORAGE_ERROR); + return; + } + auto serialized_icons = std::make_unique(); proto::SerializedIcons* serialized_icons_ptr = serialized_icons.get(); @@ -284,6 +289,15 @@ blink::mojom::ContentIndexService::DeleteCallback callback) { DCHECK_CURRENTLY_ON(ServiceWorkerContext::GetCoreThreadId()); + scoped_refptr service_worker_registration = + service_worker_context_->GetLiveRegistration( + service_worker_registration_id); + if (!service_worker_registration || + !service_worker_registration->origin().IsSameOriginWith(origin)) { + std::move(callback).Run(blink::mojom::ContentIndexError::STORAGE_ERROR); + return; + } + service_worker_context_->ClearRegistrationUserData( service_worker_registration_id, {EntryKey(entry_id), IconsKey(entry_id)}, base::BindOnce(&ContentIndexDatabase::DidDeleteEntry, @@ -316,6 +330,7 @@ void ContentIndexDatabase::GetDescriptions( int64_t service_worker_registration_id, + const url::Origin& origin, blink::mojom::ContentIndexService::GetDescriptionsCallback callback) { DCHECK_CURRENTLY_ON(BrowserThread::UI); @@ -333,15 +348,26 @@ FROM_HERE, ServiceWorkerContext::GetCoreThreadId(), base::BindOnce(&ContentIndexDatabase::GetDescriptionsOnCoreThread, weak_ptr_factory_core_.GetWeakPtr(), - service_worker_registration_id, + service_worker_registration_id, origin, std::move(wrapped_callback))); } void ContentIndexDatabase::GetDescriptionsOnCoreThread( int64_t service_worker_registration_id, + const url::Origin& origin, blink::mojom::ContentIndexService::GetDescriptionsCallback callback) { DCHECK_CURRENTLY_ON(ServiceWorkerContext::GetCoreThreadId()); + scoped_refptr service_worker_registration = + service_worker_context_->GetLiveRegistration( + service_worker_registration_id); + if (!service_worker_registration || + !service_worker_registration->origin().IsSameOriginWith(origin)) { + std::move(callback).Run(blink::mojom::ContentIndexError::STORAGE_ERROR, + /* descriptions= */ {}); + return; + } + service_worker_context_->GetRegistrationUserDataByKeyPrefix( service_worker_registration_id, kEntryPrefix, base::BindOnce(&ContentIndexDatabase::DidGetDescriptions, diff -Naur a/src/3rdparty/chromium/content/browser/content_index/content_index_database.h b/src/3rdparty/chromium/content/browser/content_index/content_index_database.h --- a/src/3rdparty/chromium/content/browser/content_index/content_index_database.h 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/content/browser/content_index/content_index_database.h 2021-11-20 03:37:01.642901729 +0000 @@ -51,6 +51,7 @@ void GetDescriptions( int64_t service_worker_registration_id, + const url::Origin& origin, blink::mojom::ContentIndexService::GetDescriptionsCallback callback); // Gets the icon for |description_id| and invokes |callback| on the UI @@ -95,6 +96,7 @@ blink::mojom::ContentIndexService::DeleteCallback callback); void GetDescriptionsOnCoreThread( int64_t service_worker_registration_id, + const url::Origin& origin, blink::mojom::ContentIndexService::GetDescriptionsCallback callback); void GetIconsOnCoreThread(int64_t service_worker_registration_id, const std::string& description_id, diff -Naur a/src/3rdparty/chromium/content/browser/content_index/content_index_service_impl.cc b/src/3rdparty/chromium/content/browser/content_index/content_index_service_impl.cc --- a/src/3rdparty/chromium/content/browser/content_index/content_index_service_impl.cc 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/content/browser/content_index/content_index_service_impl.cc 2021-11-20 03:37:01.642901729 +0000 @@ -118,7 +118,7 @@ DCHECK_CURRENTLY_ON(BrowserThread::UI); content_index_context_->database().GetDescriptions( - service_worker_registration_id, std::move(callback)); + service_worker_registration_id, origin_, std::move(callback)); } } // namespace content diff -Naur a/src/3rdparty/chromium/content/browser/devtools/devtools_http_handler.cc b/src/3rdparty/chromium/content/browser/devtools/devtools_http_handler.cc --- a/src/3rdparty/chromium/content/browser/devtools/devtools_http_handler.cc 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/content/browser/devtools/devtools_http_handler.cc 2021-11-20 03:34:47.738976372 +0000 @@ -685,8 +685,14 @@ } void DevToolsHttpHandler::OnDiscoveryPageRequest(int connection_id) { - std::string response = delegate_->GetDiscoveryPageHTML(); - Send200(connection_id, response, "text/html; charset=UTF-8"); + net::HttpServerResponseInfo response(net::HTTP_OK); + response.AddHeader("X-Frame-Options", "DENY"); + response.SetBody(delegate_->GetDiscoveryPageHTML(), + "text/html; charset=UTF-8"); + thread_->task_runner()->PostTask( + FROM_HERE, base::BindOnce(&ServerWrapper::SendResponse, + base::Unretained(server_wrapper_.get()), + connection_id, response)); } void DevToolsHttpHandler::OnFrontendResourceRequest( diff -Naur a/src/3rdparty/chromium/content/browser/indexed_db/database_impl.cc b/src/3rdparty/chromium/content/browser/indexed_db/database_impl.cc --- a/src/3rdparty/chromium/content/browser/indexed_db/database_impl.cc 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/content/browser/indexed_db/database_impl.cc 2021-11-20 03:37:12.802728824 +0000 @@ -86,6 +86,13 @@ return; } + if (!transaction->IsAcceptingRequests()) { + mojo::ReportBadMessage( + "RenameObjectStore was called after committing or aborting the " + "transaction"); + return; + } + transaction->ScheduleTask( blink::mojom::IDBTaskType::Preemptive, BindWeakOperation(&IndexedDBDatabase::RenameObjectStoreOperation, @@ -203,6 +210,12 @@ return; } + if (!transaction->IsAcceptingRequests()) { + mojo::ReportBadMessage( + "Get was called after committing or aborting the transaction"); + return; + } + blink::mojom::IDBDatabase::GetCallback aborting_callback = CreateCallbackAbortOnDestruct( @@ -253,6 +266,12 @@ return; } + if (!transaction->IsAcceptingRequests()) { + mojo::ReportBadMessage( + "GetAll was called after committing or aborting the transaction"); + return; + } + // Hypothetically, this could pass the receiver to the callback immediately. // However, for result ordering issues, we need to PostTask to mimic // all of the other operations. @@ -292,6 +311,12 @@ return; } + if (!transaction->IsAcceptingRequests()) { + mojo::ReportBadMessage( + "SetIndexKeys was called after committing or aborting the transaction"); + return; + } + transaction->ScheduleTask( blink::mojom::IDBTaskType::Preemptive, BindWeakOperation(&IndexedDBDatabase::SetIndexKeysOperation, @@ -318,6 +343,13 @@ return; } + if (!transaction->IsAcceptingRequests()) { + mojo::ReportBadMessage( + "SetIndexesReady was called after committing or aborting the " + "transaction"); + return; + } + transaction->ScheduleTask( blink::mojom::IDBTaskType::Preemptive, BindWeakOperation(&IndexedDBDatabase::SetIndexesReadyOperation, @@ -355,6 +387,12 @@ return; } + if (!transaction->IsAcceptingRequests()) { + mojo::ReportBadMessage( + "OpenCursor was called after committing or aborting the transaction"); + return; + } + blink::mojom::IDBDatabase::OpenCursorCallback aborting_callback = CreateCallbackAbortOnDestruct< blink::mojom::IDBDatabase::OpenCursorCallback, @@ -404,6 +442,12 @@ if (!transaction) return; + if (!transaction->IsAcceptingRequests()) { + mojo::ReportBadMessage( + "Count was called after committing or aborting the transaction"); + return; + } + transaction->ScheduleTask(BindWeakOperation( &IndexedDBDatabase::CountOperation, connection_->database()->AsWeakPtr(), object_store_id, index_id, @@ -429,6 +473,12 @@ if (!transaction) return; + if (!transaction->IsAcceptingRequests()) { + mojo::ReportBadMessage( + "DeleteRange was called after committing or aborting the transaction"); + return; + } + transaction->ScheduleTask(BindWeakOperation( &IndexedDBDatabase::DeleteRangeOperation, connection_->database()->AsWeakPtr(), object_store_id, @@ -452,6 +502,13 @@ if (!transaction) return; + if (!transaction->IsAcceptingRequests()) { + mojo::ReportBadMessage( + "GetKeyGeneratorCurrentNumber was called after committing or aborting " + "the transaction"); + return; + } + transaction->ScheduleTask(BindWeakOperation( &IndexedDBDatabase::GetKeyGeneratorCurrentNumberOperation, connection_->database()->AsWeakPtr(), object_store_id, @@ -475,6 +532,12 @@ if (!transaction) return; + if (!transaction->IsAcceptingRequests()) { + mojo::ReportBadMessage( + "Clear was called after committing or aborting the transaction"); + return; + } + transaction->ScheduleTask(BindWeakOperation( &IndexedDBDatabase::ClearOperation, connection_->database()->AsWeakPtr(), object_store_id, std::move(callbacks))); @@ -502,6 +565,12 @@ return; } + if (!transaction->IsAcceptingRequests()) { + mojo::ReportBadMessage( + "CreateIndex was called after committing or aborting the transaction"); + return; + } + transaction->ScheduleTask( blink::mojom::IDBTaskType::Preemptive, BindWeakOperation(&IndexedDBDatabase::CreateIndexOperation, @@ -527,6 +596,12 @@ return; } + if (!transaction->IsAcceptingRequests()) { + mojo::ReportBadMessage( + "DeleteIndex was called after committing or aborting the transaction"); + return; + } + transaction->ScheduleTask(BindWeakOperation( &IndexedDBDatabase::DeleteIndexOperation, connection_->database()->AsWeakPtr(), object_store_id, index_id)); @@ -551,6 +626,12 @@ return; } + if (!transaction->IsAcceptingRequests()) { + mojo::ReportBadMessage( + "RenameIndex was called after committing or aborting the transaction"); + return; + } + transaction->ScheduleTask( BindWeakOperation(&IndexedDBDatabase::RenameIndexOperation, connection_->database()->AsWeakPtr(), object_store_id, diff -Naur a/src/3rdparty/chromium/content/browser/indexed_db/indexed_db_transaction.h b/src/3rdparty/chromium/content/browser/indexed_db/indexed_db_transaction.h --- a/src/3rdparty/chromium/content/browser/indexed_db/indexed_db_transaction.h 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/content/browser/indexed_db/indexed_db_transaction.h 2021-11-20 03:37:12.802728824 +0000 @@ -69,6 +69,14 @@ // Signals the transaction for commit. void SetCommitFlag(); + // Returns false if the transaction has been signalled to commit, is in the + // process of committing, or finished committing or was aborted. Essentially + // when this returns false no tasks should be scheduled that try to modify + // the transaction. + bool IsAcceptingRequests() { + return !is_commit_pending_ && state_ != COMMITTING && state_ != FINISHED; + } + // This transaction is ultimately backed by a LevelDBScope. Aborting a // transaction rolls back the LevelDBScopes, which (if LevelDBScopes is in // single-sequence mode) can fail. This returns the result of that rollback, diff -Naur a/src/3rdparty/chromium/content/browser/indexed_db/transaction_impl.cc b/src/3rdparty/chromium/content/browser/indexed_db/transaction_impl.cc --- a/src/3rdparty/chromium/content/browser/indexed_db/transaction_impl.cc 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/content/browser/indexed_db/transaction_impl.cc 2021-11-20 03:37:24.345549985 +0000 @@ -57,6 +57,13 @@ return; } + if (!transaction_->IsAcceptingRequests()) { + mojo::ReportBadMessage( + "CreateObjectStore was called after committing or aborting the " + "transaction"); + return; + } + IndexedDBConnection* connection = transaction_->connection(); if (!connection->IsConnected()) return; @@ -79,6 +86,13 @@ return; } + if (!transaction_->IsAcceptingRequests()) { + mojo::ReportBadMessage( + "DeleteObjectStore was called after committing or aborting the " + "transaction"); + return; + } + IndexedDBConnection* connection = transaction_->connection(); if (!connection->IsConnected()) return; @@ -114,6 +128,12 @@ return; } + if (!transaction_->IsAcceptingRequests()) { + mojo::ReportBadMessage( + "Put was called after committing or aborting the transaction"); + return; + } + IndexedDBConnection* connection = transaction_->connection(); if (!connection->IsConnected()) { IndexedDBDatabaseError error(blink::mojom::IDBException::kUnknownError, @@ -174,6 +194,12 @@ return; } + if (!transaction_->IsAcceptingRequests()) { + mojo::ReportBadMessage( + "PutAll was called after committing or aborting the transaction"); + return; + } + std::vector> external_objects_per_put( puts.size()); for (size_t i = 0; i < puts.size(); i++) { @@ -275,6 +301,12 @@ if (!transaction_) return; + if (!transaction_->IsAcceptingRequests()) { + // This really shouldn't be happening, but seems to be happening anyway. So + // rather than killing the renderer, simply ignore the request. + return; + } + IndexedDBConnection* connection = transaction_->connection(); if (!connection->IsConnected()) return; diff -Naur a/src/3rdparty/chromium/content/browser/renderer_host/render_frame_host_impl.cc b/src/3rdparty/chromium/content/browser/renderer_host/render_frame_host_impl.cc --- a/src/3rdparty/chromium/content/browser/renderer_host/render_frame_host_impl.cc 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/content/browser/renderer_host/render_frame_host_impl.cc 2021-11-20 03:39:05.826965053 +0000 @@ -2465,6 +2465,14 @@ bad_message::ReceivedBadMessage( GetProcess(), bad_message::RFH_CHILD_FRAME_NEEDS_OWNER_ELEMENT_TYPE); } + if (owner_type == blink::mojom::FrameOwnerElementType::kPortal) { + // Portals are not created through this child + // frame code path. + bad_message::ReceivedBadMessage( + GetProcess(), + bad_message::RFH_CHILD_FRAME_UNEXPECTED_OWNER_ELEMENT_TYPE); + return; + } // The RenderFrame corresponding to this host sent an IPC message to create a // child, but by the time we get here, it's possible for the RenderFrameHost diff -Naur a/src/3rdparty/chromium/media/capture/video/mac/video_capture_device_avfoundation_mac.h b/src/3rdparty/chromium/media/capture/video/mac/video_capture_device_avfoundation_mac.h --- a/src/3rdparty/chromium/media/capture/video/mac/video_capture_device_avfoundation_mac.h 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/media/capture/video/mac/video_capture_device_avfoundation_mac.h 2021-11-20 03:35:08.768650549 +0000 @@ -45,6 +45,8 @@ // Protects concurrent setting and using |frameReceiver_|. Note that the // GUARDED_BY decoration below does not have any effect. base::Lock _lock; + // Used to avoid UAF in -captureOutput. + base::Lock _destructionLock; media::VideoCaptureDeviceAVFoundationFrameReceiver* _frameReceiver GUARDED_BY(_lock); // weak. diff -Naur a/src/3rdparty/chromium/media/capture/video/mac/video_capture_device_avfoundation_mac.mm b/src/3rdparty/chromium/media/capture/video/mac/video_capture_device_avfoundation_mac.mm --- a/src/3rdparty/chromium/media/capture/video/mac/video_capture_device_avfoundation_mac.mm 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/media/capture/video/mac/video_capture_device_avfoundation_mac.mm 2021-11-20 03:35:08.768650549 +0000 @@ -180,10 +180,24 @@ } - (void)dealloc { - [self stopStillImageOutput]; - [self stopCapture]; - _weakPtrFactoryForTakePhoto = nullptr; - _mainThreadTaskRunner = nullptr; + { + // To avoid races with concurrent callbacks, grab the lock before stopping + // capture and clearing all the variables. + base::AutoLock lock(_lock); + [self stopStillImageOutput]; + [self stopCapture]; + _frameReceiver = nullptr; + _weakPtrFactoryForTakePhoto = nullptr; + _mainThreadTaskRunner = nullptr; + } + { + // Ensures -captureOutput has finished before we continue the destruction + // steps. If -captureOutput grabbed the destruction lock before us this + // prevents UAF. If -captureOutput grabbed the destruction lock after us + // it will exit early because |_frameReceiver| is already null at this + // point. + base::AutoLock destructionLock(_destructionLock); + } [super dealloc]; } @@ -681,7 +695,9 @@ VLOG(3) << __func__; // Concurrent calls into |_frameReceiver| are not supported, so take |_lock| - // before any of the subsequent paths. + // before any of the subsequent paths. The |_destructionLock| must be grabbed + // first to avoid races with -dealloc. + base::AutoLock destructionLock(_destructionLock); base::AutoLock lock(_lock); if (!_frameReceiver) return; diff -Naur a/src/3rdparty/chromium/mojo/public/cpp/bindings/lib/array_internal.h b/src/3rdparty/chromium/mojo/public/cpp/bindings/lib/array_internal.h --- a/src/3rdparty/chromium/mojo/public/cpp/bindings/lib/array_internal.h 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/mojo/public/cpp/bindings/lib/array_internal.h 2021-11-20 03:40:16.425834663 +0000 @@ -284,8 +284,7 @@ BufferWriter() = default; void Allocate(size_t num_elements, Buffer* buffer) { - if (num_elements > Traits::kMaxNumElements) - return; + CHECK_LE(num_elements, Traits::kMaxNumElements); uint32_t num_bytes = Traits::GetStorageSize(static_cast(num_elements)); diff -Naur a/src/3rdparty/chromium/sandbox/linux/BUILD.gn b/src/3rdparty/chromium/sandbox/linux/BUILD.gn --- a/src/3rdparty/chromium/sandbox/linux/BUILD.gn 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/sandbox/linux/BUILD.gn 2021-11-20 03:37:57.059043138 +0000 @@ -436,6 +436,7 @@ "system_headers/linux_prctl.h", "system_headers/linux_seccomp.h", "system_headers/linux_signal.h", + "system_headers/linux_stat.h", "system_headers/linux_syscalls.h", "system_headers/linux_time.h", "system_headers/linux_ucontext.h", diff -Naur a/src/3rdparty/chromium/sandbox/linux/integration_tests/seccomp_broker_process_unittest.cc b/src/3rdparty/chromium/sandbox/linux/integration_tests/seccomp_broker_process_unittest.cc --- a/src/3rdparty/chromium/sandbox/linux/integration_tests/seccomp_broker_process_unittest.cc 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/sandbox/linux/integration_tests/seccomp_broker_process_unittest.cc 2021-11-20 03:37:57.059043138 +0000 @@ -34,6 +34,7 @@ #include "sandbox/linux/syscall_broker/broker_file_permission.h" #include "sandbox/linux/syscall_broker/broker_process.h" #include "sandbox/linux/system_headers/linux_seccomp.h" +#include "sandbox/linux/system_headers/linux_stat.h" #include "sandbox/linux/system_headers/linux_syscalls.h" #include "sandbox/linux/tests/scoped_temporary_file.h" #include "sandbox/linux/tests/test_utils.h" @@ -202,6 +203,26 @@ // not accept this as a valid error number. E.g. bionic accepts up to 255, glibc // and musl up to 4096. const int kFakeErrnoSentinel = 254; + +void ConvertKernelStatToLibcStat(default_stat_struct& in_stat, + struct stat& out_stat) { + out_stat.st_dev = in_stat.st_dev; + out_stat.st_ino = in_stat.st_ino; + out_stat.st_mode = in_stat.st_mode; + out_stat.st_nlink = in_stat.st_nlink; + out_stat.st_uid = in_stat.st_uid; + out_stat.st_gid = in_stat.st_gid; + out_stat.st_rdev = in_stat.st_rdev; + out_stat.st_size = in_stat.st_size; + out_stat.st_blksize = in_stat.st_blksize; + out_stat.st_blocks = in_stat.st_blocks; + out_stat.st_atim.tv_sec = in_stat.st_atime_; + out_stat.st_atim.tv_nsec = in_stat.st_atime_nsec_; + out_stat.st_mtim.tv_sec = in_stat.st_mtime_; + out_stat.st_mtim.tv_nsec = in_stat.st_mtime_nsec_; + out_stat.st_ctim.tv_sec = in_stat.st_ctime_; + out_stat.st_ctim.tv_nsec = in_stat.st_ctime_nsec_; +} } // namespace // There are a variety of ways to make syscalls in a sandboxed process. One is @@ -217,6 +238,10 @@ virtual int Open(const char* filepath, int flags) = 0; virtual int Access(const char* filepath, int mode) = 0; + // NOTE: we use struct stat instead of default_stat_struct, to make the libc + // syscaller simpler. Copying from default_stat_struct (the structure returned + // from a stat sycall) to struct stat (the structure exposed by a libc to its + // users) is simpler than going in the opposite direction. virtual int Stat(const char* filepath, bool follow_links, struct stat* statbuf) = 0; @@ -243,8 +268,12 @@ int Stat(const char* filepath, bool follow_links, struct stat* statbuf) override { - return broker_->GetBrokerClientSignalBased()->Stat(filepath, follow_links, - statbuf); + default_stat_struct buf; + int ret = broker_->GetBrokerClientSignalBased()->DefaultStatForTesting( + filepath, follow_links, &buf); + if (ret >= 0) + ConvertKernelStatToLibcStat(buf, *statbuf); + return ret; } int Rename(const char* oldpath, const char* newpath) override { @@ -300,10 +329,13 @@ int Stat(const char* filepath, bool follow_links, struct stat* statbuf) override { - int ret = follow_links ? syscall(__NR_stat, filepath, statbuf) - : syscall(__NR_lstat, filepath, statbuf); + struct kernel_stat buf; + int ret = syscall(__NR_newfstatat, AT_FDCWD, filepath, &buf, + follow_links ? 0 : AT_SYMLINK_NOFOLLOW); if (ret < 0) return -errno; + + ConvertKernelStatToLibcStat(buf, *statbuf); return ret; } diff -Naur a/src/3rdparty/chromium/sandbox/linux/seccomp-bpf-helpers/baseline_policy.cc b/src/3rdparty/chromium/sandbox/linux/seccomp-bpf-helpers/baseline_policy.cc --- a/src/3rdparty/chromium/sandbox/linux/seccomp-bpf-helpers/baseline_policy.cc 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/sandbox/linux/seccomp-bpf-helpers/baseline_policy.cc 2021-11-20 03:43:06.618124013 +0000 @@ -20,6 +20,7 @@ #include "sandbox/linux/seccomp-bpf-helpers/syscall_sets.h" #include "sandbox/linux/seccomp-bpf/sandbox_bpf.h" #include "sandbox/linux/services/syscall_wrappers.h" +#include "sandbox/linux/system_headers/linux_stat.h" #include "sandbox/linux/system_headers/linux_syscalls.h" #if !defined(SO_PEEK_OFF) @@ -157,7 +158,7 @@ return Allow(); #endif - if (sysno == __NR_clock_gettime || sysno == __NR_clock_nanosleep) { + if (SyscallSets::IsClockApi(sysno)) { return RestrictClockID(); } @@ -165,6 +166,12 @@ return RestrictCloneToThreadsAndEPERMFork(); } + // clone3 takes a pointer argument which we cannot examine, so return ENOSYS + // to force the libc to use clone. See https://crbug.com/1213452. + if (sysno == __NR_clone3) { + return Error(ENOSYS); + } + if (sysno == __NR_fcntl) return RestrictFcntlCommands(); @@ -257,6 +264,13 @@ return RestrictKillTarget(current_pid, sysno); } + // The fstatat syscalls are file system syscalls, which will be denied below + // with fs_denied_errno. However some allowed fstat syscalls are rewritten by + // libc implementations to fstatat syscalls, and we need to rewrite them back. + if (sysno == __NR_fstatat_default) { + return RewriteFstatatSIGSYS(fs_denied_errno); + } + if (SyscallSets::IsFileSystem(sysno) || SyscallSets::IsCurrentDirectory(sysno)) { return Error(fs_denied_errno); diff -Naur a/src/3rdparty/chromium/sandbox/linux/seccomp-bpf-helpers/baseline_policy_unittest.cc b/src/3rdparty/chromium/sandbox/linux/seccomp-bpf-helpers/baseline_policy_unittest.cc --- a/src/3rdparty/chromium/sandbox/linux/seccomp-bpf-helpers/baseline_policy_unittest.cc 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/sandbox/linux/seccomp-bpf-helpers/baseline_policy_unittest.cc 2021-11-20 03:38:08.162871101 +0000 @@ -50,7 +50,8 @@ namespace { -// This also tests that read(), write() and fstat() are allowed. +// This also tests that read(), write(), fstat(), and fstatat(.., "", .., +// AT_EMPTY_PATH) are allowed. void TestPipeOrSocketPair(base::ScopedFD read_end, base::ScopedFD write_end) { BPF_ASSERT_LE(0, read_end.get()); BPF_ASSERT_LE(0, write_end.get()); @@ -59,6 +60,20 @@ BPF_ASSERT_EQ(0, sys_ret); BPF_ASSERT(S_ISFIFO(stat_buf.st_mode) || S_ISSOCK(stat_buf.st_mode)); + sys_ret = fstatat(read_end.get(), "", &stat_buf, AT_EMPTY_PATH); + BPF_ASSERT_EQ(0, sys_ret); + BPF_ASSERT(S_ISFIFO(stat_buf.st_mode) || S_ISSOCK(stat_buf.st_mode)); + + // Make sure fstatat with anything other than an empty string is denied. + sys_ret = fstatat(read_end.get(), "/", &stat_buf, AT_EMPTY_PATH); + BPF_ASSERT_EQ(sys_ret, -1); + BPF_ASSERT_EQ(EPERM, errno); + + // Make sure fstatat without AT_EMPTY_PATH is denied. + sys_ret = fstatat(read_end.get(), "", &stat_buf, 0); + BPF_ASSERT_EQ(sys_ret, -1); + BPF_ASSERT_EQ(EPERM, errno); + const ssize_t kTestTransferSize = 4; static const char kTestString[kTestTransferSize] = {'T', 'E', 'S', 'T'}; ssize_t transfered = 0; diff -Naur a/src/3rdparty/chromium/sandbox/linux/seccomp-bpf-helpers/DEPS b/src/3rdparty/chromium/sandbox/linux/seccomp-bpf-helpers/DEPS --- a/src/3rdparty/chromium/sandbox/linux/seccomp-bpf-helpers/DEPS 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/sandbox/linux/seccomp-bpf-helpers/DEPS 2021-11-20 03:37:57.059043138 +0000 @@ -3,5 +3,4 @@ "+sandbox/linux/seccomp-bpf", "+sandbox/linux/services", "+sandbox/linux/system_headers", - "+third_party/lss/linux_syscall_support.h", ] diff -Naur a/src/3rdparty/chromium/sandbox/linux/seccomp-bpf-helpers/sigsys_handlers.cc b/src/3rdparty/chromium/sandbox/linux/seccomp-bpf-helpers/sigsys_handlers.cc --- a/src/3rdparty/chromium/sandbox/linux/seccomp-bpf-helpers/sigsys_handlers.cc 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/sandbox/linux/seccomp-bpf-helpers/sigsys_handlers.cc 2021-11-20 03:38:08.162871101 +0000 @@ -6,6 +6,7 @@ #include "sandbox/linux/seccomp-bpf-helpers/sigsys_handlers.h" +#include #include #include #include @@ -22,6 +23,7 @@ #include "sandbox/linux/seccomp-bpf/syscall.h" #include "sandbox/linux/services/syscall_wrappers.h" #include "sandbox/linux/system_headers/linux_seccomp.h" +#include "sandbox/linux/system_headers/linux_stat.h" #include "sandbox/linux/system_headers/linux_syscalls.h" #if defined(__mips__) @@ -355,6 +357,24 @@ return -ENOSYS; } +intptr_t SIGSYSFstatatHandler(const struct arch_seccomp_data& args, + void* fs_denied_errno) { + if (args.nr == __NR_fstatat_default) { + if (*reinterpret_cast(args.args[1]) == '\0' && + args.args[3] == static_cast(AT_EMPTY_PATH)) { + return syscall(__NR_fstat_default, static_cast(args.args[0]), + reinterpret_cast(args.args[2])); + } + return -reinterpret_cast(fs_denied_errno); + } + + CrashSIGSYS_Handler(args, fs_denied_errno); + + // Should never be reached. + RAW_CHECK(false); + return -ENOSYS; +} + bpf_dsl::ResultExpr CrashSIGSYS() { return bpf_dsl::Trap(CrashSIGSYS_Handler, NULL); } @@ -387,6 +407,11 @@ return bpf_dsl::Trap(SIGSYSSchedHandler, NULL); } +bpf_dsl::ResultExpr RewriteFstatatSIGSYS(int fs_denied_errno) { + return bpf_dsl::Trap(SIGSYSFstatatHandler, + reinterpret_cast(fs_denied_errno)); +} + void AllocateCrashKeys() { #if !defined(OS_NACL_NONSFI) if (seccomp_crash_key) diff -Naur a/src/3rdparty/chromium/sandbox/linux/seccomp-bpf-helpers/sigsys_handlers.h b/src/3rdparty/chromium/sandbox/linux/seccomp-bpf-helpers/sigsys_handlers.h --- a/src/3rdparty/chromium/sandbox/linux/seccomp-bpf-helpers/sigsys_handlers.h 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/sandbox/linux/seccomp-bpf-helpers/sigsys_handlers.h 2021-11-20 03:38:08.162871101 +0000 @@ -62,6 +62,19 @@ // sched_setparam(), sched_setscheduler() SANDBOX_EXPORT intptr_t SIGSYSSchedHandler(const arch_seccomp_data& args, void* aux); +// If the fstatat() syscall is functionally equivalent to an fstat() syscall, +// then rewrite the syscall to the equivalent fstat() syscall which can be +// adequately sandboxed. +// If the fstatat() is not functionally equivalent to an fstat() syscall, we +// fail with -fs_denied_errno. +// If the syscall is not an fstatat() at all, crash in the same way as +// CrashSIGSYS_Handler. +// This is necessary because glibc and musl have started rewriting fstat(fd, +// stat_buf) as fstatat(fd, "", stat_buf, AT_EMPTY_PATH). We rewrite the latter +// back to the former, which is actually sandboxable. +SANDBOX_EXPORT intptr_t +SIGSYSFstatatHandler(const struct arch_seccomp_data& args, + void* fs_denied_errno); // Variants of the above functions for use with bpf_dsl. SANDBOX_EXPORT bpf_dsl::ResultExpr CrashSIGSYS(); @@ -72,6 +85,7 @@ SANDBOX_EXPORT bpf_dsl::ResultExpr CrashSIGSYSFutex(); SANDBOX_EXPORT bpf_dsl::ResultExpr CrashSIGSYSPtrace(); SANDBOX_EXPORT bpf_dsl::ResultExpr RewriteSchedSIGSYS(); +SANDBOX_EXPORT bpf_dsl::ResultExpr RewriteFstatatSIGSYS(int fs_denied_errno); // Allocates a crash key so that Seccomp information can be recorded. void AllocateCrashKeys(); diff -Naur a/src/3rdparty/chromium/sandbox/linux/seccomp-bpf-helpers/syscall_parameters_restrictions_unittests.cc b/src/3rdparty/chromium/sandbox/linux/seccomp-bpf-helpers/syscall_parameters_restrictions_unittests.cc --- a/src/3rdparty/chromium/sandbox/linux/seccomp-bpf-helpers/syscall_parameters_restrictions_unittests.cc 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/sandbox/linux/seccomp-bpf-helpers/syscall_parameters_restrictions_unittests.cc 2021-11-20 03:43:06.618124013 +0000 @@ -36,10 +36,6 @@ #include "sandbox/linux/system_headers/linux_time.h" #include "sandbox/linux/tests/unit_tests.h" -#if !defined(OS_ANDROID) -#include "third_party/lss/linux_syscall_support.h" // for MAKE_PROCESS_CPUCLOCK -#endif - namespace sandbox { namespace { @@ -58,8 +54,14 @@ ResultExpr EvaluateSyscall(int sysno) const override { switch (sysno) { case __NR_clock_gettime: +#if defined(__NR_clock_gettime64) + case __NR_clock_gettime64: +#endif case __NR_clock_getres: case __NR_clock_nanosleep: +#if defined(__NR_clock_nanosleep_time64) + case __NR_clock_nanosleep_time64: +#endif return RestrictClockID(); default: return Allow(); diff -Naur a/src/3rdparty/chromium/sandbox/linux/seccomp-bpf-helpers/syscall_sets.cc b/src/3rdparty/chromium/sandbox/linux/seccomp-bpf-helpers/syscall_sets.cc --- a/src/3rdparty/chromium/sandbox/linux/seccomp-bpf-helpers/syscall_sets.cc 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/sandbox/linux/seccomp-bpf-helpers/syscall_sets.cc 2021-11-20 03:43:06.618124013 +0000 @@ -38,7 +38,13 @@ case __NR_clock_getres: // Allowed only on Android with parameters // filtered by RestrictClokID(). case __NR_clock_gettime: // Parameters filtered by RestrictClockID(). +#if defined(__NR_clock_gettime64) + case __NR_clock_gettime64: // Parameters filtered by RestrictClockID(). +#endif case __NR_clock_nanosleep: // Parameters filtered by RestrictClockID(). +#if defined(__NR_clock_nanosleep_time64) + case __NR_clock_nanosleep_time64: // Parameters filtered by RestrictClockID(). +#endif case __NR_clock_settime: // Privileged. #if defined(__i386__) || \ (defined(ARCH_CPU_MIPS_FAMILY) && defined(ARCH_CPU_32_BITS)) @@ -975,6 +981,22 @@ return true; default: return false; + } +} + +bool SyscallSets::IsClockApi(int sysno) { + switch (sysno) { + case __NR_clock_gettime: +#if defined(__NR_clock_gettime64) + case __NR_clock_gettime64: +#endif + case __NR_clock_nanosleep: +#if defined(__NR_clock_nanosleep_time64) + case __NR_clock_nanosleep_time64: +#endif + return true; + default: + return false; } } diff -Naur a/src/3rdparty/chromium/sandbox/linux/seccomp-bpf-helpers/syscall_sets.h b/src/3rdparty/chromium/sandbox/linux/seccomp-bpf-helpers/syscall_sets.h --- a/src/3rdparty/chromium/sandbox/linux/seccomp-bpf-helpers/syscall_sets.h 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/sandbox/linux/seccomp-bpf-helpers/syscall_sets.h 2021-11-20 03:43:06.618124013 +0000 @@ -99,6 +99,7 @@ static bool IsFaNotify(int sysno); static bool IsTimer(int sysno); static bool IsAdvancedTimer(int sysno); + static bool IsClockApi(int sysno); static bool IsExtendedAttributes(int sysno); static bool IsMisc(int sysno); #if defined(__arm__) diff -Naur a/src/3rdparty/chromium/sandbox/linux/services/syscall_wrappers.cc b/src/3rdparty/chromium/sandbox/linux/services/syscall_wrappers.cc --- a/src/3rdparty/chromium/sandbox/linux/services/syscall_wrappers.cc 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/sandbox/linux/services/syscall_wrappers.cc 2021-11-20 03:37:57.059043138 +0000 @@ -4,6 +4,7 @@ #include "sandbox/linux/services/syscall_wrappers.h" +#include #include #include #include @@ -14,11 +15,13 @@ #include #include +#include "base/check.h" #include "base/compiler_specific.h" #include "base/logging.h" #include "build/build_config.h" #include "sandbox/linux/system_headers/capability.h" #include "sandbox/linux/system_headers/linux_signal.h" +#include "sandbox/linux/system_headers/linux_stat.h" #include "sandbox/linux/system_headers/linux_syscalls.h" namespace sandbox { @@ -217,7 +220,7 @@ #undef STR #undef XSTR -#endif +#endif // defined(ARCH_CPU_X86_FAMILY) int sys_sigaction(int signum, const struct sigaction* act, @@ -241,7 +244,7 @@ #error "Unsupported architecture." #endif } -#endif +#endif // defined(ARCH_CPU_X86_FAMILY) } LinuxSigAction linux_oldact = {}; @@ -259,6 +262,47 @@ return result; } -#endif // defined(MEMORY_SANITIZER) +#endif // !defined(OS_NACL_NONSFI) + +int sys_stat(const char* path, struct kernel_stat* stat_buf) { + int res; +#if !defined(__NR_stat) + res = syscall(__NR_newfstatat, AT_FDCWD, path, stat_buf, 0); +#else + res = syscall(__NR_stat, path, stat_buf); +#endif + if (res == 0) + MSAN_UNPOISON(stat_buf, sizeof(*stat_buf)); + return res; +} + +int sys_lstat(const char* path, struct kernel_stat* stat_buf) { + int res; +#if !defined(__NR_lstat) + res = syscall(__NR_newfstatat, AT_FDCWD, path, stat_buf, AT_SYMLINK_NOFOLLOW); +#else + res = syscall(__NR_lstat, path, stat_buf); +#endif + if (res == 0) + MSAN_UNPOISON(stat_buf, sizeof(*stat_buf)); + return res; +} + +int sys_fstatat64(int dirfd, + const char* pathname, + struct kernel_stat64* stat_buf, + int flags) { +#if defined(__NR_fstatat64) + int res = syscall(__NR_fstatat64, dirfd, pathname, stat_buf, flags); + if (res == 0) + MSAN_UNPOISON(stat_buf, sizeof(*stat_buf)); + return res; +#else // defined(__NR_fstatat64) + // We should not reach here on 64-bit systems, as the *stat*64() are only + // necessary on 32-bit. + RAW_CHECK(false); + return -ENOSYS; +#endif +} } // namespace sandbox diff -Naur a/src/3rdparty/chromium/sandbox/linux/services/syscall_wrappers.h b/src/3rdparty/chromium/sandbox/linux/services/syscall_wrappers.h --- a/src/3rdparty/chromium/sandbox/linux/services/syscall_wrappers.h 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/sandbox/linux/services/syscall_wrappers.h 2021-11-20 03:37:57.059043138 +0000 @@ -17,6 +17,8 @@ struct rlimit64; struct cap_hdr; struct cap_data; +struct kernel_stat; +struct kernel_stat64; namespace sandbox { @@ -84,6 +86,19 @@ const struct sigaction* act, struct sigaction* oldact); +// Some architectures do not have stat() and lstat() syscalls. In that case, +// these wrappers will use newfstatat(), which is available on all other +// architectures, with the same capabilities as stat() and lstat(). +SANDBOX_EXPORT int sys_stat(const char* path, struct kernel_stat* stat_buf); +SANDBOX_EXPORT int sys_lstat(const char* path, struct kernel_stat* stat_buf); + +// Takes care of unpoisoning |stat_buf| for MSAN. Check-fails if fstatat64() is +// not a supported syscall on the current platform. +SANDBOX_EXPORT int sys_fstatat64(int dirfd, + const char* pathname, + struct kernel_stat64* stat_buf, + int flags); + } // namespace sandbox #endif // SANDBOX_LINUX_SERVICES_SYSCALL_WRAPPERS_H_ diff -Naur a/src/3rdparty/chromium/sandbox/linux/services/syscall_wrappers_unittest.cc b/src/3rdparty/chromium/sandbox/linux/services/syscall_wrappers_unittest.cc --- a/src/3rdparty/chromium/sandbox/linux/services/syscall_wrappers_unittest.cc 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/sandbox/linux/services/syscall_wrappers_unittest.cc 2021-11-20 03:37:57.059043138 +0000 @@ -5,15 +5,19 @@ #include "sandbox/linux/services/syscall_wrappers.h" #include +#include #include #include #include #include -#include +#include "base/logging.h" +#include "base/memory/page_size.h" #include "base/posix/eintr_wrapper.h" #include "build/build_config.h" #include "sandbox/linux/system_headers/linux_signal.h" +#include "sandbox/linux/system_headers/linux_stat.h" +#include "sandbox/linux/tests/scoped_temporary_file.h" #include "sandbox/linux/tests/test_utils.h" #include "sandbox/linux/tests/unit_tests.h" #include "testing/gtest/include/gtest/gtest.h" @@ -93,6 +97,129 @@ linux_sigset); } +TEST(SyscallWrappers, Stat) { + // Create a file to stat, with 12 bytes of data. + ScopedTemporaryFile tmp_file; + EXPECT_EQ(12, write(tmp_file.fd(), "blahblahblah", 12)); + + // To test we have the correct stat structures for each kernel/platform, we + // will right-align them on a page, with a guard page after. + char* two_pages = static_cast(TestUtils::MapPagesOrDie(2)); + TestUtils::MprotectLastPageOrDie(two_pages, 2); + char* page1_end = two_pages + base::GetPageSize(); + + // First, check that calling stat with |stat_buf| pointing to the last byte on + // a page causes EFAULT. + int res = sys_stat(tmp_file.full_file_name(), + reinterpret_cast(page1_end - 1)); + ASSERT_EQ(res, -1); + ASSERT_EQ(errno, EFAULT); + + // Now, check that we have the correctly sized stat structure. + struct kernel_stat* sb = reinterpret_cast( + page1_end - sizeof(struct kernel_stat)); + // Memset to c's so we can check the kernel zero'd the padding... + memset(sb, 'c', sizeof(struct kernel_stat)); + res = sys_stat(tmp_file.full_file_name(), sb); + ASSERT_EQ(res, 0); + + // Following fields may never be consistent but should be non-zero. + // Don't trust the platform to define fields with any particular sign. + EXPECT_NE(0u, static_cast(sb->st_dev)); + EXPECT_NE(0u, static_cast(sb->st_ino)); + EXPECT_NE(0u, static_cast(sb->st_mode)); + EXPECT_NE(0u, static_cast(sb->st_blksize)); + EXPECT_NE(0u, static_cast(sb->st_blocks)); + +// We are the ones that made the file. +// Note: normally gid and uid overflow on backwards-compatible 32-bit systems +// and we end up with dummy uids and gids in place here. +#if defined(ARCH_CPU_64_BITS) + EXPECT_EQ(geteuid(), sb->st_uid); + EXPECT_EQ(getegid(), sb->st_gid); +#endif + + // Wrote 12 bytes above which should fit in one block. + EXPECT_EQ(12u, sb->st_size); + + // Can't go backwards in time, 1500000000 was some time ago. + EXPECT_LT(1500000000u, static_cast(sb->st_atime_)); + EXPECT_LT(1500000000u, static_cast(sb->st_mtime_)); + EXPECT_LT(1500000000u, static_cast(sb->st_ctime_)); + + // Checking the padding for good measure. +#if defined(__x86_64__) + EXPECT_EQ(0u, sb->__pad0); + EXPECT_EQ(0u, sb->__unused4[0]); + EXPECT_EQ(0u, sb->__unused4[1]); + EXPECT_EQ(0u, sb->__unused4[2]); +#elif defined(__aarch64__) + EXPECT_EQ(0u, sb->__pad1); + EXPECT_EQ(0, sb->__pad2); + EXPECT_EQ(0u, sb->__unused4); + EXPECT_EQ(0u, sb->__unused5); +#endif +} + +TEST(SyscallWrappers, LStat) { + // Create a file to stat, with 12 bytes of data. + ScopedTemporaryFile tmp_file; + EXPECT_EQ(12, write(tmp_file.fd(), "blahblahblah", 12)); + + // Also create a symlink. + std::string symlink_name; + { + ScopedTemporaryFile tmp_file2; + symlink_name = tmp_file2.full_file_name(); + } + int rc = symlink(tmp_file.full_file_name(), symlink_name.c_str()); + if (rc != 0) { + PLOG(ERROR) << "Couldn't symlink " << symlink_name << " to target " + << tmp_file.full_file_name(); + GTEST_FAIL(); + } + + struct kernel_stat lstat_info; + rc = sys_lstat(symlink_name.c_str(), &lstat_info); + if (rc < 0 && errno == EOVERFLOW) { + GTEST_SKIP(); + } + if (rc != 0) { + PLOG(ERROR) << "Couldn't sys_lstat " << symlink_name; + GTEST_FAIL(); + } + + struct kernel_stat stat_info; + rc = sys_stat(symlink_name.c_str(), &stat_info); + if (rc < 0 && errno == EOVERFLOW) { + GTEST_SKIP(); + } + if (rc != 0) { + PLOG(ERROR) << "Couldn't sys_stat " << symlink_name; + GTEST_FAIL(); + } + + struct kernel_stat tmp_file_stat_info; + rc = sys_stat(tmp_file.full_file_name(), &tmp_file_stat_info); + if (rc < 0 && errno == EOVERFLOW) { + GTEST_SKIP(); + } + if (rc != 0) { + PLOG(ERROR) << "Couldn't sys_stat " << tmp_file.full_file_name(); + GTEST_FAIL(); + } + + // lstat should produce information about a symlink. + ASSERT_TRUE(S_ISLNK(lstat_info.st_mode)); + + // stat-ing symlink_name and tmp_file should produce the same inode. + ASSERT_EQ(stat_info.st_ino, tmp_file_stat_info.st_ino); + + // lstat-ing symlink_name should give a different inode than stat-ing + // symlink_name. + ASSERT_NE(stat_info.st_ino, lstat_info.st_ino); +} + } // namespace } // namespace sandbox diff -Naur a/src/3rdparty/chromium/sandbox/linux/syscall_broker/broker_client.cc b/src/3rdparty/chromium/sandbox/linux/syscall_broker/broker_client.cc --- a/src/3rdparty/chromium/sandbox/linux/syscall_broker/broker_client.cc 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/sandbox/linux/syscall_broker/broker_client.cc 2021-11-20 03:37:57.059043138 +0000 @@ -166,7 +166,7 @@ int BrokerClient::Stat(const char* pathname, bool follow_links, - struct stat* sb) const { + struct kernel_stat* sb) const { if (!pathname || !sb) return -EFAULT; @@ -181,7 +181,7 @@ int BrokerClient::Stat64(const char* pathname, bool follow_links, - struct stat64* sb) const { + struct kernel_stat64* sb) const { if (!pathname || !sb) return -EFAULT; diff -Naur a/src/3rdparty/chromium/sandbox/linux/syscall_broker/broker_client.h b/src/3rdparty/chromium/sandbox/linux/syscall_broker/broker_client.h --- a/src/3rdparty/chromium/sandbox/linux/syscall_broker/broker_client.h 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/sandbox/linux/syscall_broker/broker_client.h 2021-11-20 03:37:57.059043138 +0000 @@ -61,10 +61,10 @@ int Rmdir(const char* path) const override; int Stat(const char* pathname, bool follow_links, - struct stat* sb) const override; + struct kernel_stat* sb) const override; int Stat64(const char* pathname, bool follow_links, - struct stat64* sb) const override; + struct kernel_stat64* sb) const override; int Unlink(const char* unlink) const override; private: diff -Naur a/src/3rdparty/chromium/sandbox/linux/syscall_broker/broker_host.cc b/src/3rdparty/chromium/sandbox/linux/syscall_broker/broker_host.cc --- a/src/3rdparty/chromium/sandbox/linux/syscall_broker/broker_host.cc 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/sandbox/linux/syscall_broker/broker_host.cc 2021-11-20 03:37:57.059043138 +0000 @@ -20,9 +20,11 @@ #include "base/files/scoped_file.h" #include "base/logging.h" #include "base/posix/eintr_wrapper.h" +#include "sandbox/linux/services/syscall_wrappers.h" #include "sandbox/linux/syscall_broker/broker_command.h" #include "sandbox/linux/syscall_broker/broker_permission_list.h" #include "sandbox/linux/syscall_broker/broker_simple_message.h" +#include "sandbox/linux/system_headers/linux_stat.h" #include "sandbox/linux/system_headers/linux_syscalls.h" namespace sandbox { @@ -193,10 +195,12 @@ RAW_CHECK(reply->AddIntToMessage(-permission_list.denied_errno())); return; } + if (command_type == COMMAND_STAT) { - struct stat sb; - int sts = - follow_links ? stat(file_to_access, &sb) : lstat(file_to_access, &sb); + struct kernel_stat sb; + + int sts = follow_links ? sandbox::sys_stat(file_to_access, &sb) + : sandbox::sys_lstat(file_to_access, &sb); if (sts < 0) { RAW_CHECK(reply->AddIntToMessage(-errno)); return; @@ -205,10 +209,12 @@ RAW_CHECK( reply->AddDataToMessage(reinterpret_cast(&sb), sizeof(sb))); } else { +#if defined(__NR_fstatat64) DCHECK(command_type == COMMAND_STAT64); - struct stat64 sb; - int sts = follow_links ? stat64(file_to_access, &sb) - : lstat64(file_to_access, &sb); + struct kernel_stat64 sb; + + int sts = sandbox::sys_fstatat64(AT_FDCWD, file_to_access, &sb, + follow_links ? 0 : AT_SYMLINK_NOFOLLOW); if (sts < 0) { RAW_CHECK(reply->AddIntToMessage(-errno)); return; @@ -216,6 +222,11 @@ RAW_CHECK(reply->AddIntToMessage(0)); RAW_CHECK( reply->AddDataToMessage(reinterpret_cast(&sb), sizeof(sb))); +#else // defined(__NR_fstatat64) + // We should not reach here on 64-bit systems, as the *stat*64() are only + // necessary on 32-bit. + RAW_CHECK(false); +#endif } } diff -Naur a/src/3rdparty/chromium/sandbox/linux/syscall_broker/broker_process.cc b/src/3rdparty/chromium/sandbox/linux/syscall_broker/broker_process.cc --- a/src/3rdparty/chromium/sandbox/linux/syscall_broker/broker_process.cc 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/sandbox/linux/syscall_broker/broker_process.cc 2021-11-20 03:38:08.162871101 +0000 @@ -122,44 +122,49 @@ } bool BrokerProcess::IsSyscallBrokerable(int sysno, bool fast_check) const { + // The syscalls unavailable on aarch64 are all blocked by Android's default + // seccomp policy, even on non-aarch64 architectures. I.e., the syscalls XX() + // with a corresponding XXat() versions are typically unavailable in aarch64 + // and are default disabled in Android. So, we should refuse to broker them + // to be consistent with the platform's restrictions. switch (sysno) { -#if !defined(__aarch64__) +#if !defined(__aarch64__) && !defined(OS_ANDROID) case __NR_access: #endif case __NR_faccessat: return !fast_check || allowed_command_set_.test(COMMAND_ACCESS); -#if !defined(__aarch64__) +#if !defined(__aarch64__) && !defined(OS_ANDROID) case __NR_mkdir: #endif case __NR_mkdirat: return !fast_check || allowed_command_set_.test(COMMAND_MKDIR); -#if !defined(__aarch64__) +#if !defined(__aarch64__) && !defined(OS_ANDROID) case __NR_open: #endif case __NR_openat: return !fast_check || allowed_command_set_.test(COMMAND_OPEN); -#if !defined(__aarch64__) +#if !defined(__aarch64__) && !defined(OS_ANDROID) case __NR_readlink: #endif case __NR_readlinkat: return !fast_check || allowed_command_set_.test(COMMAND_READLINK); -#if !defined(__aarch64__) +#if !defined(__aarch64__) && !defined(OS_ANDROID) case __NR_rename: #endif case __NR_renameat: case __NR_renameat2: return !fast_check || allowed_command_set_.test(COMMAND_RENAME); -#if !defined(__aarch64__) +#if !defined(__aarch64__) && !defined(OS_ANDROID) case __NR_rmdir: return !fast_check || allowed_command_set_.test(COMMAND_RMDIR); #endif -#if !defined(__aarch64__) +#if !defined(__aarch64__) && !defined(OS_ANDROID) case __NR_stat: case __NR_lstat: #endif @@ -184,7 +189,7 @@ return !fast_check || allowed_command_set_.test(COMMAND_STAT); #endif -#if !defined(__aarch64__) +#if !defined(__aarch64__) && !defined(OS_ANDROID) case __NR_unlink: return !fast_check || allowed_command_set_.test(COMMAND_UNLINK); #endif diff -Naur a/src/3rdparty/chromium/sandbox/linux/syscall_broker/broker_process_unittest.cc b/src/3rdparty/chromium/sandbox/linux/syscall_broker/broker_process_unittest.cc --- a/src/3rdparty/chromium/sandbox/linux/syscall_broker/broker_process_unittest.cc 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/sandbox/linux/syscall_broker/broker_process_unittest.cc 2021-11-20 03:38:08.162871101 +0000 @@ -811,7 +811,7 @@ const char* bad_leading_path5 = "/mbogo/fictitioux"; const char* bad_leading_path6 = "/mbogo/fictitiousa"; - struct stat sb; + default_stat_struct sb; { // Actual file with permissions to see file but command not allowed. @@ -824,7 +824,7 @@ memset(&sb, 0, sizeof(sb)); EXPECT_EQ(-kFakeErrnoSentinel, - open_broker.GetBrokerClientSignalBased()->Stat( + open_broker.GetBrokerClientSignalBased()->DefaultStatForTesting( tempfile_name, follow_links, &sb)); } @@ -840,7 +840,7 @@ memset(&sb, 0, sizeof(sb)); EXPECT_EQ(-kFakeErrnoSentinel, - open_broker.GetBrokerClientSignalBased()->Stat( + open_broker.GetBrokerClientSignalBased()->DefaultStatForTesting( nonesuch_name, follow_links, &sb)); } { @@ -852,7 +852,7 @@ memset(&sb, 0, sizeof(sb)); EXPECT_EQ(-kFakeErrnoSentinel, - open_broker.GetBrokerClientSignalBased()->Stat( + open_broker.GetBrokerClientSignalBased()->DefaultStatForTesting( tempfile_name, follow_links, &sb)); } { @@ -864,38 +864,39 @@ ASSERT_TRUE(open_broker.Init(base::BindOnce(&NoOpCallback))); memset(&sb, 0, sizeof(sb)); - EXPECT_EQ(-ENOENT, open_broker.GetBrokerClientSignalBased()->Stat( - nonesuch_name, follow_links, &sb)); + EXPECT_EQ(-ENOENT, + open_broker.GetBrokerClientSignalBased()->DefaultStatForTesting( + nonesuch_name, follow_links, &sb)); // Gets denied all the way back to root since no create permission. EXPECT_EQ(-kFakeErrnoSentinel, - open_broker.GetBrokerClientSignalBased()->Stat( + open_broker.GetBrokerClientSignalBased()->DefaultStatForTesting( leading_path1, follow_links, &sb)); EXPECT_EQ(-kFakeErrnoSentinel, - open_broker.GetBrokerClientSignalBased()->Stat( + open_broker.GetBrokerClientSignalBased()->DefaultStatForTesting( leading_path2, follow_links, &sb)); EXPECT_EQ(-kFakeErrnoSentinel, - open_broker.GetBrokerClientSignalBased()->Stat( + open_broker.GetBrokerClientSignalBased()->DefaultStatForTesting( leading_path3, follow_links, &sb)); // Not fooled by substrings. EXPECT_EQ(-kFakeErrnoSentinel, - open_broker.GetBrokerClientSignalBased()->Stat( + open_broker.GetBrokerClientSignalBased()->DefaultStatForTesting( bad_leading_path1, follow_links, &sb)); EXPECT_EQ(-kFakeErrnoSentinel, - open_broker.GetBrokerClientSignalBased()->Stat( + open_broker.GetBrokerClientSignalBased()->DefaultStatForTesting( bad_leading_path2, follow_links, &sb)); EXPECT_EQ(-kFakeErrnoSentinel, - open_broker.GetBrokerClientSignalBased()->Stat( + open_broker.GetBrokerClientSignalBased()->DefaultStatForTesting( bad_leading_path3, follow_links, &sb)); EXPECT_EQ(-kFakeErrnoSentinel, - open_broker.GetBrokerClientSignalBased()->Stat( + open_broker.GetBrokerClientSignalBased()->DefaultStatForTesting( bad_leading_path4, follow_links, &sb)); EXPECT_EQ(-kFakeErrnoSentinel, - open_broker.GetBrokerClientSignalBased()->Stat( + open_broker.GetBrokerClientSignalBased()->DefaultStatForTesting( bad_leading_path5, follow_links, &sb)); EXPECT_EQ(-kFakeErrnoSentinel, - open_broker.GetBrokerClientSignalBased()->Stat( + open_broker.GetBrokerClientSignalBased()->DefaultStatForTesting( bad_leading_path6, follow_links, &sb)); } { @@ -907,37 +908,41 @@ ASSERT_TRUE(open_broker.Init(base::BindOnce(&NoOpCallback))); memset(&sb, 0, sizeof(sb)); - EXPECT_EQ(-ENOENT, open_broker.GetBrokerClientSignalBased()->Stat( - nonesuch_name, follow_links, &sb)); + EXPECT_EQ(-ENOENT, + open_broker.GetBrokerClientSignalBased()->DefaultStatForTesting( + nonesuch_name, follow_links, &sb)); // Gets ENOENT all the way back to root since it has create permission. - EXPECT_EQ(-ENOENT, open_broker.GetBrokerClientSignalBased()->Stat( - leading_path1, follow_links, &sb)); - EXPECT_EQ(-ENOENT, open_broker.GetBrokerClientSignalBased()->Stat( - leading_path2, follow_links, &sb)); + EXPECT_EQ(-ENOENT, + open_broker.GetBrokerClientSignalBased()->DefaultStatForTesting( + leading_path1, follow_links, &sb)); + EXPECT_EQ(-ENOENT, + open_broker.GetBrokerClientSignalBased()->DefaultStatForTesting( + leading_path2, follow_links, &sb)); // But can always get the root. - EXPECT_EQ(0, open_broker.GetBrokerClientSignalBased()->Stat( - leading_path3, follow_links, &sb)); + EXPECT_EQ(0, + open_broker.GetBrokerClientSignalBased()->DefaultStatForTesting( + leading_path3, follow_links, &sb)); // Not fooled by substrings. EXPECT_EQ(-kFakeErrnoSentinel, - open_broker.GetBrokerClientSignalBased()->Stat( + open_broker.GetBrokerClientSignalBased()->DefaultStatForTesting( bad_leading_path1, follow_links, &sb)); EXPECT_EQ(-kFakeErrnoSentinel, - open_broker.GetBrokerClientSignalBased()->Stat( + open_broker.GetBrokerClientSignalBased()->DefaultStatForTesting( bad_leading_path2, follow_links, &sb)); EXPECT_EQ(-kFakeErrnoSentinel, - open_broker.GetBrokerClientSignalBased()->Stat( + open_broker.GetBrokerClientSignalBased()->DefaultStatForTesting( bad_leading_path3, follow_links, &sb)); EXPECT_EQ(-kFakeErrnoSentinel, - open_broker.GetBrokerClientSignalBased()->Stat( + open_broker.GetBrokerClientSignalBased()->DefaultStatForTesting( bad_leading_path4, follow_links, &sb)); EXPECT_EQ(-kFakeErrnoSentinel, - open_broker.GetBrokerClientSignalBased()->Stat( + open_broker.GetBrokerClientSignalBased()->DefaultStatForTesting( bad_leading_path5, follow_links, &sb)); EXPECT_EQ(-kFakeErrnoSentinel, - open_broker.GetBrokerClientSignalBased()->Stat( + open_broker.GetBrokerClientSignalBased()->DefaultStatForTesting( bad_leading_path6, follow_links, &sb)); } { @@ -949,8 +954,9 @@ ASSERT_TRUE(open_broker.Init(base::BindOnce(&NoOpCallback))); memset(&sb, 0, sizeof(sb)); - EXPECT_EQ(0, open_broker.GetBrokerClientSignalBased()->Stat( - tempfile_name, follow_links, &sb)); + EXPECT_EQ(0, + open_broker.GetBrokerClientSignalBased()->DefaultStatForTesting( + tempfile_name, follow_links, &sb)); // Following fields may never be consistent but should be non-zero. // Don't trust the platform to define fields with any particular sign. @@ -968,9 +974,9 @@ EXPECT_EQ(12, sb.st_size); // Can't go backwards in time, 1500000000 was some time ago. - EXPECT_LT(1500000000u, static_cast(sb.st_atime)); - EXPECT_LT(1500000000u, static_cast(sb.st_mtime)); - EXPECT_LT(1500000000u, static_cast(sb.st_ctime)); + EXPECT_LT(1500000000u, static_cast(sb.st_atime_)); + EXPECT_LT(1500000000u, static_cast(sb.st_mtime_)); + EXPECT_LT(1500000000u, static_cast(sb.st_ctime_)); } } @@ -1590,52 +1596,52 @@ const base::flat_map> kSysnosForCommand = { {COMMAND_ACCESS, {__NR_faccessat, -#if defined(__NR_access) +#if defined(__NR_access) && !defined(OS_ANDROID) __NR_access #endif }}, {COMMAND_MKDIR, {__NR_mkdirat, -#if defined(__NR_mkdir) +#if defined(__NR_mkdir) && !defined(OS_ANDROID) __NR_mkdir #endif }}, {COMMAND_OPEN, {__NR_openat, -#if defined(__NR_open) +#if defined(__NR_open) && !defined(OS_ANDROID) __NR_open #endif }}, {COMMAND_READLINK, {__NR_readlinkat, -#if defined(__NR_readlink) +#if defined(__NR_readlink) && !defined(OS_ANDROID) __NR_readlink #endif }}, {COMMAND_RENAME, {__NR_renameat, -#if defined(__NR_rename) +#if defined(__NR_rename) && !defined(OS_ANDROID) __NR_rename #endif }}, {COMMAND_UNLINK, {__NR_unlinkat, -#if defined(__NR_unlink) +#if defined(__NR_unlink) && !defined(OS_ANDROID) __NR_unlink #endif }}, {COMMAND_RMDIR, {__NR_unlinkat, -#if defined(__NR_rmdir) +#if defined(__NR_rmdir) && !defined(OS_ANDROID) __NR_rmdir #endif }}, {COMMAND_STAT, { -#if defined(__NR_stat) +#if defined(__NR_stat) && !defined(OS_ANDROID) __NR_stat, #endif -#if defined(__NR_lstat) +#if defined(__NR_lstat) && !defined(OS_ANDROID) __NR_lstat, #endif #if defined(__NR_fstatat) diff -Naur a/src/3rdparty/chromium/sandbox/linux/syscall_broker/DEPS b/src/3rdparty/chromium/sandbox/linux/syscall_broker/DEPS --- a/src/3rdparty/chromium/sandbox/linux/syscall_broker/DEPS 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/sandbox/linux/syscall_broker/DEPS 2021-11-20 03:37:57.059043138 +0000 @@ -1,4 +1,5 @@ include_rules = [ - "+sandbox/linux/system_headers", "+sandbox/linux/bpf_dsl", + "+sandbox/linux/services", + "+sandbox/linux/system_headers", ] diff -Naur a/src/3rdparty/chromium/sandbox/linux/syscall_broker/remote_syscall_arg_handler_unittest.cc b/src/3rdparty/chromium/sandbox/linux/syscall_broker/remote_syscall_arg_handler_unittest.cc --- a/src/3rdparty/chromium/sandbox/linux/syscall_broker/remote_syscall_arg_handler_unittest.cc 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/sandbox/linux/syscall_broker/remote_syscall_arg_handler_unittest.cc 2021-11-20 03:37:57.060043123 +0000 @@ -17,6 +17,7 @@ #include "base/posix/unix_domain_socket.h" #include "base/process/process_metrics.h" #include "base/test/bind_test_util.h" +#include "sandbox/linux/tests/test_utils.h" #include "sandbox/linux/tests/unit_tests.h" #include "testing/gtest/include/gtest/gtest.h" @@ -53,19 +54,6 @@ } } -void* MapPagesOrDie(size_t num_pages) { - void* addr = mmap(nullptr, num_pages * base::GetPageSize(), - PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); - PCHECK(addr); - return addr; -} - -void MprotectLastPageOrDie(char* addr, size_t num_pages) { - size_t last_page_offset = (num_pages - 1) * base::GetPageSize(); - PCHECK(mprotect(addr + last_page_offset, base::GetPageSize(), PROT_NONE) >= - 0); -} - pid_t ForkWaitingChild(base::OnceCallback after_parent_signals_callback = base::DoNothing(), base::ScopedFD* parent_sync_fd = nullptr) { @@ -106,13 +94,13 @@ size_t total_pages = (test_config.start_at + test_config.total_size + base::GetPageSize() - 1) / base::GetPageSize(); - char* mmap_addr = static_cast(MapPagesOrDie(total_pages)); + char* mmap_addr = static_cast(TestUtils::MapPagesOrDie(total_pages)); char* addr = mmap_addr + test_config.start_at; FillBufferWithPath(addr, test_config.total_size, test_config.include_null_byte); if (test_config.last_page_inaccessible) - MprotectLastPageOrDie(mmap_addr, total_pages); + TestUtils::MprotectLastPageOrDie(mmap_addr, total_pages); pid_t pid = ForkWaitingChild(); munmap(mmap_addr, base::GetPageSize() * total_pages); @@ -213,7 +201,7 @@ } SANDBOX_TEST(BrokerRemoteSyscallArgHandler, ReadChildExited) { - void* addr = MapPagesOrDie(1); + void* addr = TestUtils::MapPagesOrDie(1); FillBufferWithPath(static_cast(addr), strlen(kPathPart) + 1, true); base::ScopedFD parent_sync, child_sync; @@ -241,10 +229,10 @@ } SANDBOX_TEST(BrokerRemoteSyscallArgHandler, BasicWrite) { - void* read_from = MapPagesOrDie(1); + void* read_from = TestUtils::MapPagesOrDie(1); const size_t write_size = base::GetPageSize(); FillBufferWithPath(static_cast(read_from), write_size, false); - char* write_to = static_cast(MapPagesOrDie(1)); + char* write_to = static_cast(TestUtils::MapPagesOrDie(1)); base::ScopedFD parent_signal_fd; const std::vector empty_fd_vec; @@ -279,8 +267,8 @@ } SANDBOX_TEST(BrokerRemoteSyscallArgHandler, WriteToInvalidAddress) { - char* write_to = static_cast(MapPagesOrDie(1)); - MprotectLastPageOrDie(write_to, 1); + char* write_to = static_cast(TestUtils::MapPagesOrDie(1)); + TestUtils::MprotectLastPageOrDie(write_to, 1); base::ScopedFD parent_signal_fd; const std::vector empty_fd_vec; @@ -296,11 +284,11 @@ } SANDBOX_TEST(BrokerRemoteSyscallArgHandler, WritePartiallyToInvalidAddress) { - char* read_from = static_cast(MapPagesOrDie(2)); + char* read_from = static_cast(TestUtils::MapPagesOrDie(2)); const size_t write_size = base::GetPageSize(); FillBufferWithPath(static_cast(read_from), write_size, false); - char* write_to = static_cast(MapPagesOrDie(2)); - MprotectLastPageOrDie(write_to, 2); + char* write_to = static_cast(TestUtils::MapPagesOrDie(2)); + TestUtils::MprotectLastPageOrDie(write_to, 2); write_to += base::GetPageSize() / 2; base::ScopedFD parent_signal_fd; const std::vector empty_fd_vec; @@ -315,7 +303,7 @@ } SANDBOX_TEST(BrokerRemoteSyscallArgHandler, WriteChildExited) { - char* addr = static_cast(MapPagesOrDie(1)); + char* addr = static_cast(TestUtils::MapPagesOrDie(1)); FillBufferWithPath(static_cast(addr), strlen(kPathPart) + 1, true); base::ScopedFD parent_sync, child_sync; diff -Naur a/src/3rdparty/chromium/sandbox/linux/syscall_broker/syscall_dispatcher.cc b/src/3rdparty/chromium/sandbox/linux/syscall_broker/syscall_dispatcher.cc --- a/src/3rdparty/chromium/sandbox/linux/syscall_broker/syscall_dispatcher.cc 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/sandbox/linux/syscall_broker/syscall_dispatcher.cc 2021-11-20 03:37:57.060043123 +0000 @@ -19,8 +19,18 @@ #define BROKER_UNPOISON_STRING(x) #endif +int SyscallDispatcher::DefaultStatForTesting(const char* pathname, + bool follow_links, + default_stat_struct* sb) { +#if defined(__NR_fstatat64) + return Stat64(pathname, follow_links, sb); +#elif defined(__NR_newfstatat) + return Stat(pathname, follow_links, sb); +#endif +} + int SyscallDispatcher::PerformStatat(const arch_seccomp_data& args, - bool arch64) { + bool stat64) { if (static_cast(args.args[0]) != AT_FDCWD) return -EPERM; // Only allow the AT_SYMLINK_NOFOLLOW flag which is used by some libc @@ -30,13 +40,29 @@ const bool follow_links = !(static_cast(args.args[3]) & AT_SYMLINK_NOFOLLOW); - if (arch64) { + if (stat64) { return Stat64(reinterpret_cast(args.args[1]), follow_links, - reinterpret_cast(args.args[2])); + reinterpret_cast(args.args[2])); } return Stat(reinterpret_cast(args.args[1]), follow_links, - reinterpret_cast(args.args[2])); + reinterpret_cast(args.args[2])); +} + +int SyscallDispatcher::PerformUnlinkat(const arch_seccomp_data& args) { + if (static_cast(args.args[0]) != AT_FDCWD) + return -EPERM; + + int flags = static_cast(args.args[2]); + + if (flags == AT_REMOVEDIR) { + return Rmdir(reinterpret_cast(args.args[1])); + } + + if (flags != 0) + return -EPERM; + + return Unlink(reinterpret_cast(args.args[1])); } int SyscallDispatcher::DispatchSyscall(const arch_seccomp_data& args) { @@ -127,59 +153,42 @@ #if defined(__NR_stat) case __NR_stat: return Stat(reinterpret_cast(args.args[0]), true, - reinterpret_cast(args.args[1])); + reinterpret_cast(args.args[1])); #endif #if defined(__NR_stat64) case __NR_stat64: return Stat64(reinterpret_cast(args.args[0]), true, - reinterpret_cast(args.args[1])); + reinterpret_cast(args.args[1])); #endif #if defined(__NR_lstat) case __NR_lstat: // See https://crbug.com/847096 BROKER_UNPOISON_STRING(reinterpret_cast(args.args[0])); return Stat(reinterpret_cast(args.args[0]), false, - reinterpret_cast(args.args[1])); + reinterpret_cast(args.args[1])); #endif #if defined(__NR_lstat64) case __NR_lstat64: // See https://crbug.com/847096 BROKER_UNPOISON_STRING(reinterpret_cast(args.args[0])); return Stat64(reinterpret_cast(args.args[0]), false, - reinterpret_cast(args.args[1])); -#endif -#if defined(__NR_fstatat) - case __NR_fstatat: - return PerformStatat(args, /*arch64=*/false); + reinterpret_cast(args.args[1])); #endif #if defined(__NR_fstatat64) case __NR_fstatat64: - return PerformStatat(args, /*arch64=*/true); + return PerformStatat(args, /*stat64=*/true); #endif #if defined(__NR_newfstatat) case __NR_newfstatat: - return PerformStatat(args, /*arch64=*/false); + return PerformStatat(args, /*stat64=*/false); #endif #if defined(__NR_unlink) case __NR_unlink: return Unlink(reinterpret_cast(args.args[0])); #endif #if defined(__NR_unlinkat) - case __NR_unlinkat: { - if (static_cast(args.args[0]) != AT_FDCWD) - return -EPERM; - - int flags = static_cast(args.args[2]); - - if (flags == AT_REMOVEDIR) { - return Rmdir(reinterpret_cast(args.args[1])); - } - - if (flags != 0) - return -EPERM; - - return Unlink(reinterpret_cast(args.args[1])); - } + case __NR_unlinkat: + return PerformUnlinkat(args); #endif // defined(__NR_unlinkat) default: RAW_CHECK(false); diff -Naur a/src/3rdparty/chromium/sandbox/linux/syscall_broker/syscall_dispatcher.h b/src/3rdparty/chromium/sandbox/linux/syscall_broker/syscall_dispatcher.h --- a/src/3rdparty/chromium/sandbox/linux/syscall_broker/syscall_dispatcher.h 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/sandbox/linux/syscall_broker/syscall_dispatcher.h 2021-11-20 03:37:57.060043123 +0000 @@ -9,13 +9,15 @@ #include #include "sandbox/linux/system_headers/linux_seccomp.h" +#include "sandbox/linux/system_headers/linux_stat.h" +#include "sandbox/sandbox_export.h" namespace sandbox { namespace syscall_broker { // An abstract class that defines all the system calls we perform for the // sandboxed process. -class SyscallDispatcher { +class SANDBOX_EXPORT SyscallDispatcher { public: // Emulates access()/faccessat(). // X_OK will always return an error in practice since the broker process @@ -40,19 +42,34 @@ virtual int Rmdir(const char* path) const = 0; // Emulates stat()/stat64()/lstat()/lstat64()/fstatat()/newfstatat(). + // Stat64 is only available on 32-bit systems. virtual int Stat(const char* pathname, bool follow_links, - struct stat* sb) const = 0; + struct kernel_stat* sb) const = 0; virtual int Stat64(const char* pathname, bool follow_links, - struct stat64* sb) const = 0; + struct kernel_stat64* sb) const = 0; // Emulates unlink()/unlinkat(). virtual int Unlink(const char* unlink) const = 0; + // Different architectures use a different syscall from the stat family by + // default in glibc. E.g. 32-bit systems use *stat*64() and fill out struct + // kernel_stat64, whereas 64-bit systems use *stat*() and fill out struct + // kernel_stat. Some tests want to call the SyscallDispatcher directly, and + // should be using the default stat in order to test against glibc. + int DefaultStatForTesting(const char* pathname, + bool follow_links, + default_stat_struct* sb); + // Validates the args passed to a *statat*() syscall and performs the syscall - // using Stat() or Stat64(). - int PerformStatat(const arch_seccomp_data& args, bool arch64); + // using Stat(), or on 32-bit systems it uses Stat64() for the *statat64() + // syscalls. + int PerformStatat(const arch_seccomp_data& args, bool stat64); + + // Validates the args passed to an unlinkat() syscall and performs the syscall + // using either Unlink() or Rmdir(). + int PerformUnlinkat(const arch_seccomp_data& args); // Reads the syscall number and arguments, imposes some policy (e.g. the *at() // system calls must only allow AT_FDCWD as the first argument), and diff -Naur a/src/3rdparty/chromium/sandbox/linux/system_headers/arm64_linux_syscalls.h b/src/3rdparty/chromium/sandbox/linux/system_headers/arm64_linux_syscalls.h --- a/src/3rdparty/chromium/sandbox/linux/system_headers/arm64_linux_syscalls.h 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/sandbox/linux/system_headers/arm64_linux_syscalls.h 2021-11-20 03:42:53.890325579 +0000 @@ -1119,4 +1119,100 @@ #define __NR_rseq 293 #endif +#if !defined(__NR_kexec_file_load) +#define __NR_kexec_file_load 294 +#endif + +#if !defined(__NR_pidfd_send_signal) +#define __NR_pidfd_send_signal 424 +#endif + +#if !defined(__NR_io_uring_setup) +#define __NR_io_uring_setup 425 +#endif + +#if !defined(__NR_io_uring_enter) +#define __NR_io_uring_enter 426 +#endif + +#if !defined(__NR_io_uring_register) +#define __NR_io_uring_register 427 +#endif + +#if !defined(__NR_open_tree) +#define __NR_open_tree 428 +#endif + +#if !defined(__NR_move_mount) +#define __NR_move_mount 429 +#endif + +#if !defined(__NR_fsopen) +#define __NR_fsopen 430 +#endif + +#if !defined(__NR_fsconfig) +#define __NR_fsconfig 431 +#endif + +#if !defined(__NR_fsmount) +#define __NR_fsmount 432 +#endif + +#if !defined(__NR_fspick) +#define __NR_fspick 433 +#endif + +#if !defined(__NR_pidfd_open) +#define __NR_pidfd_open 434 +#endif + +#if !defined(__NR_clone3) +#define __NR_clone3 435 +#endif + +#if !defined(__NR_close_range) +#define __NR_close_range 436 +#endif + +#if !defined(__NR_openat2) +#define __NR_openat2 437 +#endif + +#if !defined(__NR_pidfd_getfd) +#define __NR_pidfd_getfd 438 +#endif + +#if !defined(__NR_faccessat2) +#define __NR_faccessat2 439 +#endif + +#if !defined(__NR_process_madvise) +#define __NR_process_madvise 440 +#endif + +#if !defined(__NR_epoll_pwait2) +#define __NR_epoll_pwait2 441 +#endif + +#if !defined(__NR_mount_setattr) +#define __NR_mount_setattr 442 +#endif + +#if !defined(__NR_quotactl_path) +#define __NR_quotactl_path 443 +#endif + +#if !defined(__NR_landlock_create_ruleset) +#define __NR_landlock_create_ruleset 444 +#endif + +#if !defined(__NR_landlock_add_rule) +#define __NR_landlock_add_rule 445 +#endif + +#if !defined(__NR_landlock_restrict_self) +#define __NR_landlock_restrict_self 446 +#endif + #endif // SANDBOX_LINUX_SYSTEM_HEADERS_ARM64_LINUX_SYSCALLS_H_ diff -Naur a/src/3rdparty/chromium/sandbox/linux/system_headers/arm_linux_syscalls.h b/src/3rdparty/chromium/sandbox/linux/system_headers/arm_linux_syscalls.h --- a/src/3rdparty/chromium/sandbox/linux/system_headers/arm_linux_syscalls.h 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/sandbox/linux/system_headers/arm_linux_syscalls.h 2021-11-20 03:43:21.953881146 +0000 @@ -1441,6 +1441,182 @@ #define __NR_io_pgetevents (__NR_SYSCALL_BASE+399) #endif +#if !defined(__NR_landlock_create_ruleset) +#define __NR_landlock_create_ruleset (__NR_SYSCALL_BASE + 444) +#endif + +#if !defined(__NR_landlock_add_rule) +#define __NR_landlock_add_rule (__NR_SYSCALL_BASE + 445) +#endif + +#if !defined(__NR_landlock_restrict_self) +#define __NR_landlock_restrict_self (__NR_SYSCALL_BASE + 446) +#endif + +#if !defined(__NR_migrate_pages) +#define __NR_migrate_pages (__NR_SYSCALL_BASE + 400) +#endif + +#if !defined(__NR_kexec_file_load) +#define __NR_kexec_file_load (__NR_SYSCALL_BASE + 401) +#endif + +#if !defined(__NR_clock_gettime64) +#define __NR_clock_gettime64 (__NR_SYSCALL_BASE + 403) +#endif + +#if !defined(__NR_clock_settime64) +#define __NR_clock_settime64 (__NR_SYSCALL_BASE + 404) +#endif + +#if !defined(__NR_clock_adjtime64) +#define __NR_clock_adjtime64 (__NR_SYSCALL_BASE + 405) +#endif + +#if !defined(__NR_clock_getres_time64) +#define __NR_clock_getres_time64 (__NR_SYSCALL_BASE + 406) +#endif + +#if !defined(__NR_clock_nanosleep_time64) +#define __NR_clock_nanosleep_time64 (__NR_SYSCALL_BASE + 407) +#endif + +#if !defined(__NR_timer_gettime64) +#define __NR_timer_gettime64 (__NR_SYSCALL_BASE + 408) +#endif + +#if !defined(__NR_timer_settime64) +#define __NR_timer_settime64 (__NR_SYSCALL_BASE + 409) +#endif + +#if !defined(__NR_timerfd_gettime64) +#define __NR_timerfd_gettime64 (__NR_SYSCALL_BASE + 410) +#endif + +#if !defined(__NR_timerfd_settime64) +#define __NR_timerfd_settime64 (__NR_SYSCALL_BASE + 411) +#endif + +#if !defined(__NR_utimensat_time64) +#define __NR_utimensat_time64 (__NR_SYSCALL_BASE + 412) +#endif + +#if !defined(__NR_pselect6_time64) +#define __NR_pselect6_time64 (__NR_SYSCALL_BASE + 413) +#endif + +#if !defined(__NR_ppoll_time64) +#define __NR_ppoll_time64 (__NR_SYSCALL_BASE + 414) +#endif + +#if !defined(__NR_io_pgetevents_time64) +#define __NR_io_pgetevents_time64 (__NR_SYSCALL_BASE + 416) +#endif + +#if !defined(__NR_recvmmsg_time64) +#define __NR_recvmmsg_time64 (__NR_SYSCALL_BASE + 417) +#endif + +#if !defined(__NR_mq_timedsend_time64) +#define __NR_mq_timedsend_time64 (__NR_SYSCALL_BASE + 418) +#endif + +#if !defined(__NR_mq_timedreceive_time64) +#define __NR_mq_timedreceive_time64 (__NR_SYSCALL_BASE + 419) +#endif + +#if !defined(__NR_semtimedop_time64) +#define __NR_semtimedop_time64 (__NR_SYSCALL_BASE + 420) +#endif + +#if !defined(__NR_rt_sigtimedwait_time64) +#define __NR_rt_sigtimedwait_time64 (__NR_SYSCALL_BASE + 421) +#endif + +#if !defined(__NR_futex_time64) +#define __NR_futex_time64 (__NR_SYSCALL_BASE + 422) +#endif + +#if !defined(__NR_sched_rr_get_interval_time64) +#define __NR_sched_rr_get_interval_time64 (__NR_SYSCALL_BASE + 423) +#endif + +#if !defined(__NR_pidfd_send_signal) +#define __NR_pidfd_send_signal (__NR_SYSCALL_BASE + 424) +#endif + +#if !defined(__NR_io_uring_setup) +#define __NR_io_uring_setup (__NR_SYSCALL_BASE + 425) +#endif + +#if !defined(__NR_io_uring_enter) +#define __NR_io_uring_enter (__NR_SYSCALL_BASE + 426) +#endif + +#if !defined(__NR_io_uring_register) +#define __NR_io_uring_register (__NR_SYSCALL_BASE + 427) +#endif + +#if !defined(__NR_open_tree) +#define __NR_open_tree (__NR_SYSCALL_BASE + 428) +#endif + +#if !defined(__NR_move_mount) +#define __NR_move_mount (__NR_SYSCALL_BASE + 429) +#endif + +#if !defined(__NR_fsopen) +#define __NR_fsopen (__NR_SYSCALL_BASE + 430) +#endif + +#if !defined(__NR_fsconfig) +#define __NR_fsconfig (__NR_SYSCALL_BASE + 431) +#endif + +#if !defined(__NR_fsmount) +#define __NR_fsmount (__NR_SYSCALL_BASE + 432) +#endif + +#if !defined(__NR_fspick) +#define __NR_fspick (__NR_SYSCALL_BASE + 433) +#endif + +#if !defined(__NR_pidfd_open) +#define __NR_pidfd_open (__NR_SYSCALL_BASE + 434) +#endif + +#if !defined(__NR_clone3) +#define __NR_clone3 (__NR_SYSCALL_BASE + 435) +#endif + +#if !defined(__NR_close_range) +#define __NR_close_range (__NR_SYSCALL_BASE + 436) +#endif + +#if !defined(__NR_openat2) +#define __NR_openat2 (__NR_SYSCALL_BASE + 437) +#endif + +#if !defined(__NR_pidfd_getfd) +#define __NR_pidfd_getfd (__NR_SYSCALL_BASE + 438) +#endif + +#if !defined(__NR_faccessat2) +#define __NR_faccessat2 (__NR_SYSCALL_BASE + 439) +#endif + +#if !defined(__NR_process_madvise) +#define __NR_process_madvise (__NR_SYSCALL_BASE + 440) +#endif + +#if !defined(__NR_epoll_pwait2) +#define __NR_epoll_pwait2 (__NR_SYSCALL_BASE + 441) +#endif + +#if !defined(__NR_mount_setattr) +#define __NR_mount_setattr (__NR_SYSCALL_BASE + 442) +#endif + // ARM private syscalls. #if !defined(__ARM_NR_BASE) #define __ARM_NR_BASE (__NR_SYSCALL_BASE + 0xF0000) diff -Naur a/src/3rdparty/chromium/sandbox/linux/system_headers/linux_stat.h b/src/3rdparty/chromium/sandbox/linux/system_headers/linux_stat.h --- a/src/3rdparty/chromium/sandbox/linux/system_headers/linux_stat.h 1970-01-01 01:00:00.000000000 +0100 +++ b/src/3rdparty/chromium/sandbox/linux/system_headers/linux_stat.h 2021-11-20 03:38:08.162871101 +0000 @@ -0,0 +1,192 @@ +// Copyright 2021 The Chromium Authors. All rights reserved. +// Use of this source code is governed by a BSD-style license that can be +// found in the LICENSE file. + +#ifndef SANDBOX_LINUX_SYSTEM_HEADERS_LINUX_STAT_H_ +#define SANDBOX_LINUX_SYSTEM_HEADERS_LINUX_STAT_H_ + +#include + +#include "build/build_config.h" +#include "sandbox/linux/system_headers/linux_syscalls.h" + +#if defined(ARCH_CPU_MIPS_FAMILY) +#if defined(ARCH_CPU_64_BITS) +struct kernel_stat { +#else +struct kernel_stat64 { +#endif + unsigned st_dev; + unsigned __pad0[3]; + unsigned long long st_ino; + unsigned st_mode; + unsigned st_nlink; + unsigned st_uid; + unsigned st_gid; + unsigned st_rdev; + unsigned __pad1[3]; + long long st_size; + unsigned st_atime_; + unsigned st_atime_nsec_; + unsigned st_mtime_; + unsigned st_mtime_nsec_; + unsigned st_ctime_; + unsigned st_ctime_nsec_; + unsigned st_blksize; + unsigned __pad2; + unsigned long long st_blocks; +}; +#else +struct kernel_stat64 { + unsigned long long st_dev; + unsigned char __pad0[4]; + unsigned __st_ino; + unsigned st_mode; + unsigned st_nlink; + unsigned st_uid; + unsigned st_gid; + unsigned long long st_rdev; + unsigned char __pad3[4]; + long long st_size; + unsigned st_blksize; + unsigned long long st_blocks; + unsigned st_atime_; + unsigned st_atime_nsec_; + unsigned st_mtime_; + unsigned st_mtime_nsec_; + unsigned st_ctime_; + unsigned st_ctime_nsec_; + unsigned long long st_ino; +}; +#endif + +#if defined(__i386__) || defined(__ARM_ARCH_3__) || defined(__ARM_EABI__) +struct kernel_stat { + /* The kernel headers suggest that st_dev and st_rdev should be 32bit + * quantities encoding 12bit major and 20bit minor numbers in an interleaved + * format. In reality, we do not see useful data in the top bits. So, + * we'll leave the padding in here, until we find a better solution. + */ + unsigned short st_dev; + short pad1; + unsigned st_ino; + unsigned short st_mode; + unsigned short st_nlink; + unsigned short st_uid; + unsigned short st_gid; + unsigned short st_rdev; + short pad2; + unsigned st_size; + unsigned st_blksize; + unsigned st_blocks; + unsigned st_atime_; + unsigned st_atime_nsec_; + unsigned st_mtime_; + unsigned st_mtime_nsec_; + unsigned st_ctime_; + unsigned st_ctime_nsec_; + unsigned __unused4; + unsigned __unused5; +}; +#elif defined(__x86_64__) +struct kernel_stat { + uint64_t st_dev; + uint64_t st_ino; + uint64_t st_nlink; + unsigned st_mode; + unsigned st_uid; + unsigned st_gid; + unsigned __pad0; + uint64_t st_rdev; + int64_t st_size; + int64_t st_blksize; + int64_t st_blocks; + uint64_t st_atime_; + uint64_t st_atime_nsec_; + uint64_t st_mtime_; + uint64_t st_mtime_nsec_; + uint64_t st_ctime_; + uint64_t st_ctime_nsec_; + int64_t __unused4[3]; +}; +#elif (defined(ARCH_CPU_MIPS_FAMILY) && defined(ARCH_CPU_32_BITS)) +struct kernel_stat { + unsigned st_dev; + int st_pad1[3]; + unsigned st_ino; + unsigned st_mode; + unsigned st_nlink; + unsigned st_uid; + unsigned st_gid; + unsigned st_rdev; + int st_pad2[2]; + long st_size; + int st_pad3; + long st_atime_; + long st_atime_nsec_; + long st_mtime_; + long st_mtime_nsec_; + long st_ctime_; + long st_ctime_nsec_; + int st_blksize; + int st_blocks; + int st_pad4[14]; +}; +#elif defined(__aarch64__) +struct kernel_stat { + unsigned long st_dev; + unsigned long st_ino; + unsigned int st_mode; + unsigned int st_nlink; + unsigned int st_uid; + unsigned int st_gid; + unsigned long st_rdev; + unsigned long __pad1; + long st_size; + int st_blksize; + int __pad2; + long st_blocks; + long st_atime_; + unsigned long st_atime_nsec_; + long st_mtime_; + unsigned long st_mtime_nsec_; + long st_ctime_; + unsigned long st_ctime_nsec_; + unsigned int __unused4; + unsigned int __unused5; +}; +#endif + +#if !defined(AT_EMPTY_PATH) +#define AT_EMPTY_PATH 0x1000 +#endif + +// On 32-bit systems, we default to the 64-bit stat struct like libc +// implementations do. Otherwise we default to the normal stat struct which is +// already 64-bit. +// These defines make it easy to call the right syscall to fill out a 64-bit +// stat struct, which is the default in libc implementations but requires +// different syscall names on 32 and 64-bit platforms. +#if defined(__NR_fstatat64) + +namespace sandbox { +using default_stat_struct = struct kernel_stat64; +} // namespace sandbox + +#define __NR_fstatat_default __NR_fstatat64 +#define __NR_fstat_default __NR_fstat64 + +#elif defined(__NR_newfstatat) + +namespace sandbox { +using default_stat_struct = struct kernel_stat; +} // namespace sandbox + +#define __NR_fstatat_default __NR_newfstatat +#define __NR_fstat_default __NR_fstat + +#else +#error "one of fstatat64 and newfstatat must be defined" +#endif + +#endif // SANDBOX_LINUX_SYSTEM_HEADERS_LINUX_STAT_H_ diff -Naur a/src/3rdparty/chromium/sandbox/linux/system_headers/linux_time.h b/src/3rdparty/chromium/sandbox/linux/system_headers/linux_time.h --- a/src/3rdparty/chromium/sandbox/linux/system_headers/linux_time.h 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/sandbox/linux/system_headers/linux_time.h 2021-11-20 03:37:57.060043123 +0000 @@ -11,6 +11,32 @@ #define CPUCLOCK_CLOCK_MASK 3 #endif +#if !defined(CPUCLOCK_PROF) +#define CPUCLOCK_PROF 0 +#endif + +#if !defined(CPUCLOCK_VIRT) +#define CPUCLOCK_VIRT 1 +#endif + +#if !defined(CPUCLOCK_SCHED) +#define CPUCLOCK_SCHED 2 +#endif + +#if !defined(CPUCLOCK_PERTHREAD_MASK) +#define CPUCLOCK_PERTHREAD_MASK 4 +#endif + +#if !defined(MAKE_PROCESS_CPUCLOCK) +#define MAKE_PROCESS_CPUCLOCK(pid, clock) \ + ((int)(~(unsigned)(pid) << 3) | (int)(clock)) +#endif + +#if !defined(MAKE_THREAD_CPUCLOCK) +#define MAKE_THREAD_CPUCLOCK(tid, clock) \ + ((int)(~(unsigned)(tid) << 3) | (int)((clock) | CPUCLOCK_PERTHREAD_MASK)) +#endif + #if !defined(CLOCKFD) #define CLOCKFD 3 #endif diff -Naur a/src/3rdparty/chromium/sandbox/linux/system_headers/mips64_linux_syscalls.h b/src/3rdparty/chromium/sandbox/linux/system_headers/mips64_linux_syscalls.h --- a/src/3rdparty/chromium/sandbox/linux/system_headers/mips64_linux_syscalls.h 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/sandbox/linux/system_headers/mips64_linux_syscalls.h 2021-11-20 03:42:53.890325579 +0000 @@ -1271,4 +1271,148 @@ #define __NR_memfd_create (__NR_Linux + 314) #endif +#if !defined(__NR_bpf) +#define __NR_bpf (__NR_Linux + 315) +#endif + +#if !defined(__NR_execveat) +#define __NR_execveat (__NR_Linux + 316) +#endif + +#if !defined(__NR_userfaultfd) +#define __NR_userfaultfd (__NR_Linux + 317) +#endif + +#if !defined(__NR_membarrier) +#define __NR_membarrier (__NR_Linux + 318) +#endif + +#if !defined(__NR_mlock2) +#define __NR_mlock2 (__NR_Linux + 319) +#endif + +#if !defined(__NR_copy_file_range) +#define __NR_copy_file_range (__NR_Linux + 320) +#endif + +#if !defined(__NR_preadv2) +#define __NR_preadv2 (__NR_Linux + 321) +#endif + +#if !defined(__NR_pwritev2) +#define __NR_pwritev2 (__NR_Linux + 322) +#endif + +#if !defined(__NR_pkey_mprotect) +#define __NR_pkey_mprotect (__NR_Linux + 323) +#endif + +#if !defined(__NR_pkey_alloc) +#define __NR_pkey_alloc (__NR_Linux + 324) +#endif + +#if !defined(__NR_pkey_free) +#define __NR_pkey_free (__NR_Linux + 325) +#endif + +#if !defined(__NR_statx) +#define __NR_statx (__NR_Linux + 326) +#endif + +#if !defined(__NR_rseq) +#define __NR_rseq (__NR_Linux + 327) +#endif + +#if !defined(__NR_io_pgetevents) +#define __NR_io_pgetevents (__NR_Linux + 328) +#endif + +#if !defined(__NR_pidfd_send_signal) +#define __NR_pidfd_send_signal (__NR_Linux + 424) +#endif + +#if !defined(__NR_io_uring_setup) +#define __NR_io_uring_setup (__NR_Linux + 425) +#endif + +#if !defined(__NR_io_uring_enter) +#define __NR_io_uring_enter (__NR_Linux + 426) +#endif + +#if !defined(__NR_io_uring_register) +#define __NR_io_uring_register (__NR_Linux + 427) +#endif + +#if !defined(__NR_open_tree) +#define __NR_open_tree (__NR_Linux + 428) +#endif + +#if !defined(__NR_move_mount) +#define __NR_move_mount (__NR_Linux + 429) +#endif + +#if !defined(__NR_fsopen) +#define __NR_fsopen (__NR_Linux + 430) +#endif + +#if !defined(__NR_fsconfig) +#define __NR_fsconfig (__NR_Linux + 431) +#endif + +#if !defined(__NR_fsmount) +#define __NR_fsmount (__NR_Linux + 432) +#endif + +#if !defined(__NR_fspick) +#define __NR_fspick (__NR_Linux + 433) +#endif + +#if !defined(__NR_pidfd_open) +#define __NR_pidfd_open (__NR_Linux + 434) +#endif + +#if !defined(__NR_clone3) +#define __NR_clone3 (__NR_Linux + 435) +#endif + +#if !defined(__NR_close_range) +#define __NR_close_range (__NR_Linux + 436) +#endif + +#if !defined(__NR_openat2) +#define __NR_openat2 (__NR_Linux + 437) +#endif + +#if !defined(__NR_pidfd_getfd) +#define __NR_pidfd_getfd (__NR_Linux + 438) +#endif + +#if !defined(__NR_faccessat2) +#define __NR_faccessat2 (__NR_Linux + 439) +#endif + +#if !defined(__NR_process_madvise) +#define __NR_process_madvise (__NR_Linux + 440) +#endif + +#if !defined(__NR_epoll_pwait2) +#define __NR_epoll_pwait2 (__NR_Linux + 441) +#endif + +#if !defined(__NR_mount_setattr) +#define __NR_mount_setattr (__NR_Linux + 442) +#endif + +#if !defined(__NR_landlock_create_ruleset) +#define __NR_landlock_create_ruleset (__NR_Linux + 444) +#endif + +#if !defined(__NR_landlock_add_rule) +#define __NR_landlock_add_rule (__NR_Linux + 445) +#endif + +#if !defined(__NR_landlock_restrict_self) +#define __NR_landlock_restrict_self (__NR_Linux + 446) +#endif + #endif // SANDBOX_LINUX_SYSTEM_HEADERS_MIPS64_LINUX_SYSCALLS_H_ diff -Naur a/src/3rdparty/chromium/sandbox/linux/system_headers/mips_linux_syscalls.h b/src/3rdparty/chromium/sandbox/linux/system_headers/mips_linux_syscalls.h --- a/src/3rdparty/chromium/sandbox/linux/system_headers/mips_linux_syscalls.h 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/sandbox/linux/system_headers/mips_linux_syscalls.h 2021-11-20 03:43:21.953881146 +0000 @@ -1433,4 +1433,268 @@ #define __NR_memfd_create (__NR_Linux + 354) #endif +#if !defined(__NR_landlock_create_ruleset) +#define __NR_landlock_create_ruleset (__NR_Linux + 444) +#endif + +#if !defined(__NR_landlock_add_rule) +#define __NR_landlock_add_rule (__NR_Linux + 445) +#endif + +#if !defined(__NR_landlock_restrict_self) +#define __NR_landlock_restrict_self (__NR_Linux + 446) +#endif + +#if !defined(__NR_bpf) +#define __NR_bpf (__NR_Linux + 355) +#endif + +#if !defined(__NR_execveat) +#define __NR_execveat (__NR_Linux + 356) +#endif + +#if !defined(__NR_userfaultfd) +#define __NR_userfaultfd (__NR_Linux + 357) +#endif + +#if !defined(__NR_membarrier) +#define __NR_membarrier (__NR_Linux + 358) +#endif + +#if !defined(__NR_mlock2) +#define __NR_mlock2 (__NR_Linux + 359) +#endif + +#if !defined(__NR_copy_file_range) +#define __NR_copy_file_range (__NR_Linux + 360) +#endif + +#if !defined(__NR_preadv2) +#define __NR_preadv2 (__NR_Linux + 361) +#endif + +#if !defined(__NR_pwritev2) +#define __NR_pwritev2 (__NR_Linux + 362) +#endif + +#if !defined(__NR_pkey_mprotect) +#define __NR_pkey_mprotect (__NR_Linux + 363) +#endif + +#if !defined(__NR_pkey_alloc) +#define __NR_pkey_alloc (__NR_Linux + 364) +#endif + +#if !defined(__NR_pkey_free) +#define __NR_pkey_free (__NR_Linux + 365) +#endif + +#if !defined(__NR_statx) +#define __NR_statx (__NR_Linux + 366) +#endif + +#if !defined(__NR_rseq) +#define __NR_rseq (__NR_Linux + 367) +#endif + +#if !defined(__NR_io_pgetevents) +#define __NR_io_pgetevents (__NR_Linux + 368) +#endif + +#if !defined(__NR_semget) +#define __NR_semget (__NR_Linux + 393) +#endif + +#if !defined(__NR_semctl) +#define __NR_semctl (__NR_Linux + 394) +#endif + +#if !defined(__NR_shmget) +#define __NR_shmget (__NR_Linux + 395) +#endif + +#if !defined(__NR_shmctl) +#define __NR_shmctl (__NR_Linux + 396) +#endif + +#if !defined(__NR_shmat) +#define __NR_shmat (__NR_Linux + 397) +#endif + +#if !defined(__NR_shmdt) +#define __NR_shmdt (__NR_Linux + 398) +#endif + +#if !defined(__NR_msgget) +#define __NR_msgget (__NR_Linux + 399) +#endif + +#if !defined(__NR_msgsnd) +#define __NR_msgsnd (__NR_Linux + 400) +#endif + +#if !defined(__NR_msgrcv) +#define __NR_msgrcv (__NR_Linux + 401) +#endif + +#if !defined(__NR_msgctl) +#define __NR_msgctl (__NR_Linux + 402) +#endif + +#if !defined(__NR_clock_gettime64) +#define __NR_clock_gettime64 (__NR_Linux + 403) +#endif + +#if !defined(__NR_clock_settime64) +#define __NR_clock_settime64 (__NR_Linux + 404) +#endif + +#if !defined(__NR_clock_adjtime64) +#define __NR_clock_adjtime64 (__NR_Linux + 405) +#endif + +#if !defined(__NR_clock_getres_time64) +#define __NR_clock_getres_time64 (__NR_Linux + 406) +#endif + +#if !defined(__NR_clock_nanosleep_time64) +#define __NR_clock_nanosleep_time64 (__NR_Linux + 407) +#endif + +#if !defined(__NR_timer_gettime64) +#define __NR_timer_gettime64 (__NR_Linux + 408) +#endif + +#if !defined(__NR_timer_settime64) +#define __NR_timer_settime64 (__NR_Linux + 409) +#endif + +#if !defined(__NR_timerfd_gettime64) +#define __NR_timerfd_gettime64 (__NR_Linux + 410) +#endif + +#if !defined(__NR_timerfd_settime64) +#define __NR_timerfd_settime64 (__NR_Linux + 411) +#endif + +#if !defined(__NR_utimensat_time64) +#define __NR_utimensat_time64 (__NR_Linux + 412) +#endif + +#if !defined(__NR_pselect6_time64) +#define __NR_pselect6_time64 (__NR_Linux + 413) +#endif + +#if !defined(__NR_ppoll_time64) +#define __NR_ppoll_time64 (__NR_Linux + 414) +#endif + +#if !defined(__NR_io_pgetevents_time64) +#define __NR_io_pgetevents_time64 (__NR_Linux + 416) +#endif + +#if !defined(__NR_recvmmsg_time64) +#define __NR_recvmmsg_time64 (__NR_Linux + 417) +#endif + +#if !defined(__NR_mq_timedsend_time64) +#define __NR_mq_timedsend_time64 (__NR_Linux + 418) +#endif + +#if !defined(__NR_mq_timedreceive_time64) +#define __NR_mq_timedreceive_time64 (__NR_Linux + 419) +#endif + +#if !defined(__NR_semtimedop_time64) +#define __NR_semtimedop_time64 (__NR_Linux + 420) +#endif + +#if !defined(__NR_rt_sigtimedwait_time64) +#define __NR_rt_sigtimedwait_time64 (__NR_Linux + 421) +#endif + +#if !defined(__NR_futex_time64) +#define __NR_futex_time64 (__NR_Linux + 422) +#endif + +#if !defined(__NR_sched_rr_get_interval_time64) +#define __NR_sched_rr_get_interval_time64 (__NR_Linux + 423) +#endif + +#if !defined(__NR_pidfd_send_signal) +#define __NR_pidfd_send_signal (__NR_Linux + 424) +#endif + +#if !defined(__NR_io_uring_setup) +#define __NR_io_uring_setup (__NR_Linux + 425) +#endif + +#if !defined(__NR_io_uring_enter) +#define __NR_io_uring_enter (__NR_Linux + 426) +#endif + +#if !defined(__NR_io_uring_register) +#define __NR_io_uring_register (__NR_Linux + 427) +#endif + +#if !defined(__NR_open_tree) +#define __NR_open_tree (__NR_Linux + 428) +#endif + +#if !defined(__NR_move_mount) +#define __NR_move_mount (__NR_Linux + 429) +#endif + +#if !defined(__NR_fsopen) +#define __NR_fsopen (__NR_Linux + 430) +#endif + +#if !defined(__NR_fsconfig) +#define __NR_fsconfig (__NR_Linux + 431) +#endif + +#if !defined(__NR_fsmount) +#define __NR_fsmount (__NR_Linux + 432) +#endif + +#if !defined(__NR_fspick) +#define __NR_fspick (__NR_Linux + 433) +#endif + +#if !defined(__NR_pidfd_open) +#define __NR_pidfd_open (__NR_Linux + 434) +#endif + +#if !defined(__NR_clone3) +#define __NR_clone3 (__NR_Linux + 435) +#endif + +#if !defined(__NR_close_range) +#define __NR_close_range (__NR_Linux + 436) +#endif + +#if !defined(__NR_openat2) +#define __NR_openat2 (__NR_Linux + 437) +#endif + +#if !defined(__NR_pidfd_getfd) +#define __NR_pidfd_getfd (__NR_Linux + 438) +#endif + +#if !defined(__NR_faccessat2) +#define __NR_faccessat2 (__NR_Linux + 439) +#endif + +#if !defined(__NR_process_madvise) +#define __NR_process_madvise (__NR_Linux + 440) +#endif + +#if !defined(__NR_epoll_pwait2) +#define __NR_epoll_pwait2 (__NR_Linux + 441) +#endif + +#if !defined(__NR_mount_setattr) +#define __NR_mount_setattr (__NR_Linux + 442) +#endif + #endif // SANDBOX_LINUX_SYSTEM_HEADERS_MIPS_LINUX_SYSCALLS_H_ diff -Naur a/src/3rdparty/chromium/sandbox/linux/system_headers/x86_32_linux_syscalls.h b/src/3rdparty/chromium/sandbox/linux/system_headers/x86_32_linux_syscalls.h --- a/src/3rdparty/chromium/sandbox/linux/system_headers/x86_32_linux_syscalls.h 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/sandbox/linux/system_headers/x86_32_linux_syscalls.h 2021-11-20 03:43:21.953881146 +0000 @@ -1710,5 +1710,45 @@ #define __NR_clone3 435 #endif +#if !defined(__NR_landlock_create_ruleset) +#define __NR_landlock_create_ruleset 444 +#endif + +#if !defined(__NR_landlock_add_rule) +#define __NR_landlock_add_rule 445 +#endif + +#if !defined(__NR_landlock_restrict_self) +#define __NR_landlock_restrict_self 446 +#endif + +#if !defined(__NR_close_range) +#define __NR_close_range 436 +#endif + +#if !defined(__NR_openat2) +#define __NR_openat2 437 +#endif + +#if !defined(__NR_pidfd_getfd) +#define __NR_pidfd_getfd 438 +#endif + +#if !defined(__NR_faccessat2) +#define __NR_faccessat2 439 +#endif + +#if !defined(__NR_process_madvise) +#define __NR_process_madvise 440 +#endif + +#if !defined(__NR_epoll_pwait2) +#define __NR_epoll_pwait2 441 +#endif + +#if !defined(__NR_mount_setattr) +#define __NR_mount_setattr 442 +#endif + #endif // SANDBOX_LINUX_SYSTEM_HEADERS_X86_32_LINUX_SYSCALLS_H_ diff -Naur a/src/3rdparty/chromium/sandbox/linux/system_headers/x86_64_linux_syscalls.h b/src/3rdparty/chromium/sandbox/linux/system_headers/x86_64_linux_syscalls.h --- a/src/3rdparty/chromium/sandbox/linux/system_headers/x86_64_linux_syscalls.h 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/sandbox/linux/system_headers/x86_64_linux_syscalls.h 2021-11-20 03:42:53.890325579 +0000 @@ -1350,5 +1350,93 @@ #define __NR_rseq 334 #endif +#if !defined(__NR_pidfd_send_signal) +#define __NR_pidfd_send_signal 424 +#endif + +#if !defined(__NR_io_uring_setup) +#define __NR_io_uring_setup 425 +#endif + +#if !defined(__NR_io_uring_enter) +#define __NR_io_uring_enter 426 +#endif + +#if !defined(__NR_io_uring_register) +#define __NR_io_uring_register 427 +#endif + +#if !defined(__NR_open_tree) +#define __NR_open_tree 428 +#endif + +#if !defined(__NR_move_mount) +#define __NR_move_mount 429 +#endif + +#if !defined(__NR_fsopen) +#define __NR_fsopen 430 +#endif + +#if !defined(__NR_fsconfig) +#define __NR_fsconfig 431 +#endif + +#if !defined(__NR_fsmount) +#define __NR_fsmount 432 +#endif + +#if !defined(__NR_fspick) +#define __NR_fspick 433 +#endif + +#if !defined(__NR_pidfd_open) +#define __NR_pidfd_open 434 +#endif + +#if !defined(__NR_clone3) +#define __NR_clone3 435 +#endif + +#if !defined(__NR_close_range) +#define __NR_close_range 436 +#endif + +#if !defined(__NR_openat2) +#define __NR_openat2 437 +#endif + +#if !defined(__NR_pidfd_getfd) +#define __NR_pidfd_getfd 438 +#endif + +#if !defined(__NR_faccessat2) +#define __NR_faccessat2 439 +#endif + +#if !defined(__NR_process_madvise) +#define __NR_process_madvise 440 +#endif + +#if !defined(__NR_epoll_pwait2) +#define __NR_epoll_pwait2 441 +#endif + +#if !defined(__NR_mount_setattr) +#define __NR_mount_setattr 442 +#endif + +#if !defined(__NR_landlock_create_ruleset) +#define __NR_landlock_create_ruleset 444 +#endif + +#if !defined(__NR_landlock_add_rule) +#define __NR_landlock_add_rule 445 +#endif + +#if !defined(__NR_landlock_restrict_self) +#define __NR_landlock_restrict_self 446 +#endif + #endif // SANDBOX_LINUX_SYSTEM_HEADERS_X86_64_LINUX_SYSCALLS_H_ diff -Naur a/src/3rdparty/chromium/sandbox/policy/linux/bpf_broker_policy_linux.cc b/src/3rdparty/chromium/sandbox/policy/linux/bpf_broker_policy_linux.cc --- a/src/3rdparty/chromium/sandbox/policy/linux/bpf_broker_policy_linux.cc 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/sandbox/policy/linux/bpf_broker_policy_linux.cc 2021-11-20 03:37:57.060043123 +0000 @@ -93,8 +93,8 @@ return Allow(); break; #endif -#if defined(__NR_fstatat) - case __NR_fstatat: +#if defined(__NR_fstatat64) + case __NR_fstatat64: if (allowed_command_set_.test(syscall_broker::COMMAND_STAT)) return Allow(); break; diff -Naur a/src/3rdparty/chromium/sandbox/win/src/sandbox.h b/src/3rdparty/chromium/sandbox/win/src/sandbox.h --- a/src/3rdparty/chromium/sandbox/win/src/sandbox.h 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/sandbox/win/src/sandbox.h 2021-11-20 03:40:49.354306033 +0000 @@ -140,7 +140,7 @@ // } // // For more information see the BrokerServices API documentation. -class TargetServices { +class [[clang::lto_visibility_public]] TargetServices { public: // Initializes the target. Must call this function before any other. // returns ALL_OK if successful. All other return values imply failure. diff -Naur a/src/3rdparty/chromium/third_party/angle/src/compiler/translator/blocklayout.cpp b/src/3rdparty/chromium/third_party/angle/src/compiler/translator/blocklayout.cpp --- a/src/3rdparty/chromium/third_party/angle/src/compiler/translator/blocklayout.cpp 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/angle/src/compiler/translator/blocklayout.cpp 2021-11-20 03:41:11.177956422 +0000 @@ -198,6 +198,13 @@ return memberInfo; } +size_t BlockLayoutEncoder::getCurrentOffset() const +{ + angle::base::CheckedNumeric checkedOffset(mCurrentOffset); + checkedOffset *= kBytesPerComponent; + return checkedOffset.ValueOrDefault(std::numeric_limits::max()); +} + size_t BlockLayoutEncoder::getShaderVariableSize(const ShaderVariable &structVar, bool isRowMajor) { size_t currentOffset = mCurrentOffset; @@ -225,7 +232,13 @@ void BlockLayoutEncoder::align(size_t baseAlignment) { - mCurrentOffset = rx::roundUp(mCurrentOffset, baseAlignment); + angle::base::CheckedNumeric checkedOffset(mCurrentOffset); + checkedOffset += baseAlignment; + checkedOffset -= 1; + angle::base::CheckedNumeric checkedAlignmentOffset = checkedOffset; + checkedAlignmentOffset %= baseAlignment; + checkedOffset -= checkedAlignmentOffset.ValueOrDefault(std::numeric_limits::max()); + mCurrentOffset = checkedOffset.ValueOrDefault(std::numeric_limits::max()); } // DummyBlockEncoder implementation. @@ -288,7 +301,7 @@ baseAlignment = ComponentAlignment(numComponents); } - mCurrentOffset = rx::roundUp(mCurrentOffset, baseAlignment); + align(baseAlignment); *matrixStrideOut = matrixStride; *arrayStrideOut = arrayStride; @@ -302,16 +315,23 @@ { if (!arraySizes.empty()) { - mCurrentOffset += arrayStride * gl::ArraySizeProduct(arraySizes); + angle::base::CheckedNumeric checkedOffset(arrayStride); + checkedOffset *= gl::ArraySizeProduct(arraySizes); + checkedOffset += mCurrentOffset; + mCurrentOffset = checkedOffset.ValueOrDefault(std::numeric_limits::max()); } else if (gl::IsMatrixType(type)) { - const int numRegisters = gl::MatrixRegisterCount(type, isRowMajorMatrix); - mCurrentOffset += matrixStride * numRegisters; + angle::base::CheckedNumeric checkedOffset(matrixStride); + checkedOffset *= gl::MatrixRegisterCount(type, isRowMajorMatrix); + checkedOffset += mCurrentOffset; + mCurrentOffset = checkedOffset.ValueOrDefault(std::numeric_limits::max()); } else { - mCurrentOffset += gl::VariableComponentCount(type); + angle::base::CheckedNumeric checkedOffset(mCurrentOffset); + checkedOffset += gl::VariableComponentCount(type); + mCurrentOffset = checkedOffset.ValueOrDefault(std::numeric_limits::max()); } } diff -Naur a/src/3rdparty/chromium/third_party/angle/src/compiler/translator/blocklayout.h b/src/3rdparty/chromium/third_party/angle/src/compiler/translator/blocklayout.h --- a/src/3rdparty/chromium/third_party/angle/src/compiler/translator/blocklayout.h 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/angle/src/compiler/translator/blocklayout.h 2021-11-20 03:41:11.177956422 +0000 @@ -80,7 +80,7 @@ const std::vector &arraySizes, bool isRowMajorMatrix); - size_t getCurrentOffset() const { return mCurrentOffset * kBytesPerComponent; } + size_t getCurrentOffset() const; size_t getShaderVariableSize(const ShaderVariable &structVar, bool isRowMajor); // Called when entering/exiting a structure variable. diff -Naur a/src/3rdparty/chromium/third_party/angle/src/libANGLE/renderer/d3d/d3d11/renderer11_utils.cpp b/src/3rdparty/chromium/third_party/angle/src/libANGLE/renderer/d3d/d3d11/renderer11_utils.cpp --- a/src/3rdparty/chromium/third_party/angle/src/libANGLE/renderer/d3d/d3d11/renderer11_utils.cpp 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/angle/src/libANGLE/renderer/d3d/d3d11/renderer11_utils.cpp 2021-11-20 03:36:08.810720286 +0000 @@ -2179,28 +2179,35 @@ const d3d11::DXGIFormatSize &dxgiFormatInfo = d3d11::GetDXGIFormatSizeInfo(d3dFormatInfo.texFormat); - unsigned int rowPitch = dxgiFormatInfo.pixelBytes * width; - unsigned int depthPitch = rowPitch * height; - unsigned int maxImageSize = depthPitch * depth; + using CheckedSize = angle::CheckedNumeric; + CheckedSize rowPitch = CheckedSize(dxgiFormatInfo.pixelBytes) * CheckedSize(width); + CheckedSize depthPitch = rowPitch * CheckedSize(height); + CheckedSize maxImageSize = depthPitch * CheckedSize(depth); + + Context11 *context11 = GetImplAs(context); + ANGLE_CHECK_GL_ALLOC(context11, maxImageSize.IsValid()); angle::MemoryBuffer *scratchBuffer = nullptr; - ANGLE_CHECK_GL_ALLOC(GetImplAs(context), - context->getScratchBuffer(maxImageSize, &scratchBuffer)); + ANGLE_CHECK_GL_ALLOC(context11, + context->getScratchBuffer(maxImageSize.ValueOrDie(), &scratchBuffer)); - d3dFormatInfo.dataInitializerFunction(width, height, depth, scratchBuffer->data(), rowPitch, - depthPitch); + d3dFormatInfo.dataInitializerFunction(width, height, depth, scratchBuffer->data(), + rowPitch.ValueOrDie(), depthPitch.ValueOrDie()); for (unsigned int i = 0; i < mipLevels; i++) { unsigned int mipWidth = std::max(width >> i, 1U); unsigned int mipHeight = std::max(height >> i, 1U); - unsigned int mipRowPitch = dxgiFormatInfo.pixelBytes * mipWidth; - unsigned int mipDepthPitch = mipRowPitch * mipHeight; + using CheckedUINT = angle::CheckedNumeric; + CheckedUINT mipRowPitch = CheckedUINT(dxgiFormatInfo.pixelBytes) * CheckedUINT(mipWidth); + CheckedUINT mipDepthPitch = mipRowPitch * CheckedUINT(mipHeight); + + ANGLE_CHECK_GL_ALLOC(context11, mipRowPitch.IsValid() && mipDepthPitch.IsValid()); outSubresourceData->at(i).pSysMem = scratchBuffer->data(); - outSubresourceData->at(i).SysMemPitch = mipRowPitch; - outSubresourceData->at(i).SysMemSlicePitch = mipDepthPitch; + outSubresourceData->at(i).SysMemPitch = mipRowPitch.ValueOrDie(); + outSubresourceData->at(i).SysMemSlicePitch = mipDepthPitch.ValueOrDie(); } return angle::Result::Continue; diff -Naur a/src/3rdparty/chromium/third_party/angle/src/libANGLE/validationES.cpp b/src/3rdparty/chromium/third_party/angle/src/libANGLE/validationES.cpp --- a/src/3rdparty/chromium/third_party/angle/src/libANGLE/validationES.cpp 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/angle/src/libANGLE/validationES.cpp 2021-11-20 03:36:35.883300836 +0000 @@ -3044,6 +3044,12 @@ { return kVertexBufferBoundForTransformFeedback; } + + // Validate that we are rendering with a linked program. + if (!program->isLinked()) + { + return kProgramNotLinked; + } } } diff -Naur a/src/3rdparty/chromium/third_party/blink/renderer/core/display_lock/display_lock_utilities.cc b/src/3rdparty/chromium/third_party/blink/renderer/core/display_lock/display_lock_utilities.cc --- a/src/3rdparty/chromium/third_party/blink/renderer/core/display_lock/display_lock_utilities.cc 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/blink/renderer/core/display_lock/display_lock_utilities.cc 2021-11-20 03:35:56.328913673 +0000 @@ -174,6 +174,9 @@ if (!RuntimeEnabledFeatures::CSSContentVisibilityEnabled()) return; + if (!node_) + return; + auto* owner_node = GetFrameOwnerNode(node); if (owner_node) parent_frame_impl_ = MakeGarbageCollected(owner_node, true); @@ -217,6 +220,8 @@ } void DisplayLockUtilities::ScopedForcedUpdate::Impl::Destroy() { + if (!node_) + return; if (RuntimeEnabledFeatures::CSSContentVisibilityEnabled()) node_->GetDocument().GetDisplayLockDocumentState().EndNodeForcedScope(this); if (parent_frame_impl_) diff -Naur a/src/3rdparty/chromium/third_party/blink/renderer/core/display_lock/display_lock_utilities.h b/src/3rdparty/chromium/third_party/blink/renderer/core/display_lock/display_lock_utilities.h --- a/src/3rdparty/chromium/third_party/blink/renderer/core/display_lock/display_lock_utilities.h 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/blink/renderer/core/display_lock/display_lock_utilities.h 2021-11-20 03:35:56.328913673 +0000 @@ -8,6 +8,7 @@ #include "third_party/blink/renderer/core/core_export.h" #include "third_party/blink/renderer/core/display_lock/display_lock_context.h" #include "third_party/blink/renderer/core/editing/ephemeral_range.h" +#include "third_party/blink/renderer/core/editing/frame_selection.h" #include "third_party/blink/renderer/platform/wtf/allocator/allocator.h" namespace blink { @@ -51,6 +52,8 @@ friend void Document::EnsurePaintLocationDataValidForNode( const Node* node, DocumentUpdateReason reason); + friend VisibleSelection + FrameSelection::ComputeVisibleSelectionInDOMTreeDeprecated() const; friend class DisplayLockContext; diff -Naur a/src/3rdparty/chromium/third_party/blink/renderer/core/editing/frame_selection.cc b/src/3rdparty/chromium/third_party/blink/renderer/core/editing/frame_selection.cc --- a/src/3rdparty/chromium/third_party/blink/renderer/core/editing/frame_selection.cc 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/blink/renderer/core/editing/frame_selection.cc 2021-11-20 03:35:56.328913673 +0000 @@ -158,6 +158,10 @@ const { // TODO(editing-dev): Hoist UpdateStyleAndLayout // to caller. See http://crbug.com/590369 for more details. + DisplayLockUtilities::ScopedForcedUpdate base_scope( + GetSelectionInDOMTree().Base().AnchorNode()); + DisplayLockUtilities::ScopedForcedUpdate extent_scope( + GetSelectionInDOMTree().Extent().AnchorNode()); GetDocument().UpdateStyleAndLayout(DocumentUpdateReason::kSelection); return ComputeVisibleSelectionInDOMTree(); } diff -Naur a/src/3rdparty/chromium/third_party/blink/renderer/core/layout/layout_inline.cc b/src/3rdparty/chromium/third_party/blink/renderer/core/layout/layout_inline.cc --- a/src/3rdparty/chromium/third_party/blink/renderer/core/layout/layout_inline.cc 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/blink/renderer/core/layout/layout_inline.cc 2021-11-20 03:36:21.218528045 +0000 @@ -609,15 +609,13 @@ // nest to a much greater depth (see bugzilla bug 13430) but for now we have a // limit. This *will* result in incorrect rendering, but the alternative is to // hang forever. - const unsigned kCMaxSplitDepth = 200; Vector inlines_to_clone; LayoutInline* top_most_inline = this; for (LayoutObject* o = this; o != from_block; o = o->Parent()) { if (o->IsLayoutNGInsideListMarker()) continue; top_most_inline = ToLayoutInline(o); - if (inlines_to_clone.size() < kCMaxSplitDepth) - inlines_to_clone.push_back(top_most_inline); + inlines_to_clone.push_back(top_most_inline); // Keep walking up the chain to ensure |topMostInline| is a child of // |fromBlock|, to avoid assertion failure when |fromBlock|'s children are // moved to |toBlock| below. diff -Naur a/src/3rdparty/chromium/third_party/blink/renderer/modules/peerconnection/rtc_encoded_video_underlying_sink.cc b/src/3rdparty/chromium/third_party/blink/renderer/modules/peerconnection/rtc_encoded_video_underlying_sink.cc --- a/src/3rdparty/chromium/third_party/blink/renderer/modules/peerconnection/rtc_encoded_video_underlying_sink.cc 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/blink/renderer/modules/peerconnection/rtc_encoded_video_underlying_sink.cc 2021-11-20 03:40:38.281483418 +0000 @@ -14,8 +14,10 @@ RTCEncodedVideoUnderlyingSink::RTCEncodedVideoUnderlyingSink( ScriptState* script_state, - TransformerCallback transformer_callback) - : transformer_callback_(std::move(transformer_callback)) { + TransformerCallback transformer_callback, + webrtc::TransformableFrameInterface::Direction expected_direction) + : transformer_callback_(std::move(transformer_callback)), + expected_direction_(expected_direction) { DCHECK(transformer_callback_); } @@ -53,6 +55,12 @@ return ScriptPromise(); } + if (webrtc_frame->GetDirection() != expected_direction_) { + exception_state.ThrowDOMException(DOMExceptionCode::kOperationError, + "Invalid frame"); + return ScriptPromise(); + } + RTCEncodedVideoStreamTransformer* transformer = transformer_callback_.Run(); if (!transformer) { exception_state.ThrowDOMException(DOMExceptionCode::kInvalidStateError, diff -Naur a/src/3rdparty/chromium/third_party/blink/renderer/modules/peerconnection/rtc_encoded_video_underlying_sink.h b/src/3rdparty/chromium/third_party/blink/renderer/modules/peerconnection/rtc_encoded_video_underlying_sink.h --- a/src/3rdparty/chromium/third_party/blink/renderer/modules/peerconnection/rtc_encoded_video_underlying_sink.h 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/blink/renderer/modules/peerconnection/rtc_encoded_video_underlying_sink.h 2021-11-20 03:40:38.281483418 +0000 @@ -7,6 +7,7 @@ #include "third_party/blink/renderer/core/streams/underlying_sink_base.h" #include "third_party/blink/renderer/modules/modules_export.h" +#include "third_party/webrtc/api/frame_transformer_interface.h" namespace blink { @@ -18,7 +19,9 @@ public: using TransformerCallback = base::RepeatingCallback; - RTCEncodedVideoUnderlyingSink(ScriptState*, TransformerCallback); + RTCEncodedVideoUnderlyingSink(ScriptState*, + TransformerCallback, + webrtc::TransformableFrameInterface::Direction); // UnderlyingSinkBase ScriptPromise start(ScriptState*, @@ -37,6 +40,7 @@ private: TransformerCallback transformer_callback_; + webrtc::TransformableFrameInterface::Direction expected_direction_; }; } // namespace blink diff -Naur a/src/3rdparty/chromium/third_party/blink/renderer/modules/peerconnection/rtc_encoded_video_underlying_sink_test.cc b/src/3rdparty/chromium/third_party/blink/renderer/modules/peerconnection/rtc_encoded_video_underlying_sink_test.cc --- a/src/3rdparty/chromium/third_party/blink/renderer/modules/peerconnection/rtc_encoded_video_underlying_sink_test.cc 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/blink/renderer/modules/peerconnection/rtc_encoded_video_underlying_sink_test.cc 2021-11-20 03:40:38.281483418 +0000 @@ -74,11 +74,15 @@ EXPECT_FALSE(transformer_.HasTransformedFrameSinkCallback(kSSRC)); } - RTCEncodedVideoUnderlyingSink* CreateSink(ScriptState* script_state) { + RTCEncodedVideoUnderlyingSink* CreateSink( + ScriptState* script_state, + webrtc::TransformableFrameInterface::Direction expected_direction = + webrtc::TransformableFrameInterface::Direction::kSender) { return MakeGarbageCollected( script_state, WTF::BindRepeating(&RTCEncodedVideoUnderlyingSinkTest::GetTransformer, - WTF::Unretained(this))); + WTF::Unretained(this)), + expected_direction); } RTCEncodedVideoUnderlyingSink* CreateNullCallbackSink( @@ -86,15 +90,21 @@ return MakeGarbageCollected( script_state, WTF::BindRepeating( - []() -> RTCEncodedVideoStreamTransformer* { return nullptr; })); + []() -> RTCEncodedVideoStreamTransformer* { return nullptr; }), + webrtc::TransformableFrameInterface::Direction::kSender); } RTCEncodedVideoStreamTransformer* GetTransformer() { return &transformer_; } - ScriptValue CreateEncodedVideoFrameChunk(ScriptState* script_state) { + ScriptValue CreateEncodedVideoFrameChunk( + ScriptState* script_state, + webrtc::TransformableFrameInterface::Direction direction = + webrtc::TransformableFrameInterface::Direction::kSender) { auto mock_frame = std::make_unique>(); + ON_CALL(*mock_frame.get(), GetSsrc).WillByDefault(Return(kSSRC)); + ON_CALL(*mock_frame.get(), GetDirection).WillByDefault(Return(direction)); RTCEncodedVideoFrame* frame = MakeGarbageCollected(std::move(mock_frame)); return ScriptValue(script_state->GetIsolate(), @@ -175,4 +185,21 @@ DOMExceptionCode::kInvalidStateError)); } +TEST_F(RTCEncodedVideoUnderlyingSinkTest, WriteInvalidDirectionFails) { + V8TestingScope v8_scope; + ScriptState* script_state = v8_scope.GetScriptState(); + auto* sink = CreateSink( + script_state, webrtc::TransformableFrameInterface::Direction::kSender); + + // Write an encoded chunk with direction set to Receiver should fail as it + // doesn't match the expected direction of our sink. + DummyExceptionStateForTesting dummy_exception_state; + sink->write(script_state, + CreateEncodedVideoFrameChunk( + script_state, + webrtc::TransformableFrameInterface::Direction::kReceiver), + nullptr, dummy_exception_state); + EXPECT_TRUE(dummy_exception_state.HadException()); +} + } // namespace blink diff -Naur a/src/3rdparty/chromium/third_party/blink/renderer/modules/peerconnection/rtc_rtp_receiver.cc b/src/3rdparty/chromium/third_party/blink/renderer/modules/peerconnection/rtc_rtp_receiver.cc --- a/src/3rdparty/chromium/third_party/blink/renderer/modules/peerconnection/rtc_rtp_receiver.cc 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/blink/renderer/modules/peerconnection/rtc_rtp_receiver.cc 2021-11-20 03:40:38.281483418 +0000 @@ -497,7 +497,8 @@ ->GetEncodedVideoStreamTransformer() : nullptr; }, - WrapWeakPersistent(this))); + WrapWeakPersistent(this)), + webrtc::TransformableFrameInterface::Direction::kReceiver); // The high water mark for the stream is set to 1 so that the stream seems // ready to write, but without queuing frames. WritableStream* writable_stream = diff -Naur a/src/3rdparty/chromium/third_party/blink/renderer/modules/peerconnection/rtc_rtp_sender.cc b/src/3rdparty/chromium/third_party/blink/renderer/modules/peerconnection/rtc_rtp_sender.cc --- a/src/3rdparty/chromium/third_party/blink/renderer/modules/peerconnection/rtc_rtp_sender.cc 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/blink/renderer/modules/peerconnection/rtc_rtp_sender.cc 2021-11-20 03:40:38.281483418 +0000 @@ -893,7 +893,8 @@ ->GetEncodedVideoStreamTransformer() : nullptr; }, - WrapWeakPersistent(this))); + WrapWeakPersistent(this)), + webrtc::TransformableFrameInterface::Direction::kSender); // The high water mark for the stream is set to 1 so that the stream is // ready to write, but without queuing frames. WritableStream* writable_stream = diff -Naur a/src/3rdparty/chromium/third_party/blink/renderer/modules/webcodecs/image_decoder_external.cc b/src/3rdparty/chromium/third_party/blink/renderer/modules/webcodecs/image_decoder_external.cc --- a/src/3rdparty/chromium/third_party/blink/renderer/modules/webcodecs/image_decoder_external.cc 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/blink/renderer/modules/webcodecs/image_decoder_external.cc 2021-11-20 03:39:50.458252783 +0000 @@ -60,7 +60,16 @@ ImageDecoderExternal::ImageDecoderExternal(ScriptState* script_state, const ImageDecoderInit* init, ExceptionState& exception_state) - : script_state_(script_state) { + : ExecutionContextLifecycleObserver(ExecutionContext::From(script_state)), + script_state_(script_state) { + // If the context is already destroyed we will never get an OnContextDestroyed + // callback, which is critical to invalidating any pending WeakPtr operations. + if (GetExecutionContext()->IsContextDestroyed()) { + exception_state.ThrowDOMException(DOMExceptionCode::kOperationError, + "Invalid context."); + return; + } + UseCounter::Count(ExecutionContext::From(script_state), WebFeature::kWebCodecs); @@ -261,6 +270,13 @@ visitor->Trace(init_data_); visitor->Trace(options_); ScriptWrappable::Trace(visitor); + ExecutionContextLifecycleObserver::Trace(visitor); +} + +void ImageDecoderExternal::ContextDestroyed() {} + +bool ImageDecoderExternal::HasPendingActivity() const { + return !pending_metadata_decodes_.IsEmpty() || !pending_decodes_.IsEmpty(); } void ImageDecoderExternal::CreateImageDecoder() { diff -Naur a/src/3rdparty/chromium/third_party/blink/renderer/modules/webcodecs/image_decoder_external.h b/src/3rdparty/chromium/third_party/blink/renderer/modules/webcodecs/image_decoder_external.h --- a/src/3rdparty/chromium/third_party/blink/renderer/modules/webcodecs/image_decoder_external.h 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/blink/renderer/modules/webcodecs/image_decoder_external.h 2021-11-20 03:39:40.097419609 +0000 @@ -7,7 +7,9 @@ #include +#include "third_party/blink/renderer/bindings/core/v8/active_script_wrappable.h" #include "third_party/blink/renderer/bindings/core/v8/script_promise.h" +#include "third_party/blink/renderer/core/execution_context/execution_context_lifecycle_observer.h" #include "third_party/blink/renderer/modules/modules_export.h" #include "third_party/blink/renderer/platform/bindings/script_wrappable.h" #include "third_party/blink/renderer/platform/heap/member.h" @@ -26,8 +28,11 @@ class ScriptPromiseResolver; class SegmentReader; -class MODULES_EXPORT ImageDecoderExternal final : public ScriptWrappable, - public BytesConsumer::Client { +class MODULES_EXPORT ImageDecoderExternal final + : public ScriptWrappable, + public ActiveScriptWrappable, + public BytesConsumer::Client, + public ExecutionContextLifecycleObserver { DEFINE_WRAPPERTYPEINFO(); public: @@ -59,6 +64,12 @@ // GarbageCollected override. void Trace(Visitor*) const override; + // ExecutionContextLifecycleObserver override. + void ContextDestroyed() override; + + // ScriptWrappable override. + bool HasPendingActivity() const override; + private: void CreateImageDecoder(); diff -Naur a/src/3rdparty/chromium/third_party/blink/renderer/modules/webcodecs/image_decoder.idl b/src/3rdparty/chromium/third_party/blink/renderer/modules/webcodecs/image_decoder.idl --- a/src/3rdparty/chromium/third_party/blink/renderer/modules/webcodecs/image_decoder.idl 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/blink/renderer/modules/webcodecs/image_decoder.idl 2021-11-20 03:39:40.097419609 +0000 @@ -8,7 +8,8 @@ [ Exposed=(Window,Worker), RuntimeEnabled=WebCodecs, - ImplementedAs=ImageDecoderExternal + ImplementedAs=ImageDecoderExternal, + ActiveScriptWrappable ] interface ImageDecoder { [CallWith=ScriptState, RaisesException, MeasureAs=WebCodecsImageDecoder] constructor(ImageDecoderInit init); diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/BUILD.gn b/src/3rdparty/chromium/third_party/libjpeg_turbo/BUILD.gn --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/BUILD.gn 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/BUILD.gn 2021-11-20 03:41:33.389600594 +0000 @@ -7,11 +7,9 @@ import("//build/config/sanitizers/sanitizers.gni") import("//build/config/features.gni") - if (current_cpu == "arm" || current_cpu == "arm64") { import("//build/config/arm.gni") } - if (!use_qt) { assert(!is_ios, "This is not used on iOS, don't drag it in unintentionally") } @@ -25,6 +23,7 @@ "jpeglib.h", "jpeglibmangler.h", ] + defines = [ "MANGLE_JPEG_NAMES" ] } if (current_cpu == "x86" || current_cpu == "x64") { @@ -140,63 +139,50 @@ static_library("simd") { include_dirs = [ "." ] - deps = [ - ":libjpeg_headers", - ] + deps = [ ":libjpeg_headers" ] if (current_cpu == "x86") { deps += [ ":simd_asm" ] - sources = [ - "simd/i386/jsimd.c", - ] + sources = [ "simd/i386/jsimd.c" ] } else if (current_cpu == "x64") { deps += [ ":simd_asm" ] + sources = [ "simd/x86_64/jsimd.c" ] + } else if ((current_cpu == "arm" || current_cpu == "arm64") && arm_use_neon) { + include_dirs += [ "simd/arm/" ] + sources = [ - "simd/x86_64/jsimd.c", - ] - } else if (current_cpu == "arm" && arm_version >= 7 && - (arm_use_neon || arm_optionally_use_neon)) { - sources = [ - "simd/arm/arm/jsimd.c", - "simd/arm/arm/jsimd_neon.S", - "simd/arm/common/jccolor-neon.c", - "simd/arm/common/jcgray-neon.c", - "simd/arm/common/jcsample-neon.c", - "simd/arm/common/jdcolor-neon.c", - "simd/arm/common/jdmerge-neon.c", - "simd/arm/common/jdsample-neon.c", - "simd/arm/common/jfdctfst-neon.c", - "simd/arm/common/jfdctint-neon.c", - "simd/arm/common/jidctfst-neon.c", - "simd/arm/common/jidctint-neon.c", - "simd/arm/common/jidctred-neon.c", - "simd/arm/common/jquanti-neon.c", - ] - configs -= [ "//build/config/compiler:default_optimization" ] - configs += [ "//build/config/compiler:optimize_speed" ] - } else if (current_cpu == "arm64" && arm_use_neon) { - sources = [ - "simd/arm/arm64/jsimd.c", - "simd/arm/arm64/jsimd_neon.S", - "simd/arm/common/jccolor-neon.c", - "simd/arm/common/jcgray-neon.c", - "simd/arm/common/jcsample-neon.c", - "simd/arm/common/jdcolor-neon.c", - "simd/arm/common/jdmerge-neon.c", - "simd/arm/common/jdsample-neon.c", - "simd/arm/common/jfdctfst-neon.c", - "simd/arm/common/jfdctint-neon.c", - "simd/arm/common/jidctfst-neon.c", - "simd/arm/common/jidctint-neon.c", - "simd/arm/common/jidctred-neon.c", - "simd/arm/common/jquanti-neon.c", + "simd/arm/jccolor-neon.c", + "simd/arm/jcgray-neon.c", + "simd/arm/jcphuff-neon.c", + "simd/arm/jcsample-neon.c", + "simd/arm/jdcolor-neon.c", + "simd/arm/jdmerge-neon.c", + "simd/arm/jdsample-neon.c", + "simd/arm/jfdctfst-neon.c", + "simd/arm/jfdctint-neon.c", + "simd/arm/jidctfst-neon.c", + "simd/arm/jidctint-neon.c", + "simd/arm/jidctred-neon.c", + "simd/arm/jquanti-neon.c", ] + if (current_cpu == "arm") { + sources += [ + "simd/arm/aarch32/jchuff-neon.c", + "simd/arm/aarch32/jsimd.c", + ] + } else if (current_cpu == "arm64") { + sources += [ + "simd/arm/aarch64/jchuff-neon.c", + "simd/arm/aarch64/jsimd.c", + ] + } + + defines = [ "NEON_INTRINSICS" ] + configs -= [ "//build/config/compiler:default_optimization" ] configs += [ "//build/config/compiler:optimize_speed" ] } else { - sources = [ - "jsimd_none.c", - ] + sources = [ "jsimd_none.c" ] } if (is_win) { @@ -216,7 +202,6 @@ "jccolor.c", "jcdctmgr.c", "jchuff.c", - "jchuff.h", "jcicc.c", "jcinit.c", "jcmainct.c", @@ -236,7 +221,6 @@ "jdcolor.c", "jddctmgr.c", "jdhuff.c", - "jdhuff.h", "jdicc.c", "jdinput.c", "jdmainct.c", @@ -248,7 +232,6 @@ "jdsample.c", "jdtrans.c", "jerror.c", - "jerror.h", "jfdctflt.c", "jfdctfst.c", "jfdctint.c", @@ -258,13 +241,10 @@ "jidctred.c", "jmemmgr.c", "jmemnobs.c", - "jmemsys.h", "jpeg_nbits_table.c", - "jpegint.h", "jquant1.c", "jquant2.c", "jutils.c", - "jversion.h", ] defines = [ @@ -275,27 +255,29 @@ configs += [ ":libjpeg_config" ] public_configs = [ ":libjpeg_config" ] - public_deps = [ - ":libjpeg_headers", - ] + public_deps = [ ":libjpeg_headers" ] - # MemorySanitizer doesn't support assembly code, so keep it disabled in - # MSan builds for now. - if (is_msan) { + # MemorySanitizer doesn't support assembly code, so keep it disabled in x86 + # and x64 MSan builds for now. + if (is_msan && (current_cpu == "x86" || current_cpu == "x64")) { sources += [ "jsimd_none.c" ] } else { public_deps += [ ":simd" ] + + if ((current_cpu == "arm" || current_cpu == "arm64") && arm_use_neon) { + defines += [ "NEON_INTRINSICS" ] + } } } static_library("turbojpeg") { sources = [ - "turbojpeg.c", - "transupp.c", "jdatadst-tj.c", "jdatasrc-tj.c", "rdbmp.c", "rdppm.c", + "transupp.c", + "turbojpeg.c", "wrbmp.c", "wrppm.c", ] @@ -309,9 +291,7 @@ configs += [ ":libjpeg_config" ] public_configs = [ ":libjpeg_config" ] - public_deps = [ - ":libjpeg", - ] + public_deps = [ ":libjpeg" ] } if (build_with_chromium) { @@ -333,12 +313,12 @@ "jpegtran.c", "md5/md5.c", "md5/md5hl.c", - "tjbench.c", - "tjunittest.c", - "tjutil.c", "rdcolmap.c", "rdgif.c", "rdswitch.c", + "tjbench.c", + "tjunittest.c", + "tjutil.c", ] deps = [ @@ -348,9 +328,7 @@ "//testing/gtest:gtest_main", ] - data = [ - "testimages/" - ] + data = [ "testimages/" ] defines = [ "GTEST", diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/BUILDING.md b/src/3rdparty/chromium/third_party/libjpeg_turbo/BUILDING.md --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/BUILDING.md 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/BUILDING.md 2021-11-20 03:41:33.389600594 +0000 @@ -12,10 +12,7 @@ - [NASM](http://www.nasm.us) or [YASM](http://yasm.tortall.net) (if building x86 or x86-64 SIMD extensions) - * If using NASM, 2.10 or later is required. - * If using NASM, 2.10 or later (except 2.11.08) is required for an x86-64 Mac - build (2.11.08 does not work properly with libjpeg-turbo's x86-64 SIMD code - when building macho64 objects.) + * If using NASM, 2.13 or later is required. * If using YASM, 1.2.0 or later is required. * If building on macOS, NASM or YASM can be obtained from [MacPorts](http://www.macports.org/) or [Homebrew](http://brew.sh/). @@ -49,10 +46,8 @@ - If building the TurboJPEG Java wrapper, JDK or OpenJDK 1.5 or later is required. Most modern Linux distributions, as well as Solaris 10 and later, - include JDK or OpenJDK. On OS X 10.5 and 10.6, it will be necessary to - install the Java Developer Package, which can be downloaded from - (Apple ID required.) For other - systems, you can obtain the Oracle Java Development Kit from + include JDK or OpenJDK. For other systems, you can obtain the Oracle Java + Development Kit from . * If using JDK 11 or later, CMake 3.10.x or later must also be used. @@ -62,22 +57,22 @@ - Microsoft Visual C++ 2005 or later If you don't already have Visual C++, then the easiest way to get it is by - installing the - [Windows SDK](http://msdn.microsoft.com/en-us/windows/bb980924.aspx). - The Windows SDK includes both 32-bit and 64-bit Visual C++ compilers and - everything necessary to build libjpeg-turbo. - - * You can also use Microsoft Visual Studio Express/Community Edition, which - is a free download. (NOTE: versions prior to 2012 can only be used to - build 32-bit code.) + installing + [Visual Studio Community Edition](https://visualstudio.microsoft.com), + which includes everything necessary to build libjpeg-turbo. + + * You can also download and install the standalone Windows SDK (for Windows 7 + or later), which includes command-line versions of the 32-bit and 64-bit + Visual C++ compilers. * If you intend to build libjpeg-turbo from the command line, then add the appropriate compiler and SDK directories to the `INCLUDE`, `LIB`, and `PATH` environment variables. This is generally accomplished by - executing `vcvars32.bat` or `vcvars64.bat` and `SetEnv.cmd`. - `vcvars32.bat` and `vcvars64.bat` are part of Visual C++ and are located in - the same directory as the compiler. `SetEnv.cmd` is part of the Windows - SDK. You can pass optional arguments to `SetEnv.cmd` to specify a 32-bit - or 64-bit build environment. + executing `vcvars32.bat` or `vcvars64.bat`, which are located in the same + directory as the compiler. + * If built with Visual C++ 2015 or later, the libjpeg-turbo static libraries + cannot be used with earlier versions of Visual C++, and vice versa. + * The libjpeg API DLL (**jpeg{version}.dll**) will depend on the C run-time + DLLs corresponding to the version of Visual C++ that was used to build it. ... OR ... @@ -108,6 +103,13 @@ directory. For in-tree builds, these directories are the same. +Ninja +----- + +In all of the procedures and recipes below, replace `make` with `ninja` and +`Unix Makefiles` with `Ninja` if using Ninja. + + Build Procedure --------------- @@ -333,7 +335,7 @@ ------------- -### 32-bit Build on 64-bit Linux/Unix/Mac +### 32-bit Build on 64-bit Linux/Unix Use export/setenv to set the following environment variables before running CMake: @@ -398,117 +400,23 @@ Building libjpeg-turbo for iOS ------------------------------ -iOS platforms, such as the iPhone and iPad, use ARM processors, and all -currently supported models include NEON instructions. Thus, they can take +iOS platforms, such as the iPhone and iPad, use Arm processors, and all +currently supported models include Neon instructions. Thus, they can take advantage of libjpeg-turbo's SIMD extensions to significantly accelerate JPEG compression/decompression. This section describes how to build libjpeg-turbo for these platforms. -### Additional build requirements - -- For configurations that require [gas-preprocessor.pl] - (https://raw.githubusercontent.com/libjpeg-turbo/gas-preprocessor/master/gas-preprocessor.pl), - it should be installed in your `PATH`. - - -### ARMv7 (32-bit) - -**gas-preprocessor.pl required** - -The following scripts demonstrate how to build libjpeg-turbo to run on the -iPhone 3GS-4S/iPad 1st-3rd Generation and newer: - -#### Xcode 4.2 and earlier (LLVM-GCC) - - IOS_PLATFORMDIR=/Developer/Platforms/iPhoneOS.platform - IOS_SYSROOT=($IOS_PLATFORMDIR/Developer/SDKs/iPhoneOS*.sdk) - export CFLAGS="-mfloat-abi=softfp -march=armv7 -mcpu=cortex-a8 -mtune=cortex-a8 -mfpu=neon -miphoneos-version-min=3.0" - - cd {build_directory} - - cat <toolchain.cmake - set(CMAKE_SYSTEM_NAME Darwin) - set(CMAKE_SYSTEM_PROCESSOR arm) - set(CMAKE_C_COMPILER ${IOS_PLATFORMDIR}/Developer/usr/bin/arm-apple-darwin10-llvm-gcc-4.2) - EOF - - cmake -G"Unix Makefiles" -DCMAKE_TOOLCHAIN_FILE=toolchain.cmake \ - -DCMAKE_OSX_SYSROOT=${IOS_SYSROOT[0]} \ - [additional CMake flags] {source_directory} - make - -#### Xcode 4.3-4.6 (LLVM-GCC) - -Same as above, but replace the first line with: - - IOS_PLATFORMDIR=/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform - -#### Xcode 5 and later (Clang) +### Armv8 (64-bit) - IOS_PLATFORMDIR=/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform - IOS_SYSROOT=($IOS_PLATFORMDIR/Developer/SDKs/iPhoneOS*.sdk) - export CFLAGS="-mfloat-abi=softfp -arch armv7 -miphoneos-version-min=3.0" - export ASMFLAGS="-no-integrated-as" - - cd {build_directory} - - cat <toolchain.cmake - set(CMAKE_SYSTEM_NAME Darwin) - set(CMAKE_SYSTEM_PROCESSOR arm) - set(CMAKE_C_COMPILER /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/clang) - EOF - - cmake -G"Unix Makefiles" -DCMAKE_TOOLCHAIN_FILE=toolchain.cmake \ - -DCMAKE_OSX_SYSROOT=${IOS_SYSROOT[0]} \ - [additional CMake flags] {source_directory} - make - - -### ARMv7s (32-bit) - -**gas-preprocessor.pl required** - -The following scripts demonstrate how to build libjpeg-turbo to run on the -iPhone 5/iPad 4th Generation and newer: - -#### Xcode 4.5-4.6 (LLVM-GCC) - - IOS_PLATFORMDIR=/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform - IOS_SYSROOT=($IOS_PLATFORMDIR/Developer/SDKs/iPhoneOS*.sdk) - export CFLAGS="-Wall -mfloat-abi=softfp -march=armv7s -mcpu=swift -mtune=swift -mfpu=neon -miphoneos-version-min=6.0" - - cd {build_directory} - - cat <toolchain.cmake - set(CMAKE_SYSTEM_NAME Darwin) - set(CMAKE_SYSTEM_PROCESSOR arm) - set(CMAKE_C_COMPILER ${IOS_PLATFORMDIR}/Developer/usr/bin/arm-apple-darwin10-llvm-gcc-4.2) - EOF - - cmake -G"Unix Makefiles" -DCMAKE_TOOLCHAIN_FILE=toolchain.cmake \ - -DCMAKE_OSX_SYSROOT=${IOS_SYSROOT[0]} \ - [additional CMake flags] {source_directory} - make - -#### Xcode 5 and later (Clang) - -Same as the ARMv7 build procedure for Xcode 5 and later, except replace the -compiler flags as follows: - - export CFLAGS="-Wall -mfloat-abi=softfp -arch armv7s -miphoneos-version-min=6.0" - - -### ARMv8 (64-bit) - -**gas-preprocessor.pl required if using Xcode < 6** +**Xcode 5 or later required, Xcode 6.3.x or later recommended** The following script demonstrates how to build libjpeg-turbo to run on the iPhone 5S/iPad Mini 2/iPad Air and newer. IOS_PLATFORMDIR=/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform IOS_SYSROOT=($IOS_PLATFORMDIR/Developer/SDKs/iPhoneOS*.sdk) - export CFLAGS="-Wall -arch arm64 -miphoneos-version-min=7.0 -funwind-tables" + export CFLAGS="-Wall -arch arm64 -miphoneos-version-min=8.0 -funwind-tables" cd {build_directory} @@ -523,9 +431,6 @@ [additional CMake flags] {source_directory} make -Once built, lipo can be used to combine the ARMv7, v7s, and/or v8 variants into -a universal library. - Building libjpeg-turbo for Android ---------------------------------- @@ -534,7 +439,9 @@ [Android NDK](https://developer.android.com/tools/sdk/ndk). -### ARMv7 (32-bit) +### Armv7 (32-bit) + +**NDK r19 or later with Clang recommended** The following is a general recipe script that can be modified for your specific needs. @@ -559,7 +466,9 @@ make -### ARMv8 (64-bit) +### Armv8 (64-bit) + +**Clang recommended** The following is a general recipe script that can be modified for your specific needs. @@ -735,44 +644,23 @@ make dmg Create Mac package/disk image. This requires pkgbuild and productbuild, which -are installed by default on OS X 10.7 and later and which can be obtained by -installing Xcode 3.2.6 (with the "Unix Development" option) on OS X 10.6. -Packages built in this manner can be installed on OS X 10.5 and later, but they -must be built on OS X 10.6 or later. - - make udmg - -This creates a Mac package/disk image that contains universal x86-64/i386/ARM -binaries. The following CMake variables control which architectures are -included in the universal binaries. Setting any of these variables to an empty -string excludes that architecture from the package. - -* `OSX_32BIT_BUILD`: Directory containing an i386 (32-bit) Mac build of - libjpeg-turbo (default: *{source_directory}*/osxx86) -* `IOS_ARMV7_BUILD`: Directory containing an ARMv7 (32-bit) iOS build of - libjpeg-turbo (default: *{source_directory}*/iosarmv7) -* `IOS_ARMV7S_BUILD`: Directory containing an ARMv7s (32-bit) iOS build of - libjpeg-turbo (default: *{source_directory}*/iosarmv7s) -* `IOS_ARMV8_BUILD`: Directory containing an ARMv8 (64-bit) iOS build of - libjpeg-turbo (default: *{source_directory}*/iosarmv8) - -You should first use CMake to configure i386, ARMv7, ARMv7s, and/or ARMv8 -sub-builds of libjpeg-turbo (see "Build Recipes" and "Building libjpeg-turbo -for iOS" above) in build directories that match those specified in the -aforementioned CMake variables. Next, configure the primary build of -libjpeg-turbo as an out-of-tree build, and build it. Once the primary build -has been built, run `make udmg` from the build directory. The packaging system -will build the sub-builds, use lipo to combine them into a single set of -universal binaries, then package the universal binaries in the same manner as -`make dmg`. - +are installed by default on OS X/macOS 10.7 and later. -Cygwin ------- +In order to create a Mac package/disk image that contains universal +x86-64/Arm binaries, set the following CMake variable: - make cygwinpkg +* `ARMV8_BUILD`: Directory containing an Armv8 (64-bit) iOS or macOS build of + libjpeg-turbo to include in the universal binaries -Build a Cygwin binary package. +You should first use CMake to configure an Armv8 sub-build of libjpeg-turbo +(see "Building libjpeg-turbo for iOS" above, if applicable) in a build +directory that matches the one specified in the aforementioned CMake variable. +Next, configure the primary (x86-64) build of libjpeg-turbo as an out-of-tree +build, specifying the aforementioned CMake variable, and build it. Once the +primary build has been built, run `make dmg` from the build directory. The +packaging system will build the sub-build, use lipo to combine it with the +primary build into a single set of universal binaries, then package the +universal binaries. Windows diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/cderror.h b/src/3rdparty/chromium/third_party/libjpeg_turbo/cderror.h --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/cderror.h 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/cderror.h 2021-11-20 03:41:33.390600578 +0000 @@ -1,9 +1,11 @@ /* * cderror.h * + * This file was part of the Independent JPEG Group's software: * Copyright (C) 1994-1997, Thomas G. Lane. * Modified 2009-2017 by Guido Vollbeding. - * This file is part of the Independent JPEG Group's software. + * libjpeg-turbo Modifications: + * Copyright (C) 2021, D. R. Commander. * For conditions of distribution and use, see the accompanying README.ijg * file. * @@ -42,7 +44,7 @@ #ifdef BMP_SUPPORTED JMESSAGE(JERR_BMP_BADCMAP, "Unsupported BMP colormap format") -JMESSAGE(JERR_BMP_BADDEPTH, "Only 8- and 24-bit BMP files are supported") +JMESSAGE(JERR_BMP_BADDEPTH, "Only 8-, 24-, and 32-bit BMP files are supported") JMESSAGE(JERR_BMP_BADHEADER, "Invalid BMP file: bad header length") JMESSAGE(JERR_BMP_BADPLANES, "Invalid BMP file: biPlanes not equal to 1") JMESSAGE(JERR_BMP_COLORSPACE, "BMP output must be grayscale or RGB") @@ -50,9 +52,9 @@ JMESSAGE(JERR_BMP_EMPTY, "Empty BMP image") JMESSAGE(JERR_BMP_NOT, "Not a BMP file - does not start with BM") JMESSAGE(JERR_BMP_OUTOFRANGE, "Numeric value out of range in BMP file") -JMESSAGE(JTRC_BMP, "%ux%u 24-bit BMP image") +JMESSAGE(JTRC_BMP, "%ux%u %d-bit BMP image") JMESSAGE(JTRC_BMP_MAPPED, "%ux%u 8-bit colormapped BMP image") -JMESSAGE(JTRC_BMP_OS2, "%ux%u 24-bit OS2 BMP image") +JMESSAGE(JTRC_BMP_OS2, "%ux%u %d-bit OS2 BMP image") JMESSAGE(JTRC_BMP_OS2_MAPPED, "%ux%u 8-bit colormapped OS2 BMP image") #endif /* BMP_SUPPORTED */ @@ -60,6 +62,7 @@ JMESSAGE(JERR_GIF_BUG, "GIF output got confused") JMESSAGE(JERR_GIF_CODESIZE, "Bogus GIF codesize %d") JMESSAGE(JERR_GIF_COLORSPACE, "GIF output must be grayscale or RGB") +JMESSAGE(JERR_GIF_EMPTY, "Empty GIF image") JMESSAGE(JERR_GIF_IMAGENOTFOUND, "Too few images in GIF file") JMESSAGE(JERR_GIF_NOT, "Not a GIF file") JMESSAGE(JTRC_GIF, "%ux%ux%d GIF image") @@ -84,23 +87,6 @@ JMESSAGE(JTRC_PPM_TEXT, "%ux%u text PPM image") #endif /* PPM_SUPPORTED */ -#ifdef RLE_SUPPORTED -JMESSAGE(JERR_RLE_BADERROR, "Bogus error code from RLE library") -JMESSAGE(JERR_RLE_COLORSPACE, "RLE output must be grayscale or RGB") -JMESSAGE(JERR_RLE_DIMENSIONS, "Image dimensions (%ux%u) too large for RLE") -JMESSAGE(JERR_RLE_EMPTY, "Empty RLE file") -JMESSAGE(JERR_RLE_EOF, "Premature EOF in RLE header") -JMESSAGE(JERR_RLE_MEM, "Insufficient memory for RLE header") -JMESSAGE(JERR_RLE_NOT, "Not an RLE file") -JMESSAGE(JERR_RLE_TOOMANYCHANNELS, "Cannot handle %d output channels for RLE") -JMESSAGE(JERR_RLE_UNSUPPORTED, "Cannot handle this RLE setup") -JMESSAGE(JTRC_RLE, "%ux%u full-color RLE file") -JMESSAGE(JTRC_RLE_FULLMAP, "%ux%u full-color RLE file with map of length %d") -JMESSAGE(JTRC_RLE_GRAY, "%ux%u grayscale RLE file") -JMESSAGE(JTRC_RLE_MAPGRAY, "%ux%u grayscale RLE file with map of length %d") -JMESSAGE(JTRC_RLE_MAPPED, "%ux%u colormapped RLE file with map of length %d") -#endif /* RLE_SUPPORTED */ - #ifdef TARGA_SUPPORTED JMESSAGE(JERR_TGA_BADCMAP, "Unsupported Targa colormap format") JMESSAGE(JERR_TGA_BADPARMS, "Invalid or unsupported Targa file") diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/cdjpeg.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/cdjpeg.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/cdjpeg.c 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/cdjpeg.c 2021-11-20 03:41:33.390600578 +0000 @@ -3,8 +3,8 @@ * * This file was part of the Independent JPEG Group's software: * Copyright (C) 1991-1997, Thomas G. Lane. - * It was modified by The libjpeg-turbo Project to include only code relevant - * to libjpeg-turbo. + * libjpeg-turbo Modifications: + * Copyright (C) 2019, D. R. Commander. * For conditions of distribution and use, see the accompanying README.ijg * file. * @@ -25,26 +25,37 @@ * Optional progress monitor: display a percent-done figure on stderr. */ -#ifdef PROGRESS_REPORT - METHODDEF(void) progress_monitor(j_common_ptr cinfo) { cd_progress_ptr prog = (cd_progress_ptr)cinfo->progress; - int total_passes = prog->pub.total_passes + prog->total_extra_passes; - int percent_done = - (int)(prog->pub.pass_counter * 100L / prog->pub.pass_limit); - - if (percent_done != prog->percent_done) { - prog->percent_done = percent_done; - if (total_passes > 1) { - fprintf(stderr, "\rPass %d/%d: %3d%% ", - prog->pub.completed_passes + prog->completed_extra_passes + 1, - total_passes, percent_done); - } else { - fprintf(stderr, "\r %3d%% ", percent_done); + + if (prog->max_scans != 0 && cinfo->is_decompressor) { + int scan_no = ((j_decompress_ptr)cinfo)->input_scan_number; + + if (scan_no > (int)prog->max_scans) { + fprintf(stderr, "Scan number %d exceeds maximum scans (%d)\n", scan_no, + prog->max_scans); + exit(EXIT_FAILURE); + } + } + + if (prog->report) { + int total_passes = prog->pub.total_passes + prog->total_extra_passes; + int percent_done = + (int)(prog->pub.pass_counter * 100L / prog->pub.pass_limit); + + if (percent_done != prog->percent_done) { + prog->percent_done = percent_done; + if (total_passes > 1) { + fprintf(stderr, "\rPass %d/%d: %3d%% ", + prog->pub.completed_passes + prog->completed_extra_passes + 1, + total_passes, percent_done); + } else { + fprintf(stderr, "\r %3d%% ", percent_done); + } + fflush(stderr); } - fflush(stderr); } } @@ -57,6 +68,8 @@ progress->pub.progress_monitor = progress_monitor; progress->completed_extra_passes = 0; progress->total_extra_passes = 0; + progress->max_scans = 0; + progress->report = FALSE; progress->percent_done = -1; cinfo->progress = &progress->pub; } @@ -73,8 +86,6 @@ } } -#endif - /* * Case-insensitive matching of possibly-abbreviated keyword switches. diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/cdjpeg.h b/src/3rdparty/chromium/third_party/libjpeg_turbo/cdjpeg.h --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/cdjpeg.h 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/cdjpeg.h 2021-11-20 03:41:33.390600578 +0000 @@ -3,8 +3,9 @@ * * This file was part of the Independent JPEG Group's software: * Copyright (C) 1994-1997, Thomas G. Lane. + * Modified 2019 by Guido Vollbeding. * libjpeg-turbo Modifications: - * Copyright (C) 2017, D. R. Commander. + * Copyright (C) 2017, 2019, 2021, D. R. Commander. * For conditions of distribution and use, see the accompanying README.ijg * file. * @@ -35,6 +36,9 @@ JSAMPARRAY buffer; JDIMENSION buffer_height; +#ifdef FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION + JDIMENSION max_pixels; +#endif }; @@ -56,9 +60,9 @@ void (*finish_output) (j_decompress_ptr cinfo, djpeg_dest_ptr dinfo); /* Re-calculate buffer dimensions based on output dimensions (for use with partial image decompression.) If this is NULL, then the output format - does not support partial image decompression (BMP and RLE, in particular, - cannot support partial decompression because they use an inversion buffer - to write the image in bottom-up order.) */ + does not support partial image decompression (BMP, in particular, cannot + support partial decompression because it uses an inversion buffer to write + the image in bottom-up order.) */ void (*calc_buffer_dimensions) (j_decompress_ptr cinfo, djpeg_dest_ptr dinfo); @@ -87,6 +91,9 @@ struct jpeg_progress_mgr pub; /* fields known to JPEG library */ int completed_extra_passes; /* extra passes completed */ int total_extra_passes; /* total extra */ + JDIMENSION max_scans; /* abort if the number of scans exceeds this + value and the value is non-zero */ + boolean report; /* whether or not to report progress */ /* last printed percentage stored here to avoid multiple printouts */ int percent_done; }; @@ -101,11 +108,9 @@ EXTERN(djpeg_dest_ptr) jinit_write_bmp(j_decompress_ptr cinfo, boolean is_os2, boolean use_inversion_array); EXTERN(cjpeg_source_ptr) jinit_read_gif(j_compress_ptr cinfo); -EXTERN(djpeg_dest_ptr) jinit_write_gif(j_decompress_ptr cinfo); +EXTERN(djpeg_dest_ptr) jinit_write_gif(j_decompress_ptr cinfo, boolean is_lzw); EXTERN(cjpeg_source_ptr) jinit_read_ppm(j_compress_ptr cinfo); EXTERN(djpeg_dest_ptr) jinit_write_ppm(j_decompress_ptr cinfo); -EXTERN(cjpeg_source_ptr) jinit_read_rle(j_compress_ptr cinfo); -EXTERN(djpeg_dest_ptr) jinit_write_rle(j_decompress_ptr cinfo); EXTERN(cjpeg_source_ptr) jinit_read_targa(j_compress_ptr cinfo); EXTERN(djpeg_dest_ptr) jinit_write_targa(j_decompress_ptr cinfo); @@ -125,7 +130,6 @@ /* common support routines (in cdjpeg.c) */ -EXTERN(void) enable_signal_catcher(j_common_ptr cinfo); EXTERN(void) start_progress_monitor(j_common_ptr cinfo, cd_progress_ptr progress); EXTERN(void) end_progress_monitor(j_common_ptr cinfo); diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/change.log b/src/3rdparty/chromium/third_party/libjpeg_turbo/change.log --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/change.log 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/change.log 2021-11-20 03:41:33.390600578 +0000 @@ -6,6 +6,25 @@ CHANGE LOG for Independent JPEG Group's JPEG software +Version 9d 12-Jan-2020 +----------------------- + +Restore GIF read and write support from libjpeg version 6a. +Thank to Wolfgang Werner (W.W.) Heinz for suggestion. + +Add jpegtran -drop option; add options to the crop extension and wipe +to fill the extra area with content from the source image region, +instead of gray out. + + +Version 9c 14-Jan-2018 +----------------------- + +jpegtran: add an option to the -wipe switch to fill the region +with the average of adjacent blocks, instead of gray out. +Thank to Caitlyn Feddock and Maddie Ziegler for inspiration. + + Version 9b 17-Jan-2016 ----------------------- @@ -13,6 +32,13 @@ Thank to Michele Martone for suggestion. +Version 9a 19-Jan-2014 +----------------------- + +Add jpegtran -wipe option and extension for -crop. +Thank to Andrew Senior, David Clunie, and Josef Schmid for suggestion. + + Version 9 13-Jan-2013 ---------------------- @@ -138,11 +164,6 @@ Huffman tables are checked for validity much more carefully than before. -To avoid the Unisys LZW patent, djpeg's GIF output capability has been -changed to produce "uncompressed GIFs", and cjpeg's GIF input capability -has been removed altogether. We're not happy about it either, but there -seems to be no good alternative. - The configure script now supports building libjpeg as a shared library on many flavors of Unix (all the ones that GNU libtool knows how to build shared libraries for). Use "./configure --enable-shared" to diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/ChangeLog.md b/src/3rdparty/chromium/third_party/libjpeg_turbo/ChangeLog.md --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/ChangeLog.md 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/ChangeLog.md 2021-11-20 03:41:33.389600594 +0000 @@ -1,3 +1,262 @@ +2.1.1 +===== + +### Significant changes relative to 2.1.0 + +1. Fixed a regression introduced in 2.1.0 that caused build failures with +non-GCC-compatible compilers for Un*x/Arm platforms. + +2. Fixed a regression introduced by 2.1 beta1[13] that prevented the Arm 32-bit +(AArch32) Neon SIMD extensions from building unless the C compiler flags +included `-mfloat-abi=softfp` or `-mfloat-abi=hard`. + +3. Fixed an issue in the AArch32 Neon SIMD Huffman encoder whereby reliance on +undefined C compiler behavior led to crashes ("SIGBUS: illegal alignment") on +Android systems when running AArch32/Thumb builds of libjpeg-turbo built with +recent versions of Clang. + +4. Added a command-line argument (`-copy icc`) to jpegtran that causes it to +copy only the ICC profile markers from the source file and discard any other +metadata. + +5. libjpeg-turbo should now build and run on CHERI-enabled architectures, which +use capability pointers that are larger than the size of `size_t`. + +6. Fixed a regression introduced by 2.1 beta1[5] that caused a segfault in the +64-bit SSE2 Huffman encoder when attempting to losslessly transform a +specially-crafted malformed JPEG image. + + +2.1.0 +===== + +### Significant changes relative to 2.1 beta1 + +1. Fixed a regression introduced by 2.1 beta1[6(b)] whereby attempting to +decompress certain progressive JPEG images with one or more component planes of +width 8 or less caused a buffer overrun. + +2. Fixed a regression introduced by 2.1 beta1[6(b)] whereby attempting to +decompress a specially-crafted malformed progressive JPEG image caused the +block smoothing algorithm to read from uninitialized memory. + +3. Fixed an issue in the Arm Neon SIMD Huffman encoders that caused the +encoders to generate incorrect results when using the Clang compiler with +Visual Studio. + +4. Fixed a floating point exception (CVE-2021-20205) that occurred when +attempting to compress a specially-crafted malformed GIF image with a specified +image width of 0 using cjpeg. + +5. Fixed a regression introduced by 2.0 beta1[15] whereby attempting to +generate a progressive JPEG image on an SSE2-capable CPU using a scan script +containing one or more scans with lengths divisible by 32 and non-zero +successive approximation low bit positions would, under certain circumstances, +result in an error ("Missing Huffman code table entry") and an invalid JPEG +image. + +6. Introduced a new flag (`TJFLAG_LIMITSCANS` in the TurboJPEG C API and +`TJ.FLAG_LIMIT_SCANS` in the TurboJPEG Java API) and a corresponding TJBench +command-line argument (`-limitscans`) that causes the TurboJPEG decompression +and transform functions/operations to return/throw an error if a progressive +JPEG image contains an unreasonably large number of scans. This allows +applications that use the TurboJPEG API to guard against an exploit of the +progressive JPEG format described in the report +["Two Issues with the JPEG Standard"](https://libjpeg-turbo.org/pmwiki/uploads/About/TwoIssueswiththeJPEGStandard.pdf). + +7. The PPM reader now throws an error, rather than segfaulting (due to a buffer +overrun) or generating incorrect pixels, if an application attempts to use the +`tjLoadImage()` function to load a 16-bit binary PPM file (a binary PPM file +with a maximum value greater than 255) into a grayscale image buffer or to load +a 16-bit binary PGM file into an RGB image buffer. + +8. Fixed an issue in the PPM reader that caused incorrect pixels to be +generated when using the `tjLoadImage()` function to load a 16-bit binary PPM +file into an extended RGB image buffer. + +9. Fixed an issue whereby, if a JPEG buffer was automatically re-allocated by +one of the TurboJPEG compression or transform functions and an error +subsequently occurred during compression or transformation, the JPEG buffer +pointer passed by the application was not updated when the function returned. + + +2.0.90 (2.1 beta1) +================== + +### Significant changes relative to 2.0.6: + +1. The build system, x86-64 SIMD extensions, and accelerated Huffman codec now +support the x32 ABI on Linux, which allows for using x86-64 instructions with +32-bit pointers. The x32 ABI is generally enabled by adding `-mx32` to the +compiler flags. + + Caveats: + - CMake 3.9.0 or later is required in order for the build system to +automatically detect an x32 build. + - Java does not support the x32 ABI, and thus the TurboJPEG Java API will +automatically be disabled with x32 builds. + +2. Added Loongson MMI SIMD implementations of the RGB-to-grayscale, 4:2:2 fancy +chroma upsampling, 4:2:2 and 4:2:0 merged chroma upsampling/color conversion, +and fast integer DCT/IDCT algorithms. Relative to libjpeg-turbo 2.0.x, this +speeds up: + + - the compression of RGB source images into grayscale JPEG images by +approximately 20% + - the decompression of 4:2:2 JPEG images by approximately 40-60% when +using fancy upsampling + - the decompression of 4:2:2 and 4:2:0 JPEG images by approximately +15-20% when using merged upsampling + - the compression of RGB source images by approximately 30-45% when using +the fast integer DCT + - the decompression of JPEG images into RGB destination images by +approximately 2x when using the fast integer IDCT + + The overall decompression speedup for RGB images is now approximately +2.3-3.7x (compared to 2-3.5x with libjpeg-turbo 2.0.x.) + +3. 32-bit (Armv7 or Armv7s) iOS builds of libjpeg-turbo are no longer +supported, and the libjpeg-turbo build system can no longer be used to package +such builds. 32-bit iOS apps cannot run in iOS 11 and later, and the App Store +no longer allows them. + +4. 32-bit (i386) OS X/macOS builds of libjpeg-turbo are no longer supported, +and the libjpeg-turbo build system can no longer be used to package such +builds. 32-bit Mac applications cannot run in macOS 10.15 "Catalina" and +later, and the App Store no longer allows them. + +5. The SSE2 (x86 SIMD) and C Huffman encoding algorithms have been +significantly optimized, resulting in a measured average overall compression +speedup of 12-28% for 64-bit code and 22-52% for 32-bit code on various Intel +and AMD CPUs, as well as a measured average overall compression speedup of +0-23% on platforms that do not have a SIMD-accelerated Huffman encoding +implementation. + +6. The block smoothing algorithm that is applied by default when decompressing +progressive Huffman-encoded JPEG images has been improved in the following +ways: + + - The algorithm is now more fault-tolerant. Previously, if a particular +scan was incomplete, then the smoothing parameters for the incomplete scan +would be applied to the entire output image, including the parts of the image +that were generated by the prior (complete) scan. Visually, this had the +effect of removing block smoothing from lower-frequency scans if they were +followed by an incomplete higher-frequency scan. libjpeg-turbo now applies +block smoothing parameters to each iMCU row based on which scan generated the +pixels in that row, rather than always using the block smoothing parameters for +the most recent scan. + - When applying block smoothing to DC scans, a Gaussian-like kernel with a +5x5 window is used to reduce the "blocky" appearance. + +7. Added SIMD acceleration for progressive Huffman encoding on Arm platforms. +This speeds up the compression of full-color progressive JPEGs by about 30-40% +on average (relative to libjpeg-turbo 2.0.x) when using modern Arm CPUs. + +8. Added configure-time and run-time auto-detection of Loongson MMI SIMD +instructions, so that the Loongson MMI SIMD extensions can be included in any +MIPS64 libjpeg-turbo build. + +9. Added fault tolerance features to djpeg and jpegtran, mainly to demonstrate +methods by which applications can guard against the exploits of the JPEG format +described in the report +["Two Issues with the JPEG Standard"](https://libjpeg-turbo.org/pmwiki/uploads/About/TwoIssueswiththeJPEGStandard.pdf). + + - Both programs now accept a `-maxscans` argument, which can be used to +limit the number of allowable scans in the input file. + - Both programs now accept a `-strict` argument, which can be used to +treat all warnings as fatal. + +10. CMake package config files are now included for both the libjpeg and +TurboJPEG API libraries. This facilitates using libjpeg-turbo with CMake's +`find_package()` function. For example: + + find_package(libjpeg-turbo CONFIG REQUIRED) + + add_executable(libjpeg_program libjpeg_program.c) + target_link_libraries(libjpeg_program PUBLIC libjpeg-turbo::jpeg) + + add_executable(libjpeg_program_static libjpeg_program.c) + target_link_libraries(libjpeg_program_static PUBLIC + libjpeg-turbo::jpeg-static) + + add_executable(turbojpeg_program turbojpeg_program.c) + target_link_libraries(turbojpeg_program PUBLIC + libjpeg-turbo::turbojpeg) + + add_executable(turbojpeg_program_static turbojpeg_program.c) + target_link_libraries(turbojpeg_program_static PUBLIC + libjpeg-turbo::turbojpeg-static) + +11. Since the Unisys LZW patent has long expired, cjpeg and djpeg can now +read/write both LZW-compressed and uncompressed GIF files (feature ported from +jpeg-6a and jpeg-9d.) + +12. jpegtran now includes the `-wipe` and `-drop` options from jpeg-9a and +jpeg-9d, as well as the ability to expand the image size using the `-crop` +option. Refer to jpegtran.1 or usage.txt for more details. + +13. Added a complete intrinsics implementation of the Arm Neon SIMD extensions, +thus providing SIMD acceleration on Arm platforms for all of the algorithms +that are SIMD-accelerated on x86 platforms. This new implementation is +significantly faster in some cases than the old GAS implementation-- +depending on the algorithms used, the type of CPU core, and the compiler. GCC, +as of this writing, does not provide a full or optimal set of Neon intrinsics, +so for performance reasons, the default when building libjpeg-turbo with GCC is +to continue using the GAS implementation of the following algorithms: + + - 32-bit RGB-to-YCbCr color conversion + - 32-bit fast and accurate inverse DCT + - 64-bit RGB-to-YCbCr and YCbCr-to-RGB color conversion + - 64-bit accurate forward and inverse DCT + - 64-bit Huffman encoding + + A new CMake variable (`NEON_INTRINSICS`) can be used to override this +default. + + Since the new intrinsics implementation includes SIMD acceleration +for merged upsampling/color conversion, 1.5.1[5] is no longer necessary and has +been reverted. + +14. The Arm Neon SIMD extensions can now be built using Visual Studio. + +15. The build system can now be used to generate a universal x86-64 + Armv8 +libjpeg-turbo SDK package for both iOS and macOS. + + +2.0.6 +===== + +### Significant changes relative to 2.0.5: + +1. Fixed "using JNI after critical get" errors that occurred on Android +platforms when using any of the YUV encoding/compression/decompression/decoding +methods in the TurboJPEG Java API. + +2. Fixed or worked around multiple issues with `jpeg_skip_scanlines()`: + + - Fixed segfaults or "Corrupt JPEG data: premature end of data segment" +errors in `jpeg_skip_scanlines()` that occurred when decompressing 4:2:2 or +4:2:0 JPEG images using merged (non-fancy) upsampling/color conversion (that +is, when setting `cinfo.do_fancy_upsampling` to `FALSE`.) 2.0.0[6] was a +similar fix, but it did not cover all cases. + - `jpeg_skip_scanlines()` now throws an error if two-pass color +quantization is enabled. Two-pass color quantization never worked properly +with `jpeg_skip_scanlines()`, and the issues could not readily be fixed. + - Fixed an issue whereby `jpeg_skip_scanlines()` always returned 0 when +skipping past the end of an image. + +3. The Arm 64-bit (Armv8) Neon SIMD extensions can now be built using MinGW +toolchains targetting Arm64 (AArch64) Windows binaries. + +4. Fixed unexpected visual artifacts that occurred when using +`jpeg_crop_scanline()` and interblock smoothing while decompressing only the DC +scan of a progressive JPEG image. + +5. Fixed an issue whereby libjpeg-turbo would not build if 12-bit-per-component +JPEG support (`WITH_12BIT`) was enabled along with libjpeg v7 or libjpeg v8 +API/ABI emulation (`WITH_JPEG7` or `WITH_JPEG8`.) + + 2.0.5 ===== @@ -54,17 +313,17 @@ decompress some such images using `tjDecompressToYUV2()` or `tjDecompressToYUVPlanes()`. -5. Fixed an issue, detected by ASan, whereby attempting to losslessly transform -a specially-crafted malformed JPEG image containing an extremely-high-frequency -coefficient block (junk image data that could never be generated by a -legitimate JPEG compressor) could cause the Huffman encoder's local buffer to -be overrun. (Refer to 1.4.0[9] and 1.4beta1[15].) Given that the buffer -overrun was fully contained within the stack and did not cause a segfault or -other user-visible errant behavior, and given that the lossless transformer -(unlike the decompressor) is not generally exposed to arbitrary data exploits, -this issue did not likely pose a security risk. +5. Fixed an issue (CVE-2020-17541), detected by ASan, whereby attempting to +losslessly transform a specially-crafted malformed JPEG image containing an +extremely-high-frequency coefficient block (junk image data that could never be +generated by a legitimate JPEG compressor) could cause the Huffman encoder's +local buffer to be overrun. (Refer to 1.4.0[9] and 1.4beta1[15].) Given that +the buffer overrun was fully contained within the stack and did not cause a +segfault or other user-visible errant behavior, and given that the lossless +transformer (unlike the decompressor) is not generally exposed to arbitrary +data exploits, this issue did not likely pose a security risk. -6. The ARM 64-bit (ARMv8) NEON SIMD assembly code now stores constants in a +6. The Arm 64-bit (Armv8) Neon SIMD assembly code now stores constants in a separate read-only data section rather than in the text section, to support execute-only memory layouts. @@ -246,7 +505,7 @@ 1. Added AVX2 SIMD implementations of the colorspace conversion, chroma downsampling and upsampling, integer quantization and sample conversion, and -slow integer DCT/IDCT algorithms. When using the slow integer DCT/IDCT +accurate integer DCT/IDCT algorithms. When using the accurate integer DCT/IDCT algorithms on AVX2-equipped CPUs, the compression of RGB images is approximately 13-36% (avg. 22%) faster (relative to libjpeg-turbo 1.5.x) with 64-bit code and 11-21% (avg. 17%) faster with 32-bit code, and the @@ -350,16 +609,16 @@ now produces bitwise-identical results to the unmerged algorithms. 12. The SIMD function symbols for x86[-64]/ELF, MIPS/ELF, macOS/x86[-64] (if -libjpeg-turbo is built with YASM), and iOS/ARM[64] builds are now private. +libjpeg-turbo is built with YASM), and iOS/Arm[64] builds are now private. This prevents those symbols from being exposed in applications or shared libraries that link statically with libjpeg-turbo. 13. Added Loongson MMI SIMD implementations of the RGB-to-YCbCr and YCbCr-to-RGB colorspace conversion, 4:2:0 chroma downsampling, 4:2:0 fancy -chroma upsampling, integer quantization, and slow integer DCT/IDCT algorithms. -When using the slow integer DCT/IDCT, this speeds up the compression of RGB -images by approximately 70-100% and the decompression of RGB images by -approximately 2-3.5x. +chroma upsampling, integer quantization, and accurate integer DCT/IDCT +algorithms. When using the accurate integer DCT/IDCT, this speeds up the +compression of RGB images by approximately 70-100% and the decompression of RGB +images by approximately 2-3.5x. 14. Fixed a build error when building with older MinGW releases (regression caused by 1.5.1[7].) @@ -409,9 +668,9 @@ `jpeg_consume_input()` would return `JPEG_SUSPENDED` rather than `JPEG_REACHED_EOI`. -9. `jpeg_crop_scanlines()` now works correctly when decompressing grayscale -JPEG images that were compressed with a sampling factor other than 1 (for -instance, with `cjpeg -grayscale -sample 2x2`). +9. `jpeg_crop_scanline()` now works correctly when decompressing grayscale JPEG +images that were compressed with a sampling factor other than 1 (for instance, +with `cjpeg -grayscale -sample 2x2`). 1.5.2 @@ -435,7 +694,7 @@ 5. Fixed build and runtime errors on Windows that occurred when building libjpeg-turbo with libjpeg v7 API/ABI emulation and the in-memory source/destination managers. Due to an oversight, the `jpeg_skip_scanlines()` -and `jpeg_crop_scanlines()` functions were not being included in jpeg7.dll when +and `jpeg_crop_scanline()` functions were not being included in jpeg7.dll when libjpeg-turbo was built with `-DWITH_JPEG7=1` and `-DWITH_MEMSRCDST=1`. 6. Fixed "Bogus virtual array access" error that occurred when using the @@ -691,8 +950,8 @@ disabled by setting the `JSIMD_NOHUFFENC` environment variable to `1`. 13. Added ARM 64-bit (ARMv8) NEON SIMD implementations of the commonly-used -compression algorithms (including the slow integer forward DCT and h2v2 & h2v1 -downsampling algorithms, which are not accelerated in the 32-bit NEON +compression algorithms (including the accurate integer forward DCT and h2v2 & +h2v1 downsampling algorithms, which are not accelerated in the 32-bit NEON implementation.) This speeds up the compression of full-color JPEGs by about 75% on average on a Cavium ThunderX processor and by about 2-2.5x on average on Cortex-A53 and Cortex-A57 cores. @@ -823,8 +1082,8 @@ 7. Fixed an extremely rare bug in the Huffman encoder that caused 64-bit builds of libjpeg-turbo to incorrectly encode a few specific test images when -quality=98, an optimized Huffman table, and the slow integer forward DCT were -used. +quality=98, an optimized Huffman table, and the accurate integer forward DCT +were used. 8. The Windows (CMake) build system now supports building only static or only shared libraries. This is accomplished by adding either `-DENABLE_STATIC=0` or @@ -983,8 +1242,8 @@ The accuracy of this implementation now matches the accuracy of the SSE/SSE2 implementation. Note, however, that the floating point DCT/IDCT algorithms are mainly a legacy feature. They generally do not produce significantly better -accuracy than the slow integer DCT/IDCT algorithms, and they are quite a bit -slower. +accuracy than the accurate integer DCT/IDCT algorithms, and they are quite a +bit slower. 8. Added a new output colorspace (`JCS_RGB565`) to the libjpeg API that allows for decompressing JPEG images into RGB565 (16-bit) pixels. If dithering is not @@ -1394,8 +1653,8 @@ 2. Despite the above, the fast integer forward DCT still degrades somewhat for JPEG qualities greater than 95, so the TurboJPEG wrapper will now automatically -use the slow integer forward DCT when generating JPEG images of quality 96 or -greater. This reduces compression performance by as much as 15% for these +use the accurate integer forward DCT when generating JPEG images of quality 96 +or greater. This reduces compression performance by as much as 15% for these high-quality images but is necessary to ensure that the images are perceptually lossless. It also ensures that the library can avoid the performance pitfall created by [1]. diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/cjpeg.1 b/src/3rdparty/chromium/third_party/libjpeg_turbo/cjpeg.1 --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/cjpeg.1 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/cjpeg.1 2021-11-20 03:41:33.390600578 +0000 @@ -1,4 +1,4 @@ -.TH CJPEG 1 "18 March 2017" +.TH CJPEG 1 "4 November 2020" .SH NAME cjpeg \- compress an image file to a JPEG file .SH SYNOPSIS @@ -16,8 +16,7 @@ compresses the named image file, or the standard input if no file is named, and produces a JPEG/JFIF file on the standard output. The currently supported input file formats are: PPM (PBMPLUS color -format), PGM (PBMPLUS grayscale format), BMP, Targa, and RLE (Utah Raster -Toolkit format). (RLE is supported only if the URT library is available.) +format), PGM (PBMPLUS grayscale format), BMP, GIF, and Targa. .SH OPTIONS All switch names may be abbreviated; for example, .B \-grayscale @@ -42,10 +41,10 @@ .TP .B \-grayscale Create monochrome JPEG file from color input. Be sure to use this switch when -compressing a grayscale BMP file, because +compressing a grayscale BMP or GIF file, because .B cjpeg -isn't bright enough to notice whether a BMP file uses only shades of gray. -By saying +isn't bright enough to notice whether a BMP or GIF file uses only shades of +gray. By saying .BR \-grayscale, you'll get a smaller JPEG file that takes less time to process. .TP @@ -161,31 +160,40 @@ unable to view an arithmetic coded JPEG file at all. .TP .B \-dct int -Use integer DCT method (default). +Use accurate integer DCT method (default). .TP .B \-dct fast -Use fast integer DCT (less accurate). -In libjpeg-turbo, the fast method is generally about 5-15% faster than the int -method when using the x86/x86-64 SIMD extensions (results may vary with other -SIMD implementations, or when using libjpeg-turbo without SIMD extensions.) +Use less accurate integer DCT method [legacy feature]. +When the Independent JPEG Group's software was first released in 1991, the +compression time for a 1-megapixel JPEG image on a mainstream PC was measured +in minutes. Thus, the \fBfast\fR integer DCT algorithm provided noticeable +performance benefits. On modern CPUs running libjpeg-turbo, however, the +compression time for a 1-megapixel JPEG image is measured in milliseconds, and +thus the performance benefits of the \fBfast\fR algorithm are much less +noticeable. On modern x86/x86-64 CPUs that support AVX2 instructions, the +\fBfast\fR and \fBint\fR methods have similar performance. On other types of +CPUs, the \fBfast\fR method is generally about 5-15% faster than the \fBint\fR +method. + For quality levels of 90 and below, there should be little or no perceptible -difference between the two algorithms. For quality levels above 90, however, -the difference between the fast and the int methods becomes more pronounced. -With quality=97, for instance, the fast method incurs generally about a 1-3 dB -loss (in PSNR) relative to the int method, but this can be larger for some -images. Do not use the fast method with quality levels above 97. The -algorithm often degenerates at quality=98 and above and can actually produce a -more lossy image than if lower quality levels had been used. Also, in -libjpeg-turbo, the fast method is not fully accelerated for quality levels -above 97, so it will be slower than the int method. +quality difference between the two algorithms. For quality levels above 90, +however, the difference between the \fBfast\fR and \fBint\fR methods becomes +more pronounced. With quality=97, for instance, the \fBfast\fR method incurs +generally about a 1-3 dB loss in PSNR relative to the \fBint\fR method, but +this can be larger for some images. Do not use the \fBfast\fR method with +quality levels above 97. The algorithm often degenerates at quality=98 and +above and can actually produce a more lossy image than if lower quality levels +had been used. Also, in libjpeg-turbo, the \fBfast\fR method is not fully +accelerated for quality levels above 97, so it will be slower than the +\fBint\fR method. .TP .B \-dct float -Use floating-point DCT method. -The float method is mainly a legacy feature. It does not produce significantly -more accurate results than the int method, and it is much slower. The float -method may also give different results on different machines due to varying -roundoff behavior, whereas the integer methods should give the same results on -all machines. +Use floating-point DCT method [legacy feature]. +The \fBfloat\fR method does not produce significantly more accurate results +than the \fBint\fR method, and it is much slower. The \fBfloat\fR method may +also give different results on different machines due to varying roundoff +behavior, whereas the integer methods should give the same results on all +machines. .TP .BI \-icc " file" Embed ICC color management profile contained in the specified file. @@ -215,6 +223,9 @@ way of testing the in-memory destination manager (jpeg_mem_dest()), but it is also useful for benchmarking, since it reduces the I/O overhead. .TP +.BI \-report +Report compression progress. +.TP .B \-verbose Enable debug printout. More .BR \-v 's @@ -341,11 +352,6 @@ relevant to libjpeg-turbo, to wordsmith certain sections, and to describe features not present in libjpeg. .SH ISSUES -Support for GIF input files was removed in cjpeg v6b due to concerns over -the Unisys LZW patent. Although this patent expired in 2006, cjpeg still -lacks GIF support, for these historical reasons. (Conversion of GIF files to -JPEG is usually a bad idea anyway, since GIF is a 256-color format.) -.PP Not all variants of BMP and Targa file formats are supported. .PP The diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/cjpeg.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/cjpeg.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/cjpeg.c 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/cjpeg.c 2021-11-20 03:41:33.390600578 +0000 @@ -5,7 +5,7 @@ * Copyright (C) 1991-1998, Thomas G. Lane. * Modified 2003-2011 by Guido Vollbeding. * libjpeg-turbo Modifications: - * Copyright (C) 2010, 2013-2014, 2017, D. R. Commander. + * Copyright (C) 2010, 2013-2014, 2017, 2019-2021, D. R. Commander. * For conditions of distribution and use, see the accompanying README.ijg * file. * @@ -27,6 +27,9 @@ * works regardless of which command line style is used. */ +#ifdef CJPEG_FUZZER +#define JPEG_INTERNALS +#endif #include "cdjpeg.h" /* Common decls for cjpeg/djpeg applications */ #include "jversion.h" /* for version message */ #include "jconfigint.h" @@ -69,9 +72,9 @@ * 2) assume we can push back more than one character (works in * some C implementations, but unportable); * 3) provide our own buffering (breaks input readers that want to use - * stdio directly, such as the RLE library); + * stdio directly); * or 4) don't put back the data, and modify the input_init methods to assume - * they start reading after the start of file (also breaks RLE library). + * they start reading after the start of file. * #1 is attractive for MS-DOS but is untenable on Unix. * * The most portable solution for file types that can't be identified by their @@ -117,10 +120,6 @@ case 'P': return jinit_read_ppm(cinfo); #endif -#ifdef RLE_SUPPORTED - case 'R': - return jinit_read_rle(cinfo); -#endif #ifdef TARGA_SUPPORTED case 0x00: return jinit_read_targa(cinfo); @@ -147,6 +146,46 @@ static char *icc_filename; /* for -icc switch */ static char *outfilename; /* for -outfile switch */ boolean memdst; /* for -memdst switch */ +boolean report; /* for -report switch */ + + +#ifdef CJPEG_FUZZER + +#include + +struct my_error_mgr { + struct jpeg_error_mgr pub; + jmp_buf setjmp_buffer; +}; + +void my_error_exit(j_common_ptr cinfo) +{ + struct my_error_mgr *myerr = (struct my_error_mgr *)cinfo->err; + + longjmp(myerr->setjmp_buffer, 1); +} + +static void my_emit_message(j_common_ptr cinfo, int msg_level) +{ + if (msg_level < 0) + cinfo->err->num_warnings++; +} + +#define HANDLE_ERROR() { \ + if (cinfo.global_state > CSTATE_START) { \ + if (memdst && outbuffer) \ + (*cinfo.dest->term_destination) (&cinfo); \ + jpeg_abort_compress(&cinfo); \ + } \ + jpeg_destroy_compress(&cinfo); \ + if (input_file != stdin && input_file != NULL) \ + fclose(input_file); \ + if (memdst) \ + free(outbuffer); \ + return EXIT_FAILURE; \ +} + +#endif LOCAL(void) @@ -179,15 +218,15 @@ fprintf(stderr, " -arithmetic Use arithmetic coding\n"); #endif #ifdef DCT_ISLOW_SUPPORTED - fprintf(stderr, " -dct int Use integer DCT method%s\n", + fprintf(stderr, " -dct int Use accurate integer DCT method%s\n", (JDCT_DEFAULT == JDCT_ISLOW ? " (default)" : "")); #endif #ifdef DCT_IFAST_SUPPORTED - fprintf(stderr, " -dct fast Use fast integer DCT (less accurate)%s\n", + fprintf(stderr, " -dct fast Use less accurate integer DCT method [legacy feature]%s\n", (JDCT_DEFAULT == JDCT_IFAST ? " (default)" : "")); #endif #ifdef DCT_FLOAT_SUPPORTED - fprintf(stderr, " -dct float Use floating-point DCT method%s\n", + fprintf(stderr, " -dct float Use floating-point DCT method [legacy feature]%s\n", (JDCT_DEFAULT == JDCT_FLOAT ? " (default)" : "")); #endif fprintf(stderr, " -icc FILE Embed ICC profile contained in FILE\n"); @@ -200,6 +239,7 @@ #if JPEG_LIB_VERSION >= 80 || defined(MEM_SRCDST_SUPPORTED) fprintf(stderr, " -memdst Compress to memory instead of file (useful for benchmarking)\n"); #endif + fprintf(stderr, " -report Report compression progress\n"); fprintf(stderr, " -verbose or -debug Emit debug output\n"); fprintf(stderr, " -version Print version information and exit\n"); fprintf(stderr, "Switches for wizards:\n"); @@ -244,6 +284,7 @@ icc_filename = NULL; outfilename = NULL; memdst = FALSE; + report = FALSE; cinfo->err->trace_level = 0; /* Scan command line options, adjust parameters */ @@ -395,6 +436,9 @@ qtablefile = argv[argn]; /* We postpone actually reading the file in case -quality comes later. */ + } else if (keymatch(arg, "report", 3)) { + report = TRUE; + } else if (keymatch(arg, "restart", 1)) { /* Restart interval in MCU rows (or in MCUs with 'b'). */ long lval; @@ -508,13 +552,16 @@ #endif { struct jpeg_compress_struct cinfo; +#ifdef CJPEG_FUZZER + struct my_error_mgr myerr; + struct jpeg_error_mgr &jerr = myerr.pub; +#else struct jpeg_error_mgr jerr; -#ifdef PROGRESS_REPORT - struct cdjpeg_progress_mgr progress; #endif + struct cdjpeg_progress_mgr progress; int file_index; cjpeg_source_ptr src_mgr; - FILE *input_file; + FILE *input_file = NULL; FILE *icc_file; JOCTET *icc_profile = NULL; long icc_len = 0; @@ -632,13 +679,24 @@ fclose(icc_file); } -#ifdef PROGRESS_REPORT - start_progress_monitor((j_common_ptr)&cinfo, &progress); +#ifdef CJPEG_FUZZER + jerr.error_exit = my_error_exit; + jerr.emit_message = my_emit_message; + if (setjmp(myerr.setjmp_buffer)) + HANDLE_ERROR() #endif + if (report) { + start_progress_monitor((j_common_ptr)&cinfo, &progress); + progress.report = report; + } + /* Figure out the input file format, and set up to read it. */ src_mgr = select_file_type(&cinfo, input_file); src_mgr->input_file = input_file; +#ifdef CJPEG_FUZZER + src_mgr->max_pixels = 1048576; +#endif /* Read the input file header to obtain file size & colorspace. */ (*src_mgr->start_input) (&cinfo, src_mgr); @@ -657,6 +715,11 @@ #endif jpeg_stdio_dest(&cinfo, output_file); +#ifdef CJPEG_FUZZER + if (setjmp(myerr.setjmp_buffer)) + HANDLE_ERROR() +#endif + /* Start compressor */ jpeg_start_compress(&cinfo, TRUE); @@ -680,12 +743,13 @@ if (output_file != stdout && output_file != NULL) fclose(output_file); -#ifdef PROGRESS_REPORT - end_progress_monitor((j_common_ptr)&cinfo); -#endif + if (report) + end_progress_monitor((j_common_ptr)&cinfo); if (memdst) { +#ifndef CJPEG_FUZZER fprintf(stderr, "Compressed size: %lu bytes\n", outsize); +#endif free(outbuffer); } diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/croptest.in b/src/3rdparty/chromium/third_party/libjpeg_turbo/croptest.in --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/croptest.in 1970-01-01 01:00:00.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/croptest.in 2021-11-20 03:41:33.390600578 +0000 @@ -0,0 +1,95 @@ +#!/bin/bash + +set -u +set -e +trap onexit INT +trap onexit TERM +trap onexit EXIT + +onexit() +{ + if [ -d $OUTDIR ]; then + rm -rf $OUTDIR + fi +} + +runme() +{ + echo \*\*\* $* + $* +} + +IMAGE=vgl_6548_0026a.bmp +WIDTH=128 +HEIGHT=95 +IMGDIR=@CMAKE_CURRENT_SOURCE_DIR@/testimages +OUTDIR=`mktemp -d /tmp/__croptest_output.XXXXXX` +EXEDIR=@CMAKE_CURRENT_BINARY_DIR@ + +if [ -d $OUTDIR ]; then + rm -rf $OUTDIR +fi +mkdir -p $OUTDIR + +exec >$EXEDIR/croptest.log + +echo "============================================================" +echo "$IMAGE ($WIDTH x $HEIGHT)" +echo "============================================================" +echo + +for PROGARG in "" -progressive; do + + cp $IMGDIR/$IMAGE $OUTDIR + basename=`basename $IMAGE .bmp` + echo "------------------------------------------------------------" + echo "Generating test images" + echo "------------------------------------------------------------" + echo + runme $EXEDIR/cjpeg $PROGARG -grayscale -outfile $OUTDIR/${basename}_GRAY.jpg $IMGDIR/${basename}.bmp + runme $EXEDIR/cjpeg $PROGARG -sample 2x2 -outfile $OUTDIR/${basename}_420.jpg $IMGDIR/${basename}.bmp + runme $EXEDIR/cjpeg $PROGARG -sample 2x1 -outfile $OUTDIR/${basename}_422.jpg $IMGDIR/${basename}.bmp + runme $EXEDIR/cjpeg $PROGARG -sample 1x2 -outfile $OUTDIR/${basename}_440.jpg $IMGDIR/${basename}.bmp + runme $EXEDIR/cjpeg $PROGARG -sample 1x1 -outfile $OUTDIR/${basename}_444.jpg $IMGDIR/${basename}.bmp + echo + + for NSARG in "" -nosmooth; do + + for COLORSARG in "" "-colors 256 -dither none -onepass"; do + + for Y in {0..16}; do + + for H in {1..16}; do + + X=$(( (Y*16)%128 )) + W=$(( WIDTH-X-7 )) + if [ $Y -le 15 ]; then + CROPSPEC="${W}x${H}+${X}+${Y}" + else + Y2=$(( HEIGHT-H )); + CROPSPEC="${W}x${H}+${X}+${Y2}" + fi + + echo "------------------------------------------------------------" + echo $PROGARG $NSARG $COLORSARG -crop $CROPSPEC + echo "------------------------------------------------------------" + echo + for samp in GRAY 420 422 440 444; do + $EXEDIR/djpeg $NSARG $COLORSARG -rgb -outfile $OUTDIR/${basename}_${samp}_full.ppm $OUTDIR/${basename}_${samp}.jpg + convert -crop $CROPSPEC $OUTDIR/${basename}_${samp}_full.ppm $OUTDIR/${basename}_${samp}_ref.ppm + runme $EXEDIR/djpeg $NSARG $COLORSARG -crop $CROPSPEC -rgb -outfile $OUTDIR/${basename}_${samp}.ppm $OUTDIR/${basename}_${samp}.jpg + runme cmp $OUTDIR/${basename}_${samp}.ppm $OUTDIR/${basename}_${samp}_ref.ppm + done + echo + + done + + done + + done + + done + +done + +echo SUCCESS! diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/DIR_METADATA b/src/3rdparty/chromium/third_party/libjpeg_turbo/DIR_METADATA --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/DIR_METADATA 1970-01-01 01:00:00.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/DIR_METADATA 2021-11-20 03:41:33.389600594 +0000 @@ -0,0 +1,3 @@ +monorail { + component: "Internals>Images>Codecs" +} diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/djpeg.1 b/src/3rdparty/chromium/third_party/libjpeg_turbo/djpeg.1 --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/djpeg.1 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/djpeg.1 2021-11-20 03:41:33.390600578 +0000 @@ -1,4 +1,4 @@ -.TH DJPEG 1 "13 November 2017" +.TH DJPEG 1 "4 November 2020" .SH NAME djpeg \- decompress a JPEG file to an image file .SH SYNOPSIS @@ -15,8 +15,7 @@ .B djpeg decompresses the named JPEG file, or the standard input if no file is named, and produces an image file on the standard output. PBMPLUS (PPM/PGM), BMP, -GIF, Targa, or RLE (Utah Raster Toolkit) output format can be selected. -(RLE is supported only if the URT library is available.) +GIF, or Targa output format can be selected. .SH OPTIONS All switch names may be abbreviated; for example, .B \-grayscale @@ -81,9 +80,20 @@ format is emitted. .TP .B \-gif -Select GIF output format. Since GIF does not support more than 256 colors, +Select GIF output format (LZW-compressed). Since GIF does not support more +than 256 colors, .B \-colors 256 -is assumed (unless you specify a smaller number of colors). +is assumed (unless you specify a smaller number of colors). If you specify +.BR \-fast, +the default number of colors is 216. +.TP +.B \-gif0 +Select GIF output format (uncompressed). Since GIF does not support more than +256 colors, +.B \-colors 256 +is assumed (unless you specify a smaller number of colors). If you specify +.BR \-fast, +the default number of colors is 216. .TP .B \-os2 Select BMP output format (OS/2 1.x flavor). 8-bit colormapped format is @@ -100,9 +110,6 @@ .B \-grayscale is specified; otherwise PPM is emitted. .TP -.B \-rle -Select RLE output format. (Requires URT library.) -.TP .B \-targa Select Targa output format. Grayscale format is emitted if the JPEG file is grayscale or if @@ -114,32 +121,40 @@ Switches for advanced users: .TP .B \-dct int -Use integer DCT method (default). +Use accurate integer DCT method (default). .TP .B \-dct fast -Use fast integer DCT (less accurate). -In libjpeg-turbo, the fast method is generally about 5-15% faster than the int -method when using the x86/x86-64 SIMD extensions (results may vary with other -SIMD implementations, or when using libjpeg-turbo without SIMD extensions.) If -the JPEG image was compressed using a quality level of 85 or below, then there -should be little or no perceptible difference between the two algorithms. When -decompressing images that were compressed using quality levels above 85, -however, the difference between the fast and int methods becomes more -pronounced. With images compressed using quality=97, for instance, the fast -method incurs generally about a 4-6 dB loss (in PSNR) relative to the int -method, but this can be larger for some images. If you can avoid it, do not -use the fast method when decompressing images that were compressed using -quality levels above 97. The algorithm often degenerates for such images and -can actually produce a more lossy output image than if the JPEG image had been -compressed using lower quality levels. +Use less accurate integer DCT method [legacy feature]. +When the Independent JPEG Group's software was first released in 1991, the +decompression time for a 1-megapixel JPEG image on a mainstream PC was measured +in minutes. Thus, the \fBfast\fR integer DCT algorithm provided noticeable +performance benefits. On modern CPUs running libjpeg-turbo, however, the +decompression time for a 1-megapixel JPEG image is measured in milliseconds, +and thus the performance benefits of the \fBfast\fR algorithm are much less +noticeable. On modern x86/x86-64 CPUs that support AVX2 instructions, the +\fBfast\fR and \fBint\fR methods have similar performance. On other types of +CPUs, the \fBfast\fR method is generally about 5-15% faster than the \fBint\fR +method. + +If the JPEG image was compressed using a quality level of 85 or below, then +there should be little or no perceptible quality difference between the two +algorithms. When decompressing images that were compressed using quality +levels above 85, however, the difference between the \fBfast\fR and \fBint\fR +methods becomes more pronounced. With images compressed using quality=97, for +instance, the \fBfast\fR method incurs generally about a 4-6 dB loss in PSNR +relative to the \fBint\fR method, but this can be larger for some images. If +you can avoid it, do not use the \fBfast\fR method when decompressing images +that were compressed using quality levels above 97. The algorithm often +degenerates for such images and can actually produce a more lossy output image +than if the JPEG image had been compressed using lower quality levels. .TP .B \-dct float -Use floating-point DCT method. -The float method is mainly a legacy feature. It does not produce significantly -more accurate results than the int method, and it is much slower. The float -method may also give different results on different machines due to varying -roundoff behavior, whereas the integer methods should give the same results on -all machines. +Use floating-point DCT method [legacy feature]. +The \fBfloat\fR method does not produce significantly more accurate results +than the \fBint\fR method, and it is much slower. The \fBfloat\fR method may +also give different results on different machines due to varying roundoff +behavior, whereas the integer methods should give the same results on all +machines. .TP .B \-dither fs Use Floyd-Steinberg dithering in color quantization. @@ -190,6 +205,19 @@ .B \-max 4m selects 4000000 bytes. If more space is needed, an error will occur. .TP +.BI \-maxscans " N" +Abort if the JPEG image contains more than +.I N +scans. This feature demonstrates a method by which applications can guard +against denial-of-service attacks instigated by specially-crafted malformed +JPEG images containing numerous scans with missing image data or image data +consisting only of "EOB runs" (a feature of progressive JPEG images that allows +potentially hundreds of thousands of adjoining zero-value pixels to be +represented using only a few bytes.) Attempting to decompress such malformed +JPEG images can cause excessive CPU activity, since the decompressor must fully +process each scan (even if the scan is corrupt) before it can proceed to the +next scan. +.TP .BI \-outfile " name" Send output image to the named file, not to standard output. .TP @@ -197,6 +225,9 @@ Load input file into memory before decompressing. This feature was implemented mainly as a way of testing the in-memory source manager (jpeg_mem_src().) .TP +.BI \-report +Report decompression progress. +.TP .BI \-skip " Y0,Y1" Decompress all rows of the JPEG image except those between Y0 and Y1 (inclusive.) Note that if decompression scaling is being used, then Y0 and Y1 @@ -210,6 +241,12 @@ scaled image dimensions. Currently this option only works with the PBMPLUS (PPM/PGM), GIF, and Targa output formats. .TP +.BI \-strict +Treat all warnings as fatal. This feature also demonstrates a method by which +applications can guard against attacks instigated by specially-crafted +malformed JPEG images. Enabling this option will cause the decompressor to +abort if the JPEG image contains incomplete or corrupt image data. +.TP .B \-verbose Enable debug printout. More .BR \-v 's @@ -253,12 +290,6 @@ .B \-dither none may give acceptable results in two-pass mode, but is seldom tolerable in one-pass mode. -.PP -If you are fortunate enough to have very fast floating point hardware, -\fB\-dct float\fR may be even faster than \fB\-dct fast\fR. But on most -machines \fB\-dct float\fR is slower than \fB\-dct int\fR; in this case it is -not worth using, because its theoretical accuracy advantage is too small to be -significant in practice. .SH ENVIRONMENT .TP .B JPEGMEM @@ -287,10 +318,3 @@ This file was modified by The libjpeg-turbo Project to include only information relevant to libjpeg-turbo, to wordsmith certain sections, and to describe features not present in libjpeg. -.SH ISSUES -Support for compressed GIF output files was removed in djpeg v6b due to -concerns over the Unisys LZW patent. Although this patent expired in 2006, -djpeg still lacks compressed GIF support, for these historical reasons. -(Conversion of JPEG files to GIF is usually a bad idea anyway, since GIF is a -256-color format.) The uncompressed GIF files that djpeg generates are larger -than they should be, but they are readable by standard GIF decoders. diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/djpeg.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/djpeg.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/djpeg.c 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/djpeg.c 2021-11-20 03:41:33.391600562 +0000 @@ -3,9 +3,9 @@ * * This file was part of the Independent JPEG Group's software: * Copyright (C) 1991-1997, Thomas G. Lane. - * Modified 2013 by Guido Vollbeding. + * Modified 2013-2019 by Guido Vollbeding. * libjpeg-turbo Modifications: - * Copyright (C) 2010-2011, 2013-2017, D. R. Commander. + * Copyright (C) 2010-2011, 2013-2017, 2019-2020, D. R. Commander. * Copyright (C) 2015, Google, Inc. * For conditions of distribution and use, see the accompanying README.ijg * file. @@ -68,10 +68,10 @@ typedef enum { FMT_BMP, /* BMP format (Windows flavor) */ - FMT_GIF, /* GIF format */ + FMT_GIF, /* GIF format (LZW-compressed) */ + FMT_GIF0, /* GIF format (uncompressed) */ FMT_OS2, /* BMP format (OS/2 flavor) */ FMT_PPM, /* PPM/PGM (PBMPLUS formats) */ - FMT_RLE, /* RLE format */ FMT_TARGA, /* Targa format */ FMT_TIFF /* TIFF format */ } IMAGE_FORMATS; @@ -94,11 +94,14 @@ static const char *progname; /* program name for error messages */ static char *icc_filename; /* for -icc switch */ +static JDIMENSION max_scans; /* for -maxscans switch */ static char *outfilename; /* for -outfile switch */ -boolean memsrc; /* for -memsrc switch */ +static boolean memsrc; /* for -memsrc switch */ +static boolean report; /* for -report switch */ boolean skip, crop; JDIMENSION skip_start, skip_end; JDIMENSION crop_x, crop_y, crop_width, crop_height; +static boolean strict; /* for -strict switch */ #define INPUT_BUF_SIZE 4096 @@ -127,8 +130,10 @@ (DEFAULT_FMT == FMT_BMP ? " (default)" : "")); #endif #ifdef GIF_SUPPORTED - fprintf(stderr, " -gif Select GIF output format%s\n", + fprintf(stderr, " -gif Select GIF output format (LZW-compressed)%s\n", (DEFAULT_FMT == FMT_GIF ? " (default)" : "")); + fprintf(stderr, " -gif0 Select GIF output format (uncompressed)%s\n", + (DEFAULT_FMT == FMT_GIF0 ? " (default)" : "")); #endif #ifdef BMP_SUPPORTED fprintf(stderr, " -os2 Select BMP output format (OS/2 style)%s\n", @@ -138,25 +143,21 @@ fprintf(stderr, " -pnm Select PBMPLUS (PPM/PGM) output format%s\n", (DEFAULT_FMT == FMT_PPM ? " (default)" : "")); #endif -#ifdef RLE_SUPPORTED - fprintf(stderr, " -rle Select Utah RLE output format%s\n", - (DEFAULT_FMT == FMT_RLE ? " (default)" : "")); -#endif #ifdef TARGA_SUPPORTED fprintf(stderr, " -targa Select Targa output format%s\n", (DEFAULT_FMT == FMT_TARGA ? " (default)" : "")); #endif fprintf(stderr, "Switches for advanced users:\n"); #ifdef DCT_ISLOW_SUPPORTED - fprintf(stderr, " -dct int Use integer DCT method%s\n", + fprintf(stderr, " -dct int Use accurate integer DCT method%s\n", (JDCT_DEFAULT == JDCT_ISLOW ? " (default)" : "")); #endif #ifdef DCT_IFAST_SUPPORTED - fprintf(stderr, " -dct fast Use fast integer DCT (less accurate)%s\n", + fprintf(stderr, " -dct fast Use less accurate integer DCT method [legacy feature]%s\n", (JDCT_DEFAULT == JDCT_IFAST ? " (default)" : "")); #endif #ifdef DCT_FLOAT_SUPPORTED - fprintf(stderr, " -dct float Use floating-point DCT method%s\n", + fprintf(stderr, " -dct float Use floating-point DCT method [legacy feature]%s\n", (JDCT_DEFAULT == JDCT_FLOAT ? " (default)" : "")); #endif fprintf(stderr, " -dither fs Use F-S dithering (default)\n"); @@ -171,14 +172,16 @@ fprintf(stderr, " -onepass Use 1-pass quantization (fast, low quality)\n"); #endif fprintf(stderr, " -maxmemory N Maximum memory to use (in kbytes)\n"); + fprintf(stderr, " -maxscans N Maximum number of scans to allow in input file\n"); fprintf(stderr, " -outfile name Specify name for output file\n"); #if JPEG_LIB_VERSION >= 80 || defined(MEM_SRCDST_SUPPORTED) fprintf(stderr, " -memsrc Load input file into memory before decompressing\n"); #endif - + fprintf(stderr, " -report Report decompression progress\n"); fprintf(stderr, " -skip Y0,Y1 Decompress all rows except those between Y0 and Y1 (inclusive)\n"); fprintf(stderr, " -crop WxH+X+Y Decompress only a rectangular subregion of the image\n"); fprintf(stderr, " [requires PBMPLUS (PPM/PGM), GIF, or Targa output format]\n"); + fprintf(stderr, " -strict Treat all warnings as fatal\n"); fprintf(stderr, " -verbose or -debug Emit debug output\n"); fprintf(stderr, " -version Print version information and exit\n"); exit(EXIT_FAILURE); @@ -203,10 +206,13 @@ /* Set up default JPEG parameters. */ requested_fmt = DEFAULT_FMT; /* set default output file format */ icc_filename = NULL; + max_scans = 0; outfilename = NULL; memsrc = FALSE; + report = FALSE; skip = FALSE; crop = FALSE; + strict = FALSE; cinfo->err->trace_level = 0; /* Scan command line options, adjust parameters */ @@ -224,7 +230,7 @@ arg++; /* advance past switch marker character */ if (keymatch(arg, "bmp", 1)) { - /* BMP output format. */ + /* BMP output format (Windows flavor). */ requested_fmt = FMT_BMP; } else if (keymatch(arg, "colors", 1) || keymatch(arg, "colours", 1) || @@ -295,9 +301,13 @@ cinfo->do_fancy_upsampling = FALSE; } else if (keymatch(arg, "gif", 1)) { - /* GIF output format. */ + /* GIF output format (LZW-compressed). */ requested_fmt = FMT_GIF; + } else if (keymatch(arg, "gif0", 4)) { + /* GIF output format (uncompressed). */ + requested_fmt = FMT_GIF0; + } else if (keymatch(arg, "grayscale", 2) || keymatch(arg, "greyscale", 2)) { /* Force monochrome output. */ @@ -351,6 +361,12 @@ lval *= 1000L; cinfo->mem->max_memory_to_use = lval * 1000L; + } else if (keymatch(arg, "maxscans", 4)) { + if (++argn >= argc) /* advance to next argument */ + usage(); + if (sscanf(argv[argn], "%u", &max_scans) != 1) + usage(); + } else if (keymatch(arg, "nosmooth", 3)) { /* Suppress fancy upsampling */ cinfo->do_fancy_upsampling = FALSE; @@ -383,9 +399,8 @@ /* PPM/PGM output format. */ requested_fmt = FMT_PPM; - } else if (keymatch(arg, "rle", 1)) { - /* RLE output format. */ - requested_fmt = FMT_RLE; + } else if (keymatch(arg, "report", 2)) { + report = TRUE; } else if (keymatch(arg, "scale", 2)) { /* Scale the output image by a fraction M/N. */ @@ -413,6 +428,9 @@ usage(); crop = TRUE; + } else if (keymatch(arg, "strict", 2)) { + strict = TRUE; + } else if (keymatch(arg, "targa", 1)) { /* Targa output format. */ requested_fmt = FMT_TARGA; @@ -444,7 +462,7 @@ ERREXIT(cinfo, JERR_CANT_SUSPEND); } datasrc->bytes_in_buffer--; - return GETJOCTET(*datasrc->next_input_byte++); + return *datasrc->next_input_byte++; } @@ -499,6 +517,19 @@ } +METHODDEF(void) +my_emit_message(j_common_ptr cinfo, int msg_level) +{ + if (msg_level < 0) { + /* Treat warning as fatal */ + cinfo->err->error_exit(cinfo); + } else { + if (cinfo->err->trace_level >= msg_level) + cinfo->err->output_message(cinfo); + } +} + + /* * The main program. */ @@ -512,9 +543,7 @@ { struct jpeg_decompress_struct cinfo; struct jpeg_error_mgr jerr; -#ifdef PROGRESS_REPORT struct cdjpeg_progress_mgr progress; -#endif int file_index; djpeg_dest_ptr dest_mgr = NULL; FILE *input_file; @@ -561,6 +590,9 @@ file_index = parse_switches(&cinfo, argc, argv, 0, FALSE); + if (strict) + jerr.emit_message = my_emit_message; + #ifdef TWO_FILE_COMMANDLINE /* Must have either -outfile switch or explicit output file name */ if (outfilename == NULL) { @@ -607,9 +639,11 @@ output_file = write_stdout(); } -#ifdef PROGRESS_REPORT - start_progress_monitor((j_common_ptr)&cinfo, &progress); -#endif + if (report || max_scans != 0) { + start_progress_monitor((j_common_ptr)&cinfo, &progress); + progress.report = report; + progress.max_scans = max_scans; + } /* Specify data source for decompression */ #if JPEG_LIB_VERSION >= 80 || defined(MEM_SRCDST_SUPPORTED) @@ -657,7 +691,10 @@ #endif #ifdef GIF_SUPPORTED case FMT_GIF: - dest_mgr = jinit_write_gif(&cinfo); + dest_mgr = jinit_write_gif(&cinfo, TRUE); + break; + case FMT_GIF0: + dest_mgr = jinit_write_gif(&cinfo, FALSE); break; #endif #ifdef PPM_SUPPORTED @@ -665,11 +702,6 @@ dest_mgr = jinit_write_ppm(&cinfo); break; #endif -#ifdef RLE_SUPPORTED - case FMT_RLE: - dest_mgr = jinit_write_rle(&cinfo); - break; -#endif #ifdef TARGA_SUPPORTED case FMT_TARGA: dest_mgr = jinit_write_targa(&cinfo); @@ -712,7 +744,12 @@ dest_mgr->buffer_height); (*dest_mgr->put_pixel_rows) (&cinfo, dest_mgr, num_scanlines); } - jpeg_skip_scanlines(&cinfo, skip_end - skip_start + 1); + if ((tmp = jpeg_skip_scanlines(&cinfo, skip_end - skip_start + 1)) != + skip_end - skip_start + 1) { + fprintf(stderr, "%s: jpeg_skip_scanlines() returned %d rather than %d\n", + progname, tmp, skip_end - skip_start + 1); + return EXIT_FAILURE; + } while (cinfo.output_scanline < cinfo.output_height) { num_scanlines = jpeg_read_scanlines(&cinfo, dest_mgr->buffer, dest_mgr->buffer_height); @@ -748,13 +785,24 @@ cinfo.output_height = tmp; /* Process data */ - jpeg_skip_scanlines(&cinfo, crop_y); + if ((tmp = jpeg_skip_scanlines(&cinfo, crop_y)) != crop_y) { + fprintf(stderr, "%s: jpeg_skip_scanlines() returned %d rather than %d\n", + progname, tmp, crop_y); + return EXIT_FAILURE; + } while (cinfo.output_scanline < crop_y + crop_height) { num_scanlines = jpeg_read_scanlines(&cinfo, dest_mgr->buffer, dest_mgr->buffer_height); (*dest_mgr->put_pixel_rows) (&cinfo, dest_mgr, num_scanlines); } - jpeg_skip_scanlines(&cinfo, cinfo.output_height - crop_y - crop_height); + if ((tmp = + jpeg_skip_scanlines(&cinfo, + cinfo.output_height - crop_y - crop_height)) != + cinfo.output_height - crop_y - crop_height) { + fprintf(stderr, "%s: jpeg_skip_scanlines() returned %d rather than %d\n", + progname, tmp, cinfo.output_height - crop_y - crop_height); + return EXIT_FAILURE; + } /* Normal full-image decompress */ } else { @@ -769,12 +817,11 @@ } } -#ifdef PROGRESS_REPORT /* Hack: count final pass as done in case finish_output does an extra pass. * The library won't have updated completed_passes. */ - progress.pub.completed_passes = progress.pub.total_passes; -#endif + if (report || max_scans != 0) + progress.pub.completed_passes = progress.pub.total_passes; if (icc_filename != NULL) { FILE *icc_file; @@ -813,9 +860,8 @@ if (output_file != stdout) fclose(output_file); -#ifdef PROGRESS_REPORT - end_progress_monitor((j_common_ptr)&cinfo); -#endif + if (report || max_scans != 0) + end_progress_monitor((j_common_ptr)&cinfo); if (memsrc) free(inbuffer); diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/gtest/cjpeg-gtest-wrapper.cpp b/src/3rdparty/chromium/third_party/libjpeg_turbo/gtest/cjpeg-gtest-wrapper.cpp --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/gtest/cjpeg-gtest-wrapper.cpp 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/gtest/cjpeg-gtest-wrapper.cpp 2021-11-20 03:41:33.391600562 +0000 @@ -119,7 +119,7 @@ EXPECT_EQ(cjpeg(12, command_line), 0); // Compare expected MD5 sum against that of test image. - const std::string EXPECTED_MD5 = "e59bb462016a8d9a748c330a3474bb55"; + const std::string EXPECTED_MD5 = "0ba15f9dab81a703505f835f9dbbac6d"; EXPECT_TRUE(CompareFileAndMD5(output_path, EXPECTED_MD5)); } diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/jccolext.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/jccolext.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/jccolext.c 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/jccolext.c 2021-11-20 03:41:33.391600562 +0000 @@ -48,9 +48,9 @@ outptr2 = output_buf[2][output_row]; output_row++; for (col = 0; col < num_cols; col++) { - r = GETJSAMPLE(inptr[RGB_RED]); - g = GETJSAMPLE(inptr[RGB_GREEN]); - b = GETJSAMPLE(inptr[RGB_BLUE]); + r = inptr[RGB_RED]; + g = inptr[RGB_GREEN]; + b = inptr[RGB_BLUE]; inptr += RGB_PIXELSIZE; /* If the inputs are 0..MAXJSAMPLE, the outputs of these equations * must be too; we do not need an explicit range-limiting operation. @@ -100,9 +100,9 @@ outptr = output_buf[0][output_row]; output_row++; for (col = 0; col < num_cols; col++) { - r = GETJSAMPLE(inptr[RGB_RED]); - g = GETJSAMPLE(inptr[RGB_GREEN]); - b = GETJSAMPLE(inptr[RGB_BLUE]); + r = inptr[RGB_RED]; + g = inptr[RGB_GREEN]; + b = inptr[RGB_BLUE]; inptr += RGB_PIXELSIZE; /* Y */ outptr[col] = (JSAMPLE)((ctab[r + R_Y_OFF] + ctab[g + G_Y_OFF] + @@ -135,9 +135,9 @@ outptr2 = output_buf[2][output_row]; output_row++; for (col = 0; col < num_cols; col++) { - outptr0[col] = GETJSAMPLE(inptr[RGB_RED]); - outptr1[col] = GETJSAMPLE(inptr[RGB_GREEN]); - outptr2[col] = GETJSAMPLE(inptr[RGB_BLUE]); + outptr0[col] = inptr[RGB_RED]; + outptr1[col] = inptr[RGB_GREEN]; + outptr2[col] = inptr[RGB_BLUE]; inptr += RGB_PIXELSIZE; } } diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/jccolor.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/jccolor.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/jccolor.c 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/jccolor.c 2021-11-20 03:41:33.391600562 +0000 @@ -392,11 +392,11 @@ outptr3 = output_buf[3][output_row]; output_row++; for (col = 0; col < num_cols; col++) { - r = MAXJSAMPLE - GETJSAMPLE(inptr[0]); - g = MAXJSAMPLE - GETJSAMPLE(inptr[1]); - b = MAXJSAMPLE - GETJSAMPLE(inptr[2]); + r = MAXJSAMPLE - inptr[0]; + g = MAXJSAMPLE - inptr[1]; + b = MAXJSAMPLE - inptr[2]; /* K passes through as-is */ - outptr3[col] = inptr[3]; /* don't need GETJSAMPLE here */ + outptr3[col] = inptr[3]; inptr += 4; /* If the inputs are 0..MAXJSAMPLE, the outputs of these equations * must be too; we do not need an explicit range-limiting operation. @@ -438,7 +438,7 @@ outptr = output_buf[0][output_row]; output_row++; for (col = 0; col < num_cols; col++) { - outptr[col] = inptr[0]; /* don't need GETJSAMPLE() here */ + outptr[col] = inptr[0]; inptr += instride; } } @@ -497,7 +497,7 @@ inptr = *input_buf; outptr = output_buf[ci][output_row]; for (col = 0; col < num_cols; col++) { - outptr[col] = inptr[ci]; /* don't need GETJSAMPLE() here */ + outptr[col] = inptr[ci]; inptr += nc; } } diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/jcdctmgr.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/jcdctmgr.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/jcdctmgr.c 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/jcdctmgr.c 2021-11-20 03:41:33.391600562 +0000 @@ -381,19 +381,19 @@ elemptr = sample_data[elemr] + start_col; #if DCTSIZE == 8 /* unroll the inner loop */ - *workspaceptr++ = GETJSAMPLE(*elemptr++) - CENTERJSAMPLE; - *workspaceptr++ = GETJSAMPLE(*elemptr++) - CENTERJSAMPLE; - *workspaceptr++ = GETJSAMPLE(*elemptr++) - CENTERJSAMPLE; - *workspaceptr++ = GETJSAMPLE(*elemptr++) - CENTERJSAMPLE; - *workspaceptr++ = GETJSAMPLE(*elemptr++) - CENTERJSAMPLE; - *workspaceptr++ = GETJSAMPLE(*elemptr++) - CENTERJSAMPLE; - *workspaceptr++ = GETJSAMPLE(*elemptr++) - CENTERJSAMPLE; - *workspaceptr++ = GETJSAMPLE(*elemptr++) - CENTERJSAMPLE; + *workspaceptr++ = (*elemptr++) - CENTERJSAMPLE; + *workspaceptr++ = (*elemptr++) - CENTERJSAMPLE; + *workspaceptr++ = (*elemptr++) - CENTERJSAMPLE; + *workspaceptr++ = (*elemptr++) - CENTERJSAMPLE; + *workspaceptr++ = (*elemptr++) - CENTERJSAMPLE; + *workspaceptr++ = (*elemptr++) - CENTERJSAMPLE; + *workspaceptr++ = (*elemptr++) - CENTERJSAMPLE; + *workspaceptr++ = (*elemptr++) - CENTERJSAMPLE; #else { register int elemc; for (elemc = DCTSIZE; elemc > 0; elemc--) - *workspaceptr++ = GETJSAMPLE(*elemptr++) - CENTERJSAMPLE; + *workspaceptr++ = (*elemptr++) - CENTERJSAMPLE; } #endif } @@ -533,20 +533,19 @@ for (elemr = 0; elemr < DCTSIZE; elemr++) { elemptr = sample_data[elemr] + start_col; #if DCTSIZE == 8 /* unroll the inner loop */ - *workspaceptr++ = (FAST_FLOAT)(GETJSAMPLE(*elemptr++) - CENTERJSAMPLE); - *workspaceptr++ = (FAST_FLOAT)(GETJSAMPLE(*elemptr++) - CENTERJSAMPLE); - *workspaceptr++ = (FAST_FLOAT)(GETJSAMPLE(*elemptr++) - CENTERJSAMPLE); - *workspaceptr++ = (FAST_FLOAT)(GETJSAMPLE(*elemptr++) - CENTERJSAMPLE); - *workspaceptr++ = (FAST_FLOAT)(GETJSAMPLE(*elemptr++) - CENTERJSAMPLE); - *workspaceptr++ = (FAST_FLOAT)(GETJSAMPLE(*elemptr++) - CENTERJSAMPLE); - *workspaceptr++ = (FAST_FLOAT)(GETJSAMPLE(*elemptr++) - CENTERJSAMPLE); - *workspaceptr++ = (FAST_FLOAT)(GETJSAMPLE(*elemptr++) - CENTERJSAMPLE); + *workspaceptr++ = (FAST_FLOAT)((*elemptr++) - CENTERJSAMPLE); + *workspaceptr++ = (FAST_FLOAT)((*elemptr++) - CENTERJSAMPLE); + *workspaceptr++ = (FAST_FLOAT)((*elemptr++) - CENTERJSAMPLE); + *workspaceptr++ = (FAST_FLOAT)((*elemptr++) - CENTERJSAMPLE); + *workspaceptr++ = (FAST_FLOAT)((*elemptr++) - CENTERJSAMPLE); + *workspaceptr++ = (FAST_FLOAT)((*elemptr++) - CENTERJSAMPLE); + *workspaceptr++ = (FAST_FLOAT)((*elemptr++) - CENTERJSAMPLE); + *workspaceptr++ = (FAST_FLOAT)((*elemptr++) - CENTERJSAMPLE); #else { register int elemc; for (elemc = DCTSIZE; elemc > 0; elemc--) - *workspaceptr++ = (FAST_FLOAT) - (GETJSAMPLE(*elemptr++) - CENTERJSAMPLE); + *workspaceptr++ = (FAST_FLOAT)((*elemptr++) - CENTERJSAMPLE); } #endif } diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/jchuff.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/jchuff.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/jchuff.c 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/jchuff.c 2021-11-20 03:41:33.391600562 +0000 @@ -4,8 +4,10 @@ * This file was part of the Independent JPEG Group's software: * Copyright (C) 1991-1997, Thomas G. Lane. * libjpeg-turbo Modifications: - * Copyright (C) 2009-2011, 2014-2016, 2018-2019, D. R. Commander. + * Copyright (C) 2009-2011, 2014-2016, 2018-2021, D. R. Commander. * Copyright (C) 2015, Matthieu Darbois. + * Copyright (C) 2018, Matthias Räncker. + * Copyright (C) 2020, Arm Limited. * For conditions of distribution and use, see the accompanying README.ijg * file. * @@ -34,23 +36,28 @@ * memory footprint by 64k, which is important for some mobile applications * that create many isolated instances of libjpeg-turbo (web browsers, for * instance.) This may improve performance on some mobile platforms as well. - * This feature is enabled by default only on ARM processors, because some x86 + * This feature is enabled by default only on Arm processors, because some x86 * chips have a slow implementation of bsr, and the use of clz/bsr cannot be * shown to have a significant performance impact even on the x86 chips that - * have a fast implementation of it. When building for ARMv6, you can + * have a fast implementation of it. When building for Armv6, you can * explicitly disable the use of clz/bsr by adding -mthumb to the compiler * flags (this defines __thumb__). */ /* NOTE: Both GCC and Clang define __GNUC__ */ -#if defined(__GNUC__) && (defined(__arm__) || defined(__aarch64__)) +#if (defined(__GNUC__) && (defined(__arm__) || defined(__aarch64__))) || \ + defined(_M_ARM) || defined(_M_ARM64) #if !defined(__thumb__) || defined(__thumb2__) #define USE_CLZ_INTRINSIC #endif #endif #ifdef USE_CLZ_INTRINSIC +#if defined(_MSC_VER) && !defined(__clang__) +#define JPEG_NBITS_NONZERO(x) (32 - _CountLeadingZeros(x)) +#else #define JPEG_NBITS_NONZERO(x) (32 - __builtin_clz(x)) +#endif #define JPEG_NBITS(x) (x ? JPEG_NBITS_NONZERO(x) : 0) #else #include "jpeg_nbits_table.h" @@ -65,31 +72,42 @@ * but must not be updated permanently until we complete the MCU. */ -typedef struct { - size_t put_buffer; /* current bit-accumulation buffer */ - int put_bits; /* # of bits now in it */ - int last_dc_val[MAX_COMPS_IN_SCAN]; /* last DC coef for each component */ -} savable_state; +#if defined(__x86_64__) && defined(__ILP32__) +typedef unsigned long long bit_buf_type; +#else +typedef size_t bit_buf_type; +#endif -/* This macro is to work around compilers with missing or broken - * structure assignment. You'll need to fix this code if you have - * such a compiler and you change MAX_COMPS_IN_SCAN. +/* NOTE: The more optimal Huffman encoding algorithm is only used by the + * intrinsics implementation of the Arm Neon SIMD extensions, which is why we + * retain the old Huffman encoder behavior when using the GAS implementation. */ - -#ifndef NO_STRUCT_ASSIGN -#define ASSIGN_STATE(dest, src) ((dest) = (src)) +#if defined(WITH_SIMD) && !(defined(__arm__) || defined(__aarch64__) || \ + defined(_M_ARM) || defined(_M_ARM64)) +typedef unsigned long long simd_bit_buf_type; #else -#if MAX_COMPS_IN_SCAN == 4 -#define ASSIGN_STATE(dest, src) \ - ((dest).put_buffer = (src).put_buffer, \ - (dest).put_bits = (src).put_bits, \ - (dest).last_dc_val[0] = (src).last_dc_val[0], \ - (dest).last_dc_val[1] = (src).last_dc_val[1], \ - (dest).last_dc_val[2] = (src).last_dc_val[2], \ - (dest).last_dc_val[3] = (src).last_dc_val[3]) +typedef bit_buf_type simd_bit_buf_type; #endif + +#if (defined(SIZEOF_SIZE_T) && SIZEOF_SIZE_T == 8) || defined(_WIN64) || \ + (defined(__x86_64__) && defined(__ILP32__)) +#define BIT_BUF_SIZE 64 +#elif (defined(SIZEOF_SIZE_T) && SIZEOF_SIZE_T == 4) || defined(_WIN32) +#define BIT_BUF_SIZE 32 +#else +#error Cannot determine word size #endif +#define SIMD_BIT_BUF_SIZE (sizeof(simd_bit_buf_type) * 8) +typedef struct { + union { + bit_buf_type c; + simd_bit_buf_type simd; + } put_buffer; /* current bit accumulation buffer */ + int free_bits; /* # of bits available in it */ + /* (Neon GAS: # of bits now in it) */ + int last_dc_val[MAX_COMPS_IN_SCAN]; /* last DC coef for each component */ +} savable_state; typedef struct { struct jpeg_entropy_encoder pub; /* public fields */ @@ -123,6 +141,7 @@ size_t free_in_buffer; /* # of byte spaces remaining in buffer */ savable_state cur; /* Current bit buffer & DC state */ j_compress_ptr cinfo; /* dump_buffer needs access to this */ + int simd; } working_state; @@ -201,8 +220,17 @@ } /* Initialize bit buffer to empty */ - entropy->saved.put_buffer = 0; - entropy->saved.put_bits = 0; + if (entropy->simd) { + entropy->saved.put_buffer.simd = 0; +#if defined(__aarch64__) && !defined(NEON_INTRINSICS) + entropy->saved.free_bits = 0; +#else + entropy->saved.free_bits = SIMD_BIT_BUF_SIZE; +#endif + } else { + entropy->saved.put_buffer.c = 0; + entropy->saved.free_bits = BIT_BUF_SIZE; + } /* Initialize restart stuff */ entropy->restarts_to_go = cinfo->restart_interval; @@ -287,6 +315,7 @@ * this lets us detect duplicate VAL entries here, and later * allows emit_bits to detect any attempt to emit such symbols. */ + MEMZERO(dtbl->ehufco, sizeof(dtbl->ehufco)); MEMZERO(dtbl->ehufsi, sizeof(dtbl->ehufsi)); /* This is also a convenient place to check for out-of-range @@ -334,94 +363,94 @@ /* Outputting bits to the file */ -/* These macros perform the same task as the emit_bits() function in the - * original libjpeg code. In addition to reducing overhead by explicitly - * inlining the code, additional performance is achieved by taking into - * account the size of the bit buffer and waiting until it is almost full - * before emptying it. This mostly benefits 64-bit platforms, since 6 - * bytes can be stored in a 64-bit bit buffer before it has to be emptied. - */ - -#define EMIT_BYTE() { \ - JOCTET c; \ - put_bits -= 8; \ - c = (JOCTET)GETJOCTET(put_buffer >> put_bits); \ - *buffer++ = c; \ - if (c == 0xFF) /* need to stuff a zero byte? */ \ - *buffer++ = 0; \ -} - -#define PUT_BITS(code, size) { \ - put_bits += size; \ - put_buffer = (put_buffer << size) | code; \ -} - -#if SIZEOF_SIZE_T != 8 && !defined(_WIN64) - -#define CHECKBUF15() { \ - if (put_bits > 15) { \ - EMIT_BYTE() \ - EMIT_BYTE() \ +/* Output byte b and, speculatively, an additional 0 byte. 0xFF must be + * encoded as 0xFF 0x00, so the output buffer pointer is advanced by 2 if the + * byte is 0xFF. Otherwise, the output buffer pointer is advanced by 1, and + * the speculative 0 byte will be overwritten by the next byte. + */ +#define EMIT_BYTE(b) { \ + buffer[0] = (JOCTET)(b); \ + buffer[1] = 0; \ + buffer -= -2 + ((JOCTET)(b) < 0xFF); \ +} + +/* Output the entire bit buffer. If there are no 0xFF bytes in it, then write + * directly to the output buffer. Otherwise, use the EMIT_BYTE() macro to + * encode 0xFF as 0xFF 0x00. + */ +#if BIT_BUF_SIZE == 64 + +#define FLUSH() { \ + if (put_buffer & 0x8080808080808080 & ~(put_buffer + 0x0101010101010101)) { \ + EMIT_BYTE(put_buffer >> 56) \ + EMIT_BYTE(put_buffer >> 48) \ + EMIT_BYTE(put_buffer >> 40) \ + EMIT_BYTE(put_buffer >> 32) \ + EMIT_BYTE(put_buffer >> 24) \ + EMIT_BYTE(put_buffer >> 16) \ + EMIT_BYTE(put_buffer >> 8) \ + EMIT_BYTE(put_buffer ) \ + } else { \ + buffer[0] = (JOCTET)(put_buffer >> 56); \ + buffer[1] = (JOCTET)(put_buffer >> 48); \ + buffer[2] = (JOCTET)(put_buffer >> 40); \ + buffer[3] = (JOCTET)(put_buffer >> 32); \ + buffer[4] = (JOCTET)(put_buffer >> 24); \ + buffer[5] = (JOCTET)(put_buffer >> 16); \ + buffer[6] = (JOCTET)(put_buffer >> 8); \ + buffer[7] = (JOCTET)(put_buffer); \ + buffer += 8; \ } \ } -#endif - -#define CHECKBUF31() { \ - if (put_bits > 31) { \ - EMIT_BYTE() \ - EMIT_BYTE() \ - EMIT_BYTE() \ - EMIT_BYTE() \ - } \ -} +#else -#define CHECKBUF47() { \ - if (put_bits > 47) { \ - EMIT_BYTE() \ - EMIT_BYTE() \ - EMIT_BYTE() \ - EMIT_BYTE() \ - EMIT_BYTE() \ - EMIT_BYTE() \ +#define FLUSH() { \ + if (put_buffer & 0x80808080 & ~(put_buffer + 0x01010101)) { \ + EMIT_BYTE(put_buffer >> 24) \ + EMIT_BYTE(put_buffer >> 16) \ + EMIT_BYTE(put_buffer >> 8) \ + EMIT_BYTE(put_buffer ) \ + } else { \ + buffer[0] = (JOCTET)(put_buffer >> 24); \ + buffer[1] = (JOCTET)(put_buffer >> 16); \ + buffer[2] = (JOCTET)(put_buffer >> 8); \ + buffer[3] = (JOCTET)(put_buffer); \ + buffer += 4; \ } \ } -#if !defined(_WIN32) && !defined(SIZEOF_SIZE_T) -#error Cannot determine word size #endif -#if SIZEOF_SIZE_T == 8 || defined(_WIN64) - -#define EMIT_BITS(code, size) { \ - CHECKBUF47() \ - PUT_BITS(code, size) \ -} - -#define EMIT_CODE(code, size) { \ - temp2 &= (((JLONG)1) << nbits) - 1; \ - CHECKBUF31() \ - PUT_BITS(code, size) \ - PUT_BITS(temp2, nbits) \ +/* Fill the bit buffer to capacity with the leading bits from code, then output + * the bit buffer and put the remaining bits from code into the bit buffer. + */ +#define PUT_AND_FLUSH(code, size) { \ + put_buffer = (put_buffer << (size + free_bits)) | (code >> -free_bits); \ + FLUSH() \ + free_bits += BIT_BUF_SIZE; \ + put_buffer = code; \ } -#else - -#define EMIT_BITS(code, size) { \ - PUT_BITS(code, size) \ - CHECKBUF15() \ +/* Insert code into the bit buffer and output the bit buffer if needed. + * NOTE: We can't flush with free_bits == 0, since the left shift in + * PUT_AND_FLUSH() would have undefined behavior. + */ +#define PUT_BITS(code, size) { \ + free_bits -= size; \ + if (free_bits < 0) \ + PUT_AND_FLUSH(code, size) \ + else \ + put_buffer = (put_buffer << size) | code; \ } -#define EMIT_CODE(code, size) { \ - temp2 &= (((JLONG)1) << nbits) - 1; \ - PUT_BITS(code, size) \ - CHECKBUF15() \ - PUT_BITS(temp2, nbits) \ - CHECKBUF15() \ +#define PUT_CODE(code, size) { \ + temp &= (((JLONG)1) << nbits) - 1; \ + temp |= code << nbits; \ + nbits += size; \ + PUT_BITS(temp, nbits) \ } -#endif - /* Although it is exceedingly rare, it is possible for a Huffman-encoded * coefficient block to be larger than the 128-byte unencoded block. For each @@ -444,6 +473,7 @@ #define STORE_BUFFER() { \ if (localbuf) { \ + size_t bytes, bytestocopy; \ bytes = buffer - _buffer; \ buffer = _buffer; \ while (bytes > 0) { \ @@ -466,20 +496,46 @@ LOCAL(boolean) flush_bits(working_state *state) { - JOCTET _buffer[BUFSIZE], *buffer; - size_t put_buffer; int put_bits; - size_t bytes, bytestocopy; int localbuf = 0; + JOCTET _buffer[BUFSIZE], *buffer, temp; + simd_bit_buf_type put_buffer; int put_bits; + int localbuf = 0; + + if (state->simd) { +#if defined(__aarch64__) && !defined(NEON_INTRINSICS) + put_bits = state->cur.free_bits; +#else + put_bits = SIMD_BIT_BUF_SIZE - state->cur.free_bits; +#endif + put_buffer = state->cur.put_buffer.simd; + } else { + put_bits = BIT_BUF_SIZE - state->cur.free_bits; + put_buffer = state->cur.put_buffer.c; + } - put_buffer = state->cur.put_buffer; - put_bits = state->cur.put_bits; LOAD_BUFFER() - /* fill any partial byte with ones */ - PUT_BITS(0x7F, 7) - while (put_bits >= 8) EMIT_BYTE() + while (put_bits >= 8) { + put_bits -= 8; + temp = (JOCTET)(put_buffer >> put_bits); + EMIT_BYTE(temp) + } + if (put_bits) { + /* fill partial byte with ones */ + temp = (JOCTET)((put_buffer << (8 - put_bits)) | (0xFF >> put_bits)); + EMIT_BYTE(temp) + } - state->cur.put_buffer = 0; /* and reset bit-buffer to empty */ - state->cur.put_bits = 0; + if (state->simd) { /* and reset bit buffer to empty */ + state->cur.put_buffer.simd = 0; +#if defined(__aarch64__) && !defined(NEON_INTRINSICS) + state->cur.free_bits = 0; +#else + state->cur.free_bits = SIMD_BIT_BUF_SIZE; +#endif + } else { + state->cur.put_buffer.c = 0; + state->cur.free_bits = BIT_BUF_SIZE; + } STORE_BUFFER() return TRUE; @@ -493,7 +549,7 @@ c_derived_tbl *dctbl, c_derived_tbl *actbl) { JOCTET _buffer[BUFSIZE], *buffer; - size_t bytes, bytestocopy; int localbuf = 0; + int localbuf = 0; LOAD_BUFFER() @@ -509,53 +565,41 @@ encode_one_block(working_state *state, JCOEFPTR block, int last_dc_val, c_derived_tbl *dctbl, c_derived_tbl *actbl) { - int temp, temp2, temp3; - int nbits; - int r, code, size; + int temp, nbits, free_bits; + bit_buf_type put_buffer; JOCTET _buffer[BUFSIZE], *buffer; - size_t put_buffer; int put_bits; - int code_0xf0 = actbl->ehufco[0xf0], size_0xf0 = actbl->ehufsi[0xf0]; - size_t bytes, bytestocopy; int localbuf = 0; + int localbuf = 0; - put_buffer = state->cur.put_buffer; - put_bits = state->cur.put_bits; + free_bits = state->cur.free_bits; + put_buffer = state->cur.put_buffer.c; LOAD_BUFFER() /* Encode the DC coefficient difference per section F.1.2.1 */ - temp = temp2 = block[0] - last_dc_val; + temp = block[0] - last_dc_val; /* This is a well-known technique for obtaining the absolute value without a * branch. It is derived from an assembly language technique presented in * "How to Optimize for the Pentium Processors", Copyright (c) 1996, 1997 by - * Agner Fog. + * Agner Fog. This code assumes we are on a two's complement machine. */ - temp3 = temp >> (CHAR_BIT * sizeof(int) - 1); - temp ^= temp3; - temp -= temp3; - - /* For a negative input, want temp2 = bitwise complement of abs(input) */ - /* This code assumes we are on a two's complement machine */ - temp2 += temp3; + nbits = temp >> (CHAR_BIT * sizeof(int) - 1); + temp += nbits; + nbits ^= temp; /* Find the number of bits needed for the magnitude of the coefficient */ - nbits = JPEG_NBITS(temp); + nbits = JPEG_NBITS(nbits); - /* Emit the Huffman-coded symbol for the number of bits */ - code = dctbl->ehufco[nbits]; - size = dctbl->ehufsi[nbits]; - EMIT_BITS(code, size) - - /* Mask off any extra bits in code */ - temp2 &= (((JLONG)1) << nbits) - 1; - - /* Emit that number of bits of the value, if positive, */ - /* or the complement of its magnitude, if negative. */ - EMIT_BITS(temp2, nbits) + /* Emit the Huffman-coded symbol for the number of bits. + * Emit that number of bits of the value, if positive, + * or the complement of its magnitude, if negative. + */ + PUT_CODE(dctbl->ehufco[nbits], dctbl->ehufsi[nbits]) /* Encode the AC coefficients per section F.1.2.2 */ - r = 0; /* r = run length of zeros */ + { + int r = 0; /* r = run length of zeros */ /* Manually unroll the k loop to eliminate the counter variable. This * improves performance greatly on systems with a limited number of @@ -563,51 +607,46 @@ */ #define kloop(jpeg_natural_order_of_k) { \ if ((temp = block[jpeg_natural_order_of_k]) == 0) { \ - r++; \ + r += 16; \ } else { \ - temp2 = temp; \ /* Branch-less absolute value, bitwise complement, etc., same as above */ \ - temp3 = temp >> (CHAR_BIT * sizeof(int) - 1); \ - temp ^= temp3; \ - temp -= temp3; \ - temp2 += temp3; \ - nbits = JPEG_NBITS_NONZERO(temp); \ + nbits = temp >> (CHAR_BIT * sizeof(int) - 1); \ + temp += nbits; \ + nbits ^= temp; \ + nbits = JPEG_NBITS_NONZERO(nbits); \ /* if run length > 15, must emit special run-length-16 codes (0xF0) */ \ - while (r > 15) { \ - EMIT_BITS(code_0xf0, size_0xf0) \ - r -= 16; \ + while (r >= 16 * 16) { \ + r -= 16 * 16; \ + PUT_BITS(actbl->ehufco[0xf0], actbl->ehufsi[0xf0]) \ } \ /* Emit Huffman symbol for run length / number of bits */ \ - temp3 = (r << 4) + nbits; \ - code = actbl->ehufco[temp3]; \ - size = actbl->ehufsi[temp3]; \ - EMIT_CODE(code, size) \ + r += nbits; \ + PUT_CODE(actbl->ehufco[r], actbl->ehufsi[r]) \ r = 0; \ } \ } - /* One iteration for each value in jpeg_natural_order[] */ - kloop(1); kloop(8); kloop(16); kloop(9); kloop(2); kloop(3); - kloop(10); kloop(17); kloop(24); kloop(32); kloop(25); kloop(18); - kloop(11); kloop(4); kloop(5); kloop(12); kloop(19); kloop(26); - kloop(33); kloop(40); kloop(48); kloop(41); kloop(34); kloop(27); - kloop(20); kloop(13); kloop(6); kloop(7); kloop(14); kloop(21); - kloop(28); kloop(35); kloop(42); kloop(49); kloop(56); kloop(57); - kloop(50); kloop(43); kloop(36); kloop(29); kloop(22); kloop(15); - kloop(23); kloop(30); kloop(37); kloop(44); kloop(51); kloop(58); - kloop(59); kloop(52); kloop(45); kloop(38); kloop(31); kloop(39); - kloop(46); kloop(53); kloop(60); kloop(61); kloop(54); kloop(47); - kloop(55); kloop(62); kloop(63); - - /* If the last coef(s) were zero, emit an end-of-block code */ - if (r > 0) { - code = actbl->ehufco[0]; - size = actbl->ehufsi[0]; - EMIT_BITS(code, size) + /* One iteration for each value in jpeg_natural_order[] */ + kloop(1); kloop(8); kloop(16); kloop(9); kloop(2); kloop(3); + kloop(10); kloop(17); kloop(24); kloop(32); kloop(25); kloop(18); + kloop(11); kloop(4); kloop(5); kloop(12); kloop(19); kloop(26); + kloop(33); kloop(40); kloop(48); kloop(41); kloop(34); kloop(27); + kloop(20); kloop(13); kloop(6); kloop(7); kloop(14); kloop(21); + kloop(28); kloop(35); kloop(42); kloop(49); kloop(56); kloop(57); + kloop(50); kloop(43); kloop(36); kloop(29); kloop(22); kloop(15); + kloop(23); kloop(30); kloop(37); kloop(44); kloop(51); kloop(58); + kloop(59); kloop(52); kloop(45); kloop(38); kloop(31); kloop(39); + kloop(46); kloop(53); kloop(60); kloop(61); kloop(54); kloop(47); + kloop(55); kloop(62); kloop(63); + + /* If the last coef(s) were zero, emit an end-of-block code */ + if (r > 0) { + PUT_BITS(actbl->ehufco[0], actbl->ehufsi[0]) + } } - state->cur.put_buffer = put_buffer; - state->cur.put_bits = put_bits; + state->cur.put_buffer.c = put_buffer; + state->cur.free_bits = free_bits; STORE_BUFFER() return TRUE; @@ -654,8 +693,9 @@ /* Load up working state */ state.next_output_byte = cinfo->dest->next_output_byte; state.free_in_buffer = cinfo->dest->free_in_buffer; - ASSIGN_STATE(state.cur, entropy->saved); + state.cur = entropy->saved; state.cinfo = cinfo; + state.simd = entropy->simd; /* Emit restart marker if needed */ if (cinfo->restart_interval) { @@ -694,7 +734,7 @@ /* Completed MCU, so update state */ cinfo->dest->next_output_byte = state.next_output_byte; cinfo->dest->free_in_buffer = state.free_in_buffer; - ASSIGN_STATE(entropy->saved, state.cur); + entropy->saved = state.cur; /* Update restart-interval state too */ if (cinfo->restart_interval) { @@ -723,8 +763,9 @@ /* Load up working state ... flush_bits needs it */ state.next_output_byte = cinfo->dest->next_output_byte; state.free_in_buffer = cinfo->dest->free_in_buffer; - ASSIGN_STATE(state.cur, entropy->saved); + state.cur = entropy->saved; state.cinfo = cinfo; + state.simd = entropy->simd; /* Flush out the last data */ if (!flush_bits(&state)) @@ -733,7 +774,7 @@ /* Update state */ cinfo->dest->next_output_byte = state.next_output_byte; cinfo->dest->free_in_buffer = state.free_in_buffer; - ASSIGN_STATE(entropy->saved, state.cur); + entropy->saved = state.cur; } diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/jcinit.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/jcinit.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/jcinit.c 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/jcinit.c 2021-11-20 03:41:33.391600562 +0000 @@ -1,8 +1,10 @@ /* * jcinit.c * + * This file was part of the Independent JPEG Group's software: * Copyright (C) 1991-1997, Thomas G. Lane. - * This file is part of the Independent JPEG Group's software. + * libjpeg-turbo Modifications: + * Copyright (C) 2020, D. R. Commander. * For conditions of distribution and use, see the accompanying README.ijg * file. * @@ -19,6 +21,7 @@ #define JPEG_INTERNALS #include "jinclude.h" #include "jpeglib.h" +#include "jpegcomp.h" /* diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/jcmaster.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/jcmaster.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/jcmaster.c 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/jcmaster.c 2021-11-20 03:41:33.391600562 +0000 @@ -493,7 +493,7 @@ master->pass_type = output_pass; master->pass_number++; #endif - /*FALLTHROUGH*/ + FALLTHROUGH /*FALLTHROUGH*/ case output_pass: /* Do a data-output pass. */ /* We need not repeat per-scan setup if prior optimization pass did it. */ diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/jconfig.h b/src/3rdparty/chromium/third_party/libjpeg_turbo/jconfig.h --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/jconfig.h 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/jconfig.h 2021-11-20 03:41:33.391600562 +0000 @@ -1,13 +1,13 @@ /* Version ID for the JPEG library. * Might be useful for tests like "#if JPEG_LIB_VERSION >= 60". */ -#define JPEG_LIB_VERSION 62 +#define JPEG_LIB_VERSION 62 /* libjpeg-turbo version */ -#define LIBJPEG_TURBO_VERSION 2.0.1 +#define LIBJPEG_TURBO_VERSION 2.1.1 /* libjpeg-turbo version in integer form */ -#define LIBJPEG_TURBO_VERSION_NUMBER 2000001 +#define LIBJPEG_TURBO_VERSION_NUMBER 2001001 /* Support arithmetic encoding */ /* #define C_ARITH_CODING_SUPPORTED 1 */ @@ -61,11 +61,6 @@ unsigned. */ /* #undef RIGHT_SHIFT_IS_UNSIGNED */ -/* Define to 1 if type `char' is unsigned and you are not using gcc. */ -#ifndef __CHAR_UNSIGNED__ -/* # undef __CHAR_UNSIGNED__ */ -#endif - /* Define to empty if `const' does not conform to ANSI C. */ /* #undef const */ diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/jconfig.h.in b/src/3rdparty/chromium/third_party/libjpeg_turbo/jconfig.h.in --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/jconfig.h.in 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/jconfig.h.in 2021-11-20 03:41:33.391600562 +0000 @@ -61,11 +61,6 @@ unsigned. */ #cmakedefine RIGHT_SHIFT_IS_UNSIGNED 1 -/* Define to 1 if type `char' is unsigned and you are not using gcc. */ -#ifndef __CHAR_UNSIGNED__ - #cmakedefine __CHAR_UNSIGNED__ 1 -#endif - /* Define to empty if `const' does not conform to ANSI C. */ /* #undef const */ diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/jconfigint.h b/src/3rdparty/chromium/third_party/libjpeg_turbo/jconfigint.h --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/jconfigint.h 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/jconfigint.h 2021-11-20 03:41:33.392600546 +0000 @@ -1,5 +1,5 @@ /* libjpeg-turbo build number */ -#define BUILD "" +#define BUILD "" /* Compiler's inline keyword */ #undef inline @@ -7,9 +7,9 @@ /* How to obtain function inlining. */ #ifndef INLINE #if defined(__GNUC__) -#define INLINE inline __attribute__((always_inline)) +#define INLINE inline __attribute__((always_inline)) #elif defined(_MSC_VER) -#define INLINE __forceinline +#define INLINE __forceinline #else #define INLINE #endif @@ -19,20 +19,20 @@ #if defined(_MSC_VER) && (defined(_WIN32) || defined(_WIN64)) #define THREAD_LOCAL __declspec(thread) #else -#define THREAD_LOCAL __thread +#define THREAD_LOCAL __thread #endif /* Define to the full name of this package. */ -#define PACKAGE_NAME "libjpeg-turbo" +#define PACKAGE_NAME "libjpeg-turbo" /* Version number of package */ -#define VERSION "2.0.5" +#define VERSION "2.1.1" /* The size of `size_t', as computed by sizeof. */ #if __WORDSIZE==64 || defined(_WIN64) -#define SIZEOF_SIZE_T 8 +#define SIZEOF_SIZE_T 8 #else -#define SIZEOF_SIZE_T 4 +#define SIZEOF_SIZE_T 4 #endif /* Define if your compiler has __builtin_ctzl() and sizeof(unsigned long) == sizeof(size_t). */ @@ -42,7 +42,7 @@ /* Define to 1 if you have the header file. */ #if defined(_MSC_VER) -#define HAVE_INTRIN_H 1 +#define HAVE_INTRIN_H 1 #endif #if defined(_MSC_VER) && defined(HAVE_INTRIN_H) @@ -53,11 +53,12 @@ #endif #endif -/* How to obtain memory alignment for structures and variables. */ -#if defined(_MSC_VER) -#define ALIGN(ALIGNMENT) __declspec(align((ALIGNMENT))) -#elif defined(__clang__) || defined(__GNUC__) -#define ALIGN(ALIGNMENT) __attribute__((aligned(ALIGNMENT))) +#if defined(__has_attribute) +#if __has_attribute(fallthrough) +#define FALLTHROUGH __attribute__((fallthrough)); +#else +#define FALLTHROUGH +#endif #else -#error "Unknown compiler" +#define FALLTHROUGH #endif diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/jconfigint.h.in b/src/3rdparty/chromium/third_party/libjpeg_turbo/jconfigint.h.in --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/jconfigint.h.in 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/jconfigint.h.in 2021-11-20 03:41:33.392600546 +0000 @@ -32,3 +32,13 @@ #define HAVE_BITSCANFORWARD #endif #endif + +#if defined(__has_attribute) +#if __has_attribute(fallthrough) +#define FALLTHROUGH __attribute__((fallthrough)); +#else +#define FALLTHROUGH +#endif +#else +#define FALLTHROUGH +#endif diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/jconfig.txt b/src/3rdparty/chromium/third_party/libjpeg_turbo/jconfig.txt --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/jconfig.txt 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/jconfig.txt 2021-11-20 03:41:33.391600562 +0000 @@ -42,12 +42,6 @@ */ /* #define const */ -/* Define this if an ordinary "char" type is unsigned. - * If you're not sure, leaving it undefined will work at some cost in speed. - * If you defined HAVE_UNSIGNED_CHAR then the speed difference is minimal. - */ -#undef __CHAR_UNSIGNED__ - /* Define this if your system has an ANSI-conforming file. */ #define HAVE_STDDEF_H @@ -118,7 +112,6 @@ #define BMP_SUPPORTED /* BMP image file format */ #define GIF_SUPPORTED /* GIF image file format */ #define PPM_SUPPORTED /* PBMPLUS PPM/PGM image file format */ -#undef RLE_SUPPORTED /* Utah RLE image file format */ #define TARGA_SUPPORTED /* Targa image file format */ /* Define this if you want to name both input and output files on the command diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/jcphuff.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/jcphuff.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/jcphuff.c 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/jcphuff.c 2021-11-20 03:41:33.392600546 +0000 @@ -4,8 +4,10 @@ * This file was part of the Independent JPEG Group's software: * Copyright (C) 1995-1997, Thomas G. Lane. * libjpeg-turbo Modifications: - * Copyright (C) 2011, 2015, 2018, D. R. Commander. + * Copyright (C) 2011, 2015, 2018, 2021, D. R. Commander. * Copyright (C) 2016, 2018, Matthieu Darbois. + * Copyright (C) 2020, Arm Limited. + * Copyright (C) 2021, Alex Richardson. * For conditions of distribution and use, see the accompanying README.ijg * file. * @@ -43,23 +45,28 @@ * memory footprint by 64k, which is important for some mobile applications * that create many isolated instances of libjpeg-turbo (web browsers, for * instance.) This may improve performance on some mobile platforms as well. - * This feature is enabled by default only on ARM processors, because some x86 + * This feature is enabled by default only on Arm processors, because some x86 * chips have a slow implementation of bsr, and the use of clz/bsr cannot be * shown to have a significant performance impact even on the x86 chips that - * have a fast implementation of it. When building for ARMv6, you can + * have a fast implementation of it. When building for Armv6, you can * explicitly disable the use of clz/bsr by adding -mthumb to the compiler * flags (this defines __thumb__). */ /* NOTE: Both GCC and Clang define __GNUC__ */ -#if defined(__GNUC__) && (defined(__arm__) || defined(__aarch64__)) +#if (defined(__GNUC__) && (defined(__arm__) || defined(__aarch64__))) || \ + defined(_M_ARM) || defined(_M_ARM64) #if !defined(__thumb__) || defined(__thumb2__) #define USE_CLZ_INTRINSIC #endif #endif #ifdef USE_CLZ_INTRINSIC +#if defined(_MSC_VER) && !defined(__clang__) +#define JPEG_NBITS_NONZERO(x) (32 - _CountLeadingZeros(x)) +#else #define JPEG_NBITS_NONZERO(x) (32 - __builtin_clz(x)) +#endif #define JPEG_NBITS(x) (x ? JPEG_NBITS_NONZERO(x) : 0) #else #include "jpeg_nbits_table.h" @@ -169,24 +176,26 @@ METHODDEF(int) count_zeroes(size_t *x) { - int result; #if defined(HAVE_BUILTIN_CTZL) + int result; result = __builtin_ctzl(*x); *x >>= result; #elif defined(HAVE_BITSCANFORWARD64) + unsigned long result; _BitScanForward64(&result, *x); *x >>= result; #elif defined(HAVE_BITSCANFORWARD) + unsigned long result; _BitScanForward(&result, *x); *x >>= result; #else - result = 0; + int result = 0; while ((*x & 1) == 0) { ++result; *x >>= 1; } #endif - return result; + return (int)result; } @@ -672,7 +681,7 @@ emit_restart(entropy, entropy->next_restart_num); #ifdef WITH_SIMD - cvalue = values = (JCOEF *)PAD((size_t)values_unaligned, 16); + cvalue = values = (JCOEF *)PAD((JUINTPTR)values_unaligned, 16); #else /* Not using SIMD, so alignment is not needed */ cvalue = values = values_unaligned; @@ -860,7 +869,7 @@ #define ENCODE_COEFS_AC_REFINE(label) { \ while (zerobits) { \ - int idx = count_zeroes(&zerobits); \ + idx = count_zeroes(&zerobits); \ r += idx; \ cabsvalue += idx; \ signbits >>= idx; \ @@ -917,7 +926,7 @@ encode_mcu_AC_refine(j_compress_ptr cinfo, JBLOCKROW *MCU_data) { phuff_entropy_ptr entropy = (phuff_entropy_ptr)cinfo->entropy; - register int temp, r; + register int temp, r, idx; char *BR_buffer; unsigned int BR; int Sl = cinfo->Se - cinfo->Ss + 1; @@ -937,7 +946,7 @@ emit_restart(entropy, entropy->next_restart_num); #ifdef WITH_SIMD - cabsvalue = absvalues = (JCOEF *)PAD((size_t)absvalues_unaligned, 16); + cabsvalue = absvalues = (JCOEF *)PAD((JUINTPTR)absvalues_unaligned, 16); #else /* Not using SIMD, so alignment is not needed */ cabsvalue = absvalues = absvalues_unaligned; @@ -968,7 +977,7 @@ if (zerobits) { int diff = ((absvalues + DCTSIZE2 / 2) - cabsvalue); - int idx = count_zeroes(&zerobits); + idx = count_zeroes(&zerobits); signbits >>= idx; idx += diff; r += idx; diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/jcsample.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/jcsample.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/jcsample.c 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/jcsample.c 2021-11-20 03:41:33.392600546 +0000 @@ -6,7 +6,7 @@ * libjpeg-turbo Modifications: * Copyright 2009 Pierre Ossman for Cendio AB * Copyright (C) 2014, MIPS Technologies, Inc., California. - * Copyright (C) 2015, D. R. Commander. + * Copyright (C) 2015, 2019, D. R. Commander. * For conditions of distribution and use, see the accompanying README.ijg * file. * @@ -103,7 +103,7 @@ if (numcols > 0) { for (row = 0; row < num_rows; row++) { ptr = image_data[row] + input_cols; - pixval = ptr[-1]; /* don't need GETJSAMPLE() here */ + pixval = ptr[-1]; for (count = numcols; count > 0; count--) *ptr++ = pixval; } @@ -174,7 +174,7 @@ for (v = 0; v < v_expand; v++) { inptr = input_data[inrow + v] + outcol_h; for (h = 0; h < h_expand; h++) { - outvalue += (JLONG)GETJSAMPLE(*inptr++); + outvalue += (JLONG)(*inptr++); } } *outptr++ = (JSAMPLE)((outvalue + numpix2) / numpix); @@ -237,8 +237,7 @@ inptr = input_data[outrow]; bias = 0; /* bias = 0,1,0,1,... for successive samples */ for (outcol = 0; outcol < output_cols; outcol++) { - *outptr++ = - (JSAMPLE)((GETJSAMPLE(*inptr) + GETJSAMPLE(inptr[1]) + bias) >> 1); + *outptr++ = (JSAMPLE)((inptr[0] + inptr[1] + bias) >> 1); bias ^= 1; /* 0=>1, 1=>0 */ inptr += 2; } @@ -277,8 +276,7 @@ bias = 1; /* bias = 1,2,1,2,... for successive samples */ for (outcol = 0; outcol < output_cols; outcol++) { *outptr++ = - (JSAMPLE)((GETJSAMPLE(*inptr0) + GETJSAMPLE(inptr0[1]) + - GETJSAMPLE(*inptr1) + GETJSAMPLE(inptr1[1]) + bias) >> 2); + (JSAMPLE)((inptr0[0] + inptr0[1] + inptr1[0] + inptr1[1] + bias) >> 2); bias ^= 3; /* 1=>2, 2=>1 */ inptr0 += 2; inptr1 += 2; } @@ -337,33 +335,25 @@ below_ptr = input_data[inrow + 2]; /* Special case for first column: pretend column -1 is same as column 0 */ - membersum = GETJSAMPLE(*inptr0) + GETJSAMPLE(inptr0[1]) + - GETJSAMPLE(*inptr1) + GETJSAMPLE(inptr1[1]); - neighsum = GETJSAMPLE(*above_ptr) + GETJSAMPLE(above_ptr[1]) + - GETJSAMPLE(*below_ptr) + GETJSAMPLE(below_ptr[1]) + - GETJSAMPLE(*inptr0) + GETJSAMPLE(inptr0[2]) + - GETJSAMPLE(*inptr1) + GETJSAMPLE(inptr1[2]); + membersum = inptr0[0] + inptr0[1] + inptr1[0] + inptr1[1]; + neighsum = above_ptr[0] + above_ptr[1] + below_ptr[0] + below_ptr[1] + + inptr0[0] + inptr0[2] + inptr1[0] + inptr1[2]; neighsum += neighsum; - neighsum += GETJSAMPLE(*above_ptr) + GETJSAMPLE(above_ptr[2]) + - GETJSAMPLE(*below_ptr) + GETJSAMPLE(below_ptr[2]); + neighsum += above_ptr[0] + above_ptr[2] + below_ptr[0] + below_ptr[2]; membersum = membersum * memberscale + neighsum * neighscale; *outptr++ = (JSAMPLE)((membersum + 32768) >> 16); inptr0 += 2; inptr1 += 2; above_ptr += 2; below_ptr += 2; for (colctr = output_cols - 2; colctr > 0; colctr--) { /* sum of pixels directly mapped to this output element */ - membersum = GETJSAMPLE(*inptr0) + GETJSAMPLE(inptr0[1]) + - GETJSAMPLE(*inptr1) + GETJSAMPLE(inptr1[1]); + membersum = inptr0[0] + inptr0[1] + inptr1[0] + inptr1[1]; /* sum of edge-neighbor pixels */ - neighsum = GETJSAMPLE(*above_ptr) + GETJSAMPLE(above_ptr[1]) + - GETJSAMPLE(*below_ptr) + GETJSAMPLE(below_ptr[1]) + - GETJSAMPLE(inptr0[-1]) + GETJSAMPLE(inptr0[2]) + - GETJSAMPLE(inptr1[-1]) + GETJSAMPLE(inptr1[2]); + neighsum = above_ptr[0] + above_ptr[1] + below_ptr[0] + below_ptr[1] + + inptr0[-1] + inptr0[2] + inptr1[-1] + inptr1[2]; /* The edge-neighbors count twice as much as corner-neighbors */ neighsum += neighsum; /* Add in the corner-neighbors */ - neighsum += GETJSAMPLE(above_ptr[-1]) + GETJSAMPLE(above_ptr[2]) + - GETJSAMPLE(below_ptr[-1]) + GETJSAMPLE(below_ptr[2]); + neighsum += above_ptr[-1] + above_ptr[2] + below_ptr[-1] + below_ptr[2]; /* form final output scaled up by 2^16 */ membersum = membersum * memberscale + neighsum * neighscale; /* round, descale and output it */ @@ -372,15 +362,11 @@ } /* Special case for last column */ - membersum = GETJSAMPLE(*inptr0) + GETJSAMPLE(inptr0[1]) + - GETJSAMPLE(*inptr1) + GETJSAMPLE(inptr1[1]); - neighsum = GETJSAMPLE(*above_ptr) + GETJSAMPLE(above_ptr[1]) + - GETJSAMPLE(*below_ptr) + GETJSAMPLE(below_ptr[1]) + - GETJSAMPLE(inptr0[-1]) + GETJSAMPLE(inptr0[1]) + - GETJSAMPLE(inptr1[-1]) + GETJSAMPLE(inptr1[1]); + membersum = inptr0[0] + inptr0[1] + inptr1[0] + inptr1[1]; + neighsum = above_ptr[0] + above_ptr[1] + below_ptr[0] + below_ptr[1] + + inptr0[-1] + inptr0[1] + inptr1[-1] + inptr1[1]; neighsum += neighsum; - neighsum += GETJSAMPLE(above_ptr[-1]) + GETJSAMPLE(above_ptr[1]) + - GETJSAMPLE(below_ptr[-1]) + GETJSAMPLE(below_ptr[1]); + neighsum += above_ptr[-1] + above_ptr[1] + below_ptr[-1] + below_ptr[1]; membersum = membersum * memberscale + neighsum * neighscale; *outptr = (JSAMPLE)((membersum + 32768) >> 16); @@ -429,21 +415,18 @@ below_ptr = input_data[outrow + 1]; /* Special case for first column */ - colsum = GETJSAMPLE(*above_ptr++) + GETJSAMPLE(*below_ptr++) + - GETJSAMPLE(*inptr); - membersum = GETJSAMPLE(*inptr++); - nextcolsum = GETJSAMPLE(*above_ptr) + GETJSAMPLE(*below_ptr) + - GETJSAMPLE(*inptr); + colsum = (*above_ptr++) + (*below_ptr++) + inptr[0]; + membersum = *inptr++; + nextcolsum = above_ptr[0] + below_ptr[0] + inptr[0]; neighsum = colsum + (colsum - membersum) + nextcolsum; membersum = membersum * memberscale + neighsum * neighscale; *outptr++ = (JSAMPLE)((membersum + 32768) >> 16); lastcolsum = colsum; colsum = nextcolsum; for (colctr = output_cols - 2; colctr > 0; colctr--) { - membersum = GETJSAMPLE(*inptr++); + membersum = *inptr++; above_ptr++; below_ptr++; - nextcolsum = GETJSAMPLE(*above_ptr) + GETJSAMPLE(*below_ptr) + - GETJSAMPLE(*inptr); + nextcolsum = above_ptr[0] + below_ptr[0] + inptr[0]; neighsum = lastcolsum + (colsum - membersum) + nextcolsum; membersum = membersum * memberscale + neighsum * neighscale; *outptr++ = (JSAMPLE)((membersum + 32768) >> 16); @@ -451,7 +434,7 @@ } /* Special case for last column */ - membersum = GETJSAMPLE(*inptr); + membersum = *inptr; neighsum = lastcolsum + (colsum - membersum) + colsum; membersum = membersum * memberscale + neighsum * neighscale; *outptr = (JSAMPLE)((membersum + 32768) >> 16); diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/jctrans.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/jctrans.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/jctrans.c 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/jctrans.c 2021-11-20 03:41:33.392600546 +0000 @@ -4,8 +4,8 @@ * This file was part of the Independent JPEG Group's software: * Copyright (C) 1995-1998, Thomas G. Lane. * Modified 2000-2009 by Guido Vollbeding. - * It was modified by The libjpeg-turbo Project to include only code relevant - * to libjpeg-turbo. + * libjpeg-turbo Modifications: + * Copyright (C) 2020, D. R. Commander. * For conditions of distribution and use, see the accompanying README.ijg * file. * @@ -17,6 +17,7 @@ #define JPEG_INTERNALS #include "jinclude.h" #include "jpeglib.h" +#include "jpegcomp.h" /* Forward declarations */ diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/jdapimin.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/jdapimin.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/jdapimin.c 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/jdapimin.c 2021-11-20 03:41:33.392600546 +0000 @@ -23,6 +23,7 @@ #include "jinclude.h" #include "jpeglib.h" #include "jdmaster.h" +#include "jconfigint.h" /* @@ -308,7 +309,7 @@ /* Initialize application's data source module */ (*cinfo->src->init_source) (cinfo); cinfo->global_state = DSTATE_INHEADER; - /*FALLTHROUGH*/ + FALLTHROUGH /*FALLTHROUGH*/ case DSTATE_INHEADER: retcode = (*cinfo->inputctl->consume_input) (cinfo); if (retcode == JPEG_REACHED_SOS) { /* Found SOS, prepare to decompress */ diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/jdapistd.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/jdapistd.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/jdapistd.c 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/jdapistd.c 2021-11-20 03:41:33.392600546 +0000 @@ -4,7 +4,7 @@ * This file was part of the Independent JPEG Group's software: * Copyright (C) 1994-1996, Thomas G. Lane. * libjpeg-turbo Modifications: - * Copyright (C) 2010, 2015-2018, D. R. Commander. + * Copyright (C) 2010, 2015-2020, D. R. Commander. * Copyright (C) 2015, Google, Inc. * For conditions of distribution and use, see the accompanying README.ijg * file. @@ -21,6 +21,8 @@ #include "jinclude.h" #include "jdmainct.h" #include "jdcoefct.h" +#include "jdmaster.h" +#include "jdmerge.h" #include "jdsample.h" #include "jmemsys.h" @@ -316,6 +318,10 @@ read_and_discard_scanlines(j_decompress_ptr cinfo, JDIMENSION num_lines) { JDIMENSION n; + my_master_ptr master = (my_master_ptr)cinfo->master; + JSAMPLE dummy_sample[1] = { 0 }; + JSAMPROW dummy_row = dummy_sample; + JSAMPARRAY scanlines = NULL; void (*color_convert) (j_decompress_ptr cinfo, JSAMPIMAGE input_buf, JDIMENSION input_row, JSAMPARRAY output_buf, int num_rows) = NULL; @@ -325,6 +331,10 @@ if (cinfo->cconvert && cinfo->cconvert->color_convert) { color_convert = cinfo->cconvert->color_convert; cinfo->cconvert->color_convert = noop_convert; + /* This just prevents UBSan from complaining about adding 0 to a NULL + * pointer. The pointer isn't actually used. + */ + scanlines = &dummy_row; } if (cinfo->cquantize && cinfo->cquantize->color_quantize) { @@ -332,8 +342,13 @@ cinfo->cquantize->color_quantize = noop_quantize; } + if (master->using_merged_upsample && cinfo->max_v_samp_factor == 2) { + my_merged_upsample_ptr upsample = (my_merged_upsample_ptr)cinfo->upsample; + scanlines = &upsample->spare_row; + } + for (n = 0; n < num_lines; n++) - jpeg_read_scanlines(cinfo, NULL, 1); + jpeg_read_scanlines(cinfo, scanlines, 1); if (color_convert) cinfo->cconvert->color_convert = color_convert; @@ -353,6 +368,12 @@ { JDIMENSION rows_left; my_main_ptr main_ptr = (my_main_ptr)cinfo->main; + my_master_ptr master = (my_master_ptr)cinfo->master; + + if (master->using_merged_upsample && cinfo->max_v_samp_factor == 2) { + read_and_discard_scanlines(cinfo, rows); + return; + } /* Increment the counter to the next row group after the skipped rows. */ main_ptr->rowgroup_ctr += rows / cinfo->max_v_samp_factor; @@ -382,21 +403,27 @@ { my_main_ptr main_ptr = (my_main_ptr)cinfo->main; my_coef_ptr coef = (my_coef_ptr)cinfo->coef; + my_master_ptr master = (my_master_ptr)cinfo->master; my_upsample_ptr upsample = (my_upsample_ptr)cinfo->upsample; JDIMENSION i, x; int y; JDIMENSION lines_per_iMCU_row, lines_left_in_iMCU_row, lines_after_iMCU_row; JDIMENSION lines_to_skip, lines_to_read; + /* Two-pass color quantization is not supported. */ + if (cinfo->quantize_colors && cinfo->two_pass_quantize) + ERREXIT(cinfo, JERR_NOTIMPL); + if (cinfo->global_state != DSTATE_SCANNING) ERREXIT1(cinfo, JERR_BAD_STATE, cinfo->global_state); /* Do not skip past the bottom of the image. */ if (cinfo->output_scanline + num_lines >= cinfo->output_height) { + num_lines = cinfo->output_height - cinfo->output_scanline; cinfo->output_scanline = cinfo->output_height; (*cinfo->inputctl->finish_input_pass) (cinfo); cinfo->inputctl->eoi_reached = TRUE; - return cinfo->output_height - cinfo->output_scanline; + return num_lines; } if (num_lines == 0) @@ -445,8 +472,10 @@ main_ptr->buffer_full = FALSE; main_ptr->rowgroup_ctr = 0; main_ptr->context_state = CTX_PREPARE_FOR_IMCU; - upsample->next_row_out = cinfo->max_v_samp_factor; - upsample->rows_to_go = cinfo->output_height - cinfo->output_scanline; + if (!master->using_merged_upsample) { + upsample->next_row_out = cinfo->max_v_samp_factor; + upsample->rows_to_go = cinfo->output_height - cinfo->output_scanline; + } } /* Skipping is much simpler when context rows are not required. */ @@ -458,8 +487,10 @@ cinfo->output_scanline += lines_left_in_iMCU_row; main_ptr->buffer_full = FALSE; main_ptr->rowgroup_ctr = 0; - upsample->next_row_out = cinfo->max_v_samp_factor; - upsample->rows_to_go = cinfo->output_height - cinfo->output_scanline; + if (!master->using_merged_upsample) { + upsample->next_row_out = cinfo->max_v_samp_factor; + upsample->rows_to_go = cinfo->output_height - cinfo->output_scanline; + } } } @@ -494,7 +525,8 @@ cinfo->output_iMCU_row += lines_to_skip / lines_per_iMCU_row; increment_simple_rowgroup_ctr(cinfo, lines_to_read); } - upsample->rows_to_go = cinfo->output_height - cinfo->output_scanline; + if (!master->using_merged_upsample) + upsample->rows_to_go = cinfo->output_height - cinfo->output_scanline; return num_lines; } @@ -506,6 +538,8 @@ * decoded coefficients. This is ~5% faster for large subsets, but * it's tough to tell a difference for smaller images. */ + if (!cinfo->entropy->insufficient_data) + cinfo->master->last_good_iMCU_row = cinfo->input_iMCU_row; (*cinfo->entropy->decode_mcu) (cinfo, NULL); } } @@ -535,7 +569,8 @@ * bit odd, since "rows_to_go" seems to be redundantly keeping track of * output_scanline. */ - upsample->rows_to_go = cinfo->output_height - cinfo->output_scanline; + if (!master->using_merged_upsample) + upsample->rows_to_go = cinfo->output_height - cinfo->output_scanline; /* Always skip the requested number of lines. */ return num_lines; diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/jdarith.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/jdarith.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/jdarith.c 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/jdarith.c 2021-11-20 03:41:33.392600546 +0000 @@ -4,7 +4,7 @@ * This file was part of the Independent JPEG Group's software: * Developed 1997-2015 by Guido Vollbeding. * libjpeg-turbo Modifications: - * Copyright (C) 2015-2018, D. R. Commander. + * Copyright (C) 2015-2020, D. R. Commander. * For conditions of distribution and use, see the accompanying README.ijg * file. * @@ -80,7 +80,7 @@ if (!(*src->fill_input_buffer) (cinfo)) ERREXIT(cinfo, JERR_CANT_SUSPEND); src->bytes_in_buffer--; - return GETJOCTET(*src->next_input_byte++); + return *src->next_input_byte++; } @@ -665,8 +665,16 @@ for (ci = 0; ci < cinfo->comps_in_scan; ci++) { int coefi, cindex = cinfo->cur_comp_info[ci]->component_index; int *coef_bit_ptr = &cinfo->coef_bits[cindex][0]; + int *prev_coef_bit_ptr = + &cinfo->coef_bits[cindex + cinfo->num_components][0]; if (cinfo->Ss && coef_bit_ptr[0] < 0) /* AC without prior DC scan */ WARNMS2(cinfo, JWRN_BOGUS_PROGRESSION, cindex, 0); + for (coefi = MIN(cinfo->Ss, 1); coefi <= MAX(cinfo->Se, 9); coefi++) { + if (cinfo->input_scan_number > 1) + prev_coef_bit_ptr[coefi] = coef_bit_ptr[coefi]; + else + prev_coef_bit_ptr[coefi] = 0; + } for (coefi = cinfo->Ss; coefi <= cinfo->Se; coefi++) { int expected = (coef_bit_ptr[coefi] < 0) ? 0 : coef_bit_ptr[coefi]; if (cinfo->Ah != expected) @@ -727,6 +735,7 @@ entropy->c = 0; entropy->a = 0; entropy->ct = -16; /* force reading 2 initial bytes to fill C */ + entropy->pub.insufficient_data = FALSE; /* Initialize restart counter */ entropy->restarts_to_go = cinfo->restart_interval; @@ -763,7 +772,7 @@ int *coef_bit_ptr, ci; cinfo->coef_bits = (int (*)[DCTSIZE2]) (*cinfo->mem->alloc_small) ((j_common_ptr)cinfo, JPOOL_IMAGE, - cinfo->num_components * DCTSIZE2 * + cinfo->num_components * 2 * DCTSIZE2 * sizeof(int)); coef_bit_ptr = &cinfo->coef_bits[0][0]; for (ci = 0; ci < cinfo->num_components; ci++) diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/jdcoefct.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/jdcoefct.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/jdcoefct.c 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/jdcoefct.c 2021-11-20 03:41:33.392600546 +0000 @@ -5,8 +5,8 @@ * Copyright (C) 1994-1997, Thomas G. Lane. * libjpeg-turbo Modifications: * Copyright 2009 Pierre Ossman for Cendio AB - * Copyright (C) 2010, 2015-2016, D. R. Commander. - * Copyright (C) 2015, Google, Inc. + * Copyright (C) 2010, 2015-2016, 2019-2020, D. R. Commander. + * Copyright (C) 2015, 2020, Google, Inc. * For conditions of distribution and use, see the accompanying README.ijg * file. * @@ -102,6 +102,8 @@ /* Try to fetch an MCU. Entropy decoder expects buffer to be zeroed. */ jzero_far((void *)coef->MCU_buffer[0], (size_t)(cinfo->blocks_in_MCU * sizeof(JBLOCK))); + if (!cinfo->entropy->insufficient_data) + cinfo->master->last_good_iMCU_row = cinfo->input_iMCU_row; if (!(*cinfo->entropy->decode_mcu) (cinfo, coef->MCU_buffer)) { /* Suspension forced; update state counters and exit */ coef->MCU_vert_offset = yoffset; @@ -227,6 +229,8 @@ } } } + if (!cinfo->entropy->insufficient_data) + cinfo->master->last_good_iMCU_row = cinfo->input_iMCU_row; /* Try to fetch the MCU. */ if (!(*cinfo->entropy->decode_mcu) (cinfo, coef->MCU_buffer)) { /* Suspension forced; update state counters and exit */ @@ -326,19 +330,22 @@ #ifdef BLOCK_SMOOTHING_SUPPORTED /* - * This code applies interblock smoothing as described by section K.8 - * of the JPEG standard: the first 5 AC coefficients are estimated from - * the DC values of a DCT block and its 8 neighboring blocks. + * This code applies interblock smoothing; the first 9 AC coefficients are + * estimated from the DC values of a DCT block and its 24 neighboring blocks. * We apply smoothing only for progressive JPEG decoding, and only if * the coefficients it can estimate are not yet known to full precision. */ -/* Natural-order array positions of the first 5 zigzag-order coefficients */ +/* Natural-order array positions of the first 9 zigzag-order coefficients */ #define Q01_POS 1 #define Q10_POS 8 #define Q20_POS 16 #define Q11_POS 9 #define Q02_POS 2 +#define Q03_POS 3 +#define Q12_POS 10 +#define Q21_POS 17 +#define Q30_POS 24 /* * Determine whether block smoothing is applicable and safe. @@ -356,8 +363,8 @@ int ci, coefi; jpeg_component_info *compptr; JQUANT_TBL *qtable; - int *coef_bits; - int *coef_bits_latch; + int *coef_bits, *prev_coef_bits; + int *coef_bits_latch, *prev_coef_bits_latch; if (!cinfo->progressive_mode || cinfo->coef_bits == NULL) return FALSE; @@ -366,34 +373,47 @@ if (coef->coef_bits_latch == NULL) coef->coef_bits_latch = (int *) (*cinfo->mem->alloc_small) ((j_common_ptr)cinfo, JPOOL_IMAGE, - cinfo->num_components * + cinfo->num_components * 2 * (SAVED_COEFS * sizeof(int))); coef_bits_latch = coef->coef_bits_latch; + prev_coef_bits_latch = + &coef->coef_bits_latch[cinfo->num_components * SAVED_COEFS]; for (ci = 0, compptr = cinfo->comp_info; ci < cinfo->num_components; ci++, compptr++) { /* All components' quantization values must already be latched. */ if ((qtable = compptr->quant_table) == NULL) return FALSE; - /* Verify DC & first 5 AC quantizers are nonzero to avoid zero-divide. */ + /* Verify DC & first 9 AC quantizers are nonzero to avoid zero-divide. */ if (qtable->quantval[0] == 0 || qtable->quantval[Q01_POS] == 0 || qtable->quantval[Q10_POS] == 0 || qtable->quantval[Q20_POS] == 0 || qtable->quantval[Q11_POS] == 0 || - qtable->quantval[Q02_POS] == 0) + qtable->quantval[Q02_POS] == 0 || + qtable->quantval[Q03_POS] == 0 || + qtable->quantval[Q12_POS] == 0 || + qtable->quantval[Q21_POS] == 0 || + qtable->quantval[Q30_POS] == 0) return FALSE; /* DC values must be at least partly known for all components. */ coef_bits = cinfo->coef_bits[ci]; + prev_coef_bits = cinfo->coef_bits[ci + cinfo->num_components]; if (coef_bits[0] < 0) return FALSE; + coef_bits_latch[0] = coef_bits[0]; /* Block smoothing is helpful if some AC coefficients remain inaccurate. */ - for (coefi = 1; coefi <= 5; coefi++) { + for (coefi = 1; coefi < SAVED_COEFS; coefi++) { + if (cinfo->input_scan_number > 1) + prev_coef_bits_latch[coefi] = prev_coef_bits[coefi]; + else + prev_coef_bits_latch[coefi] = -1; coef_bits_latch[coefi] = coef_bits[coefi]; if (coef_bits[coefi] != 0) smoothing_useful = TRUE; } coef_bits_latch += SAVED_COEFS; + prev_coef_bits_latch += SAVED_COEFS; } return smoothing_useful; @@ -412,17 +432,20 @@ JDIMENSION block_num, last_block_column; int ci, block_row, block_rows, access_rows; JBLOCKARRAY buffer; - JBLOCKROW buffer_ptr, prev_block_row, next_block_row; + JBLOCKROW buffer_ptr, prev_prev_block_row, prev_block_row; + JBLOCKROW next_block_row, next_next_block_row; JSAMPARRAY output_ptr; JDIMENSION output_col; jpeg_component_info *compptr; inverse_DCT_method_ptr inverse_DCT; - boolean first_row, last_row; + boolean change_dc; JCOEF *workspace; int *coef_bits; JQUANT_TBL *quanttbl; - JLONG Q00, Q01, Q02, Q10, Q11, Q20, num; - int DC1, DC2, DC3, DC4, DC5, DC6, DC7, DC8, DC9; + JLONG Q00, Q01, Q02, Q03 = 0, Q10, Q11, Q12 = 0, Q20, Q21 = 0, Q30 = 0, num; + int DC01, DC02, DC03, DC04, DC05, DC06, DC07, DC08, DC09, DC10, DC11, DC12, + DC13, DC14, DC15, DC16, DC17, DC18, DC19, DC20, DC21, DC22, DC23, DC24, + DC25; int Al, pred; /* Keep a local variable to avoid looking it up more than once */ @@ -434,10 +457,10 @@ if (cinfo->input_scan_number == cinfo->output_scan_number) { /* If input is working on current scan, we ordinarily want it to * have completed the current row. But if input scan is DC, - * we want it to keep one row ahead so that next block row's DC + * we want it to keep two rows ahead so that next two block rows' DC * values are up to date. */ - JDIMENSION delta = (cinfo->Ss == 0) ? 1 : 0; + JDIMENSION delta = (cinfo->Ss == 0) ? 2 : 0; if (cinfo->input_iMCU_row > cinfo->output_iMCU_row + delta) break; } @@ -452,34 +475,53 @@ if (!compptr->component_needed) continue; /* Count non-dummy DCT block rows in this iMCU row. */ - if (cinfo->output_iMCU_row < last_iMCU_row) { + if (cinfo->output_iMCU_row < last_iMCU_row - 1) { + block_rows = compptr->v_samp_factor; + access_rows = block_rows * 3; /* this and next two iMCU rows */ + } else if (cinfo->output_iMCU_row < last_iMCU_row) { block_rows = compptr->v_samp_factor; access_rows = block_rows * 2; /* this and next iMCU row */ - last_row = FALSE; } else { /* NB: can't use last_row_height here; it is input-side-dependent! */ block_rows = (int)(compptr->height_in_blocks % compptr->v_samp_factor); if (block_rows == 0) block_rows = compptr->v_samp_factor; access_rows = block_rows; /* this iMCU row only */ - last_row = TRUE; } /* Align the virtual buffer for this component. */ - if (cinfo->output_iMCU_row > 0) { - access_rows += compptr->v_samp_factor; /* prior iMCU row too */ + if (cinfo->output_iMCU_row > 1) { + access_rows += 2 * compptr->v_samp_factor; /* prior two iMCU rows too */ + buffer = (*cinfo->mem->access_virt_barray) + ((j_common_ptr)cinfo, coef->whole_image[ci], + (cinfo->output_iMCU_row - 2) * compptr->v_samp_factor, + (JDIMENSION)access_rows, FALSE); + buffer += 2 * compptr->v_samp_factor; /* point to current iMCU row */ + } else if (cinfo->output_iMCU_row > 0) { buffer = (*cinfo->mem->access_virt_barray) ((j_common_ptr)cinfo, coef->whole_image[ci], (cinfo->output_iMCU_row - 1) * compptr->v_samp_factor, (JDIMENSION)access_rows, FALSE); buffer += compptr->v_samp_factor; /* point to current iMCU row */ - first_row = FALSE; } else { buffer = (*cinfo->mem->access_virt_barray) ((j_common_ptr)cinfo, coef->whole_image[ci], (JDIMENSION)0, (JDIMENSION)access_rows, FALSE); - first_row = TRUE; } - /* Fetch component-dependent info */ - coef_bits = coef->coef_bits_latch + (ci * SAVED_COEFS); + /* Fetch component-dependent info. + * If the current scan is incomplete, then we use the component-dependent + * info from the previous scan. + */ + if (cinfo->output_iMCU_row > cinfo->master->last_good_iMCU_row) + coef_bits = + coef->coef_bits_latch + ((ci + cinfo->num_components) * SAVED_COEFS); + else + coef_bits = coef->coef_bits_latch + (ci * SAVED_COEFS); + + /* We only do DC interpolation if no AC coefficient data is available. */ + change_dc = + coef_bits[1] == -1 && coef_bits[2] == -1 && coef_bits[3] == -1 && + coef_bits[4] == -1 && coef_bits[5] == -1 && coef_bits[6] == -1 && + coef_bits[7] == -1 && coef_bits[8] == -1 && coef_bits[9] == -1; + quanttbl = compptr->quant_table; Q00 = quanttbl->quantval[0]; Q01 = quanttbl->quantval[Q01_POS]; @@ -487,25 +529,51 @@ Q20 = quanttbl->quantval[Q20_POS]; Q11 = quanttbl->quantval[Q11_POS]; Q02 = quanttbl->quantval[Q02_POS]; + if (change_dc) { + Q03 = quanttbl->quantval[Q03_POS]; + Q12 = quanttbl->quantval[Q12_POS]; + Q21 = quanttbl->quantval[Q21_POS]; + Q30 = quanttbl->quantval[Q30_POS]; + } inverse_DCT = cinfo->idct->inverse_DCT[ci]; output_ptr = output_buf[ci]; /* Loop over all DCT blocks to be processed. */ for (block_row = 0; block_row < block_rows; block_row++) { buffer_ptr = buffer[block_row] + cinfo->master->first_MCU_col[ci]; - if (first_row && block_row == 0) + + if (block_row > 0 || cinfo->output_iMCU_row > 0) + prev_block_row = + buffer[block_row - 1] + cinfo->master->first_MCU_col[ci]; + else prev_block_row = buffer_ptr; + + if (block_row > 1 || cinfo->output_iMCU_row > 1) + prev_prev_block_row = + buffer[block_row - 2] + cinfo->master->first_MCU_col[ci]; + else + prev_prev_block_row = prev_block_row; + + if (block_row < block_rows - 1 || cinfo->output_iMCU_row < last_iMCU_row) + next_block_row = + buffer[block_row + 1] + cinfo->master->first_MCU_col[ci]; else - prev_block_row = buffer[block_row - 1]; - if (last_row && block_row == block_rows - 1) next_block_row = buffer_ptr; + + if (block_row < block_rows - 2 || + cinfo->output_iMCU_row < last_iMCU_row - 1) + next_next_block_row = + buffer[block_row + 2] + cinfo->master->first_MCU_col[ci]; else - next_block_row = buffer[block_row + 1]; + next_next_block_row = next_block_row; + /* We fetch the surrounding DC values using a sliding-register approach. - * Initialize all nine here so as to do the right thing on narrow pics. + * Initialize all 25 here so as to do the right thing on narrow pics. */ - DC1 = DC2 = DC3 = (int)prev_block_row[0][0]; - DC4 = DC5 = DC6 = (int)buffer_ptr[0][0]; - DC7 = DC8 = DC9 = (int)next_block_row[0][0]; + DC01 = DC02 = DC03 = DC04 = DC05 = (int)prev_prev_block_row[0][0]; + DC06 = DC07 = DC08 = DC09 = DC10 = (int)prev_block_row[0][0]; + DC11 = DC12 = DC13 = DC14 = DC15 = (int)buffer_ptr[0][0]; + DC16 = DC17 = DC18 = DC19 = DC20 = (int)next_block_row[0][0]; + DC21 = DC22 = DC23 = DC24 = DC25 = (int)next_next_block_row[0][0]; output_col = 0; last_block_column = compptr->width_in_blocks - 1; for (block_num = cinfo->master->first_MCU_col[ci]; @@ -513,18 +581,39 @@ /* Fetch current DCT block into workspace so we can modify it. */ jcopy_block_row(buffer_ptr, (JBLOCKROW)workspace, (JDIMENSION)1); /* Update DC values */ - if (block_num < last_block_column) { - DC3 = (int)prev_block_row[1][0]; - DC6 = (int)buffer_ptr[1][0]; - DC9 = (int)next_block_row[1][0]; + if (block_num == cinfo->master->first_MCU_col[ci] && + block_num < last_block_column) { + DC04 = (int)prev_prev_block_row[1][0]; + DC09 = (int)prev_block_row[1][0]; + DC14 = (int)buffer_ptr[1][0]; + DC19 = (int)next_block_row[1][0]; + DC24 = (int)next_next_block_row[1][0]; } - /* Compute coefficient estimates per K.8. - * An estimate is applied only if coefficient is still zero, - * and is not known to be fully accurate. + if (block_num + 1 < last_block_column) { + DC05 = (int)prev_prev_block_row[2][0]; + DC10 = (int)prev_block_row[2][0]; + DC15 = (int)buffer_ptr[2][0]; + DC20 = (int)next_block_row[2][0]; + DC25 = (int)next_next_block_row[2][0]; + } + /* If DC interpolation is enabled, compute coefficient estimates using + * a Gaussian-like kernel, keeping the averages of the DC values. + * + * If DC interpolation is disabled, compute coefficient estimates using + * an algorithm similar to the one described in Section K.8 of the JPEG + * standard, except applied to a 5x5 window rather than a 3x3 window. + * + * An estimate is applied only if the coefficient is still zero and is + * not known to be fully accurate. */ /* AC01 */ if ((Al = coef_bits[1]) != 0 && workspace[1] == 0) { - num = 36 * Q00 * (DC4 - DC6); + num = Q00 * (change_dc ? + (-DC01 - DC02 + DC04 + DC05 - 3 * DC06 + 13 * DC07 - + 13 * DC09 + 3 * DC10 - 3 * DC11 + 38 * DC12 - 38 * DC14 + + 3 * DC15 - 3 * DC16 + 13 * DC17 - 13 * DC19 + 3 * DC20 - + DC21 - DC22 + DC24 + DC25) : + (-7 * DC11 + 50 * DC12 - 50 * DC14 + 7 * DC15)); if (num >= 0) { pred = (int)(((Q01 << 7) + num) / (Q01 << 8)); if (Al > 0 && pred >= (1 << Al)) @@ -539,7 +628,12 @@ } /* AC10 */ if ((Al = coef_bits[2]) != 0 && workspace[8] == 0) { - num = 36 * Q00 * (DC2 - DC8); + num = Q00 * (change_dc ? + (-DC01 - 3 * DC02 - 3 * DC03 - 3 * DC04 - DC05 - DC06 + + 13 * DC07 + 38 * DC08 + 13 * DC09 - DC10 + DC16 - + 13 * DC17 - 38 * DC18 - 13 * DC19 + DC20 + DC21 + + 3 * DC22 + 3 * DC23 + 3 * DC24 + DC25) : + (-7 * DC03 + 50 * DC08 - 50 * DC18 + 7 * DC23)); if (num >= 0) { pred = (int)(((Q10 << 7) + num) / (Q10 << 8)); if (Al > 0 && pred >= (1 << Al)) @@ -554,7 +648,10 @@ } /* AC20 */ if ((Al = coef_bits[3]) != 0 && workspace[16] == 0) { - num = 9 * Q00 * (DC2 + DC8 - 2 * DC5); + num = Q00 * (change_dc ? + (DC03 + 2 * DC07 + 7 * DC08 + 2 * DC09 - 5 * DC12 - 14 * DC13 - + 5 * DC14 + 2 * DC17 + 7 * DC18 + 2 * DC19 + DC23) : + (-DC03 + 13 * DC08 - 24 * DC13 + 13 * DC18 - DC23)); if (num >= 0) { pred = (int)(((Q20 << 7) + num) / (Q20 << 8)); if (Al > 0 && pred >= (1 << Al)) @@ -569,7 +666,11 @@ } /* AC11 */ if ((Al = coef_bits[4]) != 0 && workspace[9] == 0) { - num = 5 * Q00 * (DC1 - DC3 - DC7 + DC9); + num = Q00 * (change_dc ? + (-DC01 + DC05 + 9 * DC07 - 9 * DC09 - 9 * DC17 + + 9 * DC19 + DC21 - DC25) : + (DC10 + DC16 - 10 * DC17 + 10 * DC19 - DC02 - DC20 + DC22 - + DC24 + DC04 - DC06 + 10 * DC07 - 10 * DC09)); if (num >= 0) { pred = (int)(((Q11 << 7) + num) / (Q11 << 8)); if (Al > 0 && pred >= (1 << Al)) @@ -584,7 +685,10 @@ } /* AC02 */ if ((Al = coef_bits[5]) != 0 && workspace[2] == 0) { - num = 9 * Q00 * (DC4 + DC6 - 2 * DC5); + num = Q00 * (change_dc ? + (2 * DC07 - 5 * DC08 + 2 * DC09 + DC11 + 7 * DC12 - 14 * DC13 + + 7 * DC14 + DC15 + 2 * DC17 - 5 * DC18 + 2 * DC19) : + (-DC11 + 13 * DC12 - 24 * DC13 + 13 * DC14 - DC15)); if (num >= 0) { pred = (int)(((Q02 << 7) + num) / (Q02 << 8)); if (Al > 0 && pred >= (1 << Al)) @@ -597,14 +701,96 @@ } workspace[2] = (JCOEF)pred; } + if (change_dc) { + /* AC03 */ + if ((Al = coef_bits[6]) != 0 && workspace[3] == 0) { + num = Q00 * (DC07 - DC09 + 2 * DC12 - 2 * DC14 + DC17 - DC19); + if (num >= 0) { + pred = (int)(((Q03 << 7) + num) / (Q03 << 8)); + if (Al > 0 && pred >= (1 << Al)) + pred = (1 << Al) - 1; + } else { + pred = (int)(((Q03 << 7) - num) / (Q03 << 8)); + if (Al > 0 && pred >= (1 << Al)) + pred = (1 << Al) - 1; + pred = -pred; + } + workspace[3] = (JCOEF)pred; + } + /* AC12 */ + if ((Al = coef_bits[7]) != 0 && workspace[10] == 0) { + num = Q00 * (DC07 - 3 * DC08 + DC09 - DC17 + 3 * DC18 - DC19); + if (num >= 0) { + pred = (int)(((Q12 << 7) + num) / (Q12 << 8)); + if (Al > 0 && pred >= (1 << Al)) + pred = (1 << Al) - 1; + } else { + pred = (int)(((Q12 << 7) - num) / (Q12 << 8)); + if (Al > 0 && pred >= (1 << Al)) + pred = (1 << Al) - 1; + pred = -pred; + } + workspace[10] = (JCOEF)pred; + } + /* AC21 */ + if ((Al = coef_bits[8]) != 0 && workspace[17] == 0) { + num = Q00 * (DC07 - DC09 - 3 * DC12 + 3 * DC14 + DC17 - DC19); + if (num >= 0) { + pred = (int)(((Q21 << 7) + num) / (Q21 << 8)); + if (Al > 0 && pred >= (1 << Al)) + pred = (1 << Al) - 1; + } else { + pred = (int)(((Q21 << 7) - num) / (Q21 << 8)); + if (Al > 0 && pred >= (1 << Al)) + pred = (1 << Al) - 1; + pred = -pred; + } + workspace[17] = (JCOEF)pred; + } + /* AC30 */ + if ((Al = coef_bits[9]) != 0 && workspace[24] == 0) { + num = Q00 * (DC07 + 2 * DC08 + DC09 - DC17 - 2 * DC18 - DC19); + if (num >= 0) { + pred = (int)(((Q30 << 7) + num) / (Q30 << 8)); + if (Al > 0 && pred >= (1 << Al)) + pred = (1 << Al) - 1; + } else { + pred = (int)(((Q30 << 7) - num) / (Q30 << 8)); + if (Al > 0 && pred >= (1 << Al)) + pred = (1 << Al) - 1; + pred = -pred; + } + workspace[24] = (JCOEF)pred; + } + /* coef_bits[0] is non-negative. Otherwise this function would not + * be called. + */ + num = Q00 * + (-2 * DC01 - 6 * DC02 - 8 * DC03 - 6 * DC04 - 2 * DC05 - + 6 * DC06 + 6 * DC07 + 42 * DC08 + 6 * DC09 - 6 * DC10 - + 8 * DC11 + 42 * DC12 + 152 * DC13 + 42 * DC14 - 8 * DC15 - + 6 * DC16 + 6 * DC17 + 42 * DC18 + 6 * DC19 - 6 * DC20 - + 2 * DC21 - 6 * DC22 - 8 * DC23 - 6 * DC24 - 2 * DC25); + if (num >= 0) { + pred = (int)(((Q00 << 7) + num) / (Q00 << 8)); + } else { + pred = (int)(((Q00 << 7) - num) / (Q00 << 8)); + pred = -pred; + } + workspace[0] = (JCOEF)pred; + } /* change_dc */ + /* OK, do the IDCT */ (*inverse_DCT) (cinfo, compptr, (JCOEFPTR)workspace, output_ptr, output_col); /* Advance for next column */ - DC1 = DC2; DC2 = DC3; - DC4 = DC5; DC5 = DC6; - DC7 = DC8; DC8 = DC9; - buffer_ptr++, prev_block_row++, next_block_row++; + DC01 = DC02; DC02 = DC03; DC03 = DC04; DC04 = DC05; + DC06 = DC07; DC07 = DC08; DC08 = DC09; DC09 = DC10; + DC11 = DC12; DC12 = DC13; DC13 = DC14; DC14 = DC15; + DC16 = DC17; DC17 = DC18; DC18 = DC19; DC19 = DC20; + DC21 = DC22; DC22 = DC23; DC23 = DC24; DC24 = DC25; + buffer_ptr++, prev_block_row++, next_block_row++, + prev_prev_block_row++, next_next_block_row++; output_col += compptr->_DCT_scaled_size; } output_ptr += compptr->_DCT_scaled_size; @@ -653,7 +839,7 @@ #ifdef BLOCK_SMOOTHING_SUPPORTED /* If block smoothing could be used, need a bigger window */ if (cinfo->progressive_mode) - access_rows *= 3; + access_rows *= 5; #endif coef->whole_image[ci] = (*cinfo->mem->request_virt_barray) ((j_common_ptr)cinfo, JPOOL_IMAGE, TRUE, diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/jdcoefct.h b/src/3rdparty/chromium/third_party/libjpeg_turbo/jdcoefct.h --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/jdcoefct.h 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/jdcoefct.h 2021-11-20 03:41:33.393600530 +0000 @@ -5,6 +5,7 @@ * Copyright (C) 1994-1997, Thomas G. Lane. * libjpeg-turbo Modifications: * Copyright 2009 Pierre Ossman for Cendio AB + * Copyright (C) 2020, Google, Inc. * For conditions of distribution and use, see the accompanying README.ijg * file. */ @@ -51,7 +52,7 @@ #ifdef BLOCK_SMOOTHING_SUPPORTED /* When doing block smoothing, we latch coefficient Al values here */ int *coef_bits_latch; -#define SAVED_COEFS 6 /* we save coef_bits[0..5] */ +#define SAVED_COEFS 10 /* we save coef_bits[0..9] */ #endif } my_coef_controller; diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/jdcol565.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/jdcol565.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/jdcol565.c 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/jdcol565.c 2021-11-20 03:41:33.393600530 +0000 @@ -45,9 +45,9 @@ outptr = *output_buf++; if (PACK_NEED_ALIGNMENT(outptr)) { - y = GETJSAMPLE(*inptr0++); - cb = GETJSAMPLE(*inptr1++); - cr = GETJSAMPLE(*inptr2++); + y = *inptr0++; + cb = *inptr1++; + cr = *inptr2++; r = range_limit[y + Crrtab[cr]]; g = range_limit[y + ((int)RIGHT_SHIFT(Cbgtab[cb] + Crgtab[cr], SCALEBITS))]; @@ -58,18 +58,18 @@ num_cols--; } for (col = 0; col < (num_cols >> 1); col++) { - y = GETJSAMPLE(*inptr0++); - cb = GETJSAMPLE(*inptr1++); - cr = GETJSAMPLE(*inptr2++); + y = *inptr0++; + cb = *inptr1++; + cr = *inptr2++; r = range_limit[y + Crrtab[cr]]; g = range_limit[y + ((int)RIGHT_SHIFT(Cbgtab[cb] + Crgtab[cr], SCALEBITS))]; b = range_limit[y + Cbbtab[cb]]; rgb = PACK_SHORT_565(r, g, b); - y = GETJSAMPLE(*inptr0++); - cb = GETJSAMPLE(*inptr1++); - cr = GETJSAMPLE(*inptr2++); + y = *inptr0++; + cb = *inptr1++; + cr = *inptr2++; r = range_limit[y + Crrtab[cr]]; g = range_limit[y + ((int)RIGHT_SHIFT(Cbgtab[cb] + Crgtab[cr], SCALEBITS))]; @@ -80,9 +80,9 @@ outptr += 4; } if (num_cols & 1) { - y = GETJSAMPLE(*inptr0); - cb = GETJSAMPLE(*inptr1); - cr = GETJSAMPLE(*inptr2); + y = *inptr0; + cb = *inptr1; + cr = *inptr2; r = range_limit[y + Crrtab[cr]]; g = range_limit[y + ((int)RIGHT_SHIFT(Cbgtab[cb] + Crgtab[cr], SCALEBITS))]; @@ -125,9 +125,9 @@ input_row++; outptr = *output_buf++; if (PACK_NEED_ALIGNMENT(outptr)) { - y = GETJSAMPLE(*inptr0++); - cb = GETJSAMPLE(*inptr1++); - cr = GETJSAMPLE(*inptr2++); + y = *inptr0++; + cb = *inptr1++; + cr = *inptr2++; r = range_limit[DITHER_565_R(y + Crrtab[cr], d0)]; g = range_limit[DITHER_565_G(y + ((int)RIGHT_SHIFT(Cbgtab[cb] + Crgtab[cr], @@ -139,9 +139,9 @@ num_cols--; } for (col = 0; col < (num_cols >> 1); col++) { - y = GETJSAMPLE(*inptr0++); - cb = GETJSAMPLE(*inptr1++); - cr = GETJSAMPLE(*inptr2++); + y = *inptr0++; + cb = *inptr1++; + cr = *inptr2++; r = range_limit[DITHER_565_R(y + Crrtab[cr], d0)]; g = range_limit[DITHER_565_G(y + ((int)RIGHT_SHIFT(Cbgtab[cb] + Crgtab[cr], @@ -150,9 +150,9 @@ d0 = DITHER_ROTATE(d0); rgb = PACK_SHORT_565(r, g, b); - y = GETJSAMPLE(*inptr0++); - cb = GETJSAMPLE(*inptr1++); - cr = GETJSAMPLE(*inptr2++); + y = *inptr0++; + cb = *inptr1++; + cr = *inptr2++; r = range_limit[DITHER_565_R(y + Crrtab[cr], d0)]; g = range_limit[DITHER_565_G(y + ((int)RIGHT_SHIFT(Cbgtab[cb] + Crgtab[cr], @@ -165,9 +165,9 @@ outptr += 4; } if (num_cols & 1) { - y = GETJSAMPLE(*inptr0); - cb = GETJSAMPLE(*inptr1); - cr = GETJSAMPLE(*inptr2); + y = *inptr0; + cb = *inptr1; + cr = *inptr2; r = range_limit[DITHER_565_R(y + Crrtab[cr], d0)]; g = range_limit[DITHER_565_G(y + ((int)RIGHT_SHIFT(Cbgtab[cb] + Crgtab[cr], @@ -202,32 +202,32 @@ input_row++; outptr = *output_buf++; if (PACK_NEED_ALIGNMENT(outptr)) { - r = GETJSAMPLE(*inptr0++); - g = GETJSAMPLE(*inptr1++); - b = GETJSAMPLE(*inptr2++); + r = *inptr0++; + g = *inptr1++; + b = *inptr2++; rgb = PACK_SHORT_565(r, g, b); *(INT16 *)outptr = (INT16)rgb; outptr += 2; num_cols--; } for (col = 0; col < (num_cols >> 1); col++) { - r = GETJSAMPLE(*inptr0++); - g = GETJSAMPLE(*inptr1++); - b = GETJSAMPLE(*inptr2++); + r = *inptr0++; + g = *inptr1++; + b = *inptr2++; rgb = PACK_SHORT_565(r, g, b); - r = GETJSAMPLE(*inptr0++); - g = GETJSAMPLE(*inptr1++); - b = GETJSAMPLE(*inptr2++); + r = *inptr0++; + g = *inptr1++; + b = *inptr2++; rgb = PACK_TWO_PIXELS(rgb, PACK_SHORT_565(r, g, b)); WRITE_TWO_ALIGNED_PIXELS(outptr, rgb); outptr += 4; } if (num_cols & 1) { - r = GETJSAMPLE(*inptr0); - g = GETJSAMPLE(*inptr1); - b = GETJSAMPLE(*inptr2); + r = *inptr0; + g = *inptr1; + b = *inptr2; rgb = PACK_SHORT_565(r, g, b); *(INT16 *)outptr = (INT16)rgb; } @@ -259,24 +259,24 @@ input_row++; outptr = *output_buf++; if (PACK_NEED_ALIGNMENT(outptr)) { - r = range_limit[DITHER_565_R(GETJSAMPLE(*inptr0++), d0)]; - g = range_limit[DITHER_565_G(GETJSAMPLE(*inptr1++), d0)]; - b = range_limit[DITHER_565_B(GETJSAMPLE(*inptr2++), d0)]; + r = range_limit[DITHER_565_R(*inptr0++, d0)]; + g = range_limit[DITHER_565_G(*inptr1++, d0)]; + b = range_limit[DITHER_565_B(*inptr2++, d0)]; rgb = PACK_SHORT_565(r, g, b); *(INT16 *)outptr = (INT16)rgb; outptr += 2; num_cols--; } for (col = 0; col < (num_cols >> 1); col++) { - r = range_limit[DITHER_565_R(GETJSAMPLE(*inptr0++), d0)]; - g = range_limit[DITHER_565_G(GETJSAMPLE(*inptr1++), d0)]; - b = range_limit[DITHER_565_B(GETJSAMPLE(*inptr2++), d0)]; + r = range_limit[DITHER_565_R(*inptr0++, d0)]; + g = range_limit[DITHER_565_G(*inptr1++, d0)]; + b = range_limit[DITHER_565_B(*inptr2++, d0)]; d0 = DITHER_ROTATE(d0); rgb = PACK_SHORT_565(r, g, b); - r = range_limit[DITHER_565_R(GETJSAMPLE(*inptr0++), d0)]; - g = range_limit[DITHER_565_G(GETJSAMPLE(*inptr1++), d0)]; - b = range_limit[DITHER_565_B(GETJSAMPLE(*inptr2++), d0)]; + r = range_limit[DITHER_565_R(*inptr0++, d0)]; + g = range_limit[DITHER_565_G(*inptr1++, d0)]; + b = range_limit[DITHER_565_B(*inptr2++, d0)]; d0 = DITHER_ROTATE(d0); rgb = PACK_TWO_PIXELS(rgb, PACK_SHORT_565(r, g, b)); @@ -284,9 +284,9 @@ outptr += 4; } if (num_cols & 1) { - r = range_limit[DITHER_565_R(GETJSAMPLE(*inptr0), d0)]; - g = range_limit[DITHER_565_G(GETJSAMPLE(*inptr1), d0)]; - b = range_limit[DITHER_565_B(GETJSAMPLE(*inptr2), d0)]; + r = range_limit[DITHER_565_R(*inptr0, d0)]; + g = range_limit[DITHER_565_G(*inptr1, d0)]; + b = range_limit[DITHER_565_B(*inptr2, d0)]; rgb = PACK_SHORT_565(r, g, b); *(INT16 *)outptr = (INT16)rgb; } diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/jdcolext.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/jdcolext.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/jdcolext.c 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/jdcolext.c 2021-11-20 03:41:33.393600530 +0000 @@ -53,9 +53,9 @@ input_row++; outptr = *output_buf++; for (col = 0; col < num_cols; col++) { - y = GETJSAMPLE(inptr0[col]); - cb = GETJSAMPLE(inptr1[col]); - cr = GETJSAMPLE(inptr2[col]); + y = inptr0[col]; + cb = inptr1[col]; + cr = inptr2[col]; /* Range-limiting is essential due to noise introduced by DCT losses. */ outptr[RGB_RED] = range_limit[y + Crrtab[cr]]; outptr[RGB_GREEN] = range_limit[y + @@ -93,7 +93,6 @@ inptr = input_buf[0][input_row++]; outptr = *output_buf++; for (col = 0; col < num_cols; col++) { - /* We can dispense with GETJSAMPLE() here */ outptr[RGB_RED] = outptr[RGB_GREEN] = outptr[RGB_BLUE] = inptr[col]; /* Set unused byte to 0xFF so it can be interpreted as an opaque */ /* alpha channel value */ @@ -128,7 +127,6 @@ input_row++; outptr = *output_buf++; for (col = 0; col < num_cols; col++) { - /* We can dispense with GETJSAMPLE() here */ outptr[RGB_RED] = inptr0[col]; outptr[RGB_GREEN] = inptr1[col]; outptr[RGB_BLUE] = inptr2[col]; diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/jdcolor.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/jdcolor.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/jdcolor.c 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/jdcolor.c 2021-11-20 03:41:33.393600530 +0000 @@ -341,9 +341,9 @@ input_row++; outptr = *output_buf++; for (col = 0; col < num_cols; col++) { - r = GETJSAMPLE(inptr0[col]); - g = GETJSAMPLE(inptr1[col]); - b = GETJSAMPLE(inptr2[col]); + r = inptr0[col]; + g = inptr1[col]; + b = inptr2[col]; /* Y */ outptr[col] = (JSAMPLE)((ctab[r + R_Y_OFF] + ctab[g + G_Y_OFF] + ctab[b + B_Y_OFF]) >> SCALEBITS); @@ -550,9 +550,9 @@ input_row++; outptr = *output_buf++; for (col = 0; col < num_cols; col++) { - y = GETJSAMPLE(inptr0[col]); - cb = GETJSAMPLE(inptr1[col]); - cr = GETJSAMPLE(inptr2[col]); + y = inptr0[col]; + cb = inptr1[col]; + cr = inptr2[col]; /* Range-limiting is essential due to noise introduced by DCT losses. */ outptr[0] = range_limit[MAXJSAMPLE - (y + Crrtab[cr])]; /* red */ outptr[1] = range_limit[MAXJSAMPLE - (y + /* green */ @@ -560,7 +560,7 @@ SCALEBITS)))]; outptr[2] = range_limit[MAXJSAMPLE - (y + Cbbtab[cb])]; /* blue */ /* K passes through unchanged */ - outptr[3] = inptr3[col]; /* don't need GETJSAMPLE here */ + outptr[3] = inptr3[col]; outptr += 4; } } @@ -571,11 +571,10 @@ * RGB565 conversion */ -#define PACK_SHORT_565_LE(r, g, b) ((((r) << 8) & 0xF800) | \ - (((g) << 3) & 0x7E0) | ((b) >> 3)) -#define PACK_SHORT_565_BE(r, g, b) (((r) & 0xF8) | ((g) >> 5) | \ - (((g) << 11) & 0xE000) | \ - (((b) << 5) & 0x1F00)) +#define PACK_SHORT_565_LE(r, g, b) \ + ((((r) << 8) & 0xF800) | (((g) << 3) & 0x7E0) | ((b) >> 3)) +#define PACK_SHORT_565_BE(r, g, b) \ + (((r) & 0xF8) | ((g) >> 5) | (((g) << 11) & 0xE000) | (((b) << 5) & 0x1F00)) #define PACK_TWO_PIXELS_LE(l, r) ((r << 16) | l) #define PACK_TWO_PIXELS_BE(l, r) ((l << 16) | r) diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/jdhuff.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/jdhuff.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/jdhuff.c 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/jdhuff.c 2021-11-20 03:41:33.393600530 +0000 @@ -5,6 +5,7 @@ * Copyright (C) 1991-1997, Thomas G. Lane. * libjpeg-turbo Modifications: * Copyright (C) 2009-2011, 2016, 2018-2019, D. R. Commander. + * Copyright (C) 2018, Matthias Räncker. * For conditions of distribution and use, see the accompanying README.ijg * file. * @@ -39,24 +40,6 @@ int last_dc_val[MAX_COMPS_IN_SCAN]; /* last DC coef for each component */ } savable_state; -/* This macro is to work around compilers with missing or broken - * structure assignment. You'll need to fix this code if you have - * such a compiler and you change MAX_COMPS_IN_SCAN. - */ - -#ifndef NO_STRUCT_ASSIGN -#define ASSIGN_STATE(dest, src) ((dest) = (src)) -#else -#if MAX_COMPS_IN_SCAN == 4 -#define ASSIGN_STATE(dest, src) \ - ((dest).last_dc_val[0] = (src).last_dc_val[0], \ - (dest).last_dc_val[1] = (src).last_dc_val[1], \ - (dest).last_dc_val[2] = (src).last_dc_val[2], \ - (dest).last_dc_val[3] = (src).last_dc_val[3]) -#endif -#endif - - typedef struct { struct jpeg_entropy_decoder pub; /* public fields */ @@ -325,7 +308,7 @@ bytes_in_buffer = cinfo->src->bytes_in_buffer; } bytes_in_buffer--; - c = GETJOCTET(*next_input_byte++); + c = *next_input_byte++; /* If it's 0xFF, check and discard stuffed zero byte */ if (c == 0xFF) { @@ -342,7 +325,7 @@ bytes_in_buffer = cinfo->src->bytes_in_buffer; } bytes_in_buffer--; - c = GETJOCTET(*next_input_byte++); + c = *next_input_byte++; } while (c == 0xFF); if (c == 0) { @@ -405,8 +388,8 @@ #define GET_BYTE { \ register int c0, c1; \ - c0 = GETJOCTET(*buffer++); \ - c1 = GETJOCTET(*buffer); \ + c0 = *buffer++; \ + c1 = *buffer; \ /* Pre-execute most common case */ \ get_buffer = (get_buffer << 8) | c0; \ bits_left += 8; \ @@ -423,7 +406,7 @@ } \ } -#if SIZEOF_SIZE_T == 8 || defined(_WIN64) +#if SIZEOF_SIZE_T == 8 || defined(_WIN64) || (defined(__x86_64__) && defined(__ILP32__)) /* Pre-fetch 48 bytes, because the holding register is 64-bit */ #define FILL_BIT_BUFFER_FAST \ @@ -557,6 +540,12 @@ } +#if defined(__has_feature) +#if __has_feature(undefined_behavior_sanitizer) +__attribute__((no_sanitize("signed-integer-overflow"), + no_sanitize("unsigned-integer-overflow"))) +#endif +#endif LOCAL(boolean) decode_mcu_slow(j_decompress_ptr cinfo, JBLOCKROW *MCU_data) { @@ -568,7 +557,7 @@ /* Load up working state */ BITREAD_LOAD_STATE(cinfo, entropy->bitstate); - ASSIGN_STATE(state, entropy->saved); + state = entropy->saved; for (blkn = 0; blkn < cinfo->blocks_in_MCU; blkn++) { JBLOCKROW block = MCU_data ? MCU_data[blkn] : NULL; @@ -589,11 +578,15 @@ if (entropy->dc_needed[blkn]) { /* Convert DC difference to actual value, update last_dc_val */ int ci = cinfo->MCU_membership[blkn]; - /* This is really just - * s += state.last_dc_val[ci]; - * It is written this way in order to shut up UBSan. + /* Certain malformed JPEG images produce repeated DC coefficient + * differences of 2047 or -2047, which causes state.last_dc_val[ci] to + * grow until it overflows or underflows a 32-bit signed integer. This + * behavior is, to the best of our understanding, innocuous, and it is + * unclear how to work around it without potentially affecting + * performance. Thus, we (hopefully temporarily) suppress UBSan integer + * overflow errors for this function and decode_mcu_fast(). */ - s = (int)((unsigned int)s + (unsigned int)state.last_dc_val[ci]); + s += state.last_dc_val[ci]; state.last_dc_val[ci] = s; if (block) { /* Output the DC coefficient (assumes jpeg_natural_order[0] = 0) */ @@ -653,11 +646,17 @@ /* Completed MCU, so update state */ BITREAD_SAVE_STATE(cinfo, entropy->bitstate); - ASSIGN_STATE(entropy->saved, state); + entropy->saved = state; return TRUE; } +#if defined(__has_feature) +#if __has_feature(undefined_behavior_sanitizer) +__attribute__((no_sanitize("signed-integer-overflow"), + no_sanitize("unsigned-integer-overflow"))) +#endif +#endif LOCAL(boolean) decode_mcu_fast(j_decompress_ptr cinfo, JBLOCKROW *MCU_data) { @@ -671,7 +670,7 @@ /* Load up working state */ BITREAD_LOAD_STATE(cinfo, entropy->bitstate); buffer = (JOCTET *)br_state.next_input_byte; - ASSIGN_STATE(state, entropy->saved); + state = entropy->saved; for (blkn = 0; blkn < cinfo->blocks_in_MCU; blkn++) { JBLOCKROW block = MCU_data ? MCU_data[blkn] : NULL; @@ -679,7 +678,7 @@ d_derived_tbl *actbl = entropy->ac_cur_tbls[blkn]; register int s, k, r, l; - HUFF_DECODE_FAST(s, l, dctbl, slow_decode_mcu); + HUFF_DECODE_FAST(s, l, dctbl); if (s) { FILL_BIT_BUFFER_FAST r = GET_BITS(s); @@ -688,7 +687,10 @@ if (entropy->dc_needed[blkn]) { int ci = cinfo->MCU_membership[blkn]; - s = (int)((unsigned int)s + (unsigned int)state.last_dc_val[ci]); + /* Refer to the comment in decode_mcu_slow() regarding the supression of + * a UBSan integer overflow error in this line of code. + */ + s += state.last_dc_val[ci]; state.last_dc_val[ci] = s; if (block) (*block)[0] = (JCOEF)s; @@ -697,7 +699,7 @@ if (entropy->ac_needed[blkn] && block) { for (k = 1; k < DCTSIZE2; k++) { - HUFF_DECODE_FAST(s, l, actbl, slow_decode_mcu); + HUFF_DECODE_FAST(s, l, actbl); r = s >> 4; s &= 15; @@ -716,7 +718,7 @@ } else { for (k = 1; k < DCTSIZE2; k++) { - HUFF_DECODE_FAST(s, l, actbl, slow_decode_mcu); + HUFF_DECODE_FAST(s, l, actbl); r = s >> 4; s &= 15; @@ -733,7 +735,6 @@ } if (cinfo->unread_marker != 0) { -slow_decode_mcu: cinfo->unread_marker = 0; return FALSE; } @@ -741,7 +742,7 @@ br_state.bytes_in_buffer -= (buffer - br_state.next_input_byte); br_state.next_input_byte = buffer; BITREAD_SAVE_STATE(cinfo, entropy->bitstate); - ASSIGN_STATE(entropy->saved, state); + entropy->saved = state; return TRUE; } @@ -796,7 +797,8 @@ } /* Account for restart interval (no-op if not using restarts) */ - entropy->restarts_to_go--; + if (cinfo->restart_interval) + entropy->restarts_to_go--; return TRUE; } diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/jdhuff.h b/src/3rdparty/chromium/third_party/libjpeg_turbo/jdhuff.h --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/jdhuff.h 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/jdhuff.h 2021-11-20 03:41:33.393600530 +0000 @@ -4,7 +4,8 @@ * This file was part of the Independent JPEG Group's software: * Copyright (C) 1991-1997, Thomas G. Lane. * libjpeg-turbo Modifications: - * Copyright (C) 2010-2011, 2015-2016, D. R. Commander. + * Copyright (C) 2010-2011, 2015-2016, 2021, D. R. Commander. + * Copyright (C) 2018, Matthias Räncker. * For conditions of distribution and use, see the accompanying README.ijg * file. * @@ -78,6 +79,11 @@ typedef size_t bit_buf_type; /* type of bit-extraction buffer */ #define BIT_BUF_SIZE 64 /* size of buffer in bits */ +#elif defined(__x86_64__) && defined(__ILP32__) + +typedef unsigned long long bit_buf_type; /* type of bit-extraction buffer */ +#define BIT_BUF_SIZE 64 /* size of buffer in bits */ + #else typedef unsigned long bit_buf_type; /* type of bit-extraction buffer */ @@ -211,7 +217,7 @@ } \ } -#define HUFF_DECODE_FAST(s, nb, htbl, slowlabel) \ +#define HUFF_DECODE_FAST(s, nb, htbl) \ FILL_BIT_BUFFER_FAST; \ s = PEEK_BITS(HUFF_LOOKAHEAD); \ s = htbl->lookup[s]; \ @@ -229,8 +235,9 @@ nb++; \ } \ if (nb > 16) \ - goto slowlabel; \ - s = htbl->pub->huffval[(int)(s + htbl->valoffset[nb]) & 0xFF]; \ + s = 0; \ + else \ + s = htbl->pub->huffval[(int)(s + htbl->valoffset[nb]) & 0xFF]; \ } /* Out-of-line case for Huffman code fetching */ diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/jdicc.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/jdicc.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/jdicc.c 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/jdicc.c 2021-11-20 03:41:33.393600530 +0000 @@ -38,18 +38,18 @@ marker->marker == ICC_MARKER && marker->data_length >= ICC_OVERHEAD_LEN && /* verify the identifying string */ - GETJOCTET(marker->data[0]) == 0x49 && - GETJOCTET(marker->data[1]) == 0x43 && - GETJOCTET(marker->data[2]) == 0x43 && - GETJOCTET(marker->data[3]) == 0x5F && - GETJOCTET(marker->data[4]) == 0x50 && - GETJOCTET(marker->data[5]) == 0x52 && - GETJOCTET(marker->data[6]) == 0x4F && - GETJOCTET(marker->data[7]) == 0x46 && - GETJOCTET(marker->data[8]) == 0x49 && - GETJOCTET(marker->data[9]) == 0x4C && - GETJOCTET(marker->data[10]) == 0x45 && - GETJOCTET(marker->data[11]) == 0x0; + marker->data[0] == 0x49 && + marker->data[1] == 0x43 && + marker->data[2] == 0x43 && + marker->data[3] == 0x5F && + marker->data[4] == 0x50 && + marker->data[5] == 0x52 && + marker->data[6] == 0x4F && + marker->data[7] == 0x46 && + marker->data[8] == 0x49 && + marker->data[9] == 0x4C && + marker->data[10] == 0x45 && + marker->data[11] == 0x0; } @@ -102,12 +102,12 @@ for (marker = cinfo->marker_list; marker != NULL; marker = marker->next) { if (marker_is_icc(marker)) { if (num_markers == 0) - num_markers = GETJOCTET(marker->data[13]); - else if (num_markers != GETJOCTET(marker->data[13])) { + num_markers = marker->data[13]; + else if (num_markers != marker->data[13]) { WARNMS(cinfo, JWRN_BOGUS_ICC); /* inconsistent num_markers fields */ return FALSE; } - seq_no = GETJOCTET(marker->data[12]); + seq_no = marker->data[12]; if (seq_no <= 0 || seq_no > num_markers) { WARNMS(cinfo, JWRN_BOGUS_ICC); /* bogus sequence number */ return FALSE; @@ -154,7 +154,7 @@ JOCTET FAR *src_ptr; JOCTET *dst_ptr; unsigned int length; - seq_no = GETJOCTET(marker->data[12]); + seq_no = marker->data[12]; dst_ptr = icc_data + data_offset[seq_no]; src_ptr = marker->data + ICC_OVERHEAD_LEN; length = data_length[seq_no]; diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/jdmainct.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/jdmainct.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/jdmainct.c 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/jdmainct.c 2021-11-20 03:41:33.393600530 +0000 @@ -18,6 +18,7 @@ #include "jinclude.h" #include "jdmainct.h" +#include "jconfigint.h" /* @@ -360,7 +361,7 @@ main_ptr->context_state = CTX_PREPARE_FOR_IMCU; if (*out_row_ctr >= out_rows_avail) return; /* Postprocessor exactly filled output buf */ - /*FALLTHROUGH*/ + FALLTHROUGH /*FALLTHROUGH*/ case CTX_PREPARE_FOR_IMCU: /* Prepare to process first M-1 row groups of this iMCU row */ main_ptr->rowgroup_ctr = 0; @@ -371,7 +372,7 @@ if (main_ptr->iMCU_row_ctr == cinfo->total_iMCU_rows) set_bottom_pointers(cinfo); main_ptr->context_state = CTX_PROCESS_IMCU; - /*FALLTHROUGH*/ + FALLTHROUGH /*FALLTHROUGH*/ case CTX_PROCESS_IMCU: /* Call postprocessor using previously set pointers */ (*cinfo->post->post_process_data) (cinfo, diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/jdmarker.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/jdmarker.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/jdmarker.c 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/jdmarker.c 2021-11-20 03:41:33.393600530 +0000 @@ -151,7 +151,7 @@ #define INPUT_BYTE(cinfo, V, action) \ MAKESTMT( MAKE_BYTE_AVAIL(cinfo, action); \ bytes_in_buffer--; \ - V = GETJOCTET(*next_input_byte++); ) + V = *next_input_byte++; ) /* As above, but read two bytes interpreted as an unsigned 16-bit integer. * V should be declared unsigned int or perhaps JLONG. @@ -159,10 +159,10 @@ #define INPUT_2BYTES(cinfo, V, action) \ MAKESTMT( MAKE_BYTE_AVAIL(cinfo, action); \ bytes_in_buffer--; \ - V = ((unsigned int)GETJOCTET(*next_input_byte++)) << 8; \ + V = ((unsigned int)(*next_input_byte++)) << 8; \ MAKE_BYTE_AVAIL(cinfo, action); \ bytes_in_buffer--; \ - V += GETJOCTET(*next_input_byte++); ) + V += *next_input_byte++; ) /* @@ -608,18 +608,18 @@ JLONG totallen = (JLONG)datalen + remaining; if (datalen >= APP0_DATA_LEN && - GETJOCTET(data[0]) == 0x4A && - GETJOCTET(data[1]) == 0x46 && - GETJOCTET(data[2]) == 0x49 && - GETJOCTET(data[3]) == 0x46 && - GETJOCTET(data[4]) == 0) { + data[0] == 0x4A && + data[1] == 0x46 && + data[2] == 0x49 && + data[3] == 0x46 && + data[4] == 0) { /* Found JFIF APP0 marker: save info */ cinfo->saw_JFIF_marker = TRUE; - cinfo->JFIF_major_version = GETJOCTET(data[5]); - cinfo->JFIF_minor_version = GETJOCTET(data[6]); - cinfo->density_unit = GETJOCTET(data[7]); - cinfo->X_density = (GETJOCTET(data[8]) << 8) + GETJOCTET(data[9]); - cinfo->Y_density = (GETJOCTET(data[10]) << 8) + GETJOCTET(data[11]); + cinfo->JFIF_major_version = data[5]; + cinfo->JFIF_minor_version = data[6]; + cinfo->density_unit = data[7]; + cinfo->X_density = (data[8] << 8) + data[9]; + cinfo->Y_density = (data[10] << 8) + data[11]; /* Check version. * Major version must be 1, anything else signals an incompatible change. * (We used to treat this as an error, but now it's a nonfatal warning, @@ -634,24 +634,22 @@ cinfo->JFIF_major_version, cinfo->JFIF_minor_version, cinfo->X_density, cinfo->Y_density, cinfo->density_unit); /* Validate thumbnail dimensions and issue appropriate messages */ - if (GETJOCTET(data[12]) | GETJOCTET(data[13])) - TRACEMS2(cinfo, 1, JTRC_JFIF_THUMBNAIL, - GETJOCTET(data[12]), GETJOCTET(data[13])); + if (data[12] | data[13]) + TRACEMS2(cinfo, 1, JTRC_JFIF_THUMBNAIL, data[12], data[13]); totallen -= APP0_DATA_LEN; - if (totallen != - ((JLONG)GETJOCTET(data[12]) * (JLONG)GETJOCTET(data[13]) * (JLONG)3)) + if (totallen != ((JLONG)data[12] * (JLONG)data[13] * (JLONG)3)) TRACEMS1(cinfo, 1, JTRC_JFIF_BADTHUMBNAILSIZE, (int)totallen); } else if (datalen >= 6 && - GETJOCTET(data[0]) == 0x4A && - GETJOCTET(data[1]) == 0x46 && - GETJOCTET(data[2]) == 0x58 && - GETJOCTET(data[3]) == 0x58 && - GETJOCTET(data[4]) == 0) { + data[0] == 0x4A && + data[1] == 0x46 && + data[2] == 0x58 && + data[3] == 0x58 && + data[4] == 0) { /* Found JFIF "JFXX" extension APP0 marker */ /* The library doesn't actually do anything with these, * but we try to produce a helpful trace message. */ - switch (GETJOCTET(data[5])) { + switch (data[5]) { case 0x10: TRACEMS1(cinfo, 1, JTRC_THUMB_JPEG, (int)totallen); break; @@ -662,8 +660,7 @@ TRACEMS1(cinfo, 1, JTRC_THUMB_RGB, (int)totallen); break; default: - TRACEMS2(cinfo, 1, JTRC_JFIF_EXTENSION, - GETJOCTET(data[5]), (int)totallen); + TRACEMS2(cinfo, 1, JTRC_JFIF_EXTENSION, data[5], (int)totallen); break; } } else { @@ -684,16 +681,16 @@ unsigned int version, flags0, flags1, transform; if (datalen >= APP14_DATA_LEN && - GETJOCTET(data[0]) == 0x41 && - GETJOCTET(data[1]) == 0x64 && - GETJOCTET(data[2]) == 0x6F && - GETJOCTET(data[3]) == 0x62 && - GETJOCTET(data[4]) == 0x65) { + data[0] == 0x41 && + data[1] == 0x64 && + data[2] == 0x6F && + data[3] == 0x62 && + data[4] == 0x65) { /* Found Adobe APP14 marker */ - version = (GETJOCTET(data[5]) << 8) + GETJOCTET(data[6]); - flags0 = (GETJOCTET(data[7]) << 8) + GETJOCTET(data[8]); - flags1 = (GETJOCTET(data[9]) << 8) + GETJOCTET(data[10]); - transform = GETJOCTET(data[11]); + version = (data[5] << 8) + data[6]; + flags0 = (data[7] << 8) + data[8]; + flags1 = (data[9] << 8) + data[10]; + transform = data[11]; TRACEMS4(cinfo, 1, JTRC_ADOBE, version, flags0, flags1, transform); cinfo->saw_Adobe_marker = TRUE; cinfo->Adobe_transform = (UINT8)transform; diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/jdmaster.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/jdmaster.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/jdmaster.c 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/jdmaster.c 2021-11-20 03:41:33.393600530 +0000 @@ -5,7 +5,7 @@ * Copyright (C) 1991-1997, Thomas G. Lane. * Modified 2002-2009 by Guido Vollbeding. * libjpeg-turbo Modifications: - * Copyright (C) 2009-2011, 2016, D. R. Commander. + * Copyright (C) 2009-2011, 2016, 2019, D. R. Commander. * Copyright (C) 2013, Linaro Limited. * Copyright (C) 2015, Google, Inc. * For conditions of distribution and use, see the accompanying README.ijg @@ -22,7 +22,6 @@ #include "jpeglib.h" #include "jpegcomp.h" #include "jdmaster.h" -#include "jsimd.h" /* @@ -70,17 +69,6 @@ cinfo->comp_info[1]._DCT_scaled_size != cinfo->_min_DCT_scaled_size || cinfo->comp_info[2]._DCT_scaled_size != cinfo->_min_DCT_scaled_size) return FALSE; -#ifdef WITH_SIMD - /* If YCbCr-to-RGB color conversion is SIMD-accelerated but merged upsampling - isn't, then disabling merged upsampling is likely to be faster when - decompressing YCbCr JPEG images. */ - if (!jsimd_can_h2v2_merged_upsample() && !jsimd_can_h2v1_merged_upsample() && - jsimd_can_ycc_rgb() && cinfo->jpeg_color_space == JCS_YCbCr && - (cinfo->out_color_space == JCS_RGB || - (cinfo->out_color_space >= JCS_EXT_RGB && - cinfo->out_color_space <= JCS_EXT_ARGB))) - return FALSE; -#endif /* ??? also need to test for upsample-time rescaling, when & if supported */ return TRUE; /* by golly, it'll work... */ #else @@ -580,6 +568,7 @@ */ cinfo->master->first_iMCU_col = 0; cinfo->master->last_iMCU_col = cinfo->MCUs_per_row - 1; + cinfo->master->last_good_iMCU_row = 0; #ifdef D_MULTISCAN_FILES_SUPPORTED /* If jpeg_start_decompress will read the whole file, initialize diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/jdmerge.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/jdmerge.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/jdmerge.c 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/jdmerge.c 2021-11-20 03:41:33.394600514 +0000 @@ -5,7 +5,7 @@ * Copyright (C) 1994-1996, Thomas G. Lane. * libjpeg-turbo Modifications: * Copyright 2009 Pierre Ossman for Cendio AB - * Copyright (C) 2009, 2011, 2014-2015, D. R. Commander. + * Copyright (C) 2009, 2011, 2014-2015, 2020, D. R. Commander. * Copyright (C) 2013, Linaro Limited. * For conditions of distribution and use, see the accompanying README.ijg * file. @@ -40,41 +40,13 @@ #define JPEG_INTERNALS #include "jinclude.h" #include "jpeglib.h" +#include "jdmerge.h" #include "jsimd.h" #include "jconfigint.h" #ifdef UPSAMPLE_MERGING_SUPPORTED -/* Private subobject */ - -typedef struct { - struct jpeg_upsampler pub; /* public fields */ - - /* Pointer to routine to do actual upsampling/conversion of one row group */ - void (*upmethod) (j_decompress_ptr cinfo, JSAMPIMAGE input_buf, - JDIMENSION in_row_group_ctr, JSAMPARRAY output_buf); - - /* Private state for YCC->RGB conversion */ - int *Cr_r_tab; /* => table for Cr to R conversion */ - int *Cb_b_tab; /* => table for Cb to B conversion */ - JLONG *Cr_g_tab; /* => table for Cr to G conversion */ - JLONG *Cb_g_tab; /* => table for Cb to G conversion */ - - /* For 2:1 vertical sampling, we produce two output rows at a time. - * We need a "spare" row buffer to hold the second output row if the - * application provides just a one-row buffer; we also use the spare - * to discard the dummy last row if the image height is odd. - */ - JSAMPROW spare_row; - boolean spare_full; /* T if spare buffer is occupied */ - - JDIMENSION out_row_width; /* samples per output row */ - JDIMENSION rows_to_go; /* counts rows remaining in image */ -} my_upsampler; - -typedef my_upsampler *my_upsample_ptr; - #define SCALEBITS 16 /* speediest right-shift on some machines */ #define ONE_HALF ((JLONG)1 << (SCALEBITS - 1)) #define FIX(x) ((JLONG)((x) * (1L << SCALEBITS) + 0.5)) @@ -189,7 +161,7 @@ LOCAL(void) build_ycc_rgb_table(j_decompress_ptr cinfo) { - my_upsample_ptr upsample = (my_upsample_ptr)cinfo->upsample; + my_merged_upsample_ptr upsample = (my_merged_upsample_ptr)cinfo->upsample; int i; JLONG x; SHIFT_TEMPS @@ -232,7 +204,7 @@ METHODDEF(void) start_pass_merged_upsample(j_decompress_ptr cinfo) { - my_upsample_ptr upsample = (my_upsample_ptr)cinfo->upsample; + my_merged_upsample_ptr upsample = (my_merged_upsample_ptr)cinfo->upsample; /* Mark the spare buffer empty */ upsample->spare_full = FALSE; @@ -254,7 +226,7 @@ JDIMENSION *out_row_ctr, JDIMENSION out_rows_avail) /* 2:1 vertical sampling case: may need a spare row. */ { - my_upsample_ptr upsample = (my_upsample_ptr)cinfo->upsample; + my_merged_upsample_ptr upsample = (my_merged_upsample_ptr)cinfo->upsample; JSAMPROW work_ptrs[2]; JDIMENSION num_rows; /* number of rows returned to caller */ @@ -305,7 +277,7 @@ JDIMENSION *out_row_ctr, JDIMENSION out_rows_avail) /* 1:1 vertical sampling case: much easier, never need a spare row. */ { - my_upsample_ptr upsample = (my_upsample_ptr)cinfo->upsample; + my_merged_upsample_ptr upsample = (my_merged_upsample_ptr)cinfo->upsample; /* Just do the upsampling. */ (*upsample->upmethod) (cinfo, input_buf, *in_row_group_ctr, @@ -420,11 +392,10 @@ * RGB565 conversion */ -#define PACK_SHORT_565_LE(r, g, b) ((((r) << 8) & 0xF800) | \ - (((g) << 3) & 0x7E0) | ((b) >> 3)) -#define PACK_SHORT_565_BE(r, g, b) (((r) & 0xF8) | ((g) >> 5) | \ - (((g) << 11) & 0xE000) | \ - (((b) << 5) & 0x1F00)) +#define PACK_SHORT_565_LE(r, g, b) \ + ((((r) << 8) & 0xF800) | (((g) << 3) & 0x7E0) | ((b) >> 3)) +#define PACK_SHORT_565_BE(r, g, b) \ + (((r) & 0xF8) | ((g) >> 5) | (((g) << 11) & 0xE000) | (((b) << 5) & 0x1F00)) #define PACK_TWO_PIXELS_LE(l, r) ((r << 16) | l) #define PACK_TWO_PIXELS_BE(l, r) ((l << 16) | r) @@ -566,11 +537,11 @@ GLOBAL(void) jinit_merged_upsampler(j_decompress_ptr cinfo) { - my_upsample_ptr upsample; + my_merged_upsample_ptr upsample; - upsample = (my_upsample_ptr) + upsample = (my_merged_upsample_ptr) (*cinfo->mem->alloc_small) ((j_common_ptr)cinfo, JPOOL_IMAGE, - sizeof(my_upsampler)); + sizeof(my_merged_upsampler)); cinfo->upsample = (struct jpeg_upsampler *)upsample; upsample->pub.start_pass = start_pass_merged_upsample; upsample->pub.need_context_rows = FALSE; diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/jdmerge.h b/src/3rdparty/chromium/third_party/libjpeg_turbo/jdmerge.h --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/jdmerge.h 1970-01-01 01:00:00.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/jdmerge.h 2021-11-20 03:41:33.394600514 +0000 @@ -0,0 +1,47 @@ +/* + * jdmerge.h + * + * This file was part of the Independent JPEG Group's software: + * Copyright (C) 1994-1996, Thomas G. Lane. + * libjpeg-turbo Modifications: + * Copyright (C) 2020, D. R. Commander. + * For conditions of distribution and use, see the accompanying README.ijg + * file. + */ + +#define JPEG_INTERNALS +#include "jpeglib.h" + +#ifdef UPSAMPLE_MERGING_SUPPORTED + + +/* Private subobject */ + +typedef struct { + struct jpeg_upsampler pub; /* public fields */ + + /* Pointer to routine to do actual upsampling/conversion of one row group */ + void (*upmethod) (j_decompress_ptr cinfo, JSAMPIMAGE input_buf, + JDIMENSION in_row_group_ctr, JSAMPARRAY output_buf); + + /* Private state for YCC->RGB conversion */ + int *Cr_r_tab; /* => table for Cr to R conversion */ + int *Cb_b_tab; /* => table for Cb to B conversion */ + JLONG *Cr_g_tab; /* => table for Cr to G conversion */ + JLONG *Cb_g_tab; /* => table for Cb to G conversion */ + + /* For 2:1 vertical sampling, we produce two output rows at a time. + * We need a "spare" row buffer to hold the second output row if the + * application provides just a one-row buffer; we also use the spare + * to discard the dummy last row if the image height is odd. + */ + JSAMPROW spare_row; + boolean spare_full; /* T if spare buffer is occupied */ + + JDIMENSION out_row_width; /* samples per output row */ + JDIMENSION rows_to_go; /* counts rows remaining in image */ +} my_merged_upsampler; + +typedef my_merged_upsampler *my_merged_upsample_ptr; + +#endif /* UPSAMPLE_MERGING_SUPPORTED */ diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/jdmrg565.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/jdmrg565.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/jdmrg565.c 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/jdmrg565.c 2021-11-20 03:41:33.394600514 +0000 @@ -5,7 +5,7 @@ * Copyright (C) 1994-1996, Thomas G. Lane. * libjpeg-turbo Modifications: * Copyright (C) 2013, Linaro Limited. - * Copyright (C) 2014-2015, 2018, D. R. Commander. + * Copyright (C) 2014-2015, 2018, 2020, D. R. Commander. * For conditions of distribution and use, see the accompanying README.ijg * file. * @@ -19,7 +19,7 @@ JDIMENSION in_row_group_ctr, JSAMPARRAY output_buf) { - my_upsample_ptr upsample = (my_upsample_ptr)cinfo->upsample; + my_merged_upsample_ptr upsample = (my_merged_upsample_ptr)cinfo->upsample; register int y, cred, cgreen, cblue; int cb, cr; register JSAMPROW outptr; @@ -43,20 +43,20 @@ /* Loop for each pair of output pixels */ for (col = cinfo->output_width >> 1; col > 0; col--) { /* Do the chroma part of the calculation */ - cb = GETJSAMPLE(*inptr1++); - cr = GETJSAMPLE(*inptr2++); + cb = *inptr1++; + cr = *inptr2++; cred = Crrtab[cr]; cgreen = (int)RIGHT_SHIFT(Cbgtab[cb] + Crgtab[cr], SCALEBITS); cblue = Cbbtab[cb]; /* Fetch 2 Y values and emit 2 pixels */ - y = GETJSAMPLE(*inptr0++); + y = *inptr0++; r = range_limit[y + cred]; g = range_limit[y + cgreen]; b = range_limit[y + cblue]; rgb = PACK_SHORT_565(r, g, b); - y = GETJSAMPLE(*inptr0++); + y = *inptr0++; r = range_limit[y + cred]; g = range_limit[y + cgreen]; b = range_limit[y + cblue]; @@ -68,12 +68,12 @@ /* If image width is odd, do the last output column separately */ if (cinfo->output_width & 1) { - cb = GETJSAMPLE(*inptr1); - cr = GETJSAMPLE(*inptr2); + cb = *inptr1; + cr = *inptr2; cred = Crrtab[cr]; cgreen = (int)RIGHT_SHIFT(Cbgtab[cb] + Crgtab[cr], SCALEBITS); cblue = Cbbtab[cb]; - y = GETJSAMPLE(*inptr0); + y = *inptr0; r = range_limit[y + cred]; g = range_limit[y + cgreen]; b = range_limit[y + cblue]; @@ -90,7 +90,7 @@ JDIMENSION in_row_group_ctr, JSAMPARRAY output_buf) { - my_upsample_ptr upsample = (my_upsample_ptr)cinfo->upsample; + my_merged_upsample_ptr upsample = (my_merged_upsample_ptr)cinfo->upsample; register int y, cred, cgreen, cblue; int cb, cr; register JSAMPROW outptr; @@ -115,21 +115,21 @@ /* Loop for each pair of output pixels */ for (col = cinfo->output_width >> 1; col > 0; col--) { /* Do the chroma part of the calculation */ - cb = GETJSAMPLE(*inptr1++); - cr = GETJSAMPLE(*inptr2++); + cb = *inptr1++; + cr = *inptr2++; cred = Crrtab[cr]; cgreen = (int)RIGHT_SHIFT(Cbgtab[cb] + Crgtab[cr], SCALEBITS); cblue = Cbbtab[cb]; /* Fetch 2 Y values and emit 2 pixels */ - y = GETJSAMPLE(*inptr0++); + y = *inptr0++; r = range_limit[DITHER_565_R(y + cred, d0)]; g = range_limit[DITHER_565_G(y + cgreen, d0)]; b = range_limit[DITHER_565_B(y + cblue, d0)]; d0 = DITHER_ROTATE(d0); rgb = PACK_SHORT_565(r, g, b); - y = GETJSAMPLE(*inptr0++); + y = *inptr0++; r = range_limit[DITHER_565_R(y + cred, d0)]; g = range_limit[DITHER_565_G(y + cgreen, d0)]; b = range_limit[DITHER_565_B(y + cblue, d0)]; @@ -142,12 +142,12 @@ /* If image width is odd, do the last output column separately */ if (cinfo->output_width & 1) { - cb = GETJSAMPLE(*inptr1); - cr = GETJSAMPLE(*inptr2); + cb = *inptr1; + cr = *inptr2; cred = Crrtab[cr]; cgreen = (int)RIGHT_SHIFT(Cbgtab[cb] + Crgtab[cr], SCALEBITS); cblue = Cbbtab[cb]; - y = GETJSAMPLE(*inptr0); + y = *inptr0; r = range_limit[DITHER_565_R(y + cred, d0)]; g = range_limit[DITHER_565_G(y + cgreen, d0)]; b = range_limit[DITHER_565_B(y + cblue, d0)]; @@ -163,7 +163,7 @@ JDIMENSION in_row_group_ctr, JSAMPARRAY output_buf) { - my_upsample_ptr upsample = (my_upsample_ptr)cinfo->upsample; + my_merged_upsample_ptr upsample = (my_merged_upsample_ptr)cinfo->upsample; register int y, cred, cgreen, cblue; int cb, cr; register JSAMPROW outptr0, outptr1; @@ -189,20 +189,20 @@ /* Loop for each group of output pixels */ for (col = cinfo->output_width >> 1; col > 0; col--) { /* Do the chroma part of the calculation */ - cb = GETJSAMPLE(*inptr1++); - cr = GETJSAMPLE(*inptr2++); + cb = *inptr1++; + cr = *inptr2++; cred = Crrtab[cr]; cgreen = (int)RIGHT_SHIFT(Cbgtab[cb] + Crgtab[cr], SCALEBITS); cblue = Cbbtab[cb]; /* Fetch 4 Y values and emit 4 pixels */ - y = GETJSAMPLE(*inptr00++); + y = *inptr00++; r = range_limit[y + cred]; g = range_limit[y + cgreen]; b = range_limit[y + cblue]; rgb = PACK_SHORT_565(r, g, b); - y = GETJSAMPLE(*inptr00++); + y = *inptr00++; r = range_limit[y + cred]; g = range_limit[y + cgreen]; b = range_limit[y + cblue]; @@ -211,13 +211,13 @@ WRITE_TWO_PIXELS(outptr0, rgb); outptr0 += 4; - y = GETJSAMPLE(*inptr01++); + y = *inptr01++; r = range_limit[y + cred]; g = range_limit[y + cgreen]; b = range_limit[y + cblue]; rgb = PACK_SHORT_565(r, g, b); - y = GETJSAMPLE(*inptr01++); + y = *inptr01++; r = range_limit[y + cred]; g = range_limit[y + cgreen]; b = range_limit[y + cblue]; @@ -229,20 +229,20 @@ /* If image width is odd, do the last output column separately */ if (cinfo->output_width & 1) { - cb = GETJSAMPLE(*inptr1); - cr = GETJSAMPLE(*inptr2); + cb = *inptr1; + cr = *inptr2; cred = Crrtab[cr]; cgreen = (int)RIGHT_SHIFT(Cbgtab[cb] + Crgtab[cr], SCALEBITS); cblue = Cbbtab[cb]; - y = GETJSAMPLE(*inptr00); + y = *inptr00; r = range_limit[y + cred]; g = range_limit[y + cgreen]; b = range_limit[y + cblue]; rgb = PACK_SHORT_565(r, g, b); *(INT16 *)outptr0 = (INT16)rgb; - y = GETJSAMPLE(*inptr01); + y = *inptr01; r = range_limit[y + cred]; g = range_limit[y + cgreen]; b = range_limit[y + cblue]; @@ -259,7 +259,7 @@ JDIMENSION in_row_group_ctr, JSAMPARRAY output_buf) { - my_upsample_ptr upsample = (my_upsample_ptr)cinfo->upsample; + my_merged_upsample_ptr upsample = (my_merged_upsample_ptr)cinfo->upsample; register int y, cred, cgreen, cblue; int cb, cr; register JSAMPROW outptr0, outptr1; @@ -287,21 +287,21 @@ /* Loop for each group of output pixels */ for (col = cinfo->output_width >> 1; col > 0; col--) { /* Do the chroma part of the calculation */ - cb = GETJSAMPLE(*inptr1++); - cr = GETJSAMPLE(*inptr2++); + cb = *inptr1++; + cr = *inptr2++; cred = Crrtab[cr]; cgreen = (int)RIGHT_SHIFT(Cbgtab[cb] + Crgtab[cr], SCALEBITS); cblue = Cbbtab[cb]; /* Fetch 4 Y values and emit 4 pixels */ - y = GETJSAMPLE(*inptr00++); + y = *inptr00++; r = range_limit[DITHER_565_R(y + cred, d0)]; g = range_limit[DITHER_565_G(y + cgreen, d0)]; b = range_limit[DITHER_565_B(y + cblue, d0)]; d0 = DITHER_ROTATE(d0); rgb = PACK_SHORT_565(r, g, b); - y = GETJSAMPLE(*inptr00++); + y = *inptr00++; r = range_limit[DITHER_565_R(y + cred, d0)]; g = range_limit[DITHER_565_G(y + cgreen, d0)]; b = range_limit[DITHER_565_B(y + cblue, d0)]; @@ -311,14 +311,14 @@ WRITE_TWO_PIXELS(outptr0, rgb); outptr0 += 4; - y = GETJSAMPLE(*inptr01++); + y = *inptr01++; r = range_limit[DITHER_565_R(y + cred, d1)]; g = range_limit[DITHER_565_G(y + cgreen, d1)]; b = range_limit[DITHER_565_B(y + cblue, d1)]; d1 = DITHER_ROTATE(d1); rgb = PACK_SHORT_565(r, g, b); - y = GETJSAMPLE(*inptr01++); + y = *inptr01++; r = range_limit[DITHER_565_R(y + cred, d1)]; g = range_limit[DITHER_565_G(y + cgreen, d1)]; b = range_limit[DITHER_565_B(y + cblue, d1)]; @@ -331,20 +331,20 @@ /* If image width is odd, do the last output column separately */ if (cinfo->output_width & 1) { - cb = GETJSAMPLE(*inptr1); - cr = GETJSAMPLE(*inptr2); + cb = *inptr1; + cr = *inptr2; cred = Crrtab[cr]; cgreen = (int)RIGHT_SHIFT(Cbgtab[cb] + Crgtab[cr], SCALEBITS); cblue = Cbbtab[cb]; - y = GETJSAMPLE(*inptr00); + y = *inptr00; r = range_limit[DITHER_565_R(y + cred, d0)]; g = range_limit[DITHER_565_G(y + cgreen, d0)]; b = range_limit[DITHER_565_B(y + cblue, d0)]; rgb = PACK_SHORT_565(r, g, b); *(INT16 *)outptr0 = (INT16)rgb; - y = GETJSAMPLE(*inptr01); + y = *inptr01; r = range_limit[DITHER_565_R(y + cred, d1)]; g = range_limit[DITHER_565_G(y + cgreen, d1)]; b = range_limit[DITHER_565_B(y + cblue, d1)]; diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/jdmrgext.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/jdmrgext.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/jdmrgext.c 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/jdmrgext.c 2021-11-20 03:41:33.394600514 +0000 @@ -4,7 +4,7 @@ * This file was part of the Independent JPEG Group's software: * Copyright (C) 1994-1996, Thomas G. Lane. * libjpeg-turbo Modifications: - * Copyright (C) 2011, 2015, D. R. Commander. + * Copyright (C) 2011, 2015, 2020, D. R. Commander. * For conditions of distribution and use, see the accompanying README.ijg * file. * @@ -25,7 +25,7 @@ JDIMENSION in_row_group_ctr, JSAMPARRAY output_buf) { - my_upsample_ptr upsample = (my_upsample_ptr)cinfo->upsample; + my_merged_upsample_ptr upsample = (my_merged_upsample_ptr)cinfo->upsample; register int y, cred, cgreen, cblue; int cb, cr; register JSAMPROW outptr; @@ -46,13 +46,13 @@ /* Loop for each pair of output pixels */ for (col = cinfo->output_width >> 1; col > 0; col--) { /* Do the chroma part of the calculation */ - cb = GETJSAMPLE(*inptr1++); - cr = GETJSAMPLE(*inptr2++); + cb = *inptr1++; + cr = *inptr2++; cred = Crrtab[cr]; cgreen = (int)RIGHT_SHIFT(Cbgtab[cb] + Crgtab[cr], SCALEBITS); cblue = Cbbtab[cb]; /* Fetch 2 Y values and emit 2 pixels */ - y = GETJSAMPLE(*inptr0++); + y = *inptr0++; outptr[RGB_RED] = range_limit[y + cred]; outptr[RGB_GREEN] = range_limit[y + cgreen]; outptr[RGB_BLUE] = range_limit[y + cblue]; @@ -60,7 +60,7 @@ outptr[RGB_ALPHA] = 0xFF; #endif outptr += RGB_PIXELSIZE; - y = GETJSAMPLE(*inptr0++); + y = *inptr0++; outptr[RGB_RED] = range_limit[y + cred]; outptr[RGB_GREEN] = range_limit[y + cgreen]; outptr[RGB_BLUE] = range_limit[y + cblue]; @@ -71,12 +71,12 @@ } /* If image width is odd, do the last output column separately */ if (cinfo->output_width & 1) { - cb = GETJSAMPLE(*inptr1); - cr = GETJSAMPLE(*inptr2); + cb = *inptr1; + cr = *inptr2; cred = Crrtab[cr]; cgreen = (int)RIGHT_SHIFT(Cbgtab[cb] + Crgtab[cr], SCALEBITS); cblue = Cbbtab[cb]; - y = GETJSAMPLE(*inptr0); + y = *inptr0; outptr[RGB_RED] = range_limit[y + cred]; outptr[RGB_GREEN] = range_limit[y + cgreen]; outptr[RGB_BLUE] = range_limit[y + cblue]; @@ -97,7 +97,7 @@ JDIMENSION in_row_group_ctr, JSAMPARRAY output_buf) { - my_upsample_ptr upsample = (my_upsample_ptr)cinfo->upsample; + my_merged_upsample_ptr upsample = (my_merged_upsample_ptr)cinfo->upsample; register int y, cred, cgreen, cblue; int cb, cr; register JSAMPROW outptr0, outptr1; @@ -120,13 +120,13 @@ /* Loop for each group of output pixels */ for (col = cinfo->output_width >> 1; col > 0; col--) { /* Do the chroma part of the calculation */ - cb = GETJSAMPLE(*inptr1++); - cr = GETJSAMPLE(*inptr2++); + cb = *inptr1++; + cr = *inptr2++; cred = Crrtab[cr]; cgreen = (int)RIGHT_SHIFT(Cbgtab[cb] + Crgtab[cr], SCALEBITS); cblue = Cbbtab[cb]; /* Fetch 4 Y values and emit 4 pixels */ - y = GETJSAMPLE(*inptr00++); + y = *inptr00++; outptr0[RGB_RED] = range_limit[y + cred]; outptr0[RGB_GREEN] = range_limit[y + cgreen]; outptr0[RGB_BLUE] = range_limit[y + cblue]; @@ -134,7 +134,7 @@ outptr0[RGB_ALPHA] = 0xFF; #endif outptr0 += RGB_PIXELSIZE; - y = GETJSAMPLE(*inptr00++); + y = *inptr00++; outptr0[RGB_RED] = range_limit[y + cred]; outptr0[RGB_GREEN] = range_limit[y + cgreen]; outptr0[RGB_BLUE] = range_limit[y + cblue]; @@ -142,7 +142,7 @@ outptr0[RGB_ALPHA] = 0xFF; #endif outptr0 += RGB_PIXELSIZE; - y = GETJSAMPLE(*inptr01++); + y = *inptr01++; outptr1[RGB_RED] = range_limit[y + cred]; outptr1[RGB_GREEN] = range_limit[y + cgreen]; outptr1[RGB_BLUE] = range_limit[y + cblue]; @@ -150,7 +150,7 @@ outptr1[RGB_ALPHA] = 0xFF; #endif outptr1 += RGB_PIXELSIZE; - y = GETJSAMPLE(*inptr01++); + y = *inptr01++; outptr1[RGB_RED] = range_limit[y + cred]; outptr1[RGB_GREEN] = range_limit[y + cgreen]; outptr1[RGB_BLUE] = range_limit[y + cblue]; @@ -161,19 +161,19 @@ } /* If image width is odd, do the last output column separately */ if (cinfo->output_width & 1) { - cb = GETJSAMPLE(*inptr1); - cr = GETJSAMPLE(*inptr2); + cb = *inptr1; + cr = *inptr2; cred = Crrtab[cr]; cgreen = (int)RIGHT_SHIFT(Cbgtab[cb] + Crgtab[cr], SCALEBITS); cblue = Cbbtab[cb]; - y = GETJSAMPLE(*inptr00); + y = *inptr00; outptr0[RGB_RED] = range_limit[y + cred]; outptr0[RGB_GREEN] = range_limit[y + cgreen]; outptr0[RGB_BLUE] = range_limit[y + cblue]; #ifdef RGB_ALPHA outptr0[RGB_ALPHA] = 0xFF; #endif - y = GETJSAMPLE(*inptr01); + y = *inptr01; outptr1[RGB_RED] = range_limit[y + cred]; outptr1[RGB_GREEN] = range_limit[y + cgreen]; outptr1[RGB_BLUE] = range_limit[y + cblue]; diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/jdphuff.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/jdphuff.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/jdphuff.c 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/jdphuff.c 2021-11-20 03:41:33.394600514 +0000 @@ -4,7 +4,7 @@ * This file was part of the Independent JPEG Group's software: * Copyright (C) 1995-1997, Thomas G. Lane. * libjpeg-turbo Modifications: - * Copyright (C) 2015-2016, 2018, D. R. Commander. + * Copyright (C) 2015-2016, 2018-2021, D. R. Commander. * For conditions of distribution and use, see the accompanying README.ijg * file. * @@ -41,25 +41,6 @@ int last_dc_val[MAX_COMPS_IN_SCAN]; /* last DC coef for each component */ } savable_state; -/* This macro is to work around compilers with missing or broken - * structure assignment. You'll need to fix this code if you have - * such a compiler and you change MAX_COMPS_IN_SCAN. - */ - -#ifndef NO_STRUCT_ASSIGN -#define ASSIGN_STATE(dest, src) ((dest) = (src)) -#else -#if MAX_COMPS_IN_SCAN == 4 -#define ASSIGN_STATE(dest, src) \ - ((dest).EOBRUN = (src).EOBRUN, \ - (dest).last_dc_val[0] = (src).last_dc_val[0], \ - (dest).last_dc_val[1] = (src).last_dc_val[1], \ - (dest).last_dc_val[2] = (src).last_dc_val[2], \ - (dest).last_dc_val[3] = (src).last_dc_val[3]) -#endif -#endif - - typedef struct { struct jpeg_entropy_decoder pub; /* public fields */ @@ -102,7 +83,7 @@ boolean is_DC_band, bad; int ci, coefi, tbl; d_derived_tbl **pdtbl; - int *coef_bit_ptr; + int *coef_bit_ptr, *prev_coef_bit_ptr; jpeg_component_info *compptr; is_DC_band = (cinfo->Ss == 0); @@ -143,8 +124,15 @@ for (ci = 0; ci < cinfo->comps_in_scan; ci++) { int cindex = cinfo->cur_comp_info[ci]->component_index; coef_bit_ptr = &cinfo->coef_bits[cindex][0]; + prev_coef_bit_ptr = &cinfo->coef_bits[cindex + cinfo->num_components][0]; if (!is_DC_band && coef_bit_ptr[0] < 0) /* AC without prior DC scan */ WARNMS2(cinfo, JWRN_BOGUS_PROGRESSION, cindex, 0); + for (coefi = MIN(cinfo->Ss, 1); coefi <= MAX(cinfo->Se, 9); coefi++) { + if (cinfo->input_scan_number > 1) + prev_coef_bit_ptr[coefi] = coef_bit_ptr[coefi]; + else + prev_coef_bit_ptr[coefi] = 0; + } for (coefi = cinfo->Ss; coefi <= cinfo->Se; coefi++) { int expected = (coef_bit_ptr[coefi] < 0) ? 0 : coef_bit_ptr[coefi]; if (cinfo->Ah != expected) @@ -323,7 +311,7 @@ /* Load up working state */ BITREAD_LOAD_STATE(cinfo, entropy->bitstate); - ASSIGN_STATE(state, entropy->saved); + state = entropy->saved; /* Outer loop handles each block in the MCU */ @@ -356,11 +344,12 @@ /* Completed MCU, so update state */ BITREAD_SAVE_STATE(cinfo, entropy->bitstate); - ASSIGN_STATE(entropy->saved, state); + entropy->saved = state; } /* Account for restart interval (no-op if not using restarts) */ - entropy->restarts_to_go--; + if (cinfo->restart_interval) + entropy->restarts_to_go--; return TRUE; } @@ -444,7 +433,8 @@ } /* Account for restart interval (no-op if not using restarts) */ - entropy->restarts_to_go--; + if (cinfo->restart_interval) + entropy->restarts_to_go--; return TRUE; } @@ -495,7 +485,8 @@ BITREAD_SAVE_STATE(cinfo, entropy->bitstate); /* Account for restart interval (no-op if not using restarts) */ - entropy->restarts_to_go--; + if (cinfo->restart_interval) + entropy->restarts_to_go--; return TRUE; } @@ -638,7 +629,8 @@ } /* Account for restart interval (no-op if not using restarts) */ - entropy->restarts_to_go--; + if (cinfo->restart_interval) + entropy->restarts_to_go--; return TRUE; @@ -676,7 +668,7 @@ /* Create progression status table */ cinfo->coef_bits = (int (*)[DCTSIZE2]) (*cinfo->mem->alloc_small) ((j_common_ptr)cinfo, JPOOL_IMAGE, - cinfo->num_components * DCTSIZE2 * + cinfo->num_components * 2 * DCTSIZE2 * sizeof(int)); coef_bit_ptr = &cinfo->coef_bits[0][0]; for (ci = 0; ci < cinfo->num_components; ci++) diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/jdsample.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/jdsample.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/jdsample.c 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/jdsample.c 2021-11-20 03:41:33.394600514 +0000 @@ -8,7 +8,7 @@ * Copyright (C) 2010, 2015-2016, D. R. Commander. * Copyright (C) 2014, MIPS Technologies, Inc., California. * Copyright (C) 2015, Google, Inc. - * Copyright (C) 2019, Arm Limited. + * Copyright (C) 2019-2020, Arm Limited. * For conditions of distribution and use, see the accompanying README.ijg * file. * @@ -177,7 +177,7 @@ outptr = output_data[outrow]; outend = outptr + cinfo->output_width; while (outptr < outend) { - invalue = *inptr++; /* don't need GETJSAMPLE() here */ + invalue = *inptr++; for (h = h_expand; h > 0; h--) { *outptr++ = invalue; } @@ -213,7 +213,7 @@ outptr = output_data[inrow]; outend = outptr + cinfo->output_width; while (outptr < outend) { - invalue = *inptr++; /* don't need GETJSAMPLE() here */ + invalue = *inptr++; *outptr++ = invalue; *outptr++ = invalue; } @@ -242,7 +242,7 @@ outptr = output_data[outrow]; outend = outptr + cinfo->output_width; while (outptr < outend) { - invalue = *inptr++; /* don't need GETJSAMPLE() here */ + invalue = *inptr++; *outptr++ = invalue; *outptr++ = invalue; } @@ -283,20 +283,20 @@ inptr = input_data[inrow]; outptr = output_data[inrow]; /* Special case for first column */ - invalue = GETJSAMPLE(*inptr++); + invalue = *inptr++; *outptr++ = (JSAMPLE)invalue; - *outptr++ = (JSAMPLE)((invalue * 3 + GETJSAMPLE(*inptr) + 2) >> 2); + *outptr++ = (JSAMPLE)((invalue * 3 + inptr[0] + 2) >> 2); for (colctr = compptr->downsampled_width - 2; colctr > 0; colctr--) { /* General case: 3/4 * nearer pixel + 1/4 * further pixel */ - invalue = GETJSAMPLE(*inptr++) * 3; - *outptr++ = (JSAMPLE)((invalue + GETJSAMPLE(inptr[-2]) + 1) >> 2); - *outptr++ = (JSAMPLE)((invalue + GETJSAMPLE(*inptr) + 2) >> 2); + invalue = (*inptr++) * 3; + *outptr++ = (JSAMPLE)((invalue + inptr[-2] + 1) >> 2); + *outptr++ = (JSAMPLE)((invalue + inptr[0] + 2) >> 2); } /* Special case for last column */ - invalue = GETJSAMPLE(*inptr); - *outptr++ = (JSAMPLE)((invalue * 3 + GETJSAMPLE(inptr[-1]) + 1) >> 2); + invalue = *inptr; + *outptr++ = (JSAMPLE)((invalue * 3 + inptr[-1] + 1) >> 2); *outptr++ = (JSAMPLE)invalue; } } @@ -338,7 +338,7 @@ outptr = output_data[outrow++]; for (colctr = 0; colctr < compptr->downsampled_width; colctr++) { - thiscolsum = GETJSAMPLE(*inptr0++) * 3 + GETJSAMPLE(*inptr1++); + thiscolsum = (*inptr0++) * 3 + (*inptr1++); *outptr++ = (JSAMPLE)((thiscolsum + bias) >> 2); } } @@ -381,8 +381,8 @@ outptr = output_data[outrow++]; /* Special case for first column */ - thiscolsum = GETJSAMPLE(*inptr0++) * 3 + GETJSAMPLE(*inptr1++); - nextcolsum = GETJSAMPLE(*inptr0++) * 3 + GETJSAMPLE(*inptr1++); + thiscolsum = (*inptr0++) * 3 + (*inptr1++); + nextcolsum = (*inptr0++) * 3 + (*inptr1++); *outptr++ = (JSAMPLE)((thiscolsum * 4 + 8) >> 4); *outptr++ = (JSAMPLE)((thiscolsum * 3 + nextcolsum + 7) >> 4); lastcolsum = thiscolsum; thiscolsum = nextcolsum; @@ -390,7 +390,7 @@ for (colctr = compptr->downsampled_width - 2; colctr > 0; colctr--) { /* General case: 3/4 * nearer pixel + 1/4 * further pixel in each */ /* dimension, thus 9/16, 3/16, 3/16, 1/16 overall */ - nextcolsum = GETJSAMPLE(*inptr0++) * 3 + GETJSAMPLE(*inptr1++); + nextcolsum = (*inptr0++) * 3 + (*inptr1++); *outptr++ = (JSAMPLE)((thiscolsum * 3 + lastcolsum + 8) >> 4); *outptr++ = (JSAMPLE)((thiscolsum * 3 + nextcolsum + 7) >> 4); lastcolsum = thiscolsum; thiscolsum = nextcolsum; @@ -477,9 +477,12 @@ } else if (h_in_group == h_out_group && v_in_group * 2 == v_out_group && do_fancy) { /* Non-fancy upsampling is handled by the generic method */ +#if defined(__arm__) || defined(__aarch64__) || \ + defined(_M_ARM) || defined(_M_ARM64) if (jsimd_can_h1v2_fancy_upsample()) upsample->methods[ci] = jsimd_h1v2_fancy_upsample; else +#endif upsample->methods[ci] = h1v2_fancy_upsample; upsample->pub.need_context_rows = TRUE; } else if (h_in_group * 2 == h_out_group && diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/jdtrans.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/jdtrans.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/jdtrans.c 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/jdtrans.c 2021-11-20 03:41:33.394600514 +0000 @@ -3,8 +3,8 @@ * * This file was part of the Independent JPEG Group's software: * Copyright (C) 1995-1997, Thomas G. Lane. - * It was modified by The libjpeg-turbo Project to include only code relevant - * to libjpeg-turbo. + * libjpeg-turbo Modifications: + * Copyright (C) 2020, D. R. Commander. * For conditions of distribution and use, see the accompanying README.ijg * file. * @@ -16,6 +16,7 @@ #define JPEG_INTERNALS #include "jinclude.h" #include "jpeglib.h" +#include "jpegcomp.h" /* Forward declarations */ diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/jerror.h b/src/3rdparty/chromium/third_party/libjpeg_turbo/jerror.h --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/jerror.h 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/jerror.h 2021-11-20 03:41:33.394600514 +0000 @@ -207,6 +207,10 @@ #endif #endif JMESSAGE(JWRN_BOGUS_ICC, "Corrupt JPEG data: bad ICC marker") +#if JPEG_LIB_VERSION < 70 +JMESSAGE(JERR_BAD_DROP_SAMPLING, + "Component index %d: mismatching sampling ratio %d:%d, %d:%d, %c") +#endif #ifdef JMAKE_ENUM_LIST @@ -252,6 +256,15 @@ (cinfo)->err->msg_parm.i[2] = (p3), \ (cinfo)->err->msg_parm.i[3] = (p4), \ (*(cinfo)->err->error_exit) ((j_common_ptr)(cinfo))) +#define ERREXIT6(cinfo, code, p1, p2, p3, p4, p5, p6) \ + ((cinfo)->err->msg_code = (code), \ + (cinfo)->err->msg_parm.i[0] = (p1), \ + (cinfo)->err->msg_parm.i[1] = (p2), \ + (cinfo)->err->msg_parm.i[2] = (p3), \ + (cinfo)->err->msg_parm.i[3] = (p4), \ + (cinfo)->err->msg_parm.i[4] = (p5), \ + (cinfo)->err->msg_parm.i[5] = (p6), \ + (*(cinfo)->err->error_exit) ((j_common_ptr)(cinfo))) #define ERREXITS(cinfo, code, str) \ ((cinfo)->err->msg_code = (code), \ strncpy((cinfo)->err->msg_parm.s, (str), JMSG_STR_PARM_MAX), \ diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/jfdctint.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/jfdctint.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/jfdctint.c 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/jfdctint.c 2021-11-20 03:41:33.394600514 +0000 @@ -4,11 +4,11 @@ * This file was part of the Independent JPEG Group's software: * Copyright (C) 1991-1996, Thomas G. Lane. * libjpeg-turbo Modifications: - * Copyright (C) 2015, D. R. Commander. + * Copyright (C) 2015, 2020, D. R. Commander. * For conditions of distribution and use, see the accompanying README.ijg * file. * - * This file contains a slow-but-accurate integer implementation of the + * This file contains a slower but more accurate integer implementation of the * forward DCT (Discrete Cosine Transform). * * A 2-D DCT can be done by 1-D DCT on each row followed by 1-D DCT diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/jidctint.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/jidctint.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/jidctint.c 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/jidctint.c 2021-11-20 03:41:33.395600498 +0000 @@ -3,13 +3,13 @@ * * This file was part of the Independent JPEG Group's software: * Copyright (C) 1991-1998, Thomas G. Lane. - * Modification developed 2002-2009 by Guido Vollbeding. + * Modification developed 2002-2018 by Guido Vollbeding. * libjpeg-turbo Modifications: - * Copyright (C) 2015, D. R. Commander. + * Copyright (C) 2015, 2020, D. R. Commander. * For conditions of distribution and use, see the accompanying README.ijg * file. * - * This file contains a slow-but-accurate integer implementation of the + * This file contains a slower but more accurate integer implementation of the * inverse DCT (Discrete Cosine Transform). In the IJG code, this routine * must also perform dequantization of the input coefficients. * @@ -417,7 +417,7 @@ /* * Perform dequantization and inverse DCT on one block of coefficients, - * producing a 7x7 output block. + * producing a reduced-size 7x7 output block. * * Optimized algorithm with 12 multiplications in the 1-D kernel. * cK represents sqrt(2) * cos(K*pi/14). @@ -1258,7 +1258,7 @@ /* * Perform dequantization and inverse DCT on one block of coefficients, - * producing a 11x11 output block. + * producing an 11x11 output block. * * Optimized algorithm with 24 multiplications in the 1-D kernel. * cK represents sqrt(2) * cos(K*pi/22). @@ -2398,7 +2398,7 @@ tmp0 = DEQUANTIZE(inptr[DCTSIZE * 0], quantptr[DCTSIZE * 0]); tmp0 = LEFT_SHIFT(tmp0, CONST_BITS); /* Add fudge factor here for final descale. */ - tmp0 += 1 << (CONST_BITS - PASS1_BITS - 1); + tmp0 += ONE << (CONST_BITS - PASS1_BITS - 1); z1 = DEQUANTIZE(inptr[DCTSIZE * 4], quantptr[DCTSIZE * 4]); tmp1 = MULTIPLY(z1, FIX(1.306562965)); /* c4[16] = c2[8] */ diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/jmemmgr.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/jmemmgr.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/jmemmgr.c 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/jmemmgr.c 2021-11-20 03:41:33.395600498 +0000 @@ -4,7 +4,7 @@ * This file was part of the Independent JPEG Group's software: * Copyright (C) 1991-1997, Thomas G. Lane. * libjpeg-turbo Modifications: - * Copyright (C) 2016, D. R. Commander. + * Copyright (C) 2016, 2021, D. R. Commander. * For conditions of distribution and use, see the accompanying README.ijg * file. * @@ -95,19 +95,6 @@ #endif #endif -#ifdef WITH_SIMD -#if (ALIGN_SIZE % 16) - #error "ALIGN_SIZE is not a multiple of 16 bytes - required for SIMD instructions." -#endif -#if defined(__AVX2__) && (ALIGN_SIZE % 32) - #error "AVX2 requires 32-byte alignment. ALIGN_SIZE is not a multiple of 32 bytes." -#elif defined(__ARM_NEON) && (ALIGN_SIZE % 32) - /* 32-byte alignment allows us to extract more performance from */ - /* fancy-upsampling algorithms when using NEON. */ - #error "NEON optimizations rely on 32-byte alignment. ALIGN_SIZE is not a multiple of 32 bytes." -#endif -#endif - /* * We allocate objects from "pools", where each pool is gotten with a single * request to jpeg_get_small() or jpeg_get_large(). There is no per-object @@ -1045,7 +1032,7 @@ large_pool_ptr next_lhdr_ptr = lhdr_ptr->next; space_freed = lhdr_ptr->bytes_used + lhdr_ptr->bytes_left + - sizeof(large_pool_hdr); + sizeof(large_pool_hdr) + ALIGN_SIZE - 1; jpeg_free_large(cinfo, (void *)lhdr_ptr, space_freed); mem->total_space_allocated -= space_freed; lhdr_ptr = next_lhdr_ptr; @@ -1058,7 +1045,7 @@ while (shdr_ptr != NULL) { small_pool_ptr next_shdr_ptr = shdr_ptr->next; space_freed = shdr_ptr->bytes_used + shdr_ptr->bytes_left + - sizeof(small_pool_hdr); + sizeof(small_pool_hdr) + ALIGN_SIZE - 1; jpeg_free_small(cinfo, (void *)shdr_ptr, space_freed); mem->total_space_allocated -= space_freed; shdr_ptr = next_shdr_ptr; diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/jmorecfg.h b/src/3rdparty/chromium/third_party/libjpeg_turbo/jmorecfg.h --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/jmorecfg.h 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/jmorecfg.h 2021-11-20 03:41:33.395600498 +0000 @@ -5,7 +5,7 @@ * Copyright (C) 1991-1997, Thomas G. Lane. * Modified 1997-2009 by Guido Vollbeding. * libjpeg-turbo Modifications: - * Copyright (C) 2009, 2011, 2014-2015, 2018, D. R. Commander. + * Copyright (C) 2009, 2011, 2014-2015, 2018, 2020, D. R. Commander. * For conditions of distribution and use, see the accompanying README.ijg * file. * @@ -43,25 +43,11 @@ #if BITS_IN_JSAMPLE == 8 /* JSAMPLE should be the smallest type that will hold the values 0..255. - * You can use a signed char by having GETJSAMPLE mask it with 0xFF. */ -#ifdef HAVE_UNSIGNED_CHAR - typedef unsigned char JSAMPLE; #define GETJSAMPLE(value) ((int)(value)) -#else /* not HAVE_UNSIGNED_CHAR */ - -typedef char JSAMPLE; -#ifdef __CHAR_UNSIGNED__ -#define GETJSAMPLE(value) ((int)(value)) -#else -#define GETJSAMPLE(value) ((int)(value) & 0xFF) -#endif /* __CHAR_UNSIGNED__ */ - -#endif /* HAVE_UNSIGNED_CHAR */ - #define MAXJSAMPLE 255 #define CENTERJSAMPLE 128 @@ -97,22 +83,9 @@ * managers, this is also the data type passed to fread/fwrite. */ -#ifdef HAVE_UNSIGNED_CHAR - typedef unsigned char JOCTET; #define GETJOCTET(value) (value) -#else /* not HAVE_UNSIGNED_CHAR */ - -typedef char JOCTET; -#ifdef __CHAR_UNSIGNED__ -#define GETJOCTET(value) (value) -#else -#define GETJOCTET(value) ((value) & 0xFF) -#endif /* __CHAR_UNSIGNED__ */ - -#endif /* HAVE_UNSIGNED_CHAR */ - /* These typedefs are used for various table entries and so forth. * They must be at least as wide as specified; but making them too big @@ -123,15 +96,7 @@ /* UINT8 must hold at least the values 0..255. */ -#ifdef HAVE_UNSIGNED_CHAR typedef unsigned char UINT8; -#else /* not HAVE_UNSIGNED_CHAR */ -#ifdef __CHAR_UNSIGNED__ -typedef char UINT8; -#else /* not __CHAR_UNSIGNED__ */ -typedef short UINT8; -#endif /* __CHAR_UNSIGNED__ */ -#endif /* HAVE_UNSIGNED_CHAR */ /* UINT16 must hold at least the values 0..65535. */ @@ -273,9 +238,9 @@ /* Capability options common to encoder and decoder: */ -#define DCT_ISLOW_SUPPORTED /* slow but accurate integer algorithm */ -#define DCT_IFAST_SUPPORTED /* faster, less accurate integer method */ -#define DCT_FLOAT_SUPPORTED /* floating-point: accurate, fast on fast HW */ +#define DCT_ISLOW_SUPPORTED /* accurate integer method */ +#define DCT_IFAST_SUPPORTED /* less accurate int method [legacy feature] */ +#define DCT_FLOAT_SUPPORTED /* floating-point method [legacy feature] */ /* Encoder capability options: */ diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/jpegcomp.h b/src/3rdparty/chromium/third_party/libjpeg_turbo/jpegcomp.h --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/jpegcomp.h 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/jpegcomp.h 2021-11-20 03:41:33.395600498 +0000 @@ -1,7 +1,7 @@ /* * jpegcomp.h * - * Copyright (C) 2010, D. R. Commander. + * Copyright (C) 2010, 2020, D. R. Commander. * For conditions of distribution and use, see the accompanying README.ijg * file. * @@ -19,6 +19,7 @@ #define _min_DCT_v_scaled_size min_DCT_v_scaled_size #define _jpeg_width jpeg_width #define _jpeg_height jpeg_height +#define JERR_ARITH_NOTIMPL JERR_NOT_COMPILED #else #define _DCT_scaled_size DCT_scaled_size #define _DCT_h_scaled_size DCT_scaled_size diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/jpegint.h b/src/3rdparty/chromium/third_party/libjpeg_turbo/jpegint.h --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/jpegint.h 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/jpegint.h 2021-11-20 03:41:33.395600498 +0000 @@ -5,8 +5,9 @@ * Copyright (C) 1991-1997, Thomas G. Lane. * Modified 1997-2009 by Guido Vollbeding. * libjpeg-turbo Modifications: - * Copyright (C) 2015-2016, D. R. Commander. + * Copyright (C) 2015-2016, 2019, 2021, D. R. Commander. * Copyright (C) 2015, Google, Inc. + * Copyright (C) 2021, Alex Richardson. * For conditions of distribution and use, see the accompanying README.ijg * file. * @@ -47,6 +48,18 @@ /* JLONG must hold at least signed 32-bit values. */ typedef long JLONG; +/* JUINTPTR must hold pointer values. */ +#ifdef __UINTPTR_TYPE__ +/* + * __UINTPTR_TYPE__ is GNU-specific and available in GCC 4.6+ and Clang 3.0+. + * Fortunately, that is sufficient to support the few architectures for which + * sizeof(void *) != sizeof(size_t). The only other options would require C99 + * or Clang-specific builtins. + */ +typedef __UINTPTR_TYPE__ JUINTPTR; +#else +typedef size_t JUINTPTR; +#endif /* * Left shift macro that handles a negative operand without causing any @@ -158,6 +171,9 @@ JDIMENSION first_MCU_col[MAX_COMPONENTS]; JDIMENSION last_MCU_col[MAX_COMPONENTS]; boolean jinit_upsampler_no_alloc; + + /* Last iMCU row that was successfully decoded */ + JDIMENSION last_good_iMCU_row; }; /* Input control module */ diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/jpeglib.h b/src/3rdparty/chromium/third_party/libjpeg_turbo/jpeglib.h --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/jpeglib.h 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/jpeglib.h 2021-11-20 03:41:33.395600498 +0000 @@ -5,7 +5,7 @@ * Copyright (C) 1991-1998, Thomas G. Lane. * Modified 2002-2009 by Guido Vollbeding. * libjpeg-turbo Modifications: - * Copyright (C) 2009-2011, 2013-2014, 2016-2017, D. R. Commander. + * Copyright (C) 2009-2011, 2013-2014, 2016-2017, 2020, D. R. Commander. * Copyright (C) 2015, Google, Inc. * For conditions of distribution and use, see the accompanying README.ijg * file. @@ -19,7 +19,9 @@ #define JPEGLIB_H /* Begin chromium edits */ +#ifdef MANGLE_JPEG_NAMES #include "jpeglibmangler.h" +#endif /* End chromium edits */ /* @@ -248,9 +250,9 @@ /* DCT/IDCT algorithm options. */ typedef enum { - JDCT_ISLOW, /* slow but accurate integer algorithm */ - JDCT_IFAST, /* faster, less accurate integer method */ - JDCT_FLOAT /* floating-point: accurate, fast on fast HW */ + JDCT_ISLOW, /* accurate integer method */ + JDCT_IFAST, /* less accurate integer method [legacy feature] */ + JDCT_FLOAT /* floating-point method [legacy feature] */ } J_DCT_METHOD; #ifndef JDCT_DEFAULT /* may be overridden in jconfig.h */ diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/jpegtran.1 b/src/3rdparty/chromium/third_party/libjpeg_turbo/jpegtran.1 --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/jpegtran.1 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/jpegtran.1 2021-11-20 03:41:33.395600498 +0000 @@ -1,4 +1,4 @@ -.TH JPEGTRAN 1 "18 March 2017" +.TH JPEGTRAN 1 "13 July 2021" .SH NAME jpegtran \- lossless transformation of JPEG files .SH SYNOPSIS @@ -161,13 +161,13 @@ .PP This version of \fBjpegtran\fR also offers a lossless crop option, which discards data outside of a given image region but losslessly preserves what is -inside. Like the rotate and flip transforms, lossless crop is restricted by the -current JPEG format; the upper left corner of the selected region must fall on -an iMCU boundary. If it doesn't, then it is silently moved up and/or left to -the nearest iMCU boundary (the lower right corner is unchanged.) Thus, the +inside. Like the rotate and flip transforms, lossless crop is restricted by +the current JPEG format; the upper left corner of the selected region must fall +on an iMCU boundary. If it doesn't, then it is silently moved up and/or left +to the nearest iMCU boundary (the lower right corner is unchanged.) Thus, the output image covers at least the requested region, but it may cover more. The -adjustment of the region dimensions may be optionally disabled by attaching -an 'f' character ("force") to the width or height number. +adjustment of the region dimensions may be optionally disabled by attaching an +'f' character ("force") to the width or height number. The image can be losslessly cropped by giving the switch: .TP @@ -180,6 +180,47 @@ doesn't, then it is silently moved up and/or left to the nearest iMCU boundary (the lower right corner is unchanged.) .PP +If W or H is larger than the width/height of the input image, then the output +image is expanded in size, and the expanded region is filled in with zeros +(neutral gray). Attaching an 'f' character ("flatten") to the width number +will cause each block in the expanded region to be filled in with the DC +coefficient of the nearest block in the input image rather than grayed out. +Attaching an 'r' character ("reflect") to the width number will cause the +expanded region to be filled in with repeated reflections of the input image +rather than grayed out. +.PP +A complementary lossless wipe option is provided to discard (gray out) data +inside a given image region while losslessly preserving what is outside: +.TP +.B \-wipe WxH+X+Y +Wipe (gray out) a rectangular region of width W and height H from the input +image, starting at point X,Y. +.PP +Attaching an 'f' character ("flatten") to the width number will cause the +region to be filled with the average of adjacent blocks rather than grayed out. +If the wipe region and the region outside the wipe region, when adjusted to the +nearest iMCU boundary, form two horizontally adjacent rectangles, then +attaching an 'r' character ("reflect") to the width number will cause the wipe +region to be filled with repeated reflections of the outside region rather than +grayed out. +.PP +A lossless drop option is also provided, which allows another JPEG image to be +inserted ("dropped") into the input image data at a given position, replacing +the existing image data at that position: +.TP +.B \-drop +X+Y filename +Drop (insert) another image at point X,Y +.PP +Both the input image and the drop image must have the same subsampling level. +It is best if they also have the same quantization (quality.) Otherwise, the +quantization of the output image will be adapted to accommodate the higher of +the input image quality and the drop image quality. The trim option can be +used with the drop option to requantize the drop image to match the input +image. Note that a grayscale image can be dropped into a full-color image or +vice versa, as long as the full-color image has no vertical subsampling. If +the input image is grayscale and the drop image is full-color, then the +chrominance channels from the drop image will be discarded. +.PP Other not-strictly-lossless transformation switches are: .TP .B \-grayscale @@ -206,6 +247,10 @@ Copy only comment markers. This setting copies comments from the source file but discards any other metadata. .TP +.B \-copy icc +Copy only ICC profile markers. This setting copies the ICC profile from the +source file but discards any other metadata. +.TP .B \-copy all Copy all extra markers. This setting preserves miscellaneous markers found in the source file, such as JFIF thumbnails, Exif data, and Photoshop @@ -220,7 +265,7 @@ .BI \-icc " file" Embed ICC color management profile contained in the specified file. Note that this will cause \fBjpegtran\fR to ignore any APP2 markers in the input file, -even if \fB-copy all\fR is specified. +even if \fB-copy all\fR or \fB-copy icc\fR is specified. .TP .BI \-maxmemory " N" Set limit for amount of memory to use in processing large images. Value is @@ -229,9 +274,31 @@ .B \-max 4m selects 4000000 bytes. If more space is needed, an error will occur. .TP +.BI \-maxscans " N" +Abort if the input image contains more than +.I N +scans. This feature demonstrates a method by which applications can guard +against denial-of-service attacks instigated by specially-crafted malformed +JPEG images containing numerous scans with missing image data or image data +consisting only of "EOB runs" (a feature of progressive JPEG images that allows +potentially hundreds of thousands of adjoining zero-value pixels to be +represented using only a few bytes.) Attempting to transform such malformed +JPEG images can cause excessive CPU activity, since the decompressor must fully +process each scan (even if the scan is corrupt) before it can proceed to the +next scan. +.TP .BI \-outfile " name" Send output image to the named file, not to standard output. .TP +.BI \-report +Report transformation progress. +.TP +.BI \-strict +Treat all warnings as fatal. This feature also demonstrates a method by which +applications can guard against attacks instigated by specially-crafted +malformed JPEG images. Enabling this option will cause the decompressor to +abort if the input image contains incomplete or corrupt image data. +.TP .B \-verbose Enable debug printout. More .BR \-v 's diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/jpegtran.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/jpegtran.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/jpegtran.c 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/jpegtran.c 2021-11-20 03:41:33.395600498 +0000 @@ -2,9 +2,9 @@ * jpegtran.c * * This file was part of the Independent JPEG Group's software: - * Copyright (C) 1995-2010, Thomas G. Lane, Guido Vollbeding. + * Copyright (C) 1995-2019, Thomas G. Lane, Guido Vollbeding. * libjpeg-turbo Modifications: - * Copyright (C) 2010, 2014, 2017, D. R. Commander. + * Copyright (C) 2010, 2014, 2017, 2019-2021, D. R. Commander. * For conditions of distribution and use, see the accompanying README.ijg * file. * @@ -41,7 +41,11 @@ static const char *progname; /* program name for error messages */ static char *icc_filename; /* for -icc switch */ +static JDIMENSION max_scans; /* for -maxscans switch */ static char *outfilename; /* for -outfile switch */ +static char *dropfilename; /* for -drop switch */ +static boolean report; /* for -report switch */ +static boolean strict; /* for -strict switch */ static JCOPY_OPTION copyoption; /* -copy switch */ static jpeg_transform_info transformoption; /* image transformation options */ @@ -60,6 +64,7 @@ fprintf(stderr, "Switches (names may be abbreviated):\n"); fprintf(stderr, " -copy none Copy no extra markers from source file\n"); fprintf(stderr, " -copy comments Copy only comment markers (default)\n"); + fprintf(stderr, " -copy icc Copy only ICC profile markers\n"); fprintf(stderr, " -copy all Copy all extra markers\n"); #ifdef ENTROPY_OPT_SUPPORTED fprintf(stderr, " -optimize Optimize Huffman table (smaller file, but slow compression)\n"); @@ -69,9 +74,10 @@ #endif fprintf(stderr, "Switches for modifying the image:\n"); #if TRANSFORMS_SUPPORTED - fprintf(stderr, " -crop WxH+X+Y Crop to a rectangular subarea\n"); - fprintf(stderr, " -grayscale Reduce to grayscale (omit color data)\n"); + fprintf(stderr, " -crop WxH+X+Y Crop to a rectangular region\n"); + fprintf(stderr, " -drop +X+Y filename Drop (insert) another image\n"); fprintf(stderr, " -flip [horizontal|vertical] Mirror image (left-right or top-bottom)\n"); + fprintf(stderr, " -grayscale Reduce to grayscale (omit color data)\n"); fprintf(stderr, " -perfect Fail if there is non-transformable edge blocks\n"); fprintf(stderr, " -rotate [90|180|270] Rotate image (degrees clockwise)\n"); #endif @@ -79,6 +85,8 @@ fprintf(stderr, " -transpose Transpose image\n"); fprintf(stderr, " -transverse Transverse transpose image\n"); fprintf(stderr, " -trim Drop non-transformable edge blocks\n"); + fprintf(stderr, " with -drop: Requantize drop file to match source file\n"); + fprintf(stderr, " -wipe WxH+X+Y Wipe (gray out) a rectangular region\n"); #endif fprintf(stderr, "Switches for advanced users:\n"); #ifdef C_ARITH_CODING_SUPPORTED @@ -87,7 +95,10 @@ fprintf(stderr, " -icc FILE Embed ICC profile contained in FILE\n"); fprintf(stderr, " -restart N Set restart interval in rows, or in blocks with B\n"); fprintf(stderr, " -maxmemory N Maximum memory to use (in kbytes)\n"); + fprintf(stderr, " -maxscans N Maximum number of scans to allow in input file\n"); fprintf(stderr, " -outfile name Specify name for output file\n"); + fprintf(stderr, " -report Report transformation progress\n"); + fprintf(stderr, " -strict Treat all warnings as fatal\n"); fprintf(stderr, " -verbose or -debug Emit debug output\n"); fprintf(stderr, " -version Print version information and exit\n"); fprintf(stderr, "Switches for wizards:\n"); @@ -141,7 +152,10 @@ /* Set up default JPEG parameters. */ simple_progressive = FALSE; icc_filename = NULL; + max_scans = 0; outfilename = NULL; + report = FALSE; + strict = FALSE; copyoption = JCOPYOPT_DEFAULT; transformoption.transform = JXFORM_NONE; transformoption.perfect = FALSE; @@ -183,6 +197,8 @@ copyoption = JCOPYOPT_NONE; } else if (keymatch(argv[argn], "comments", 1)) { copyoption = JCOPYOPT_COMMENTS; + } else if (keymatch(argv[argn], "icc", 1)) { + copyoption = JCOPYOPT_ICC; } else if (keymatch(argv[argn], "all", 1)) { copyoption = JCOPYOPT_ALL; } else @@ -193,7 +209,8 @@ #if TRANSFORMS_SUPPORTED if (++argn >= argc) /* advance to next argument */ usage(); - if (!jtransform_parse_crop_spec(&transformoption, argv[argn])) { + if (transformoption.crop /* reject multiple crop/drop/wipe requests */ || + !jtransform_parse_crop_spec(&transformoption, argv[argn])) { fprintf(stderr, "%s: bogus -crop argument '%s'\n", progname, argv[argn]); exit(EXIT_FAILURE); @@ -202,6 +219,26 @@ select_transform(JXFORM_NONE); /* force an error */ #endif + } else if (keymatch(arg, "drop", 2)) { +#if TRANSFORMS_SUPPORTED + if (++argn >= argc) /* advance to next argument */ + usage(); + if (transformoption.crop /* reject multiple crop/drop/wipe requests */ || + !jtransform_parse_crop_spec(&transformoption, argv[argn]) || + transformoption.crop_width_set != JCROP_UNSET || + transformoption.crop_height_set != JCROP_UNSET) { + fprintf(stderr, "%s: bogus -drop argument '%s'\n", + progname, argv[argn]); + exit(EXIT_FAILURE); + } + if (++argn >= argc) /* advance to next argument */ + usage(); + dropfilename = argv[argn]; + select_transform(JXFORM_DROP); +#else + select_transform(JXFORM_NONE); /* force an error */ +#endif + } else if (keymatch(arg, "debug", 1) || keymatch(arg, "verbose", 1)) { /* Enable debug printouts. */ /* On first -d, print version identification */ @@ -261,6 +298,12 @@ lval *= 1000L; cinfo->mem->max_memory_to_use = lval * 1000L; + } else if (keymatch(arg, "maxscans", 4)) { + if (++argn >= argc) /* advance to next argument */ + usage(); + if (sscanf(argv[argn], "%u", &max_scans) != 1) + usage(); + } else if (keymatch(arg, "optimize", 1) || keymatch(arg, "optimise", 1)) { /* Enable entropy parm optimization. */ #ifdef ENTROPY_OPT_SUPPORTED @@ -293,6 +336,9 @@ exit(EXIT_FAILURE); #endif + } else if (keymatch(arg, "report", 3)) { + report = TRUE; + } else if (keymatch(arg, "restart", 1)) { /* Restart interval in MCU rows (or in MCUs with 'b'). */ long lval; @@ -338,6 +384,9 @@ exit(EXIT_FAILURE); #endif + } else if (keymatch(arg, "strict", 2)) { + strict = TRUE; + } else if (keymatch(arg, "transpose", 1)) { /* Transpose (across UL-to-LR axis). */ select_transform(JXFORM_TRANSPOSE); @@ -350,6 +399,21 @@ /* Trim off any partial edge MCUs that the transform can't handle. */ transformoption.trim = TRUE; + } else if (keymatch(arg, "wipe", 1)) { +#if TRANSFORMS_SUPPORTED + if (++argn >= argc) /* advance to next argument */ + usage(); + if (transformoption.crop /* reject multiple crop/drop/wipe requests */ || + !jtransform_parse_crop_spec(&transformoption, argv[argn])) { + fprintf(stderr, "%s: bogus -wipe argument '%s'\n", + progname, argv[argn]); + exit(EXIT_FAILURE); + } + select_transform(JXFORM_WIPE); +#else + select_transform(JXFORM_NONE); /* force an error */ +#endif + } else { usage(); /* bogus switch */ } @@ -375,6 +439,19 @@ } +METHODDEF(void) +my_emit_message(j_common_ptr cinfo, int msg_level) +{ + if (msg_level < 0) { + /* Treat warning as fatal */ + cinfo->err->error_exit(cinfo); + } else { + if (cinfo->err->trace_level >= msg_level) + cinfo->err->output_message(cinfo); + } +} + + /* * The main program. */ @@ -387,11 +464,14 @@ #endif { struct jpeg_decompress_struct srcinfo; +#if TRANSFORMS_SUPPORTED + struct jpeg_decompress_struct dropinfo; + struct jpeg_error_mgr jdroperr; + FILE *drop_file; +#endif struct jpeg_compress_struct dstinfo; struct jpeg_error_mgr jsrcerr, jdsterr; -#ifdef PROGRESS_REPORT - struct cdjpeg_progress_mgr progress; -#endif + struct cdjpeg_progress_mgr src_progress, dst_progress; jvirt_barray_ptr *src_coef_arrays; jvirt_barray_ptr *dst_coef_arrays; int file_index; @@ -424,13 +504,16 @@ * values read here are mostly ignored; we will rescan the switches after * opening the input file. Also note that most of the switches affect the * destination JPEG object, so we parse into that and then copy over what - * needs to affects the source too. + * needs to affect the source too. */ file_index = parse_switches(&dstinfo, argc, argv, 0, FALSE); jsrcerr.trace_level = jdsterr.trace_level; srcinfo.mem->max_memory_to_use = dstinfo.mem->max_memory_to_use; + if (strict) + jsrcerr.emit_message = my_emit_message; + #ifdef TWO_FILE_COMMANDLINE /* Must have either -outfile switch or explicit output file name */ if (outfilename == NULL) { @@ -494,10 +577,33 @@ fclose(icc_file); if (copyoption == JCOPYOPT_ALL) copyoption = JCOPYOPT_ALL_EXCEPT_ICC; + if (copyoption == JCOPYOPT_ICC) + copyoption = JCOPYOPT_NONE; } -#ifdef PROGRESS_REPORT - start_progress_monitor((j_common_ptr)&dstinfo, &progress); + if (report) { + start_progress_monitor((j_common_ptr)&dstinfo, &dst_progress); + dst_progress.report = report; + } + if (report || max_scans != 0) { + start_progress_monitor((j_common_ptr)&srcinfo, &src_progress); + src_progress.report = report; + src_progress.max_scans = max_scans; + } +#if TRANSFORMS_SUPPORTED + /* Open the drop file. */ + if (dropfilename != NULL) { + if ((drop_file = fopen(dropfilename, READ_BINARY)) == NULL) { + fprintf(stderr, "%s: can't open %s for reading\n", progname, + dropfilename); + return EXIT_FAILURE; + } + dropinfo.err = jpeg_std_error(&jdroperr); + jpeg_create_decompress(&dropinfo); + jpeg_stdio_src(&dropinfo, drop_file); + } else { + drop_file = NULL; + } #endif /* Specify data source for decompression */ @@ -509,6 +615,17 @@ /* Read file header */ (void)jpeg_read_header(&srcinfo, TRUE); +#if TRANSFORMS_SUPPORTED + if (dropfilename != NULL) { + (void)jpeg_read_header(&dropinfo, TRUE); + transformoption.crop_width = dropinfo.image_width; + transformoption.crop_width_set = JCROP_POS; + transformoption.crop_height = dropinfo.image_height; + transformoption.crop_height_set = JCROP_POS; + transformoption.drop_ptr = &dropinfo; + } +#endif + /* Any space needed by a transform option must be requested before * jpeg_read_coefficients so that memory allocation will be done right. */ @@ -524,6 +641,12 @@ /* Read source file as DCT coefficients */ src_coef_arrays = jpeg_read_coefficients(&srcinfo); +#if TRANSFORMS_SUPPORTED + if (dropfilename != NULL) { + transformoption.drop_coef_arrays = jpeg_read_coefficients(&dropinfo); + } +#endif + /* Initialize destination compression parameters from source values */ jpeg_copy_critical_parameters(&srcinfo, &dstinfo); @@ -584,20 +707,36 @@ /* Finish compression and release memory */ jpeg_finish_compress(&dstinfo); jpeg_destroy_compress(&dstinfo); +#if TRANSFORMS_SUPPORTED + if (dropfilename != NULL) { + (void)jpeg_finish_decompress(&dropinfo); + jpeg_destroy_decompress(&dropinfo); + } +#endif (void)jpeg_finish_decompress(&srcinfo); jpeg_destroy_decompress(&srcinfo); /* Close output file, if we opened it */ if (fp != stdout) fclose(fp); - -#ifdef PROGRESS_REPORT - end_progress_monitor((j_common_ptr)&dstinfo); +#if TRANSFORMS_SUPPORTED + if (drop_file != NULL) + fclose(drop_file); #endif + if (report) + end_progress_monitor((j_common_ptr)&dstinfo); + if (report || max_scans != 0) + end_progress_monitor((j_common_ptr)&srcinfo); + free(icc_profile); /* All done. */ +#if TRANSFORMS_SUPPORTED + if (dropfilename != NULL) + return (jsrcerr.num_warnings + jdroperr.num_warnings + + jdsterr.num_warnings ? EXIT_WARNING : EXIT_SUCCESS); +#endif return (jsrcerr.num_warnings + jdsterr.num_warnings ? EXIT_WARNING : EXIT_SUCCESS); } diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/jquant1.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/jquant1.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/jquant1.c 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/jquant1.c 2021-11-20 03:41:33.395600498 +0000 @@ -479,7 +479,7 @@ for (col = width; col > 0; col--) { pixcode = 0; for (ci = 0; ci < nc; ci++) { - pixcode += GETJSAMPLE(colorindex[ci][GETJSAMPLE(*ptrin++)]); + pixcode += colorindex[ci][*ptrin++]; } *ptrout++ = (JSAMPLE)pixcode; } @@ -506,9 +506,9 @@ ptrin = input_buf[row]; ptrout = output_buf[row]; for (col = width; col > 0; col--) { - pixcode = GETJSAMPLE(colorindex0[GETJSAMPLE(*ptrin++)]); - pixcode += GETJSAMPLE(colorindex1[GETJSAMPLE(*ptrin++)]); - pixcode += GETJSAMPLE(colorindex2[GETJSAMPLE(*ptrin++)]); + pixcode = colorindex0[*ptrin++]; + pixcode += colorindex1[*ptrin++]; + pixcode += colorindex2[*ptrin++]; *ptrout++ = (JSAMPLE)pixcode; } } @@ -552,7 +552,7 @@ * required amount of padding. */ *output_ptr += - colorindex_ci[GETJSAMPLE(*input_ptr) + dither[col_index]]; + colorindex_ci[*input_ptr + dither[col_index]]; input_ptr += nc; output_ptr++; col_index = (col_index + 1) & ODITHER_MASK; @@ -595,12 +595,9 @@ col_index = 0; for (col = width; col > 0; col--) { - pixcode = - GETJSAMPLE(colorindex0[GETJSAMPLE(*input_ptr++) + dither0[col_index]]); - pixcode += - GETJSAMPLE(colorindex1[GETJSAMPLE(*input_ptr++) + dither1[col_index]]); - pixcode += - GETJSAMPLE(colorindex2[GETJSAMPLE(*input_ptr++) + dither2[col_index]]); + pixcode = colorindex0[(*input_ptr++) + dither0[col_index]]; + pixcode += colorindex1[(*input_ptr++) + dither1[col_index]]; + pixcode += colorindex2[(*input_ptr++) + dither2[col_index]]; *output_ptr++ = (JSAMPLE)pixcode; col_index = (col_index + 1) & ODITHER_MASK; } @@ -677,15 +674,15 @@ * The maximum error is +- MAXJSAMPLE; this sets the required size * of the range_limit array. */ - cur += GETJSAMPLE(*input_ptr); - cur = GETJSAMPLE(range_limit[cur]); + cur += *input_ptr; + cur = range_limit[cur]; /* Select output value, accumulate into output code for this pixel */ - pixcode = GETJSAMPLE(colorindex_ci[cur]); + pixcode = colorindex_ci[cur]; *output_ptr += (JSAMPLE)pixcode; /* Compute actual representation error at this pixel */ /* Note: we can do this even though we don't have the final */ /* pixel code, because the colormap is orthogonal. */ - cur -= GETJSAMPLE(colormap_ci[pixcode]); + cur -= colormap_ci[pixcode]; /* Compute error fractions to be propagated to adjacent pixels. * Add these into the running sums, and simultaneously shift the * next-line error sums left by 1 column. diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/jquant2.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/jquant2.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/jquant2.c 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/jquant2.c 2021-11-20 03:41:33.396600482 +0000 @@ -4,7 +4,7 @@ * This file was part of the Independent JPEG Group's software: * Copyright (C) 1991-1996, Thomas G. Lane. * libjpeg-turbo Modifications: - * Copyright (C) 2009, 2014-2015, D. R. Commander. + * Copyright (C) 2009, 2014-2015, 2020, D. R. Commander. * For conditions of distribution and use, see the accompanying README.ijg * file. * @@ -215,9 +215,9 @@ ptr = input_buf[row]; for (col = width; col > 0; col--) { /* get pixel value and index into the histogram */ - histp = &histogram[GETJSAMPLE(ptr[0]) >> C0_SHIFT] - [GETJSAMPLE(ptr[1]) >> C1_SHIFT] - [GETJSAMPLE(ptr[2]) >> C2_SHIFT]; + histp = &histogram[ptr[0] >> C0_SHIFT] + [ptr[1] >> C1_SHIFT] + [ptr[2] >> C2_SHIFT]; /* increment, check for overflow and undo increment if so. */ if (++(*histp) <= 0) (*histp)--; @@ -665,7 +665,7 @@ for (i = 0; i < numcolors; i++) { /* We compute the squared-c0-distance term, then add in the other two. */ - x = GETJSAMPLE(cinfo->colormap[0][i]); + x = cinfo->colormap[0][i]; if (x < minc0) { tdist = (x - minc0) * C0_SCALE; min_dist = tdist * tdist; @@ -688,7 +688,7 @@ } } - x = GETJSAMPLE(cinfo->colormap[1][i]); + x = cinfo->colormap[1][i]; if (x < minc1) { tdist = (x - minc1) * C1_SCALE; min_dist += tdist * tdist; @@ -710,7 +710,7 @@ } } - x = GETJSAMPLE(cinfo->colormap[2][i]); + x = cinfo->colormap[2][i]; if (x < minc2) { tdist = (x - minc2) * C2_SCALE; min_dist += tdist * tdist; @@ -788,13 +788,13 @@ #define STEP_C2 ((1 << C2_SHIFT) * C2_SCALE) for (i = 0; i < numcolors; i++) { - icolor = GETJSAMPLE(colorlist[i]); + icolor = colorlist[i]; /* Compute (square of) distance from minc0/c1/c2 to this color */ - inc0 = (minc0 - GETJSAMPLE(cinfo->colormap[0][icolor])) * C0_SCALE; + inc0 = (minc0 - cinfo->colormap[0][icolor]) * C0_SCALE; dist0 = inc0 * inc0; - inc1 = (minc1 - GETJSAMPLE(cinfo->colormap[1][icolor])) * C1_SCALE; + inc1 = (minc1 - cinfo->colormap[1][icolor]) * C1_SCALE; dist0 += inc1 * inc1; - inc2 = (minc2 - GETJSAMPLE(cinfo->colormap[2][icolor])) * C2_SCALE; + inc2 = (minc2 - cinfo->colormap[2][icolor]) * C2_SCALE; dist0 += inc2 * inc2; /* Form the initial difference increments */ inc0 = inc0 * (2 * STEP_C0) + STEP_C0 * STEP_C0; @@ -879,7 +879,7 @@ for (ic1 = 0; ic1 < BOX_C1_ELEMS; ic1++) { cachep = &histogram[c0 + ic0][c1 + ic1][c2]; for (ic2 = 0; ic2 < BOX_C2_ELEMS; ic2++) { - *cachep++ = (histcell)(GETJSAMPLE(*cptr++) + 1); + *cachep++ = (histcell)((*cptr++) + 1); } } } @@ -909,9 +909,9 @@ outptr = output_buf[row]; for (col = width; col > 0; col--) { /* get pixel value and index into the cache */ - c0 = GETJSAMPLE(*inptr++) >> C0_SHIFT; - c1 = GETJSAMPLE(*inptr++) >> C1_SHIFT; - c2 = GETJSAMPLE(*inptr++) >> C2_SHIFT; + c0 = (*inptr++) >> C0_SHIFT; + c1 = (*inptr++) >> C1_SHIFT; + c2 = (*inptr++) >> C2_SHIFT; cachep = &histogram[c0][c1][c2]; /* If we have not seen this color before, find nearest colormap entry */ /* and update the cache */ @@ -996,12 +996,12 @@ * The maximum error is +- MAXJSAMPLE (or less with error limiting); * this sets the required size of the range_limit array. */ - cur0 += GETJSAMPLE(inptr[0]); - cur1 += GETJSAMPLE(inptr[1]); - cur2 += GETJSAMPLE(inptr[2]); - cur0 = GETJSAMPLE(range_limit[cur0]); - cur1 = GETJSAMPLE(range_limit[cur1]); - cur2 = GETJSAMPLE(range_limit[cur2]); + cur0 += inptr[0]; + cur1 += inptr[1]; + cur2 += inptr[2]; + cur0 = range_limit[cur0]; + cur1 = range_limit[cur1]; + cur2 = range_limit[cur2]; /* Index into the cache with adjusted pixel value */ cachep = &histogram[cur0 >> C0_SHIFT][cur1 >> C1_SHIFT][cur2 >> C2_SHIFT]; @@ -1015,9 +1015,9 @@ register int pixcode = *cachep - 1; *outptr = (JSAMPLE)pixcode; /* Compute representation error for this pixel */ - cur0 -= GETJSAMPLE(colormap0[pixcode]); - cur1 -= GETJSAMPLE(colormap1[pixcode]); - cur2 -= GETJSAMPLE(colormap2[pixcode]); + cur0 -= colormap0[pixcode]; + cur1 -= colormap1[pixcode]; + cur2 -= colormap2[pixcode]; } /* Compute error fractions to be propagated to adjacent pixels. * Add these into the running sums, and simultaneously shift the @@ -1145,7 +1145,7 @@ int i; /* Only F-S dithering or no dithering is supported. */ - /* If user asks for ordered dither, give him F-S. */ + /* If user asks for ordered dither, give them F-S. */ if (cinfo->dither_mode != JDITHER_NONE) cinfo->dither_mode = JDITHER_FS; @@ -1263,7 +1263,7 @@ cquantize->sv_colormap = NULL; /* Only F-S dithering or no dithering is supported. */ - /* If user asks for ordered dither, give him F-S. */ + /* If user asks for ordered dither, give them F-S. */ if (cinfo->dither_mode != JDITHER_NONE) cinfo->dither_mode = JDITHER_FS; diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/jsimd.h b/src/3rdparty/chromium/third_party/libjpeg_turbo/jsimd.h --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/jsimd.h 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/jsimd.h 2021-11-20 03:41:33.396600482 +0000 @@ -4,6 +4,7 @@ * Copyright 2009 Pierre Ossman for Cendio AB * Copyright (C) 2011, 2014, D. R. Commander. * Copyright (C) 2015-2016, 2018, Matthieu Darbois. + * Copyright (C) 2020, Arm Limited. * * Based on the x86 SIMD extension for IJG JPEG library, * Copyright (C) 1999-2006, MIYASAKA Masaru. diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/jsimd_none.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/jsimd_none.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/jsimd_none.c 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/jsimd_none.c 2021-11-20 03:41:33.396600482 +0000 @@ -4,6 +4,7 @@ * Copyright 2009 Pierre Ossman for Cendio AB * Copyright (C) 2009-2011, 2014, D. R. Commander. * Copyright (C) 2015-2016, 2018, Matthieu Darbois. + * Copyright (C) 2020, Arm Limited. * * Based on the x86 SIMD extension for IJG JPEG library, * Copyright (C) 1999-2006, MIYASAKA Masaru. diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/jversion.h b/src/3rdparty/chromium/third_party/libjpeg_turbo/jversion.h --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/jversion.h 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/jversion.h 2021-11-20 03:41:33.396600482 +0000 @@ -2,9 +2,9 @@ * jversion.h * * This file was part of the Independent JPEG Group's software: - * Copyright (C) 1991-2012, Thomas G. Lane, Guido Vollbeding. + * Copyright (C) 1991-2020, Thomas G. Lane, Guido Vollbeding. * libjpeg-turbo Modifications: - * Copyright (C) 2010, 2012-2020, D. R. Commander. + * Copyright (C) 2010, 2012-2021, D. R. Commander. * For conditions of distribution and use, see the accompanying README.ijg * file. * @@ -30,23 +30,25 @@ * NOTE: It is our convention to place the authors in the following order: * - libjpeg-turbo authors (2009-) in descending order of the date of their * most recent contribution to the project, then in ascending order of the - * date of their first contribution to the project + * date of their first contribution to the project, then in alphabetical + * order * - Upstream authors in descending order of the date of the first inclusion of * their code */ #define JCOPYRIGHT \ - "Copyright (C) 2009-2020 D. R. Commander\n" \ - "Copyright (C) 2011-2016 Siarhei Siamashka\n" \ + "Copyright (C) 2009-2021 D. R. Commander\n" \ + "Copyright (C) 2015, 2020 Google, Inc.\n" \ + "Copyright (C) 2019-2020 Arm Limited\n" \ "Copyright (C) 2015-2016, 2018 Matthieu Darbois\n" \ + "Copyright (C) 2011-2016 Siarhei Siamashka\n" \ "Copyright (C) 2015 Intel Corporation\n" \ - "Copyright (C) 2015 Google, Inc.\n" \ + "Copyright (C) 2013-2014 Linaro Limited\n" \ "Copyright (C) 2013-2014 MIPS Technologies, Inc.\n" \ - "Copyright (C) 2013 Linaro Limited\n" \ + "Copyright (C) 2009, 2012 Pierre Ossman for Cendio AB\n" \ "Copyright (C) 2009-2011 Nokia Corporation and/or its subsidiary(-ies)\n" \ - "Copyright (C) 2009 Pierre Ossman for Cendio AB\n" \ "Copyright (C) 1999-2006 MIYASAKA Masaru\n" \ - "Copyright (C) 1991-2016 Thomas G. Lane, Guido Vollbeding" + "Copyright (C) 1991-2020 Thomas G. Lane, Guido Vollbeding" #define JCOPYRIGHT_SHORT \ - "Copyright (C) 1991-2020 The libjpeg-turbo Project and many others" + "Copyright (C) 1991-2021 The libjpeg-turbo Project and many others" diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/libjpeg.txt b/src/3rdparty/chromium/third_party/libjpeg_turbo/libjpeg.txt --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/libjpeg.txt 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/libjpeg.txt 2021-11-20 03:41:33.396600482 +0000 @@ -3,7 +3,7 @@ This file was part of the Independent JPEG Group's software: Copyright (C) 1994-2013, Thomas G. Lane, Guido Vollbeding. libjpeg-turbo Modifications: -Copyright (C) 2010, 2014-2018, D. R. Commander. +Copyright (C) 2010, 2014-2018, 2020, D. R. Commander. Copyright (C) 2015, Google, Inc. For conditions of distribution and use, see the accompanying README.ijg file. @@ -750,7 +750,9 @@ Suspending data sources are not supported by this function. Calling jpeg_skip_scanlines() with a suspending data source will result in undefined -behavior. +behavior. Two-pass color quantization is also not supported by this function. +Calling jpeg_skip_scanlines() with two-pass color quantization enabled will +result in an error. jpeg_skip_scanlines() will not allow skipping past the bottom of the image. If the value of num_lines is large enough to skip past the bottom of the image, @@ -967,30 +969,38 @@ J_DCT_METHOD dct_method Selects the algorithm used for the DCT step. Choices are: - JDCT_ISLOW: slow but accurate integer algorithm - JDCT_IFAST: faster, less accurate integer method - JDCT_FLOAT: floating-point method + JDCT_ISLOW: accurate integer method + JDCT_IFAST: less accurate integer method [legacy feature] + JDCT_FLOAT: floating-point method [legacy feature] JDCT_DEFAULT: default method (normally JDCT_ISLOW) JDCT_FASTEST: fastest method (normally JDCT_IFAST) - In libjpeg-turbo, JDCT_IFAST is generally about 5-15% faster than - JDCT_ISLOW when using the x86/x86-64 SIMD extensions (results may vary - with other SIMD implementations, or when using libjpeg-turbo without - SIMD extensions.) For quality levels of 90 and below, there should be - little or no perceptible difference between the two algorithms. For - quality levels above 90, however, the difference between JDCT_IFAST and + When the Independent JPEG Group's software was first released in 1991, + the compression time for a 1-megapixel JPEG image on a mainstream PC + was measured in minutes. Thus, JDCT_IFAST provided noticeable + performance benefits. On modern CPUs running libjpeg-turbo, however, + the compression time for a 1-megapixel JPEG image is measured in + milliseconds, and thus the performance benefits of JDCT_IFAST are much + less noticeable. On modern x86/x86-64 CPUs that support AVX2 + instructions, JDCT_IFAST and JDCT_ISLOW have similar performance. On + other types of CPUs, JDCT_IFAST is generally about 5-15% faster than + JDCT_ISLOW. + + For quality levels of 90 and below, there should be little or no + perceptible quality difference between the two algorithms. For quality + levels above 90, however, the difference between JDCT_IFAST and JDCT_ISLOW becomes more pronounced. With quality=97, for instance, - JDCT_IFAST incurs generally about a 1-3 dB loss (in PSNR) relative to + JDCT_IFAST incurs generally about a 1-3 dB loss in PSNR relative to JDCT_ISLOW, but this can be larger for some images. Do not use JDCT_IFAST with quality levels above 97. The algorithm often degenerates at quality=98 and above and can actually produce a more lossy image than if lower quality levels had been used. Also, in libjpeg-turbo, JDCT_IFAST is not fully accelerated for quality levels - above 97, so it will be slower than JDCT_ISLOW. JDCT_FLOAT is mainly a - legacy feature. It does not produce significantly more accurate - results than the ISLOW method, and it is much slower. The FLOAT method - may also give different results on different machines due to varying - roundoff behavior, whereas the integer methods should give the same - results on all machines. + above 97, so it will be slower than JDCT_ISLOW. + + JDCT_FLOAT does not produce significantly more accurate results than + JDCT_ISLOW, and it is much slower. JDCT_FLOAT may also give different + results on different machines due to varying roundoff behavior, whereas + the integer methods should give the same results on all machines. J_COLOR_SPACE jpeg_color_space int num_components @@ -1268,31 +1278,39 @@ J_DCT_METHOD dct_method Selects the algorithm used for the DCT step. Choices are: - JDCT_ISLOW: slow but accurate integer algorithm - JDCT_IFAST: faster, less accurate integer method - JDCT_FLOAT: floating-point method + JDCT_ISLOW: accurate integer method + JDCT_IFAST: less accurate integer method [legacy feature] + JDCT_FLOAT: floating-point method [legacy feature] JDCT_DEFAULT: default method (normally JDCT_ISLOW) JDCT_FASTEST: fastest method (normally JDCT_IFAST) - In libjpeg-turbo, JDCT_IFAST is generally about 5-15% faster than - JDCT_ISLOW when using the x86/x86-64 SIMD extensions (results may vary - with other SIMD implementations, or when using libjpeg-turbo without - SIMD extensions.) If the JPEG image was compressed using a quality - level of 85 or below, then there should be little or no perceptible - difference between the two algorithms. When decompressing images that - were compressed using quality levels above 85, however, the difference + When the Independent JPEG Group's software was first released in 1991, + the decompression time for a 1-megapixel JPEG image on a mainstream PC + was measured in minutes. Thus, JDCT_IFAST provided noticeable + performance benefits. On modern CPUs running libjpeg-turbo, however, + the decompression time for a 1-megapixel JPEG image is measured in + milliseconds, and thus the performance benefits of JDCT_IFAST are much + less noticeable. On modern x86/x86-64 CPUs that support AVX2 + instructions, JDCT_IFAST and JDCT_ISLOW have similar performance. On + other types of CPUs, JDCT_IFAST is generally about 5-15% faster than + JDCT_ISLOW. + + If the JPEG image was compressed using a quality level of 85 or below, + then there should be little or no perceptible quality difference + between the two algorithms. When decompressing images that were + compressed using quality levels above 85, however, the difference between JDCT_IFAST and JDCT_ISLOW becomes more pronounced. With images compressed using quality=97, for instance, JDCT_IFAST incurs generally - about a 4-6 dB loss (in PSNR) relative to JDCT_ISLOW, but this can be + about a 4-6 dB loss in PSNR relative to JDCT_ISLOW, but this can be larger for some images. If you can avoid it, do not use JDCT_IFAST when decompressing images that were compressed using quality levels above 97. The algorithm often degenerates for such images and can actually produce a more lossy output image than if the JPEG image had - been compressed using lower quality levels. JDCT_FLOAT is mainly a - legacy feature. It does not produce significantly more accurate - results than the ISLOW method, and it is much slower. The FLOAT method - may also give different results on different machines due to varying - roundoff behavior, whereas the integer methods should give the same - results on all machines. + been compressed using lower quality levels. + + JDCT_FLOAT does not produce significantly more accurate results than + JDCT_ISLOW, and it is much slower. JDCT_FLOAT may also give different + results on different machines due to varying roundoff behavior, whereas + the integer methods should give the same results on all machines. boolean do_fancy_upsampling If TRUE, do careful upsampling of chroma components. If FALSE, diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/LICENSE.md b/src/3rdparty/chromium/third_party/libjpeg_turbo/LICENSE.md --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/LICENSE.md 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/LICENSE.md 2021-11-20 03:41:33.389600594 +0000 @@ -91,7 +91,7 @@ The Modified (3-clause) BSD License =================================== -Copyright (C)2009-2020 D. R. Commander. All Rights Reserved. +Copyright (C)2009-2021 D. R. Commander. All Rights Reserved.
Copyright (C)2015 Viktor SzathmĂ¡ry. All Rights Reserved. Redistribution and use in source and binary forms, with or without diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/OWNERS b/src/3rdparty/chromium/third_party/libjpeg_turbo/OWNERS --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/OWNERS 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/OWNERS 2021-11-20 03:41:33.389600594 +0000 @@ -1,5 +1,3 @@ scroggo@google.com cblume@chromium.org jonathan.wright@arm.com - -# COMPONENT: Internals>Images>Codecs diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/rdbmp.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/rdbmp.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/rdbmp.c 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/rdbmp.c 2021-11-20 03:41:33.396600482 +0000 @@ -6,13 +6,13 @@ * Modified 2009-2017 by Guido Vollbeding. * libjpeg-turbo Modifications: * Modified 2011 by Siarhei Siamashka. - * Copyright (C) 2015, 2017-2018, D. R. Commander. + * Copyright (C) 2015, 2017-2018, 2021, D. R. Commander. * For conditions of distribution and use, see the accompanying README.ijg * file. * * This file contains routines to read input images in Microsoft "BMP" * format (MS Windows 3.x, OS/2 1.x, and OS/2 2.x flavors). - * Currently, only 8-bit and 24-bit images are supported, not 1-bit or + * Currently, only 8-, 24-, and 32-bit images are supported, not 1-bit or * 4-bit (feeding such low-depth images into JPEG would be silly anyway). * Also, we don't support RLE-compressed files. * @@ -34,18 +34,8 @@ /* Macros to deal with unsigned chars as efficiently as compiler allows */ -#ifdef HAVE_UNSIGNED_CHAR typedef unsigned char U_CHAR; #define UCH(x) ((int)(x)) -#else /* !HAVE_UNSIGNED_CHAR */ -#ifdef __CHAR_UNSIGNED__ -typedef char U_CHAR; -#define UCH(x) ((int)(x)) -#else -typedef char U_CHAR; -#define UCH(x) ((int)(x) & 0xFF) -#endif -#endif /* HAVE_UNSIGNED_CHAR */ #define ReadOK(file, buffer, len) \ @@ -71,7 +61,7 @@ JDIMENSION source_row; /* Current source row number */ JDIMENSION row_width; /* Physical width of scanlines in file */ - int bits_per_pixel; /* remembers 8- or 24-bit format */ + int bits_per_pixel; /* remembers 8-, 24-, or 32-bit format */ int cmap_length; /* colormap length */ boolean use_inversion_array; /* TRUE = preload the whole image, which is @@ -179,14 +169,14 @@ outptr = source->pub.buffer[0]; if (cinfo->in_color_space == JCS_GRAYSCALE) { for (col = cinfo->image_width; col > 0; col--) { - t = GETJSAMPLE(*inptr++); + t = *inptr++; if (t >= cmaplen) ERREXIT(cinfo, JERR_BMP_OUTOFRANGE); *outptr++ = colormap[0][t]; } } else if (cinfo->in_color_space == JCS_CMYK) { for (col = cinfo->image_width; col > 0; col--) { - t = GETJSAMPLE(*inptr++); + t = *inptr++; if (t >= cmaplen) ERREXIT(cinfo, JERR_BMP_OUTOFRANGE); rgb_to_cmyk(colormap[0][t], colormap[1][t], colormap[2][t], outptr, @@ -202,7 +192,7 @@ if (aindex >= 0) { for (col = cinfo->image_width; col > 0; col--) { - t = GETJSAMPLE(*inptr++); + t = *inptr++; if (t >= cmaplen) ERREXIT(cinfo, JERR_BMP_OUTOFRANGE); outptr[rindex] = colormap[0][t]; @@ -213,7 +203,7 @@ } } else { for (col = cinfo->image_width; col > 0; col--) { - t = GETJSAMPLE(*inptr++); + t = *inptr++; if (t >= cmaplen) ERREXIT(cinfo, JERR_BMP_OUTOFRANGE); outptr[rindex] = colormap[0][t]; @@ -258,7 +248,6 @@ MEMCOPY(outptr, inptr, source->row_width); } else if (cinfo->in_color_space == JCS_CMYK) { for (col = cinfo->image_width; col > 0; col--) { - /* can omit GETJSAMPLE() safely */ JSAMPLE b = *inptr++, g = *inptr++, r = *inptr++; rgb_to_cmyk(r, g, b, outptr, outptr + 1, outptr + 2, outptr + 3); outptr += 4; @@ -272,7 +261,7 @@ if (aindex >= 0) { for (col = cinfo->image_width; col > 0; col--) { - outptr[bindex] = *inptr++; /* can omit GETJSAMPLE() safely */ + outptr[bindex] = *inptr++; outptr[gindex] = *inptr++; outptr[rindex] = *inptr++; outptr[aindex] = 0xFF; @@ -280,7 +269,7 @@ } } else { for (col = cinfo->image_width; col > 0; col--) { - outptr[bindex] = *inptr++; /* can omit GETJSAMPLE() safely */ + outptr[bindex] = *inptr++; outptr[gindex] = *inptr++; outptr[rindex] = *inptr++; outptr += ps; @@ -323,7 +312,6 @@ MEMCOPY(outptr, inptr, source->row_width); } else if (cinfo->in_color_space == JCS_CMYK) { for (col = cinfo->image_width; col > 0; col--) { - /* can omit GETJSAMPLE() safely */ JSAMPLE b = *inptr++, g = *inptr++, r = *inptr++; rgb_to_cmyk(r, g, b, outptr, outptr + 1, outptr + 2, outptr + 3); inptr++; /* skip the 4th byte (Alpha channel) */ @@ -338,7 +326,7 @@ if (aindex >= 0) { for (col = cinfo->image_width; col > 0; col--) { - outptr[bindex] = *inptr++; /* can omit GETJSAMPLE() safely */ + outptr[bindex] = *inptr++; outptr[gindex] = *inptr++; outptr[rindex] = *inptr++; outptr[aindex] = *inptr++; @@ -346,7 +334,7 @@ } } else { for (col = cinfo->image_width; col > 0; col--) { - outptr[bindex] = *inptr++; /* can omit GETJSAMPLE() safely */ + outptr[bindex] = *inptr++; outptr[gindex] = *inptr++; outptr[rindex] = *inptr++; inptr++; /* skip the 4th byte (Alpha channel) */ @@ -436,14 +424,14 @@ (((unsigned int)UCH(array[offset + 2])) << 16) + \ (((unsigned int)UCH(array[offset + 3])) << 24)) - unsigned int bfOffBits; - unsigned int headerSize; + int bfOffBits; + int headerSize; int biWidth; int biHeight; unsigned short biPlanes; unsigned int biCompression; int biXPelsPerMeter, biYPelsPerMeter; - unsigned int biClrUsed = 0; + int biClrUsed = 0; int mapentrysize = 0; /* 0 indicates no colormap */ int bPad; JDIMENSION row_width = 0; @@ -462,7 +450,7 @@ if (!ReadOK(source->pub.input_file, bmpinfoheader, 4)) ERREXIT(cinfo, JERR_INPUT_EOF); headerSize = GET_4B(bmpinfoheader, 0); - if (headerSize < 12 || headerSize > 64) + if (headerSize < 12 || headerSize > 64 || (headerSize + 14) > bfOffBits) ERREXIT(cinfo, JERR_BMP_BADHEADER); if (!ReadOK(source->pub.input_file, bmpinfoheader + 4, headerSize - 4)) ERREXIT(cinfo, JERR_INPUT_EOF); @@ -481,7 +469,9 @@ TRACEMS2(cinfo, 1, JTRC_BMP_OS2_MAPPED, biWidth, biHeight); break; case 24: /* RGB image */ - TRACEMS2(cinfo, 1, JTRC_BMP_OS2, biWidth, biHeight); + case 32: /* RGB image + Alpha channel */ + TRACEMS3(cinfo, 1, JTRC_BMP_OS2, biWidth, biHeight, + source->bits_per_pixel); break; default: ERREXIT(cinfo, JERR_BMP_BADDEPTH); @@ -508,10 +498,8 @@ TRACEMS2(cinfo, 1, JTRC_BMP_MAPPED, biWidth, biHeight); break; case 24: /* RGB image */ - TRACEMS2(cinfo, 1, JTRC_BMP, biWidth, biHeight); - break; case 32: /* RGB image + Alpha channel */ - TRACEMS2(cinfo, 1, JTRC_BMP, biWidth, biHeight); + TRACEMS3(cinfo, 1, JTRC_BMP, biWidth, biHeight, source->bits_per_pixel); break; default: ERREXIT(cinfo, JERR_BMP_BADDEPTH); @@ -534,6 +522,11 @@ if (biWidth <= 0 || biHeight <= 0) ERREXIT(cinfo, JERR_BMP_EMPTY); +#ifdef FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION + if (sinfo->max_pixels && + (unsigned long long)biWidth * biHeight > sinfo->max_pixels) + ERREXIT(cinfo, JERR_WIDTH_OVERFLOW); +#endif if (biPlanes != 1) ERREXIT(cinfo, JERR_BMP_BADPLANES); @@ -587,7 +580,9 @@ cinfo->input_components = 4; else ERREXIT(cinfo, JERR_BAD_IN_COLORSPACE); - row_width = (JDIMENSION)(biWidth * 3); + if ((unsigned long long)biWidth * 3ULL > 0xFFFFFFFFULL) + ERREXIT(cinfo, JERR_WIDTH_OVERFLOW); + row_width = (JDIMENSION)biWidth * 3; break; case 32: if (cinfo->in_color_space == JCS_UNKNOWN) @@ -598,7 +593,9 @@ cinfo->input_components = 4; else ERREXIT(cinfo, JERR_BAD_IN_COLORSPACE); - row_width = (JDIMENSION)(biWidth * 4); + if ((unsigned long long)biWidth * 4ULL > 0xFFFFFFFFULL) + ERREXIT(cinfo, JERR_WIDTH_OVERFLOW); + row_width = (JDIMENSION)biWidth * 4; break; default: ERREXIT(cinfo, JERR_BMP_BADDEPTH); @@ -643,7 +640,7 @@ /* Allocate one-row buffer for returned data */ source->pub.buffer = (*cinfo->mem->alloc_sarray) ((j_common_ptr)cinfo, JPOOL_IMAGE, - (JDIMENSION)(biWidth * cinfo->input_components), (JDIMENSION)1); + (JDIMENSION)biWidth * (JDIMENSION)cinfo->input_components, (JDIMENSION)1); source->pub.buffer_height = 1; cinfo->data_precision = 8; @@ -680,6 +677,9 @@ /* Fill in method ptrs, except get_pixel_rows which start_input sets */ source->pub.start_input = start_input_bmp; source->pub.finish_input = finish_input_bmp; +#ifdef FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION + source->pub.max_pixels = 0; +#endif source->use_inversion_array = use_inversion_array; diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/rdcolmap.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/rdcolmap.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/rdcolmap.c 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/rdcolmap.c 2021-11-20 03:41:33.396600482 +0000 @@ -54,9 +54,8 @@ /* Check for duplicate color. */ for (index = 0; index < ncolors; index++) { - if (GETJSAMPLE(colormap0[index]) == R && - GETJSAMPLE(colormap1[index]) == G && - GETJSAMPLE(colormap2[index]) == B) + if (colormap0[index] == R && colormap1[index] == G && + colormap2[index] == B) return; /* color is already in map */ } diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/rdgif.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/rdgif.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/rdgif.c 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/rdgif.c 2021-11-20 03:41:33.397600466 +0000 @@ -1,29 +1,673 @@ /* * rdgif.c * + * This file was part of the Independent JPEG Group's software: * Copyright (C) 1991-1997, Thomas G. Lane. - * This file is part of the Independent JPEG Group's software. + * Modified 2019 by Guido Vollbeding. + * libjpeg-turbo Modifications: + * Copyright (C) 2021, D. R. Commander. * For conditions of distribution and use, see the accompanying README.ijg * file. * * This file contains routines to read input images in GIF format. * - ***************************************************************************** - * NOTE: to avoid entanglements with Unisys' patent on LZW compression, * - * the ability to read GIF files has been removed from the IJG distribution. * - * Sorry about that. * - ***************************************************************************** - * - * We are required to state that - * "The Graphics Interchange Format(c) is the Copyright property of - * CompuServe Incorporated. GIF(sm) is a Service Mark property of - * CompuServe Incorporated." + * These routines may need modification for non-Unix environments or + * specialized applications. As they stand, they assume input from + * an ordinary stdio stream. They further assume that reading begins + * at the start of the file; start_input may need work if the + * user interface has already read some data (e.g., to determine that + * the file is indeed GIF format). + */ + +/* + * This code is loosely based on giftoppm from the PBMPLUS distribution + * of Feb. 1991. That file contains the following copyright notice: + * +-------------------------------------------------------------------+ + * | Copyright 1990, David Koblas. | + * | Permission to use, copy, modify, and distribute this software | + * | and its documentation for any purpose and without fee is hereby | + * | granted, provided that the above copyright notice appear in all | + * | copies and that both that copyright notice and this permission | + * | notice appear in supporting documentation. This software is | + * | provided "as is" without express or implied warranty. | + * +-------------------------------------------------------------------+ */ #include "cdjpeg.h" /* Common decls for cjpeg/djpeg applications */ #ifdef GIF_SUPPORTED + +/* Macros to deal with unsigned chars as efficiently as compiler allows */ + +typedef unsigned char U_CHAR; +#define UCH(x) ((int)(x)) + + +#define ReadOK(file, buffer, len) \ + (JFREAD(file, buffer, len) == ((size_t)(len))) + + +#define MAXCOLORMAPSIZE 256 /* max # of colors in a GIF colormap */ +#define NUMCOLORS 3 /* # of colors */ +#define CM_RED 0 /* color component numbers */ +#define CM_GREEN 1 +#define CM_BLUE 2 + +#define MAX_LZW_BITS 12 /* maximum LZW code size */ +#define LZW_TABLE_SIZE (1 << MAX_LZW_BITS) /* # of possible LZW symbols */ + +/* Macros for extracting header data --- note we assume chars may be signed */ + +#define LM_to_uint(array, offset) \ + ((unsigned int)UCH(array[offset]) + \ + (((unsigned int)UCH(array[offset + 1])) << 8)) + +#define BitSet(byte, bit) ((byte) & (bit)) +#define INTERLACE 0x40 /* mask for bit signifying interlaced image */ +#define COLORMAPFLAG 0x80 /* mask for bit signifying colormap presence */ + + +/* + * LZW decompression tables look like this: + * symbol_head[K] = prefix symbol of any LZW symbol K (0..LZW_TABLE_SIZE-1) + * symbol_tail[K] = suffix byte of any LZW symbol K (0..LZW_TABLE_SIZE-1) + * Note that entries 0..end_code of the above tables are not used, + * since those symbols represent raw bytes or special codes. + * + * The stack represents the not-yet-used expansion of the last LZW symbol. + * In the worst case, a symbol could expand to as many bytes as there are + * LZW symbols, so we allocate LZW_TABLE_SIZE bytes for the stack. + * (This is conservative since that number includes the raw-byte symbols.) + */ + + +/* Private version of data source object */ + +typedef struct { + struct cjpeg_source_struct pub; /* public fields */ + + j_compress_ptr cinfo; /* back link saves passing separate parm */ + + JSAMPARRAY colormap; /* GIF colormap (converted to my format) */ + + /* State for GetCode and LZWReadByte */ + U_CHAR code_buf[256 + 4]; /* current input data block */ + int last_byte; /* # of bytes in code_buf */ + int last_bit; /* # of bits in code_buf */ + int cur_bit; /* next bit index to read */ + boolean first_time; /* flags first call to GetCode */ + boolean out_of_blocks; /* TRUE if hit terminator data block */ + + int input_code_size; /* codesize given in GIF file */ + int clear_code, end_code; /* values for Clear and End codes */ + + int code_size; /* current actual code size */ + int limit_code; /* 2^code_size */ + int max_code; /* first unused code value */ + + /* Private state for LZWReadByte */ + int oldcode; /* previous LZW symbol */ + int firstcode; /* first byte of oldcode's expansion */ + + /* LZW symbol table and expansion stack */ + UINT16 *symbol_head; /* => table of prefix symbols */ + UINT8 *symbol_tail; /* => table of suffix bytes */ + UINT8 *symbol_stack; /* => stack for symbol expansions */ + UINT8 *sp; /* stack pointer */ + + /* State for interlaced image processing */ + boolean is_interlaced; /* TRUE if have interlaced image */ + jvirt_sarray_ptr interlaced_image; /* full image in interlaced order */ + JDIMENSION cur_row_number; /* need to know actual row number */ + JDIMENSION pass2_offset; /* # of pixel rows in pass 1 */ + JDIMENSION pass3_offset; /* # of pixel rows in passes 1&2 */ + JDIMENSION pass4_offset; /* # of pixel rows in passes 1,2,3 */ +} gif_source_struct; + +typedef gif_source_struct *gif_source_ptr; + + +/* Forward declarations */ +METHODDEF(JDIMENSION) get_pixel_rows(j_compress_ptr cinfo, + cjpeg_source_ptr sinfo); +METHODDEF(JDIMENSION) load_interlaced_image(j_compress_ptr cinfo, + cjpeg_source_ptr sinfo); +METHODDEF(JDIMENSION) get_interlaced_row(j_compress_ptr cinfo, + cjpeg_source_ptr sinfo); + + +LOCAL(int) +ReadByte(gif_source_ptr sinfo) +/* Read next byte from GIF file */ +{ + register FILE *infile = sinfo->pub.input_file; + register int c; + + if ((c = getc(infile)) == EOF) + ERREXIT(sinfo->cinfo, JERR_INPUT_EOF); + return c; +} + + +LOCAL(int) +GetDataBlock(gif_source_ptr sinfo, U_CHAR *buf) +/* Read a GIF data block, which has a leading count byte */ +/* A zero-length block marks the end of a data block sequence */ +{ + int count; + + count = ReadByte(sinfo); + if (count > 0) { + if (!ReadOK(sinfo->pub.input_file, buf, count)) + ERREXIT(sinfo->cinfo, JERR_INPUT_EOF); + } + return count; +} + + +LOCAL(void) +SkipDataBlocks(gif_source_ptr sinfo) +/* Skip a series of data blocks, until a block terminator is found */ +{ + U_CHAR buf[256]; + + while (GetDataBlock(sinfo, buf) > 0) + /* skip */; +} + + +LOCAL(void) +ReInitLZW(gif_source_ptr sinfo) +/* (Re)initialize LZW state; shared code for startup and Clear processing */ +{ + sinfo->code_size = sinfo->input_code_size + 1; + sinfo->limit_code = sinfo->clear_code << 1; /* 2^code_size */ + sinfo->max_code = sinfo->clear_code + 2; /* first unused code value */ + sinfo->sp = sinfo->symbol_stack; /* init stack to empty */ +} + + +LOCAL(void) +InitLZWCode(gif_source_ptr sinfo) +/* Initialize for a series of LZWReadByte (and hence GetCode) calls */ +{ + /* GetCode initialization */ + sinfo->last_byte = 2; /* make safe to "recopy last two bytes" */ + sinfo->code_buf[0] = 0; + sinfo->code_buf[1] = 0; + sinfo->last_bit = 0; /* nothing in the buffer */ + sinfo->cur_bit = 0; /* force buffer load on first call */ + sinfo->first_time = TRUE; + sinfo->out_of_blocks = FALSE; + + /* LZWReadByte initialization: */ + /* compute special code values (note that these do not change later) */ + sinfo->clear_code = 1 << sinfo->input_code_size; + sinfo->end_code = sinfo->clear_code + 1; + ReInitLZW(sinfo); +} + + +LOCAL(int) +GetCode(gif_source_ptr sinfo) +/* Fetch the next code_size bits from the GIF data */ +/* We assume code_size is less than 16 */ +{ + register int accum; + int offs, count; + + while (sinfo->cur_bit + sinfo->code_size > sinfo->last_bit) { + /* Time to reload the buffer */ + /* First time, share code with Clear case */ + if (sinfo->first_time) { + sinfo->first_time = FALSE; + return sinfo->clear_code; + } + if (sinfo->out_of_blocks) { + WARNMS(sinfo->cinfo, JWRN_GIF_NOMOREDATA); + return sinfo->end_code; /* fake something useful */ + } + /* preserve last two bytes of what we have -- assume code_size <= 16 */ + sinfo->code_buf[0] = sinfo->code_buf[sinfo->last_byte-2]; + sinfo->code_buf[1] = sinfo->code_buf[sinfo->last_byte-1]; + /* Load more bytes; set flag if we reach the terminator block */ + if ((count = GetDataBlock(sinfo, &sinfo->code_buf[2])) == 0) { + sinfo->out_of_blocks = TRUE; + WARNMS(sinfo->cinfo, JWRN_GIF_NOMOREDATA); + return sinfo->end_code; /* fake something useful */ + } + /* Reset counters */ + sinfo->cur_bit = (sinfo->cur_bit - sinfo->last_bit) + 16; + sinfo->last_byte = 2 + count; + sinfo->last_bit = sinfo->last_byte * 8; + } + + /* Form up next 24 bits in accum */ + offs = sinfo->cur_bit >> 3; /* byte containing cur_bit */ + accum = UCH(sinfo->code_buf[offs + 2]); + accum <<= 8; + accum |= UCH(sinfo->code_buf[offs + 1]); + accum <<= 8; + accum |= UCH(sinfo->code_buf[offs]); + + /* Right-align cur_bit in accum, then mask off desired number of bits */ + accum >>= (sinfo->cur_bit & 7); + sinfo->cur_bit += sinfo->code_size; + return accum & ((1 << sinfo->code_size) - 1); +} + + +LOCAL(int) +LZWReadByte(gif_source_ptr sinfo) +/* Read an LZW-compressed byte */ +{ + register int code; /* current working code */ + int incode; /* saves actual input code */ + + /* If any codes are stacked from a previously read symbol, return them */ + if (sinfo->sp > sinfo->symbol_stack) + return (int)(*(--sinfo->sp)); + + /* Time to read a new symbol */ + code = GetCode(sinfo); + + if (code == sinfo->clear_code) { + /* Reinit state, swallow any extra Clear codes, and */ + /* return next code, which is expected to be a raw byte. */ + ReInitLZW(sinfo); + do { + code = GetCode(sinfo); + } while (code == sinfo->clear_code); + if (code > sinfo->clear_code) { /* make sure it is a raw byte */ + WARNMS(sinfo->cinfo, JWRN_GIF_BADDATA); + code = 0; /* use something valid */ + } + /* make firstcode, oldcode valid! */ + sinfo->firstcode = sinfo->oldcode = code; + return code; + } + + if (code == sinfo->end_code) { + /* Skip the rest of the image, unless GetCode already read terminator */ + if (!sinfo->out_of_blocks) { + SkipDataBlocks(sinfo); + sinfo->out_of_blocks = TRUE; + } + /* Complain that there's not enough data */ + WARNMS(sinfo->cinfo, JWRN_GIF_ENDCODE); + /* Pad data with 0's */ + return 0; /* fake something usable */ + } + + /* Got normal raw byte or LZW symbol */ + incode = code; /* save for a moment */ + + if (code >= sinfo->max_code) { /* special case for not-yet-defined symbol */ + /* code == max_code is OK; anything bigger is bad data */ + if (code > sinfo->max_code) { + WARNMS(sinfo->cinfo, JWRN_GIF_BADDATA); + incode = 0; /* prevent creation of loops in symbol table */ + } + /* this symbol will be defined as oldcode/firstcode */ + *(sinfo->sp++) = (UINT8)sinfo->firstcode; + code = sinfo->oldcode; + } + + /* If it's a symbol, expand it into the stack */ + while (code >= sinfo->clear_code) { + *(sinfo->sp++) = sinfo->symbol_tail[code]; /* tail is a byte value */ + code = sinfo->symbol_head[code]; /* head is another LZW symbol */ + } + /* At this point code just represents a raw byte */ + sinfo->firstcode = code; /* save for possible future use */ + + /* If there's room in table... */ + if ((code = sinfo->max_code) < LZW_TABLE_SIZE) { + /* Define a new symbol = prev sym + head of this sym's expansion */ + sinfo->symbol_head[code] = (UINT16)sinfo->oldcode; + sinfo->symbol_tail[code] = (UINT8)sinfo->firstcode; + sinfo->max_code++; + /* Is it time to increase code_size? */ + if (sinfo->max_code >= sinfo->limit_code && + sinfo->code_size < MAX_LZW_BITS) { + sinfo->code_size++; + sinfo->limit_code <<= 1; /* keep equal to 2^code_size */ + } + } + + sinfo->oldcode = incode; /* save last input symbol for future use */ + return sinfo->firstcode; /* return first byte of symbol's expansion */ +} + + +LOCAL(void) +ReadColorMap(gif_source_ptr sinfo, int cmaplen, JSAMPARRAY cmap) +/* Read a GIF colormap */ +{ + int i; + + for (i = 0; i < cmaplen; i++) { +#if BITS_IN_JSAMPLE == 8 +#define UPSCALE(x) (x) +#else +#define UPSCALE(x) ((x) << (BITS_IN_JSAMPLE - 8)) +#endif + cmap[CM_RED][i] = (JSAMPLE)UPSCALE(ReadByte(sinfo)); + cmap[CM_GREEN][i] = (JSAMPLE)UPSCALE(ReadByte(sinfo)); + cmap[CM_BLUE][i] = (JSAMPLE)UPSCALE(ReadByte(sinfo)); + } +} + + +LOCAL(void) +DoExtension(gif_source_ptr sinfo) +/* Process an extension block */ +/* Currently we ignore 'em all */ +{ + int extlabel; + + /* Read extension label byte */ + extlabel = ReadByte(sinfo); + TRACEMS1(sinfo->cinfo, 1, JTRC_GIF_EXTENSION, extlabel); + /* Skip the data block(s) associated with the extension */ + SkipDataBlocks(sinfo); +} + + +/* + * Read the file header; return image size and component count. + */ + +METHODDEF(void) +start_input_gif(j_compress_ptr cinfo, cjpeg_source_ptr sinfo) +{ + gif_source_ptr source = (gif_source_ptr)sinfo; + U_CHAR hdrbuf[10]; /* workspace for reading control blocks */ + unsigned int width, height; /* image dimensions */ + int colormaplen, aspectRatio; + int c; + + /* Read and verify GIF Header */ + if (!ReadOK(source->pub.input_file, hdrbuf, 6)) + ERREXIT(cinfo, JERR_GIF_NOT); + if (hdrbuf[0] != 'G' || hdrbuf[1] != 'I' || hdrbuf[2] != 'F') + ERREXIT(cinfo, JERR_GIF_NOT); + /* Check for expected version numbers. + * If unknown version, give warning and try to process anyway; + * this is per recommendation in GIF89a standard. + */ + if ((hdrbuf[3] != '8' || hdrbuf[4] != '7' || hdrbuf[5] != 'a') && + (hdrbuf[3] != '8' || hdrbuf[4] != '9' || hdrbuf[5] != 'a')) + TRACEMS3(cinfo, 1, JTRC_GIF_BADVERSION, hdrbuf[3], hdrbuf[4], hdrbuf[5]); + + /* Read and decipher Logical Screen Descriptor */ + if (!ReadOK(source->pub.input_file, hdrbuf, 7)) + ERREXIT(cinfo, JERR_INPUT_EOF); + width = LM_to_uint(hdrbuf, 0); + height = LM_to_uint(hdrbuf, 2); + if (width == 0 || height == 0) + ERREXIT(cinfo, JERR_GIF_EMPTY); +#ifdef FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION + if (sinfo->max_pixels && + (unsigned long long)width * height > sinfo->max_pixels) + ERREXIT(cinfo, JERR_WIDTH_OVERFLOW); +#endif + /* we ignore the color resolution, sort flag, and background color index */ + aspectRatio = UCH(hdrbuf[6]); + if (aspectRatio != 0 && aspectRatio != 49) + TRACEMS(cinfo, 1, JTRC_GIF_NONSQUARE); + + /* Allocate space to store the colormap */ + source->colormap = (*cinfo->mem->alloc_sarray) + ((j_common_ptr)cinfo, JPOOL_IMAGE, (JDIMENSION)MAXCOLORMAPSIZE, + (JDIMENSION)NUMCOLORS); + colormaplen = 0; /* indicate initialization */ + + /* Read global colormap if header indicates it is present */ + if (BitSet(hdrbuf[4], COLORMAPFLAG)) { + colormaplen = 2 << (hdrbuf[4] & 0x07); + ReadColorMap(source, colormaplen, source->colormap); + } + + /* Scan until we reach start of desired image. + * We don't currently support skipping images, but could add it easily. + */ + for (;;) { + c = ReadByte(source); + + if (c == ';') /* GIF terminator?? */ + ERREXIT(cinfo, JERR_GIF_IMAGENOTFOUND); + + if (c == '!') { /* Extension */ + DoExtension(source); + continue; + } + + if (c != ',') { /* Not an image separator? */ + WARNMS1(cinfo, JWRN_GIF_CHAR, c); + continue; + } + + /* Read and decipher Local Image Descriptor */ + if (!ReadOK(source->pub.input_file, hdrbuf, 9)) + ERREXIT(cinfo, JERR_INPUT_EOF); + /* we ignore top/left position info, also sort flag */ + width = LM_to_uint(hdrbuf, 4); + height = LM_to_uint(hdrbuf, 6); + if (width == 0 || height == 0) + ERREXIT(cinfo, JERR_GIF_EMPTY); +#ifdef FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION + if (sinfo->max_pixels && + (unsigned long long)width * height > sinfo->max_pixels) + ERREXIT(cinfo, JERR_WIDTH_OVERFLOW); +#endif + source->is_interlaced = (BitSet(hdrbuf[8], INTERLACE) != 0); + + /* Read local colormap if header indicates it is present */ + /* Note: if we wanted to support skipping images, */ + /* we'd need to skip rather than read colormap for ignored images */ + if (BitSet(hdrbuf[8], COLORMAPFLAG)) { + colormaplen = 2 << (hdrbuf[8] & 0x07); + ReadColorMap(source, colormaplen, source->colormap); + } + + source->input_code_size = ReadByte(source); /* get min-code-size byte */ + if (source->input_code_size < 2 || source->input_code_size > 8) + ERREXIT1(cinfo, JERR_GIF_CODESIZE, source->input_code_size); + + /* Reached desired image, so break out of loop */ + /* If we wanted to skip this image, */ + /* we'd call SkipDataBlocks and then continue the loop */ + break; + } + + /* Prepare to read selected image: first initialize LZW decompressor */ + source->symbol_head = (UINT16 *) + (*cinfo->mem->alloc_large) ((j_common_ptr)cinfo, JPOOL_IMAGE, + LZW_TABLE_SIZE * sizeof(UINT16)); + source->symbol_tail = (UINT8 *) + (*cinfo->mem->alloc_large) ((j_common_ptr)cinfo, JPOOL_IMAGE, + LZW_TABLE_SIZE * sizeof(UINT8)); + source->symbol_stack = (UINT8 *) + (*cinfo->mem->alloc_large) ((j_common_ptr)cinfo, JPOOL_IMAGE, + LZW_TABLE_SIZE * sizeof(UINT8)); + InitLZWCode(source); + + /* + * If image is interlaced, we read it into a full-size sample array, + * decompressing as we go; then get_interlaced_row selects rows from the + * sample array in the proper order. + */ + if (source->is_interlaced) { + /* We request the virtual array now, but can't access it until virtual + * arrays have been allocated. Hence, the actual work of reading the + * image is postponed until the first call to get_pixel_rows. + */ + source->interlaced_image = (*cinfo->mem->request_virt_sarray) + ((j_common_ptr)cinfo, JPOOL_IMAGE, FALSE, + (JDIMENSION)width, (JDIMENSION)height, (JDIMENSION)1); + if (cinfo->progress != NULL) { + cd_progress_ptr progress = (cd_progress_ptr)cinfo->progress; + progress->total_extra_passes++; /* count file input as separate pass */ + } + source->pub.get_pixel_rows = load_interlaced_image; + } else { + source->pub.get_pixel_rows = get_pixel_rows; + } + + /* Create compressor input buffer. */ + source->pub.buffer = (*cinfo->mem->alloc_sarray) + ((j_common_ptr)cinfo, JPOOL_IMAGE, (JDIMENSION)width * NUMCOLORS, + (JDIMENSION)1); + source->pub.buffer_height = 1; + + /* Pad colormap for safety. */ + for (c = colormaplen; c < source->clear_code; c++) { + source->colormap[CM_RED][c] = + source->colormap[CM_GREEN][c] = + source->colormap[CM_BLUE][c] = CENTERJSAMPLE; + } + + /* Return info about the image. */ + cinfo->in_color_space = JCS_RGB; + cinfo->input_components = NUMCOLORS; + cinfo->data_precision = BITS_IN_JSAMPLE; /* we always rescale data to this */ + cinfo->image_width = width; + cinfo->image_height = height; + + TRACEMS3(cinfo, 1, JTRC_GIF, width, height, colormaplen); +} + + +/* + * Read one row of pixels. + * This version is used for noninterlaced GIF images: + * we read directly from the GIF file. + */ + +METHODDEF(JDIMENSION) +get_pixel_rows(j_compress_ptr cinfo, cjpeg_source_ptr sinfo) +{ + gif_source_ptr source = (gif_source_ptr)sinfo; + register int c; + register JSAMPROW ptr; + register JDIMENSION col; + register JSAMPARRAY colormap = source->colormap; + + ptr = source->pub.buffer[0]; + for (col = cinfo->image_width; col > 0; col--) { + c = LZWReadByte(source); + *ptr++ = colormap[CM_RED][c]; + *ptr++ = colormap[CM_GREEN][c]; + *ptr++ = colormap[CM_BLUE][c]; + } + return 1; +} + + +/* + * Read one row of pixels. + * This version is used for the first call on get_pixel_rows when + * reading an interlaced GIF file: we read the whole image into memory. + */ + +METHODDEF(JDIMENSION) +load_interlaced_image(j_compress_ptr cinfo, cjpeg_source_ptr sinfo) +{ + gif_source_ptr source = (gif_source_ptr)sinfo; + register JSAMPROW sptr; + register JDIMENSION col; + JDIMENSION row; + cd_progress_ptr progress = (cd_progress_ptr)cinfo->progress; + + /* Read the interlaced image into the virtual array we've created. */ + for (row = 0; row < cinfo->image_height; row++) { + if (progress != NULL) { + progress->pub.pass_counter = (long)row; + progress->pub.pass_limit = (long)cinfo->image_height; + (*progress->pub.progress_monitor) ((j_common_ptr)cinfo); + } + sptr = *(*cinfo->mem->access_virt_sarray) + ((j_common_ptr)cinfo, source->interlaced_image, row, (JDIMENSION)1, + TRUE); + for (col = cinfo->image_width; col > 0; col--) { + *sptr++ = (JSAMPLE)LZWReadByte(source); + } + } + if (progress != NULL) + progress->completed_extra_passes++; + + /* Replace method pointer so subsequent calls don't come here. */ + source->pub.get_pixel_rows = get_interlaced_row; + /* Initialize for get_interlaced_row, and perform first call on it. */ + source->cur_row_number = 0; + source->pass2_offset = (cinfo->image_height + 7) / 8; + source->pass3_offset = source->pass2_offset + (cinfo->image_height + 3) / 8; + source->pass4_offset = source->pass3_offset + (cinfo->image_height + 1) / 4; + + return get_interlaced_row(cinfo, sinfo); +} + + +/* + * Read one row of pixels. + * This version is used for interlaced GIF images: + * we read from the virtual array. + */ + +METHODDEF(JDIMENSION) +get_interlaced_row(j_compress_ptr cinfo, cjpeg_source_ptr sinfo) +{ + gif_source_ptr source = (gif_source_ptr)sinfo; + register int c; + register JSAMPROW sptr, ptr; + register JDIMENSION col; + register JSAMPARRAY colormap = source->colormap; + JDIMENSION irow; + + /* Figure out which row of interlaced image is needed, and access it. */ + switch ((int)(source->cur_row_number & 7)) { + case 0: /* first-pass row */ + irow = source->cur_row_number >> 3; + break; + case 4: /* second-pass row */ + irow = (source->cur_row_number >> 3) + source->pass2_offset; + break; + case 2: /* third-pass row */ + case 6: + irow = (source->cur_row_number >> 2) + source->pass3_offset; + break; + default: /* fourth-pass row */ + irow = (source->cur_row_number >> 1) + source->pass4_offset; + } + sptr = *(*cinfo->mem->access_virt_sarray) + ((j_common_ptr)cinfo, source->interlaced_image, irow, (JDIMENSION)1, + FALSE); + /* Scan the row, expand colormap, and output */ + ptr = source->pub.buffer[0]; + for (col = cinfo->image_width; col > 0; col--) { + c = *sptr++; + *ptr++ = colormap[CM_RED][c]; + *ptr++ = colormap[CM_GREEN][c]; + *ptr++ = colormap[CM_BLUE][c]; + } + source->cur_row_number++; /* for next time */ + return 1; +} + + +/* + * Finish up at the end of the file. + */ + +METHODDEF(void) +finish_input_gif(j_compress_ptr cinfo, cjpeg_source_ptr sinfo) +{ + /* no work */ +} + + /* * The module selection routine for GIF format input. */ @@ -31,9 +675,21 @@ GLOBAL(cjpeg_source_ptr) jinit_read_gif(j_compress_ptr cinfo) { - fprintf(stderr, "GIF input is unsupported for legal reasons. Sorry.\n"); - exit(EXIT_FAILURE); - return NULL; /* keep compiler happy */ + gif_source_ptr source; + + /* Create module interface object */ + source = (gif_source_ptr) + (*cinfo->mem->alloc_small) ((j_common_ptr)cinfo, JPOOL_IMAGE, + sizeof(gif_source_struct)); + source->cinfo = cinfo; /* make back link for subroutines */ + /* Fill in method ptrs, except get_pixel_rows which start_input sets */ + source->pub.start_input = start_input_gif; + source->pub.finish_input = finish_input_gif; +#ifdef FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION + source->pub.max_pixels = 0; +#endif + + return (cjpeg_source_ptr)source; } #endif /* GIF_SUPPORTED */ diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/rdppm.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/rdppm.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/rdppm.c 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/rdppm.c 2021-11-20 03:41:33.397600466 +0000 @@ -5,7 +5,7 @@ * Copyright (C) 1991-1997, Thomas G. Lane. * Modified 2009 by Bill Allombert, Guido Vollbeding. * libjpeg-turbo Modifications: - * Copyright (C) 2015-2017, 2020, D. R. Commander. + * Copyright (C) 2015-2017, 2020-2021, D. R. Commander. * For conditions of distribution and use, see the accompanying README.ijg * file. * @@ -43,18 +43,8 @@ /* Macros to deal with unsigned chars as efficiently as compiler allows */ -#ifdef HAVE_UNSIGNED_CHAR typedef unsigned char U_CHAR; #define UCH(x) ((int)(x)) -#else /* !HAVE_UNSIGNED_CHAR */ -#ifdef __CHAR_UNSIGNED__ -typedef char U_CHAR; -#define UCH(x) ((int)(x)) -#else -typedef char U_CHAR; -#define UCH(x) ((int)(x) & 0xFF) -#endif -#endif /* HAVE_UNSIGNED_CHAR */ #define ReadOK(file, buffer, len) \ @@ -122,11 +112,10 @@ while ((ch = pbm_getc(infile)) >= '0' && ch <= '9') { val *= 10; val += ch - '0'; + if (val > maxval) + ERREXIT(cinfo, JERR_PPM_OUTOFRANGE); } - if (val > maxval) - ERREXIT(cinfo, JERR_PPM_OUTOFRANGE); - return val; } @@ -526,6 +515,11 @@ register JSAMPLE *rescale = source->rescale; JDIMENSION col; unsigned int maxval = source->maxval; + register int rindex = rgb_red[cinfo->in_color_space]; + register int gindex = rgb_green[cinfo->in_color_space]; + register int bindex = rgb_blue[cinfo->in_color_space]; + register int aindex = alpha_index[cinfo->in_color_space]; + register int ps = rgb_pixelsize[cinfo->in_color_space]; if (!ReadOK(source->pub.input_file, source->iobuffer, source->buffer_width)) ERREXIT(cinfo, JERR_INPUT_EOF); @@ -537,17 +531,20 @@ temp |= UCH(*bufferptr++); if (temp > maxval) ERREXIT(cinfo, JERR_PPM_OUTOFRANGE); - *ptr++ = rescale[temp]; + ptr[rindex] = rescale[temp]; temp = UCH(*bufferptr++) << 8; temp |= UCH(*bufferptr++); if (temp > maxval) ERREXIT(cinfo, JERR_PPM_OUTOFRANGE); - *ptr++ = rescale[temp]; + ptr[gindex] = rescale[temp]; temp = UCH(*bufferptr++) << 8; temp |= UCH(*bufferptr++); if (temp > maxval) ERREXIT(cinfo, JERR_PPM_OUTOFRANGE); - *ptr++ = rescale[temp]; + ptr[bindex] = rescale[temp]; + if (aindex >= 0) + ptr[aindex] = 0xFF; + ptr += ps; } return 1; } @@ -589,6 +586,10 @@ if (w <= 0 || h <= 0 || maxval <= 0) /* error check */ ERREXIT(cinfo, JERR_PPM_NOT); +#ifdef FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION + if (sinfo->max_pixels && (unsigned long long)w * h > sinfo->max_pixels) + ERREXIT(cinfo, JERR_WIDTH_OVERFLOW); +#endif cinfo->data_precision = BITS_IN_JSAMPLE; /* we always rescale data to this */ cinfo->image_width = (JDIMENSION)w; @@ -634,7 +635,10 @@ cinfo->in_color_space = JCS_GRAYSCALE; TRACEMS2(cinfo, 1, JTRC_PGM, w, h); if (maxval > 255) { - source->pub.get_pixel_rows = get_word_gray_row; + if (cinfo->in_color_space == JCS_GRAYSCALE) + source->pub.get_pixel_rows = get_word_gray_row; + else + ERREXIT(cinfo, JERR_BAD_IN_COLORSPACE); } else if (maxval == MAXJSAMPLE && sizeof(JSAMPLE) == sizeof(U_CHAR) && cinfo->in_color_space == JCS_GRAYSCALE) { source->pub.get_pixel_rows = get_raw_row; @@ -657,13 +661,17 @@ cinfo->in_color_space = JCS_EXT_RGB; TRACEMS2(cinfo, 1, JTRC_PPM, w, h); if (maxval > 255) { - source->pub.get_pixel_rows = get_word_rgb_row; + if (IsExtRGB(cinfo->in_color_space)) + source->pub.get_pixel_rows = get_word_rgb_row; + else + ERREXIT(cinfo, JERR_BAD_IN_COLORSPACE); } else if (maxval == MAXJSAMPLE && sizeof(JSAMPLE) == sizeof(U_CHAR) && - (cinfo->in_color_space == JCS_EXT_RGB #if RGB_RED == 0 && RGB_GREEN == 1 && RGB_BLUE == 2 && RGB_PIXELSIZE == 3 - || cinfo->in_color_space == JCS_RGB + (cinfo->in_color_space == JCS_EXT_RGB || + cinfo->in_color_space == JCS_RGB)) { +#else + cinfo->in_color_space == JCS_EXT_RGB) { #endif - )) { source->pub.get_pixel_rows = get_raw_row; use_raw_buffer = TRUE; need_rescale = FALSE; @@ -722,6 +730,8 @@ (*cinfo->mem->alloc_small) ((j_common_ptr)cinfo, JPOOL_IMAGE, (size_t)(((long)MAX(maxval, 255) + 1L) * sizeof(JSAMPLE))); + MEMZERO(source->rescale, (size_t)(((long)MAX(maxval, 255) + 1L) * + sizeof(JSAMPLE))); half_maxval = maxval / 2; for (val = 0; val <= (long)maxval; val++) { /* The multiplication here must be done in 32 bits to avoid overflow */ @@ -759,6 +769,9 @@ /* Fill in method ptrs, except get_pixel_rows which start_input sets */ source->pub.start_input = start_input_ppm; source->pub.finish_input = finish_input_ppm; +#ifdef FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION + source->pub.max_pixels = 0; +#endif return (cjpeg_source_ptr)source; } diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/rdtarga.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/rdtarga.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/rdtarga.c 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/rdtarga.c 2021-11-20 03:41:33.397600466 +0000 @@ -5,7 +5,7 @@ * Copyright (C) 1991-1996, Thomas G. Lane. * Modified 2017 by Guido Vollbeding. * libjpeg-turbo Modifications: - * Copyright (C) 2018, D. R. Commander. + * Copyright (C) 2018, 2021, D. R. Commander. * For conditions of distribution and use, see the accompanying README.ijg * file. * @@ -28,18 +28,8 @@ /* Macros to deal with unsigned chars as efficiently as compiler allows */ -#ifdef HAVE_UNSIGNED_CHAR typedef unsigned char U_CHAR; #define UCH(x) ((int)(x)) -#else /* !HAVE_UNSIGNED_CHAR */ -#ifdef __CHAR_UNSIGNED__ -typedef char U_CHAR; -#define UCH(x) ((int)(x)) -#else -typedef char U_CHAR; -#define UCH(x) ((int)(x) & 0xFF) -#endif -#endif /* HAVE_UNSIGNED_CHAR */ #define ReadOK(file, buffer, len) \ @@ -344,8 +334,9 @@ unsigned int width, height, maplen; boolean is_bottom_up; -#define GET_2B(offset) ((unsigned int)UCH(targaheader[offset]) + \ - (((unsigned int)UCH(targaheader[offset + 1])) << 8)) +#define GET_2B(offset) \ + ((unsigned int)UCH(targaheader[offset]) + \ + (((unsigned int)UCH(targaheader[offset + 1])) << 8)) if (!ReadOK(source->pub.input_file, targaheader, 18)) ERREXIT(cinfo, JERR_INPUT_EOF); @@ -372,6 +363,11 @@ interlace_type != 0 || /* currently don't allow interlaced image */ width == 0 || height == 0) /* image width/height must be non-zero */ ERREXIT(cinfo, JERR_TGA_BADPARMS); +#ifdef FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION + if (sinfo->max_pixels && + (unsigned long long)width * height > sinfo->max_pixels) + ERREXIT(cinfo, JERR_WIDTH_OVERFLOW); +#endif if (subtype > 8) { /* It's an RLE-coded file */ @@ -502,6 +498,9 @@ /* Fill in method ptrs, except get_pixel_rows which start_input sets */ source->pub.start_input = start_input_tga; source->pub.finish_input = finish_input_tga; +#ifdef FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION + source->pub.max_pixels = 0; +#endif return (cjpeg_source_ptr)source; } diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/README.chromium b/src/3rdparty/chromium/third_party/libjpeg_turbo/README.chromium --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/README.chromium 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/README.chromium 2021-11-20 03:41:33.390600578 +0000 @@ -1,6 +1,6 @@ Name: libjpeg-turbo URL: https://github.com/libjpeg-turbo/libjpeg-turbo/ -Version: 2.0.5 +Version: 2.1.1 License: Custom license License File: LICENSE.md Security Critical: yes @@ -8,19 +8,21 @@ Description: This consists of the components: -* libjpeg-turbo 2.0.5 +* libjpeg-turbo 2.1.1 * This file (README.chromium) * A build file (BUILD.gn) * An OWNERS file * A codereview.settings file * Patched header files used by Chromium -* Cherry picked a fix from upstream master to enable AArch64 Windows builds: - https://github.com/libjpeg-turbo/libjpeg-turbo/commit/6ee5d5f568fda1a7c6a49dd8995f2d89866ee42d -* Deleted unused directories: ci, cmakescripts, doc, java, release, sharedlib, - simd/loongson, simd/mips, simd/powerpc, and win +* Deleted unused directories: cmakescripts, doc, fuzz, java, release, + sharedlib, simd/mips, simd/mips64, simd/powerpc, and win * Deleted unused files: appveyor.yml, CMakeLists.txt, doxygen.config, - doxygen-extra.css, .gitattributes, tjexample.c, tjexampletest.java.in, + doxygen-extra.css, .gitattributes, md5/CMakeLists.txt, md5/md5cmp.c, + simd/CMakeLists.txt, tjexample.c, tjexampletest.in, tjexampletest.java.in and .travis.yml +* Deleted legacy Arm Neon assembly files (supporting old compiler versions that + do not generate performant code from intrinsics): + simd/arm/aarch32/jsimd_neon.S, simd/arm/aarch64/jsimd_neon.S. This libjpeg-turbo can replace our libjpeg-6b without any modifications in the Chromium code. @@ -31,14 +33,8 @@ arise when system libraries attempt to use our libjpeg. Also, we applied the following changes which are not merged to upstream: -* Fix libjpeg_turbo svn r64 libjpeg6b compat issue: make the fast path Huffman - decoder fallback to slow decoding if the Huffman decoding bit sentinel > 16, - this to match the exact behavior of jpeg_huff_decode(). - http://crbug.com/398235 - The patch in the above bug removed "& 0xFF". It has been restored from upstream - https://github.com/libjpeg-turbo/libjpeg-turbo/commit/fa1d18385d904d530b4aec83ab7757a33397de6e -* Configuration files jconfig.h and jconfigint.h were generated and then altered - manually to be compatible on all of Chromium's platforms. +* Configuration files jconfig.h, jconfigint.h and neon-compat.h were generated + and then altered manually to be compatible on all of Chromium's platforms. http://crbug.com/608347 * Fix static const data duplication of jpeg_nbits_table. A unique copy was in the jchuff.obj and jcphuff.obj resulting in an added 65k in @@ -51,53 +47,21 @@ lld) arising from attempts to reference the table from assembler on 32-bit x86. This only affects shared libraries, but that's important for downstream Android builds. -* Arm NEON patches to improve performance and maintainability. These changes - are tracked by the following Chromium issue: https://crbug.com/922430 - - Add memory alignment size check in jmemmgr.c - - Add 32-byte memory alignment check for Arm NEON - - Add Arm NEON implementation of h2v2_fancy_upsample - - Add SIMD function stubs for h1v2_fancy_upsample - - Add Arm NEON implementation of h1v2_fancy_upsample - - Add Arm NEON implementation of h2v1_fancy_upsample - - Add extra guard for loop unroll pragma on AArch64 - - Add Arm NEON implementation of h2v1_upsample - - Add Arm NEON implementation of h2v2_upsample - - Implement YCbCr->RGB using Arm NEON intrinsics - - Implement YCbCr->RGB565 using Arm NEON intrinsics - - Add Arm NEON implementation of h2v1_merged_upsample - - Add Arm NEON implementation of h2v2_merged_upsample - - Implement 2x2 IDCT using Arm NEON intrinsics - - Implement 4x4 IDCT using Arm NEON intrinsics - - Implement slow IDCT using Arm NEON intrinsics - - Precompute DCT block output pointers in IDCT functions - - Implement fast IDCT using Arm NEON intrinsics - - Add Arm NEON implementation of h2v1_downsample - - Add Arm NEON implementation of h2v2_downsample - - Implement RGB->YCbCr using Arm NEON intrinsics - - Add Arm NEON implementation of RGB->Grayscale - - Add compiler-independent alignment macro - - Implement sample conversion using Arm NEON intrinsics - - Implement quantization using Arm NEON intrinsics - - Implement fast DCT using Arm NEON intrinsics - - Implement accurate DCT using Arm NEON intrinsics -* Patches to enable running the upstream unit tests through gtest. +* Patches to enable running the upstream unit tests through GTest. The upstream unit tests are defined here under the section 'TESTS': https://github.com/libjpeg-turbo/libjpeg-turbo/blob/master/CMakeLists.txt These changes are tracked by Chromium issue: https://crbug.com/993876 - Refactor tjunittest.c to provide test interface - - Add gtest wrapper for tjunittests - Move tjunittest logs from stdout to stderr - Refactor tjbench.c to provide test interface - - Add gtest wrapper for tjbench tests - Move tbench logs from stdout to stderr - Write tjunittest output files to sdcard on Android - Refactor cjpeg.c to provide test interface - - Add gtest wrapper for cjpeg tests - Refactor jpegtran.c to provide test interface - Add input JPEG images for djpeg and jpegtran tests - - Add gtest wrapper for jpegtran tests - Refactor djpeg.c to provide test interface - - Add gtest wrapper for djpeg tests + A new gtest directory contains GTest wrappers (and associated utilities) for + each of tjunittest, tjbench, cjpeg, djpeg and jpegtran. Refer to working-with-nested-repos [1] for details of how to setup your git svn client to update the code (for making local changes, cherry picking from diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/README.ijg b/src/3rdparty/chromium/third_party/libjpeg_turbo/README.ijg --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/README.ijg 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/README.ijg 2021-11-20 03:41:33.390600578 +0000 @@ -128,7 +128,7 @@ fitness for a particular purpose. This software is provided "AS IS", and you, its user, assume the entire risk as to its quality and accuracy. -This software is copyright (C) 1991-2016, Thomas G. Lane, Guido Vollbeding. +This software is copyright (C) 1991-2020, Thomas G. Lane, Guido Vollbeding. All Rights Reserved except as specified below. Permission is hereby granted to use, copy, modify, and distribute this @@ -159,19 +159,6 @@ assumed by the product vendor. -The IJG distribution formerly included code to read and write GIF files. -To avoid entanglement with the Unisys LZW patent (now expired), GIF reading -support has been removed altogether, and the GIF writer has been simplified -to produce "uncompressed GIFs". This technique does not use the LZW -algorithm; the resulting GIF files are larger than usual, but are readable -by all standard GIF decoders. - -We are required to state that - "The Graphics Interchange Format(c) is the Copyright property of - CompuServe Incorporated. GIF(sm) is a Service Mark property of - CompuServe Incorporated." - - REFERENCES ========== @@ -223,12 +210,12 @@ A PDF file of the older JFIF 1.02 specification is available at http://www.w3.org/Graphics/JPEG/jfif3.pdf. -The TIFF 6.0 file format specification can be obtained by FTP from -ftp://ftp.sgi.com/graphics/tiff/TIFF6.ps.gz. The JPEG incorporation scheme -found in the TIFF 6.0 spec of 3-June-92 has a number of serious problems. -IJG does not recommend use of the TIFF 6.0 design (TIFF Compression tag 6). -Instead, we recommend the JPEG design proposed by TIFF Technical Note #2 -(Compression tag 7). Copies of this Note can be obtained from +The TIFF 6.0 file format specification can be obtained from +http://mirrors.ctan.org/graphics/tiff/TIFF6.ps.gz. The JPEG incorporation +scheme found in the TIFF 6.0 spec of 3-June-92 has a number of serious +problems. IJG does not recommend use of the TIFF 6.0 design (TIFF Compression +tag 6). Instead, we recommend the JPEG design proposed by TIFF Technical Note +#2 (Compression tag 7). Copies of this Note can be obtained from http://www.ijg.org/files/. It is expected that the next revision of the TIFF spec will replace the 6.0 JPEG design with the Note's design. Although IJG's own code does not support TIFF/JPEG, the free libtiff library @@ -243,14 +230,8 @@ directory "files". The JPEG FAQ (Frequently Asked Questions) article is a source of some -general information about JPEG. -It is available on the World Wide Web at http://www.faqs.org/faqs/jpeg-faq/ -and other news.answers archive sites, including the official news.answers -archive at rtfm.mit.edu: ftp://rtfm.mit.edu/pub/usenet/news.answers/jpeg-faq/. -If you don't have Web or FTP access, send e-mail to mail-server@rtfm.mit.edu -with body - send usenet/news.answers/jpeg-faq/part1 - send usenet/news.answers/jpeg-faq/part2 +general information about JPEG. It is available at +http://www.faqs.org/faqs/jpeg-faq. FILE FORMAT COMPATIBILITY diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/README.md b/src/3rdparty/chromium/third_party/libjpeg_turbo/README.md --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/README.md 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/README.md 2021-11-20 03:41:33.390600578 +0000 @@ -2,8 +2,8 @@ ========== libjpeg-turbo is a JPEG image codec that uses SIMD instructions to accelerate -baseline JPEG compression and decompression on x86, x86-64, ARM, PowerPC, and -MIPS systems, as well as progressive JPEG compression on x86 and x86-64 +baseline JPEG compression and decompression on x86, x86-64, Arm, PowerPC, and +MIPS systems, as well as progressive JPEG compression on x86, x86-64, and Arm systems. On such systems, libjpeg-turbo is generally 2-6x as fast as libjpeg, all else being equal. On other types of systems, libjpeg-turbo can still outperform libjpeg by a significant amount, by virtue of its highly-optimized @@ -179,8 +179,8 @@ NOTE: As of this writing, extensive research has been conducted into the usefulness of DCT scaling as a means of data reduction and SmartScale as a -means of quality improvement. The reader is invited to peruse the research at - and draw his/her own conclusions, +means of quality improvement. Readers are invited to peruse the research at + and draw their own conclusions, but it is the general belief of our project that these features have not demonstrated sufficient usefulness to justify inclusion in libjpeg-turbo. @@ -287,12 +287,13 @@ (and slightly faster) floating point IDCT algorithm introduced in libjpeg v8a as opposed to the algorithm used in libjpeg v6b. It should be noted, however, that this algorithm basically brings the accuracy of the floating - point IDCT in line with the accuracy of the slow integer IDCT. The floating - point DCT/IDCT algorithms are mainly a legacy feature, and they do not - produce significantly more accuracy than the slow integer algorithms (to put - numbers on this, the typical difference in PNSR between the two algorithms - is less than 0.10 dB, whereas changing the quality level by 1 in the upper - range of the quality scale is typically more like a 1.0 dB difference.) + point IDCT in line with the accuracy of the accurate integer IDCT. The + floating point DCT/IDCT algorithms are mainly a legacy feature, and they do + not produce significantly more accuracy than the accurate integer algorithms + (to put numbers on this, the typical difference in PNSR between the two + algorithms is less than 0.10 dB, whereas changing the quality level by 1 in + the upper range of the quality scale is typically more like a 1.0 dB + difference.) - If the floating point algorithms in libjpeg-turbo are not implemented using SIMD instructions on a particular platform, then the accuracy of the @@ -340,7 +341,7 @@ correct results whenever the fast integer forward DCT is used along with a JPEG quality of 98-100. Thus, libjpeg-turbo must use the non-SIMD quantization function in those cases. This causes performance to drop by as much as 40%. -It is therefore strongly advised that you use the slow integer forward DCT +It is therefore strongly advised that you use the accurate integer forward DCT whenever encoding images with a JPEG quality of 98 or higher. diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/aarch32/jccolext-neon.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/aarch32/jccolext-neon.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/aarch32/jccolext-neon.c 1970-01-01 01:00:00.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/aarch32/jccolext-neon.c 2021-11-20 03:41:33.397600466 +0000 @@ -0,0 +1,148 @@ +/* + * jccolext-neon.c - colorspace conversion (32-bit Arm Neon) + * + * Copyright (C) 2020, Arm Limited. All Rights Reserved. + * Copyright (C) 2020, D. R. Commander. All Rights Reserved. + * + * This software is provided 'as-is', without any express or implied + * warranty. In no event will the authors be held liable for any damages + * arising from the use of this software. + * + * Permission is granted to anyone to use this software for any purpose, + * including commercial applications, and to alter it and redistribute it + * freely, subject to the following restrictions: + * + * 1. The origin of this software must not be misrepresented; you must not + * claim that you wrote the original software. If you use this software + * in a product, an acknowledgment in the product documentation would be + * appreciated but is not required. + * 2. Altered source versions must be plainly marked as such, and must not be + * misrepresented as being the original software. + * 3. This notice may not be removed or altered from any source distribution. + */ + +/* This file is included by jccolor-neon.c */ + + +/* RGB -> YCbCr conversion is defined by the following equations: + * Y = 0.29900 * R + 0.58700 * G + 0.11400 * B + * Cb = -0.16874 * R - 0.33126 * G + 0.50000 * B + 128 + * Cr = 0.50000 * R - 0.41869 * G - 0.08131 * B + 128 + * + * Avoid floating point arithmetic by using shifted integer constants: + * 0.29899597 = 19595 * 2^-16 + * 0.58700561 = 38470 * 2^-16 + * 0.11399841 = 7471 * 2^-16 + * 0.16874695 = 11059 * 2^-16 + * 0.33125305 = 21709 * 2^-16 + * 0.50000000 = 32768 * 2^-16 + * 0.41868592 = 27439 * 2^-16 + * 0.08131409 = 5329 * 2^-16 + * These constants are defined in jccolor-neon.c + * + * We add the fixed-point equivalent of 0.5 to Cb and Cr, which effectively + * rounds up or down the result via integer truncation. + */ + +void jsimd_rgb_ycc_convert_neon(JDIMENSION image_width, JSAMPARRAY input_buf, + JSAMPIMAGE output_buf, JDIMENSION output_row, + int num_rows) +{ + /* Pointer to RGB(X/A) input data */ + JSAMPROW inptr; + /* Pointers to Y, Cb, and Cr output data */ + JSAMPROW outptr0, outptr1, outptr2; + /* Allocate temporary buffer for final (image_width % 8) pixels in row. */ + ALIGN(16) uint8_t tmp_buf[8 * RGB_PIXELSIZE]; + + /* Set up conversion constants. */ +#ifdef HAVE_VLD1_U16_X2 + const uint16x4x2_t consts = vld1_u16_x2(jsimd_rgb_ycc_neon_consts); +#else + /* GCC does not currently support the intrinsic vld1__x2(). */ + const uint16x4_t consts1 = vld1_u16(jsimd_rgb_ycc_neon_consts); + const uint16x4_t consts2 = vld1_u16(jsimd_rgb_ycc_neon_consts + 4); + const uint16x4x2_t consts = { { consts1, consts2 } }; +#endif + const uint32x4_t scaled_128_5 = vdupq_n_u32((128 << 16) + 32767); + + while (--num_rows >= 0) { + inptr = *input_buf++; + outptr0 = output_buf[0][output_row]; + outptr1 = output_buf[1][output_row]; + outptr2 = output_buf[2][output_row]; + output_row++; + + int cols_remaining = image_width; + for (; cols_remaining > 0; cols_remaining -= 8) { + + /* To prevent buffer overread by the vector load instructions, the last + * (image_width % 8) columns of data are first memcopied to a temporary + * buffer large enough to accommodate the vector load. + */ + if (cols_remaining < 8) { + memcpy(tmp_buf, inptr, cols_remaining * RGB_PIXELSIZE); + inptr = tmp_buf; + } + +#if RGB_PIXELSIZE == 4 + uint8x8x4_t input_pixels = vld4_u8(inptr); +#else + uint8x8x3_t input_pixels = vld3_u8(inptr); +#endif + uint16x8_t r = vmovl_u8(input_pixels.val[RGB_RED]); + uint16x8_t g = vmovl_u8(input_pixels.val[RGB_GREEN]); + uint16x8_t b = vmovl_u8(input_pixels.val[RGB_BLUE]); + + /* Compute Y = 0.29900 * R + 0.58700 * G + 0.11400 * B */ + uint32x4_t y_low = vmull_lane_u16(vget_low_u16(r), consts.val[0], 0); + y_low = vmlal_lane_u16(y_low, vget_low_u16(g), consts.val[0], 1); + y_low = vmlal_lane_u16(y_low, vget_low_u16(b), consts.val[0], 2); + uint32x4_t y_high = vmull_lane_u16(vget_high_u16(r), consts.val[0], 0); + y_high = vmlal_lane_u16(y_high, vget_high_u16(g), consts.val[0], 1); + y_high = vmlal_lane_u16(y_high, vget_high_u16(b), consts.val[0], 2); + + /* Compute Cb = -0.16874 * R - 0.33126 * G + 0.50000 * B + 128 */ + uint32x4_t cb_low = scaled_128_5; + cb_low = vmlsl_lane_u16(cb_low, vget_low_u16(r), consts.val[0], 3); + cb_low = vmlsl_lane_u16(cb_low, vget_low_u16(g), consts.val[1], 0); + cb_low = vmlal_lane_u16(cb_low, vget_low_u16(b), consts.val[1], 1); + uint32x4_t cb_high = scaled_128_5; + cb_high = vmlsl_lane_u16(cb_high, vget_high_u16(r), consts.val[0], 3); + cb_high = vmlsl_lane_u16(cb_high, vget_high_u16(g), consts.val[1], 0); + cb_high = vmlal_lane_u16(cb_high, vget_high_u16(b), consts.val[1], 1); + + /* Compute Cr = 0.50000 * R - 0.41869 * G - 0.08131 * B + 128 */ + uint32x4_t cr_low = scaled_128_5; + cr_low = vmlal_lane_u16(cr_low, vget_low_u16(r), consts.val[1], 1); + cr_low = vmlsl_lane_u16(cr_low, vget_low_u16(g), consts.val[1], 2); + cr_low = vmlsl_lane_u16(cr_low, vget_low_u16(b), consts.val[1], 3); + uint32x4_t cr_high = scaled_128_5; + cr_high = vmlal_lane_u16(cr_high, vget_high_u16(r), consts.val[1], 1); + cr_high = vmlsl_lane_u16(cr_high, vget_high_u16(g), consts.val[1], 2); + cr_high = vmlsl_lane_u16(cr_high, vget_high_u16(b), consts.val[1], 3); + + /* Descale Y values (rounding right shift) and narrow to 16-bit. */ + uint16x8_t y_u16 = vcombine_u16(vrshrn_n_u32(y_low, 16), + vrshrn_n_u32(y_high, 16)); + /* Descale Cb values (right shift) and narrow to 16-bit. */ + uint16x8_t cb_u16 = vcombine_u16(vshrn_n_u32(cb_low, 16), + vshrn_n_u32(cb_high, 16)); + /* Descale Cr values (right shift) and narrow to 16-bit. */ + uint16x8_t cr_u16 = vcombine_u16(vshrn_n_u32(cr_low, 16), + vshrn_n_u32(cr_high, 16)); + /* Narrow Y, Cb, and Cr values to 8-bit and store to memory. Buffer + * overwrite is permitted up to the next multiple of ALIGN_SIZE bytes. + */ + vst1_u8(outptr0, vmovn_u16(y_u16)); + vst1_u8(outptr1, vmovn_u16(cb_u16)); + vst1_u8(outptr2, vmovn_u16(cr_u16)); + + /* Increment pointers. */ + inptr += (8 * RGB_PIXELSIZE); + outptr0 += 8; + outptr1 += 8; + outptr2 += 8; + } + } +} diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/aarch32/jchuff-neon.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/aarch32/jchuff-neon.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/aarch32/jchuff-neon.c 1970-01-01 01:00:00.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/aarch32/jchuff-neon.c 2021-11-20 03:41:33.397600466 +0000 @@ -0,0 +1,334 @@ +/* + * jchuff-neon.c - Huffman entropy encoding (32-bit Arm Neon) + * + * Copyright (C) 2020, Arm Limited. All Rights Reserved. + * + * This software is provided 'as-is', without any express or implied + * warranty. In no event will the authors be held liable for any damages + * arising from the use of this software. + * + * Permission is granted to anyone to use this software for any purpose, + * including commercial applications, and to alter it and redistribute it + * freely, subject to the following restrictions: + * + * 1. The origin of this software must not be misrepresented; you must not + * claim that you wrote the original software. If you use this software + * in a product, an acknowledgment in the product documentation would be + * appreciated but is not required. + * 2. Altered source versions must be plainly marked as such, and must not be + * misrepresented as being the original software. + * 3. This notice may not be removed or altered from any source distribution. + * + * NOTE: All referenced figures are from + * Recommendation ITU-T T.81 (1992) | ISO/IEC 10918-1:1994. + */ + +#define JPEG_INTERNALS +#include "../../../jinclude.h" +#include "../../../jpeglib.h" +#include "../../../jsimd.h" +#include "../../../jdct.h" +#include "../../../jsimddct.h" +#include "../../jsimd.h" +#include "../jchuff.h" +#include "neon-compat.h" + +#include + +#include + + +JOCTET *jsimd_huff_encode_one_block_neon(void *state, JOCTET *buffer, + JCOEFPTR block, int last_dc_val, + c_derived_tbl *dctbl, + c_derived_tbl *actbl) +{ + uint8_t block_nbits[DCTSIZE2]; + uint16_t block_diff[DCTSIZE2]; + + /* Load rows of coefficients from DCT block in zig-zag order. */ + + /* Compute DC coefficient difference value. (F.1.1.5.1) */ + int16x8_t row0 = vdupq_n_s16(block[0] - last_dc_val); + row0 = vld1q_lane_s16(block + 1, row0, 1); + row0 = vld1q_lane_s16(block + 8, row0, 2); + row0 = vld1q_lane_s16(block + 16, row0, 3); + row0 = vld1q_lane_s16(block + 9, row0, 4); + row0 = vld1q_lane_s16(block + 2, row0, 5); + row0 = vld1q_lane_s16(block + 3, row0, 6); + row0 = vld1q_lane_s16(block + 10, row0, 7); + + int16x8_t row1 = vld1q_dup_s16(block + 17); + row1 = vld1q_lane_s16(block + 24, row1, 1); + row1 = vld1q_lane_s16(block + 32, row1, 2); + row1 = vld1q_lane_s16(block + 25, row1, 3); + row1 = vld1q_lane_s16(block + 18, row1, 4); + row1 = vld1q_lane_s16(block + 11, row1, 5); + row1 = vld1q_lane_s16(block + 4, row1, 6); + row1 = vld1q_lane_s16(block + 5, row1, 7); + + int16x8_t row2 = vld1q_dup_s16(block + 12); + row2 = vld1q_lane_s16(block + 19, row2, 1); + row2 = vld1q_lane_s16(block + 26, row2, 2); + row2 = vld1q_lane_s16(block + 33, row2, 3); + row2 = vld1q_lane_s16(block + 40, row2, 4); + row2 = vld1q_lane_s16(block + 48, row2, 5); + row2 = vld1q_lane_s16(block + 41, row2, 6); + row2 = vld1q_lane_s16(block + 34, row2, 7); + + int16x8_t row3 = vld1q_dup_s16(block + 27); + row3 = vld1q_lane_s16(block + 20, row3, 1); + row3 = vld1q_lane_s16(block + 13, row3, 2); + row3 = vld1q_lane_s16(block + 6, row3, 3); + row3 = vld1q_lane_s16(block + 7, row3, 4); + row3 = vld1q_lane_s16(block + 14, row3, 5); + row3 = vld1q_lane_s16(block + 21, row3, 6); + row3 = vld1q_lane_s16(block + 28, row3, 7); + + int16x8_t abs_row0 = vabsq_s16(row0); + int16x8_t abs_row1 = vabsq_s16(row1); + int16x8_t abs_row2 = vabsq_s16(row2); + int16x8_t abs_row3 = vabsq_s16(row3); + + int16x8_t row0_lz = vclzq_s16(abs_row0); + int16x8_t row1_lz = vclzq_s16(abs_row1); + int16x8_t row2_lz = vclzq_s16(abs_row2); + int16x8_t row3_lz = vclzq_s16(abs_row3); + + /* Compute number of bits required to represent each coefficient. */ + uint8x8_t row0_nbits = vsub_u8(vdup_n_u8(16), + vmovn_u16(vreinterpretq_u16_s16(row0_lz))); + uint8x8_t row1_nbits = vsub_u8(vdup_n_u8(16), + vmovn_u16(vreinterpretq_u16_s16(row1_lz))); + uint8x8_t row2_nbits = vsub_u8(vdup_n_u8(16), + vmovn_u16(vreinterpretq_u16_s16(row2_lz))); + uint8x8_t row3_nbits = vsub_u8(vdup_n_u8(16), + vmovn_u16(vreinterpretq_u16_s16(row3_lz))); + + vst1_u8(block_nbits + 0 * DCTSIZE, row0_nbits); + vst1_u8(block_nbits + 1 * DCTSIZE, row1_nbits); + vst1_u8(block_nbits + 2 * DCTSIZE, row2_nbits); + vst1_u8(block_nbits + 3 * DCTSIZE, row3_nbits); + + uint16x8_t row0_mask = + vshlq_u16(vreinterpretq_u16_s16(vshrq_n_s16(row0, 15)), + vnegq_s16(row0_lz)); + uint16x8_t row1_mask = + vshlq_u16(vreinterpretq_u16_s16(vshrq_n_s16(row1, 15)), + vnegq_s16(row1_lz)); + uint16x8_t row2_mask = + vshlq_u16(vreinterpretq_u16_s16(vshrq_n_s16(row2, 15)), + vnegq_s16(row2_lz)); + uint16x8_t row3_mask = + vshlq_u16(vreinterpretq_u16_s16(vshrq_n_s16(row3, 15)), + vnegq_s16(row3_lz)); + + uint16x8_t row0_diff = veorq_u16(vreinterpretq_u16_s16(abs_row0), row0_mask); + uint16x8_t row1_diff = veorq_u16(vreinterpretq_u16_s16(abs_row1), row1_mask); + uint16x8_t row2_diff = veorq_u16(vreinterpretq_u16_s16(abs_row2), row2_mask); + uint16x8_t row3_diff = veorq_u16(vreinterpretq_u16_s16(abs_row3), row3_mask); + + /* Store diff values for rows 0, 1, 2, and 3. */ + vst1q_u16(block_diff + 0 * DCTSIZE, row0_diff); + vst1q_u16(block_diff + 1 * DCTSIZE, row1_diff); + vst1q_u16(block_diff + 2 * DCTSIZE, row2_diff); + vst1q_u16(block_diff + 3 * DCTSIZE, row3_diff); + + /* Load last four rows of coefficients from DCT block in zig-zag order. */ + int16x8_t row4 = vld1q_dup_s16(block + 35); + row4 = vld1q_lane_s16(block + 42, row4, 1); + row4 = vld1q_lane_s16(block + 49, row4, 2); + row4 = vld1q_lane_s16(block + 56, row4, 3); + row4 = vld1q_lane_s16(block + 57, row4, 4); + row4 = vld1q_lane_s16(block + 50, row4, 5); + row4 = vld1q_lane_s16(block + 43, row4, 6); + row4 = vld1q_lane_s16(block + 36, row4, 7); + + int16x8_t row5 = vld1q_dup_s16(block + 29); + row5 = vld1q_lane_s16(block + 22, row5, 1); + row5 = vld1q_lane_s16(block + 15, row5, 2); + row5 = vld1q_lane_s16(block + 23, row5, 3); + row5 = vld1q_lane_s16(block + 30, row5, 4); + row5 = vld1q_lane_s16(block + 37, row5, 5); + row5 = vld1q_lane_s16(block + 44, row5, 6); + row5 = vld1q_lane_s16(block + 51, row5, 7); + + int16x8_t row6 = vld1q_dup_s16(block + 58); + row6 = vld1q_lane_s16(block + 59, row6, 1); + row6 = vld1q_lane_s16(block + 52, row6, 2); + row6 = vld1q_lane_s16(block + 45, row6, 3); + row6 = vld1q_lane_s16(block + 38, row6, 4); + row6 = vld1q_lane_s16(block + 31, row6, 5); + row6 = vld1q_lane_s16(block + 39, row6, 6); + row6 = vld1q_lane_s16(block + 46, row6, 7); + + int16x8_t row7 = vld1q_dup_s16(block + 53); + row7 = vld1q_lane_s16(block + 60, row7, 1); + row7 = vld1q_lane_s16(block + 61, row7, 2); + row7 = vld1q_lane_s16(block + 54, row7, 3); + row7 = vld1q_lane_s16(block + 47, row7, 4); + row7 = vld1q_lane_s16(block + 55, row7, 5); + row7 = vld1q_lane_s16(block + 62, row7, 6); + row7 = vld1q_lane_s16(block + 63, row7, 7); + + int16x8_t abs_row4 = vabsq_s16(row4); + int16x8_t abs_row5 = vabsq_s16(row5); + int16x8_t abs_row6 = vabsq_s16(row6); + int16x8_t abs_row7 = vabsq_s16(row7); + + int16x8_t row4_lz = vclzq_s16(abs_row4); + int16x8_t row5_lz = vclzq_s16(abs_row5); + int16x8_t row6_lz = vclzq_s16(abs_row6); + int16x8_t row7_lz = vclzq_s16(abs_row7); + + /* Compute number of bits required to represent each coefficient. */ + uint8x8_t row4_nbits = vsub_u8(vdup_n_u8(16), + vmovn_u16(vreinterpretq_u16_s16(row4_lz))); + uint8x8_t row5_nbits = vsub_u8(vdup_n_u8(16), + vmovn_u16(vreinterpretq_u16_s16(row5_lz))); + uint8x8_t row6_nbits = vsub_u8(vdup_n_u8(16), + vmovn_u16(vreinterpretq_u16_s16(row6_lz))); + uint8x8_t row7_nbits = vsub_u8(vdup_n_u8(16), + vmovn_u16(vreinterpretq_u16_s16(row7_lz))); + + vst1_u8(block_nbits + 4 * DCTSIZE, row4_nbits); + vst1_u8(block_nbits + 5 * DCTSIZE, row5_nbits); + vst1_u8(block_nbits + 6 * DCTSIZE, row6_nbits); + vst1_u8(block_nbits + 7 * DCTSIZE, row7_nbits); + + uint16x8_t row4_mask = + vshlq_u16(vreinterpretq_u16_s16(vshrq_n_s16(row4, 15)), + vnegq_s16(row4_lz)); + uint16x8_t row5_mask = + vshlq_u16(vreinterpretq_u16_s16(vshrq_n_s16(row5, 15)), + vnegq_s16(row5_lz)); + uint16x8_t row6_mask = + vshlq_u16(vreinterpretq_u16_s16(vshrq_n_s16(row6, 15)), + vnegq_s16(row6_lz)); + uint16x8_t row7_mask = + vshlq_u16(vreinterpretq_u16_s16(vshrq_n_s16(row7, 15)), + vnegq_s16(row7_lz)); + + uint16x8_t row4_diff = veorq_u16(vreinterpretq_u16_s16(abs_row4), row4_mask); + uint16x8_t row5_diff = veorq_u16(vreinterpretq_u16_s16(abs_row5), row5_mask); + uint16x8_t row6_diff = veorq_u16(vreinterpretq_u16_s16(abs_row6), row6_mask); + uint16x8_t row7_diff = veorq_u16(vreinterpretq_u16_s16(abs_row7), row7_mask); + + /* Store diff values for rows 4, 5, 6, and 7. */ + vst1q_u16(block_diff + 4 * DCTSIZE, row4_diff); + vst1q_u16(block_diff + 5 * DCTSIZE, row5_diff); + vst1q_u16(block_diff + 6 * DCTSIZE, row6_diff); + vst1q_u16(block_diff + 7 * DCTSIZE, row7_diff); + + /* Construct bitmap to accelerate encoding of AC coefficients. A set bit + * means that the corresponding coefficient != 0. + */ + uint8x8_t row0_nbits_gt0 = vcgt_u8(row0_nbits, vdup_n_u8(0)); + uint8x8_t row1_nbits_gt0 = vcgt_u8(row1_nbits, vdup_n_u8(0)); + uint8x8_t row2_nbits_gt0 = vcgt_u8(row2_nbits, vdup_n_u8(0)); + uint8x8_t row3_nbits_gt0 = vcgt_u8(row3_nbits, vdup_n_u8(0)); + uint8x8_t row4_nbits_gt0 = vcgt_u8(row4_nbits, vdup_n_u8(0)); + uint8x8_t row5_nbits_gt0 = vcgt_u8(row5_nbits, vdup_n_u8(0)); + uint8x8_t row6_nbits_gt0 = vcgt_u8(row6_nbits, vdup_n_u8(0)); + uint8x8_t row7_nbits_gt0 = vcgt_u8(row7_nbits, vdup_n_u8(0)); + + /* { 0x80, 0x40, 0x20, 0x10, 0x08, 0x04, 0x02, 0x01 } */ + const uint8x8_t bitmap_mask = + vreinterpret_u8_u64(vmov_n_u64(0x0102040810204080)); + + row0_nbits_gt0 = vand_u8(row0_nbits_gt0, bitmap_mask); + row1_nbits_gt0 = vand_u8(row1_nbits_gt0, bitmap_mask); + row2_nbits_gt0 = vand_u8(row2_nbits_gt0, bitmap_mask); + row3_nbits_gt0 = vand_u8(row3_nbits_gt0, bitmap_mask); + row4_nbits_gt0 = vand_u8(row4_nbits_gt0, bitmap_mask); + row5_nbits_gt0 = vand_u8(row5_nbits_gt0, bitmap_mask); + row6_nbits_gt0 = vand_u8(row6_nbits_gt0, bitmap_mask); + row7_nbits_gt0 = vand_u8(row7_nbits_gt0, bitmap_mask); + + uint8x8_t bitmap_rows_10 = vpadd_u8(row1_nbits_gt0, row0_nbits_gt0); + uint8x8_t bitmap_rows_32 = vpadd_u8(row3_nbits_gt0, row2_nbits_gt0); + uint8x8_t bitmap_rows_54 = vpadd_u8(row5_nbits_gt0, row4_nbits_gt0); + uint8x8_t bitmap_rows_76 = vpadd_u8(row7_nbits_gt0, row6_nbits_gt0); + uint8x8_t bitmap_rows_3210 = vpadd_u8(bitmap_rows_32, bitmap_rows_10); + uint8x8_t bitmap_rows_7654 = vpadd_u8(bitmap_rows_76, bitmap_rows_54); + uint8x8_t bitmap = vpadd_u8(bitmap_rows_7654, bitmap_rows_3210); + + /* Shift left to remove DC bit. */ + bitmap = vreinterpret_u8_u64(vshl_n_u64(vreinterpret_u64_u8(bitmap), 1)); + /* Move bitmap to 32-bit scalar registers. */ + uint32_t bitmap_1_32 = vget_lane_u32(vreinterpret_u32_u8(bitmap), 1); + uint32_t bitmap_33_63 = vget_lane_u32(vreinterpret_u32_u8(bitmap), 0); + + /* Set up state and bit buffer for output bitstream. */ + working_state *state_ptr = (working_state *)state; + int free_bits = state_ptr->cur.free_bits; + size_t put_buffer = state_ptr->cur.put_buffer; + + /* Encode DC coefficient. */ + + unsigned int nbits = block_nbits[0]; + /* Emit Huffman-coded symbol and additional diff bits. */ + unsigned int diff = block_diff[0]; + PUT_CODE(dctbl->ehufco[nbits], dctbl->ehufsi[nbits], diff) + + /* Encode AC coefficients. */ + + unsigned int r = 0; /* r = run length of zeros */ + unsigned int i = 1; /* i = number of coefficients encoded */ + /* Code and size information for a run length of 16 zero coefficients */ + const unsigned int code_0xf0 = actbl->ehufco[0xf0]; + const unsigned int size_0xf0 = actbl->ehufsi[0xf0]; + + while (bitmap_1_32 != 0) { + r = BUILTIN_CLZ(bitmap_1_32); + i += r; + bitmap_1_32 <<= r; + nbits = block_nbits[i]; + diff = block_diff[i]; + while (r > 15) { + /* If run length > 15, emit special run-length-16 codes. */ + PUT_BITS(code_0xf0, size_0xf0) + r -= 16; + } + /* Emit Huffman symbol for run length / number of bits. (F.1.2.2.1) */ + unsigned int rs = (r << 4) + nbits; + PUT_CODE(actbl->ehufco[rs], actbl->ehufsi[rs], diff) + i++; + bitmap_1_32 <<= 1; + } + + r = 33 - i; + i = 33; + + while (bitmap_33_63 != 0) { + unsigned int leading_zeros = BUILTIN_CLZ(bitmap_33_63); + r += leading_zeros; + i += leading_zeros; + bitmap_33_63 <<= leading_zeros; + nbits = block_nbits[i]; + diff = block_diff[i]; + while (r > 15) { + /* If run length > 15, emit special run-length-16 codes. */ + PUT_BITS(code_0xf0, size_0xf0) + r -= 16; + } + /* Emit Huffman symbol for run length / number of bits. (F.1.2.2.1) */ + unsigned int rs = (r << 4) + nbits; + PUT_CODE(actbl->ehufco[rs], actbl->ehufsi[rs], diff) + r = 0; + i++; + bitmap_33_63 <<= 1; + } + + /* If the last coefficient(s) were zero, emit an end-of-block (EOB) code. + * The value of RS for the EOB code is 0. + */ + if (i != 64) { + PUT_BITS(actbl->ehufco[0], actbl->ehufsi[0]) + } + + state_ptr->cur.put_buffer = put_buffer; + state_ptr->cur.free_bits = free_bits; + + return buffer; +} diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/aarch32/jsimd.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/aarch32/jsimd.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/aarch32/jsimd.c 1970-01-01 01:00:00.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/aarch32/jsimd.c 2021-11-20 03:41:33.397600466 +0000 @@ -0,0 +1,983 @@ +/* + * jsimd_arm.c + * + * Copyright 2009 Pierre Ossman for Cendio AB + * Copyright (C) 2011, Nokia Corporation and/or its subsidiary(-ies). + * Copyright (C) 2009-2011, 2013-2014, 2016, 2018, D. R. Commander. + * Copyright (C) 2015-2016, 2018, Matthieu Darbois. + * Copyright (C) 2019, Google LLC. + * Copyright (C) 2020, Arm Limited. + * + * Based on the x86 SIMD extension for IJG JPEG library, + * Copyright (C) 1999-2006, MIYASAKA Masaru. + * For conditions of distribution and use, see copyright notice in jsimdext.inc + * + * This file contains the interface between the "normal" portions + * of the library and the SIMD implementations when running on a + * 32-bit Arm architecture. + */ + +#define JPEG_INTERNALS +#include "../../../jinclude.h" +#include "../../../jpeglib.h" +#include "../../../jsimd.h" +#include "../../../jdct.h" +#include "../../../jsimddct.h" +#include "../../jsimd.h" + +#include +#include +#include + +static unsigned int simd_support = ~0; +static unsigned int simd_huffman = 1; + +#if !defined(__ARM_NEON__) && (defined(__linux__) || defined(ANDROID) || defined(__ANDROID__)) + +#define SOMEWHAT_SANE_PROC_CPUINFO_SIZE_LIMIT (1024 * 1024) + +LOCAL(int) +check_feature(char *buffer, char *feature) +{ + char *p; + + if (*feature == 0) + return 0; + if (strncmp(buffer, "Features", 8) != 0) + return 0; + buffer += 8; + while (isspace(*buffer)) + buffer++; + + /* Check if 'feature' is present in the buffer as a separate word */ + while ((p = strstr(buffer, feature))) { + if (p > buffer && !isspace(*(p - 1))) { + buffer++; + continue; + } + p += strlen(feature); + if (*p != 0 && !isspace(*p)) { + buffer++; + continue; + } + return 1; + } + return 0; +} + +LOCAL(int) +parse_proc_cpuinfo(int bufsize) +{ + char *buffer = (char *)malloc(bufsize); + FILE *fd; + + simd_support = 0; + + if (!buffer) + return 0; + + fd = fopen("/proc/cpuinfo", "r"); + if (fd) { + while (fgets(buffer, bufsize, fd)) { + if (!strchr(buffer, '\n') && !feof(fd)) { + /* "impossible" happened - insufficient size of the buffer! */ + fclose(fd); + free(buffer); + return 0; + } + if (check_feature(buffer, "neon")) + simd_support |= JSIMD_NEON; + } + fclose(fd); + } + free(buffer); + return 1; +} + +#endif + +/* + * Check what SIMD accelerations are supported. + * + * FIXME: This code is racy under a multi-threaded environment. + */ +LOCAL(void) +init_simd(void) +{ +#ifndef NO_GETENV + char *env = NULL; +#endif +#if !defined(__ARM_NEON__) && (defined(__linux__) || defined(ANDROID) || defined(__ANDROID__)) + int bufsize = 1024; /* an initial guess for the line buffer size limit */ +#endif + + if (simd_support != ~0U) + return; + + simd_support = 0; + +#if defined(__ARM_NEON__) + simd_support |= JSIMD_NEON; +#elif defined(__linux__) || defined(ANDROID) || defined(__ANDROID__) + /* We still have a chance to use Neon regardless of globally used + * -mcpu/-mfpu options passed to gcc by performing runtime detection via + * /proc/cpuinfo parsing on linux/android */ + while (!parse_proc_cpuinfo(bufsize)) { + bufsize *= 2; + if (bufsize > SOMEWHAT_SANE_PROC_CPUINFO_SIZE_LIMIT) + break; + } +#endif + +#ifndef NO_GETENV + /* Force different settings through environment variables */ + env = getenv("JSIMD_FORCENEON"); + if ((env != NULL) && (strcmp(env, "1") == 0)) + simd_support = JSIMD_NEON; + env = getenv("JSIMD_FORCENONE"); + if ((env != NULL) && (strcmp(env, "1") == 0)) + simd_support = 0; + env = getenv("JSIMD_NOHUFFENC"); + if ((env != NULL) && (strcmp(env, "1") == 0)) + simd_huffman = 0; +#endif +} + +GLOBAL(int) +jsimd_can_rgb_ycc(void) +{ + init_simd(); + + /* The code is optimised for these values only */ + if (BITS_IN_JSAMPLE != 8) + return 0; + if (sizeof(JDIMENSION) != 4) + return 0; + if ((RGB_PIXELSIZE != 3) && (RGB_PIXELSIZE != 4)) + return 0; + + if (simd_support & JSIMD_NEON) + return 1; + + return 0; +} + +GLOBAL(int) +jsimd_can_rgb_gray(void) +{ + init_simd(); + + /* The code is optimised for these values only */ + if (BITS_IN_JSAMPLE != 8) + return 0; + if (sizeof(JDIMENSION) != 4) + return 0; + if ((RGB_PIXELSIZE != 3) && (RGB_PIXELSIZE != 4)) + return 0; + + if (simd_support & JSIMD_NEON) + return 1; + + return 0; +} + +GLOBAL(int) +jsimd_can_ycc_rgb(void) +{ + init_simd(); + + /* The code is optimised for these values only */ + if (BITS_IN_JSAMPLE != 8) + return 0; + if (sizeof(JDIMENSION) != 4) + return 0; + if ((RGB_PIXELSIZE != 3) && (RGB_PIXELSIZE != 4)) + return 0; + + if (simd_support & JSIMD_NEON) + return 1; + + return 0; +} + +GLOBAL(int) +jsimd_can_ycc_rgb565(void) +{ + init_simd(); + + /* The code is optimised for these values only */ + if (BITS_IN_JSAMPLE != 8) + return 0; + if (sizeof(JDIMENSION) != 4) + return 0; + + if (simd_support & JSIMD_NEON) + return 1; + + return 0; +} + +GLOBAL(void) +jsimd_rgb_ycc_convert(j_compress_ptr cinfo, JSAMPARRAY input_buf, + JSAMPIMAGE output_buf, JDIMENSION output_row, + int num_rows) +{ + void (*neonfct) (JDIMENSION, JSAMPARRAY, JSAMPIMAGE, JDIMENSION, int); + + switch (cinfo->in_color_space) { + case JCS_EXT_RGB: + neonfct = jsimd_extrgb_ycc_convert_neon; + break; + case JCS_EXT_RGBX: + case JCS_EXT_RGBA: + neonfct = jsimd_extrgbx_ycc_convert_neon; + break; + case JCS_EXT_BGR: + neonfct = jsimd_extbgr_ycc_convert_neon; + break; + case JCS_EXT_BGRX: + case JCS_EXT_BGRA: + neonfct = jsimd_extbgrx_ycc_convert_neon; + break; + case JCS_EXT_XBGR: + case JCS_EXT_ABGR: + neonfct = jsimd_extxbgr_ycc_convert_neon; + break; + case JCS_EXT_XRGB: + case JCS_EXT_ARGB: + neonfct = jsimd_extxrgb_ycc_convert_neon; + break; + default: + neonfct = jsimd_extrgb_ycc_convert_neon; + break; + } + + neonfct(cinfo->image_width, input_buf, output_buf, output_row, num_rows); +} + +GLOBAL(void) +jsimd_rgb_gray_convert(j_compress_ptr cinfo, JSAMPARRAY input_buf, + JSAMPIMAGE output_buf, JDIMENSION output_row, + int num_rows) +{ + void (*neonfct) (JDIMENSION, JSAMPARRAY, JSAMPIMAGE, JDIMENSION, int); + + switch (cinfo->in_color_space) { + case JCS_EXT_RGB: + neonfct = jsimd_extrgb_gray_convert_neon; + break; + case JCS_EXT_RGBX: + case JCS_EXT_RGBA: + neonfct = jsimd_extrgbx_gray_convert_neon; + break; + case JCS_EXT_BGR: + neonfct = jsimd_extbgr_gray_convert_neon; + break; + case JCS_EXT_BGRX: + case JCS_EXT_BGRA: + neonfct = jsimd_extbgrx_gray_convert_neon; + break; + case JCS_EXT_XBGR: + case JCS_EXT_ABGR: + neonfct = jsimd_extxbgr_gray_convert_neon; + break; + case JCS_EXT_XRGB: + case JCS_EXT_ARGB: + neonfct = jsimd_extxrgb_gray_convert_neon; + break; + default: + neonfct = jsimd_extrgb_gray_convert_neon; + break; + } + + neonfct(cinfo->image_width, input_buf, output_buf, output_row, num_rows); +} + +GLOBAL(void) +jsimd_ycc_rgb_convert(j_decompress_ptr cinfo, JSAMPIMAGE input_buf, + JDIMENSION input_row, JSAMPARRAY output_buf, + int num_rows) +{ + void (*neonfct) (JDIMENSION, JSAMPIMAGE, JDIMENSION, JSAMPARRAY, int); + + switch (cinfo->out_color_space) { + case JCS_EXT_RGB: + neonfct = jsimd_ycc_extrgb_convert_neon; + break; + case JCS_EXT_RGBX: + case JCS_EXT_RGBA: + neonfct = jsimd_ycc_extrgbx_convert_neon; + break; + case JCS_EXT_BGR: + neonfct = jsimd_ycc_extbgr_convert_neon; + break; + case JCS_EXT_BGRX: + case JCS_EXT_BGRA: + neonfct = jsimd_ycc_extbgrx_convert_neon; + break; + case JCS_EXT_XBGR: + case JCS_EXT_ABGR: + neonfct = jsimd_ycc_extxbgr_convert_neon; + break; + case JCS_EXT_XRGB: + case JCS_EXT_ARGB: + neonfct = jsimd_ycc_extxrgb_convert_neon; + break; + default: + neonfct = jsimd_ycc_extrgb_convert_neon; + break; + } + + neonfct(cinfo->output_width, input_buf, input_row, output_buf, num_rows); +} + +GLOBAL(void) +jsimd_ycc_rgb565_convert(j_decompress_ptr cinfo, JSAMPIMAGE input_buf, + JDIMENSION input_row, JSAMPARRAY output_buf, + int num_rows) +{ + jsimd_ycc_rgb565_convert_neon(cinfo->output_width, input_buf, input_row, + output_buf, num_rows); +} + +GLOBAL(int) +jsimd_can_h2v2_downsample(void) +{ + init_simd(); + + /* The code is optimised for these values only */ + if (BITS_IN_JSAMPLE != 8) + return 0; + if (DCTSIZE != 8) + return 0; + if (sizeof(JDIMENSION) != 4) + return 0; + + if (simd_support & JSIMD_NEON) + return 1; + + return 0; +} + +GLOBAL(int) +jsimd_can_h2v1_downsample(void) +{ + init_simd(); + + /* The code is optimised for these values only */ + if (BITS_IN_JSAMPLE != 8) + return 0; + if (DCTSIZE != 8) + return 0; + if (sizeof(JDIMENSION) != 4) + return 0; + + if (simd_support & JSIMD_NEON) + return 1; + + return 0; +} + +GLOBAL(void) +jsimd_h2v2_downsample(j_compress_ptr cinfo, jpeg_component_info *compptr, + JSAMPARRAY input_data, JSAMPARRAY output_data) +{ + jsimd_h2v2_downsample_neon(cinfo->image_width, cinfo->max_v_samp_factor, + compptr->v_samp_factor, compptr->width_in_blocks, + input_data, output_data); +} + +GLOBAL(void) +jsimd_h2v1_downsample(j_compress_ptr cinfo, jpeg_component_info *compptr, + JSAMPARRAY input_data, JSAMPARRAY output_data) +{ + jsimd_h2v1_downsample_neon(cinfo->image_width, cinfo->max_v_samp_factor, + compptr->v_samp_factor, compptr->width_in_blocks, + input_data, output_data); +} + +GLOBAL(int) +jsimd_can_h2v2_upsample(void) +{ + init_simd(); + + /* The code is optimised for these values only */ + if (BITS_IN_JSAMPLE != 8) + return 0; + if (sizeof(JDIMENSION) != 4) + return 0; + + if (simd_support & JSIMD_NEON) + return 1; + + return 0; +} + +GLOBAL(int) +jsimd_can_h2v1_upsample(void) +{ + init_simd(); + + /* The code is optimised for these values only */ + if (BITS_IN_JSAMPLE != 8) + return 0; + if (sizeof(JDIMENSION) != 4) + return 0; + if (simd_support & JSIMD_NEON) + return 1; + + return 0; +} + +GLOBAL(void) +jsimd_h2v2_upsample(j_decompress_ptr cinfo, jpeg_component_info *compptr, + JSAMPARRAY input_data, JSAMPARRAY *output_data_ptr) +{ + jsimd_h2v2_upsample_neon(cinfo->max_v_samp_factor, cinfo->output_width, + input_data, output_data_ptr); +} + +GLOBAL(void) +jsimd_h2v1_upsample(j_decompress_ptr cinfo, jpeg_component_info *compptr, + JSAMPARRAY input_data, JSAMPARRAY *output_data_ptr) +{ + jsimd_h2v1_upsample_neon(cinfo->max_v_samp_factor, cinfo->output_width, + input_data, output_data_ptr); +} + +GLOBAL(int) +jsimd_can_h2v2_fancy_upsample(void) +{ + init_simd(); + + /* The code is optimised for these values only */ + if (BITS_IN_JSAMPLE != 8) + return 0; + if (sizeof(JDIMENSION) != 4) + return 0; + + if (simd_support & JSIMD_NEON) + return 1; + + return 0; +} + +GLOBAL(int) +jsimd_can_h2v1_fancy_upsample(void) +{ + init_simd(); + + /* The code is optimised for these values only */ + if (BITS_IN_JSAMPLE != 8) + return 0; + if (sizeof(JDIMENSION) != 4) + return 0; + + if (simd_support & JSIMD_NEON) + return 1; + + return 0; +} + +GLOBAL(int) +jsimd_can_h1v2_fancy_upsample(void) +{ + init_simd(); + + /* The code is optimised for these values only */ + if (BITS_IN_JSAMPLE != 8) + return 0; + if (sizeof(JDIMENSION) != 4) + return 0; + + if (simd_support & JSIMD_NEON) + return 1; + + return 0; +} + +GLOBAL(void) +jsimd_h2v2_fancy_upsample(j_decompress_ptr cinfo, jpeg_component_info *compptr, + JSAMPARRAY input_data, JSAMPARRAY *output_data_ptr) +{ + jsimd_h2v2_fancy_upsample_neon(cinfo->max_v_samp_factor, + compptr->downsampled_width, input_data, + output_data_ptr); +} + +GLOBAL(void) +jsimd_h2v1_fancy_upsample(j_decompress_ptr cinfo, jpeg_component_info *compptr, + JSAMPARRAY input_data, JSAMPARRAY *output_data_ptr) +{ + jsimd_h2v1_fancy_upsample_neon(cinfo->max_v_samp_factor, + compptr->downsampled_width, input_data, + output_data_ptr); +} + +GLOBAL(void) +jsimd_h1v2_fancy_upsample(j_decompress_ptr cinfo, jpeg_component_info *compptr, + JSAMPARRAY input_data, JSAMPARRAY *output_data_ptr) +{ + jsimd_h1v2_fancy_upsample_neon(cinfo->max_v_samp_factor, + compptr->downsampled_width, input_data, + output_data_ptr); +} + +GLOBAL(int) +jsimd_can_h2v2_merged_upsample(void) +{ + init_simd(); + + /* The code is optimised for these values only */ + if (BITS_IN_JSAMPLE != 8) + return 0; + if (sizeof(JDIMENSION) != 4) + return 0; + + if (simd_support & JSIMD_NEON) + return 1; + + return 0; +} + +GLOBAL(int) +jsimd_can_h2v1_merged_upsample(void) +{ + init_simd(); + + /* The code is optimised for these values only */ + if (BITS_IN_JSAMPLE != 8) + return 0; + if (sizeof(JDIMENSION) != 4) + return 0; + + if (simd_support & JSIMD_NEON) + return 1; + + return 0; +} + +GLOBAL(void) +jsimd_h2v2_merged_upsample(j_decompress_ptr cinfo, JSAMPIMAGE input_buf, + JDIMENSION in_row_group_ctr, JSAMPARRAY output_buf) +{ + void (*neonfct) (JDIMENSION, JSAMPIMAGE, JDIMENSION, JSAMPARRAY); + + switch (cinfo->out_color_space) { + case JCS_EXT_RGB: + neonfct = jsimd_h2v2_extrgb_merged_upsample_neon; + break; + case JCS_EXT_RGBX: + case JCS_EXT_RGBA: + neonfct = jsimd_h2v2_extrgbx_merged_upsample_neon; + break; + case JCS_EXT_BGR: + neonfct = jsimd_h2v2_extbgr_merged_upsample_neon; + break; + case JCS_EXT_BGRX: + case JCS_EXT_BGRA: + neonfct = jsimd_h2v2_extbgrx_merged_upsample_neon; + break; + case JCS_EXT_XBGR: + case JCS_EXT_ABGR: + neonfct = jsimd_h2v2_extxbgr_merged_upsample_neon; + break; + case JCS_EXT_XRGB: + case JCS_EXT_ARGB: + neonfct = jsimd_h2v2_extxrgb_merged_upsample_neon; + break; + default: + neonfct = jsimd_h2v2_extrgb_merged_upsample_neon; + break; + } + + neonfct(cinfo->output_width, input_buf, in_row_group_ctr, output_buf); +} + +GLOBAL(void) +jsimd_h2v1_merged_upsample(j_decompress_ptr cinfo, JSAMPIMAGE input_buf, + JDIMENSION in_row_group_ctr, JSAMPARRAY output_buf) +{ + void (*neonfct) (JDIMENSION, JSAMPIMAGE, JDIMENSION, JSAMPARRAY); + + switch (cinfo->out_color_space) { + case JCS_EXT_RGB: + neonfct = jsimd_h2v1_extrgb_merged_upsample_neon; + break; + case JCS_EXT_RGBX: + case JCS_EXT_RGBA: + neonfct = jsimd_h2v1_extrgbx_merged_upsample_neon; + break; + case JCS_EXT_BGR: + neonfct = jsimd_h2v1_extbgr_merged_upsample_neon; + break; + case JCS_EXT_BGRX: + case JCS_EXT_BGRA: + neonfct = jsimd_h2v1_extbgrx_merged_upsample_neon; + break; + case JCS_EXT_XBGR: + case JCS_EXT_ABGR: + neonfct = jsimd_h2v1_extxbgr_merged_upsample_neon; + break; + case JCS_EXT_XRGB: + case JCS_EXT_ARGB: + neonfct = jsimd_h2v1_extxrgb_merged_upsample_neon; + break; + default: + neonfct = jsimd_h2v1_extrgb_merged_upsample_neon; + break; + } + + neonfct(cinfo->output_width, input_buf, in_row_group_ctr, output_buf); +} + +GLOBAL(int) +jsimd_can_convsamp(void) +{ + init_simd(); + + /* The code is optimised for these values only */ + if (DCTSIZE != 8) + return 0; + if (BITS_IN_JSAMPLE != 8) + return 0; + if (sizeof(JDIMENSION) != 4) + return 0; + if (sizeof(DCTELEM) != 2) + return 0; + + if (simd_support & JSIMD_NEON) + return 1; + + return 0; +} + +GLOBAL(int) +jsimd_can_convsamp_float(void) +{ + return 0; +} + +GLOBAL(void) +jsimd_convsamp(JSAMPARRAY sample_data, JDIMENSION start_col, + DCTELEM *workspace) +{ + jsimd_convsamp_neon(sample_data, start_col, workspace); +} + +GLOBAL(void) +jsimd_convsamp_float(JSAMPARRAY sample_data, JDIMENSION start_col, + FAST_FLOAT *workspace) +{ +} + +GLOBAL(int) +jsimd_can_fdct_islow(void) +{ + init_simd(); + + /* The code is optimised for these values only */ + if (DCTSIZE != 8) + return 0; + if (sizeof(DCTELEM) != 2) + return 0; + + if (simd_support & JSIMD_NEON) + return 1; + + return 0; +} + +GLOBAL(int) +jsimd_can_fdct_ifast(void) +{ + init_simd(); + + /* The code is optimised for these values only */ + if (DCTSIZE != 8) + return 0; + if (sizeof(DCTELEM) != 2) + return 0; + + if (simd_support & JSIMD_NEON) + return 1; + + return 0; +} + +GLOBAL(int) +jsimd_can_fdct_float(void) +{ + return 0; +} + +GLOBAL(void) +jsimd_fdct_islow(DCTELEM *data) +{ + jsimd_fdct_islow_neon(data); +} + +GLOBAL(void) +jsimd_fdct_ifast(DCTELEM *data) +{ + jsimd_fdct_ifast_neon(data); +} + +GLOBAL(void) +jsimd_fdct_float(FAST_FLOAT *data) +{ +} + +GLOBAL(int) +jsimd_can_quantize(void) +{ + init_simd(); + + /* The code is optimised for these values only */ + if (DCTSIZE != 8) + return 0; + if (sizeof(JCOEF) != 2) + return 0; + if (sizeof(DCTELEM) != 2) + return 0; + + if (simd_support & JSIMD_NEON) + return 1; + + return 0; +} + +GLOBAL(int) +jsimd_can_quantize_float(void) +{ + return 0; +} + +GLOBAL(void) +jsimd_quantize(JCOEFPTR coef_block, DCTELEM *divisors, DCTELEM *workspace) +{ + jsimd_quantize_neon(coef_block, divisors, workspace); +} + +GLOBAL(void) +jsimd_quantize_float(JCOEFPTR coef_block, FAST_FLOAT *divisors, + FAST_FLOAT *workspace) +{ +} + +GLOBAL(int) +jsimd_can_idct_2x2(void) +{ + init_simd(); + + /* The code is optimised for these values only */ + if (DCTSIZE != 8) + return 0; + if (sizeof(JCOEF) != 2) + return 0; + if (BITS_IN_JSAMPLE != 8) + return 0; + if (sizeof(JDIMENSION) != 4) + return 0; + if (sizeof(ISLOW_MULT_TYPE) != 2) + return 0; + + if (simd_support & JSIMD_NEON) + return 1; + + return 0; +} + +GLOBAL(int) +jsimd_can_idct_4x4(void) +{ + init_simd(); + + /* The code is optimised for these values only */ + if (DCTSIZE != 8) + return 0; + if (sizeof(JCOEF) != 2) + return 0; + if (BITS_IN_JSAMPLE != 8) + return 0; + if (sizeof(JDIMENSION) != 4) + return 0; + if (sizeof(ISLOW_MULT_TYPE) != 2) + return 0; + + if (simd_support & JSIMD_NEON) + return 1; + + return 0; +} + +GLOBAL(void) +jsimd_idct_2x2(j_decompress_ptr cinfo, jpeg_component_info *compptr, + JCOEFPTR coef_block, JSAMPARRAY output_buf, + JDIMENSION output_col) +{ + jsimd_idct_2x2_neon(compptr->dct_table, coef_block, output_buf, output_col); +} + +GLOBAL(void) +jsimd_idct_4x4(j_decompress_ptr cinfo, jpeg_component_info *compptr, + JCOEFPTR coef_block, JSAMPARRAY output_buf, + JDIMENSION output_col) +{ + jsimd_idct_4x4_neon(compptr->dct_table, coef_block, output_buf, output_col); +} + +GLOBAL(int) +jsimd_can_idct_islow(void) +{ + init_simd(); + + /* The code is optimised for these values only */ + if (DCTSIZE != 8) + return 0; + if (sizeof(JCOEF) != 2) + return 0; + if (BITS_IN_JSAMPLE != 8) + return 0; + if (sizeof(JDIMENSION) != 4) + return 0; + if (sizeof(ISLOW_MULT_TYPE) != 2) + return 0; + + if (simd_support & JSIMD_NEON) + return 1; + + return 0; +} + +GLOBAL(int) +jsimd_can_idct_ifast(void) +{ + init_simd(); + + /* The code is optimised for these values only */ + if (DCTSIZE != 8) + return 0; + if (sizeof(JCOEF) != 2) + return 0; + if (BITS_IN_JSAMPLE != 8) + return 0; + if (sizeof(JDIMENSION) != 4) + return 0; + if (sizeof(IFAST_MULT_TYPE) != 2) + return 0; + if (IFAST_SCALE_BITS != 2) + return 0; + + if (simd_support & JSIMD_NEON) + return 1; + + return 0; +} + +GLOBAL(int) +jsimd_can_idct_float(void) +{ + return 0; +} + +GLOBAL(void) +jsimd_idct_islow(j_decompress_ptr cinfo, jpeg_component_info *compptr, + JCOEFPTR coef_block, JSAMPARRAY output_buf, + JDIMENSION output_col) +{ + jsimd_idct_islow_neon(compptr->dct_table, coef_block, output_buf, + output_col); +} + +GLOBAL(void) +jsimd_idct_ifast(j_decompress_ptr cinfo, jpeg_component_info *compptr, + JCOEFPTR coef_block, JSAMPARRAY output_buf, + JDIMENSION output_col) +{ + jsimd_idct_ifast_neon(compptr->dct_table, coef_block, output_buf, + output_col); +} + +GLOBAL(void) +jsimd_idct_float(j_decompress_ptr cinfo, jpeg_component_info *compptr, + JCOEFPTR coef_block, JSAMPARRAY output_buf, + JDIMENSION output_col) +{ +} + +GLOBAL(int) +jsimd_can_huff_encode_one_block(void) +{ + init_simd(); + + if (DCTSIZE != 8) + return 0; + if (sizeof(JCOEF) != 2) + return 0; + + if (simd_support & JSIMD_NEON && simd_huffman) + return 1; + + return 0; +} + +GLOBAL(JOCTET *) +jsimd_huff_encode_one_block(void *state, JOCTET *buffer, JCOEFPTR block, + int last_dc_val, c_derived_tbl *dctbl, + c_derived_tbl *actbl) +{ + return jsimd_huff_encode_one_block_neon(state, buffer, block, last_dc_val, + dctbl, actbl); +} + +GLOBAL(int) +jsimd_can_encode_mcu_AC_first_prepare(void) +{ + init_simd(); + + if (DCTSIZE != 8) + return 0; + if (sizeof(JCOEF) != 2) + return 0; + + if (simd_support & JSIMD_NEON) + return 1; + + return 0; +} + +GLOBAL(void) +jsimd_encode_mcu_AC_first_prepare(const JCOEF *block, + const int *jpeg_natural_order_start, int Sl, + int Al, JCOEF *values, size_t *zerobits) +{ + jsimd_encode_mcu_AC_first_prepare_neon(block, jpeg_natural_order_start, + Sl, Al, values, zerobits); +} + +GLOBAL(int) +jsimd_can_encode_mcu_AC_refine_prepare(void) +{ + init_simd(); + + if (DCTSIZE != 8) + return 0; + if (sizeof(JCOEF) != 2) + return 0; + + if (simd_support & JSIMD_NEON) + return 1; + + return 0; +} + +GLOBAL(int) +jsimd_encode_mcu_AC_refine_prepare(const JCOEF *block, + const int *jpeg_natural_order_start, int Sl, + int Al, JCOEF *absvalues, size_t *bits) +{ + return jsimd_encode_mcu_AC_refine_prepare_neon(block, + jpeg_natural_order_start, Sl, + Al, absvalues, bits); +} diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/aarch64/jccolext-neon.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/aarch64/jccolext-neon.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/aarch64/jccolext-neon.c 1970-01-01 01:00:00.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/aarch64/jccolext-neon.c 2021-11-20 03:41:33.398600450 +0000 @@ -0,0 +1,316 @@ +/* + * jccolext-neon.c - colorspace conversion (64-bit Arm Neon) + * + * Copyright (C) 2020, Arm Limited. All Rights Reserved. + * + * This software is provided 'as-is', without any express or implied + * warranty. In no event will the authors be held liable for any damages + * arising from the use of this software. + * + * Permission is granted to anyone to use this software for any purpose, + * including commercial applications, and to alter it and redistribute it + * freely, subject to the following restrictions: + * + * 1. The origin of this software must not be misrepresented; you must not + * claim that you wrote the original software. If you use this software + * in a product, an acknowledgment in the product documentation would be + * appreciated but is not required. + * 2. Altered source versions must be plainly marked as such, and must not be + * misrepresented as being the original software. + * 3. This notice may not be removed or altered from any source distribution. + */ + +/* This file is included by jccolor-neon.c */ + + +/* RGB -> YCbCr conversion is defined by the following equations: + * Y = 0.29900 * R + 0.58700 * G + 0.11400 * B + * Cb = -0.16874 * R - 0.33126 * G + 0.50000 * B + 128 + * Cr = 0.50000 * R - 0.41869 * G - 0.08131 * B + 128 + * + * Avoid floating point arithmetic by using shifted integer constants: + * 0.29899597 = 19595 * 2^-16 + * 0.58700561 = 38470 * 2^-16 + * 0.11399841 = 7471 * 2^-16 + * 0.16874695 = 11059 * 2^-16 + * 0.33125305 = 21709 * 2^-16 + * 0.50000000 = 32768 * 2^-16 + * 0.41868592 = 27439 * 2^-16 + * 0.08131409 = 5329 * 2^-16 + * These constants are defined in jccolor-neon.c + * + * We add the fixed-point equivalent of 0.5 to Cb and Cr, which effectively + * rounds up or down the result via integer truncation. + */ + +void jsimd_rgb_ycc_convert_neon(JDIMENSION image_width, JSAMPARRAY input_buf, + JSAMPIMAGE output_buf, JDIMENSION output_row, + int num_rows) +{ + /* Pointer to RGB(X/A) input data */ + JSAMPROW inptr; + /* Pointers to Y, Cb, and Cr output data */ + JSAMPROW outptr0, outptr1, outptr2; + /* Allocate temporary buffer for final (image_width % 16) pixels in row. */ + ALIGN(16) uint8_t tmp_buf[16 * RGB_PIXELSIZE]; + + /* Set up conversion constants. */ + const uint16x8_t consts = vld1q_u16(jsimd_rgb_ycc_neon_consts); + const uint32x4_t scaled_128_5 = vdupq_n_u32((128 << 16) + 32767); + + while (--num_rows >= 0) { + inptr = *input_buf++; + outptr0 = output_buf[0][output_row]; + outptr1 = output_buf[1][output_row]; + outptr2 = output_buf[2][output_row]; + output_row++; + + int cols_remaining = image_width; + for (; cols_remaining >= 16; cols_remaining -= 16) { + +#if RGB_PIXELSIZE == 4 + uint8x16x4_t input_pixels = vld4q_u8(inptr); +#else + uint8x16x3_t input_pixels = vld3q_u8(inptr); +#endif + uint16x8_t r_l = vmovl_u8(vget_low_u8(input_pixels.val[RGB_RED])); + uint16x8_t g_l = vmovl_u8(vget_low_u8(input_pixels.val[RGB_GREEN])); + uint16x8_t b_l = vmovl_u8(vget_low_u8(input_pixels.val[RGB_BLUE])); + uint16x8_t r_h = vmovl_u8(vget_high_u8(input_pixels.val[RGB_RED])); + uint16x8_t g_h = vmovl_u8(vget_high_u8(input_pixels.val[RGB_GREEN])); + uint16x8_t b_h = vmovl_u8(vget_high_u8(input_pixels.val[RGB_BLUE])); + + /* Compute Y = 0.29900 * R + 0.58700 * G + 0.11400 * B */ + uint32x4_t y_ll = vmull_laneq_u16(vget_low_u16(r_l), consts, 0); + y_ll = vmlal_laneq_u16(y_ll, vget_low_u16(g_l), consts, 1); + y_ll = vmlal_laneq_u16(y_ll, vget_low_u16(b_l), consts, 2); + uint32x4_t y_lh = vmull_laneq_u16(vget_high_u16(r_l), consts, 0); + y_lh = vmlal_laneq_u16(y_lh, vget_high_u16(g_l), consts, 1); + y_lh = vmlal_laneq_u16(y_lh, vget_high_u16(b_l), consts, 2); + uint32x4_t y_hl = vmull_laneq_u16(vget_low_u16(r_h), consts, 0); + y_hl = vmlal_laneq_u16(y_hl, vget_low_u16(g_h), consts, 1); + y_hl = vmlal_laneq_u16(y_hl, vget_low_u16(b_h), consts, 2); + uint32x4_t y_hh = vmull_laneq_u16(vget_high_u16(r_h), consts, 0); + y_hh = vmlal_laneq_u16(y_hh, vget_high_u16(g_h), consts, 1); + y_hh = vmlal_laneq_u16(y_hh, vget_high_u16(b_h), consts, 2); + + /* Compute Cb = -0.16874 * R - 0.33126 * G + 0.50000 * B + 128 */ + uint32x4_t cb_ll = scaled_128_5; + cb_ll = vmlsl_laneq_u16(cb_ll, vget_low_u16(r_l), consts, 3); + cb_ll = vmlsl_laneq_u16(cb_ll, vget_low_u16(g_l), consts, 4); + cb_ll = vmlal_laneq_u16(cb_ll, vget_low_u16(b_l), consts, 5); + uint32x4_t cb_lh = scaled_128_5; + cb_lh = vmlsl_laneq_u16(cb_lh, vget_high_u16(r_l), consts, 3); + cb_lh = vmlsl_laneq_u16(cb_lh, vget_high_u16(g_l), consts, 4); + cb_lh = vmlal_laneq_u16(cb_lh, vget_high_u16(b_l), consts, 5); + uint32x4_t cb_hl = scaled_128_5; + cb_hl = vmlsl_laneq_u16(cb_hl, vget_low_u16(r_h), consts, 3); + cb_hl = vmlsl_laneq_u16(cb_hl, vget_low_u16(g_h), consts, 4); + cb_hl = vmlal_laneq_u16(cb_hl, vget_low_u16(b_h), consts, 5); + uint32x4_t cb_hh = scaled_128_5; + cb_hh = vmlsl_laneq_u16(cb_hh, vget_high_u16(r_h), consts, 3); + cb_hh = vmlsl_laneq_u16(cb_hh, vget_high_u16(g_h), consts, 4); + cb_hh = vmlal_laneq_u16(cb_hh, vget_high_u16(b_h), consts, 5); + + /* Compute Cr = 0.50000 * R - 0.41869 * G - 0.08131 * B + 128 */ + uint32x4_t cr_ll = scaled_128_5; + cr_ll = vmlal_laneq_u16(cr_ll, vget_low_u16(r_l), consts, 5); + cr_ll = vmlsl_laneq_u16(cr_ll, vget_low_u16(g_l), consts, 6); + cr_ll = vmlsl_laneq_u16(cr_ll, vget_low_u16(b_l), consts, 7); + uint32x4_t cr_lh = scaled_128_5; + cr_lh = vmlal_laneq_u16(cr_lh, vget_high_u16(r_l), consts, 5); + cr_lh = vmlsl_laneq_u16(cr_lh, vget_high_u16(g_l), consts, 6); + cr_lh = vmlsl_laneq_u16(cr_lh, vget_high_u16(b_l), consts, 7); + uint32x4_t cr_hl = scaled_128_5; + cr_hl = vmlal_laneq_u16(cr_hl, vget_low_u16(r_h), consts, 5); + cr_hl = vmlsl_laneq_u16(cr_hl, vget_low_u16(g_h), consts, 6); + cr_hl = vmlsl_laneq_u16(cr_hl, vget_low_u16(b_h), consts, 7); + uint32x4_t cr_hh = scaled_128_5; + cr_hh = vmlal_laneq_u16(cr_hh, vget_high_u16(r_h), consts, 5); + cr_hh = vmlsl_laneq_u16(cr_hh, vget_high_u16(g_h), consts, 6); + cr_hh = vmlsl_laneq_u16(cr_hh, vget_high_u16(b_h), consts, 7); + + /* Descale Y values (rounding right shift) and narrow to 16-bit. */ + uint16x8_t y_l = vcombine_u16(vrshrn_n_u32(y_ll, 16), + vrshrn_n_u32(y_lh, 16)); + uint16x8_t y_h = vcombine_u16(vrshrn_n_u32(y_hl, 16), + vrshrn_n_u32(y_hh, 16)); + /* Descale Cb values (right shift) and narrow to 16-bit. */ + uint16x8_t cb_l = vcombine_u16(vshrn_n_u32(cb_ll, 16), + vshrn_n_u32(cb_lh, 16)); + uint16x8_t cb_h = vcombine_u16(vshrn_n_u32(cb_hl, 16), + vshrn_n_u32(cb_hh, 16)); + /* Descale Cr values (right shift) and narrow to 16-bit. */ + uint16x8_t cr_l = vcombine_u16(vshrn_n_u32(cr_ll, 16), + vshrn_n_u32(cr_lh, 16)); + uint16x8_t cr_h = vcombine_u16(vshrn_n_u32(cr_hl, 16), + vshrn_n_u32(cr_hh, 16)); + /* Narrow Y, Cb, and Cr values to 8-bit and store to memory. Buffer + * overwrite is permitted up to the next multiple of ALIGN_SIZE bytes. + */ + vst1q_u8(outptr0, vcombine_u8(vmovn_u16(y_l), vmovn_u16(y_h))); + vst1q_u8(outptr1, vcombine_u8(vmovn_u16(cb_l), vmovn_u16(cb_h))); + vst1q_u8(outptr2, vcombine_u8(vmovn_u16(cr_l), vmovn_u16(cr_h))); + + /* Increment pointers. */ + inptr += (16 * RGB_PIXELSIZE); + outptr0 += 16; + outptr1 += 16; + outptr2 += 16; + } + + if (cols_remaining > 8) { + /* To prevent buffer overread by the vector load instructions, the last + * (image_width % 16) columns of data are first memcopied to a temporary + * buffer large enough to accommodate the vector load. + */ + memcpy(tmp_buf, inptr, cols_remaining * RGB_PIXELSIZE); + inptr = tmp_buf; + +#if RGB_PIXELSIZE == 4 + uint8x16x4_t input_pixels = vld4q_u8(inptr); +#else + uint8x16x3_t input_pixels = vld3q_u8(inptr); +#endif + uint16x8_t r_l = vmovl_u8(vget_low_u8(input_pixels.val[RGB_RED])); + uint16x8_t g_l = vmovl_u8(vget_low_u8(input_pixels.val[RGB_GREEN])); + uint16x8_t b_l = vmovl_u8(vget_low_u8(input_pixels.val[RGB_BLUE])); + uint16x8_t r_h = vmovl_u8(vget_high_u8(input_pixels.val[RGB_RED])); + uint16x8_t g_h = vmovl_u8(vget_high_u8(input_pixels.val[RGB_GREEN])); + uint16x8_t b_h = vmovl_u8(vget_high_u8(input_pixels.val[RGB_BLUE])); + + /* Compute Y = 0.29900 * R + 0.58700 * G + 0.11400 * B */ + uint32x4_t y_ll = vmull_laneq_u16(vget_low_u16(r_l), consts, 0); + y_ll = vmlal_laneq_u16(y_ll, vget_low_u16(g_l), consts, 1); + y_ll = vmlal_laneq_u16(y_ll, vget_low_u16(b_l), consts, 2); + uint32x4_t y_lh = vmull_laneq_u16(vget_high_u16(r_l), consts, 0); + y_lh = vmlal_laneq_u16(y_lh, vget_high_u16(g_l), consts, 1); + y_lh = vmlal_laneq_u16(y_lh, vget_high_u16(b_l), consts, 2); + uint32x4_t y_hl = vmull_laneq_u16(vget_low_u16(r_h), consts, 0); + y_hl = vmlal_laneq_u16(y_hl, vget_low_u16(g_h), consts, 1); + y_hl = vmlal_laneq_u16(y_hl, vget_low_u16(b_h), consts, 2); + uint32x4_t y_hh = vmull_laneq_u16(vget_high_u16(r_h), consts, 0); + y_hh = vmlal_laneq_u16(y_hh, vget_high_u16(g_h), consts, 1); + y_hh = vmlal_laneq_u16(y_hh, vget_high_u16(b_h), consts, 2); + + /* Compute Cb = -0.16874 * R - 0.33126 * G + 0.50000 * B + 128 */ + uint32x4_t cb_ll = scaled_128_5; + cb_ll = vmlsl_laneq_u16(cb_ll, vget_low_u16(r_l), consts, 3); + cb_ll = vmlsl_laneq_u16(cb_ll, vget_low_u16(g_l), consts, 4); + cb_ll = vmlal_laneq_u16(cb_ll, vget_low_u16(b_l), consts, 5); + uint32x4_t cb_lh = scaled_128_5; + cb_lh = vmlsl_laneq_u16(cb_lh, vget_high_u16(r_l), consts, 3); + cb_lh = vmlsl_laneq_u16(cb_lh, vget_high_u16(g_l), consts, 4); + cb_lh = vmlal_laneq_u16(cb_lh, vget_high_u16(b_l), consts, 5); + uint32x4_t cb_hl = scaled_128_5; + cb_hl = vmlsl_laneq_u16(cb_hl, vget_low_u16(r_h), consts, 3); + cb_hl = vmlsl_laneq_u16(cb_hl, vget_low_u16(g_h), consts, 4); + cb_hl = vmlal_laneq_u16(cb_hl, vget_low_u16(b_h), consts, 5); + uint32x4_t cb_hh = scaled_128_5; + cb_hh = vmlsl_laneq_u16(cb_hh, vget_high_u16(r_h), consts, 3); + cb_hh = vmlsl_laneq_u16(cb_hh, vget_high_u16(g_h), consts, 4); + cb_hh = vmlal_laneq_u16(cb_hh, vget_high_u16(b_h), consts, 5); + + /* Compute Cr = 0.50000 * R - 0.41869 * G - 0.08131 * B + 128 */ + uint32x4_t cr_ll = scaled_128_5; + cr_ll = vmlal_laneq_u16(cr_ll, vget_low_u16(r_l), consts, 5); + cr_ll = vmlsl_laneq_u16(cr_ll, vget_low_u16(g_l), consts, 6); + cr_ll = vmlsl_laneq_u16(cr_ll, vget_low_u16(b_l), consts, 7); + uint32x4_t cr_lh = scaled_128_5; + cr_lh = vmlal_laneq_u16(cr_lh, vget_high_u16(r_l), consts, 5); + cr_lh = vmlsl_laneq_u16(cr_lh, vget_high_u16(g_l), consts, 6); + cr_lh = vmlsl_laneq_u16(cr_lh, vget_high_u16(b_l), consts, 7); + uint32x4_t cr_hl = scaled_128_5; + cr_hl = vmlal_laneq_u16(cr_hl, vget_low_u16(r_h), consts, 5); + cr_hl = vmlsl_laneq_u16(cr_hl, vget_low_u16(g_h), consts, 6); + cr_hl = vmlsl_laneq_u16(cr_hl, vget_low_u16(b_h), consts, 7); + uint32x4_t cr_hh = scaled_128_5; + cr_hh = vmlal_laneq_u16(cr_hh, vget_high_u16(r_h), consts, 5); + cr_hh = vmlsl_laneq_u16(cr_hh, vget_high_u16(g_h), consts, 6); + cr_hh = vmlsl_laneq_u16(cr_hh, vget_high_u16(b_h), consts, 7); + + /* Descale Y values (rounding right shift) and narrow to 16-bit. */ + uint16x8_t y_l = vcombine_u16(vrshrn_n_u32(y_ll, 16), + vrshrn_n_u32(y_lh, 16)); + uint16x8_t y_h = vcombine_u16(vrshrn_n_u32(y_hl, 16), + vrshrn_n_u32(y_hh, 16)); + /* Descale Cb values (right shift) and narrow to 16-bit. */ + uint16x8_t cb_l = vcombine_u16(vshrn_n_u32(cb_ll, 16), + vshrn_n_u32(cb_lh, 16)); + uint16x8_t cb_h = vcombine_u16(vshrn_n_u32(cb_hl, 16), + vshrn_n_u32(cb_hh, 16)); + /* Descale Cr values (right shift) and narrow to 16-bit. */ + uint16x8_t cr_l = vcombine_u16(vshrn_n_u32(cr_ll, 16), + vshrn_n_u32(cr_lh, 16)); + uint16x8_t cr_h = vcombine_u16(vshrn_n_u32(cr_hl, 16), + vshrn_n_u32(cr_hh, 16)); + /* Narrow Y, Cb, and Cr values to 8-bit and store to memory. Buffer + * overwrite is permitted up to the next multiple of ALIGN_SIZE bytes. + */ + vst1q_u8(outptr0, vcombine_u8(vmovn_u16(y_l), vmovn_u16(y_h))); + vst1q_u8(outptr1, vcombine_u8(vmovn_u16(cb_l), vmovn_u16(cb_h))); + vst1q_u8(outptr2, vcombine_u8(vmovn_u16(cr_l), vmovn_u16(cr_h))); + + } else if (cols_remaining > 0) { + /* To prevent buffer overread by the vector load instructions, the last + * (image_width % 8) columns of data are first memcopied to a temporary + * buffer large enough to accommodate the vector load. + */ + memcpy(tmp_buf, inptr, cols_remaining * RGB_PIXELSIZE); + inptr = tmp_buf; + +#if RGB_PIXELSIZE == 4 + uint8x8x4_t input_pixels = vld4_u8(inptr); +#else + uint8x8x3_t input_pixels = vld3_u8(inptr); +#endif + uint16x8_t r = vmovl_u8(input_pixels.val[RGB_RED]); + uint16x8_t g = vmovl_u8(input_pixels.val[RGB_GREEN]); + uint16x8_t b = vmovl_u8(input_pixels.val[RGB_BLUE]); + + /* Compute Y = 0.29900 * R + 0.58700 * G + 0.11400 * B */ + uint32x4_t y_l = vmull_laneq_u16(vget_low_u16(r), consts, 0); + y_l = vmlal_laneq_u16(y_l, vget_low_u16(g), consts, 1); + y_l = vmlal_laneq_u16(y_l, vget_low_u16(b), consts, 2); + uint32x4_t y_h = vmull_laneq_u16(vget_high_u16(r), consts, 0); + y_h = vmlal_laneq_u16(y_h, vget_high_u16(g), consts, 1); + y_h = vmlal_laneq_u16(y_h, vget_high_u16(b), consts, 2); + + /* Compute Cb = -0.16874 * R - 0.33126 * G + 0.50000 * B + 128 */ + uint32x4_t cb_l = scaled_128_5; + cb_l = vmlsl_laneq_u16(cb_l, vget_low_u16(r), consts, 3); + cb_l = vmlsl_laneq_u16(cb_l, vget_low_u16(g), consts, 4); + cb_l = vmlal_laneq_u16(cb_l, vget_low_u16(b), consts, 5); + uint32x4_t cb_h = scaled_128_5; + cb_h = vmlsl_laneq_u16(cb_h, vget_high_u16(r), consts, 3); + cb_h = vmlsl_laneq_u16(cb_h, vget_high_u16(g), consts, 4); + cb_h = vmlal_laneq_u16(cb_h, vget_high_u16(b), consts, 5); + + /* Compute Cr = 0.50000 * R - 0.41869 * G - 0.08131 * B + 128 */ + uint32x4_t cr_l = scaled_128_5; + cr_l = vmlal_laneq_u16(cr_l, vget_low_u16(r), consts, 5); + cr_l = vmlsl_laneq_u16(cr_l, vget_low_u16(g), consts, 6); + cr_l = vmlsl_laneq_u16(cr_l, vget_low_u16(b), consts, 7); + uint32x4_t cr_h = scaled_128_5; + cr_h = vmlal_laneq_u16(cr_h, vget_high_u16(r), consts, 5); + cr_h = vmlsl_laneq_u16(cr_h, vget_high_u16(g), consts, 6); + cr_h = vmlsl_laneq_u16(cr_h, vget_high_u16(b), consts, 7); + + /* Descale Y values (rounding right shift) and narrow to 16-bit. */ + uint16x8_t y_u16 = vcombine_u16(vrshrn_n_u32(y_l, 16), + vrshrn_n_u32(y_h, 16)); + /* Descale Cb values (right shift) and narrow to 16-bit. */ + uint16x8_t cb_u16 = vcombine_u16(vshrn_n_u32(cb_l, 16), + vshrn_n_u32(cb_h, 16)); + /* Descale Cr values (right shift) and narrow to 16-bit. */ + uint16x8_t cr_u16 = vcombine_u16(vshrn_n_u32(cr_l, 16), + vshrn_n_u32(cr_h, 16)); + /* Narrow Y, Cb, and Cr values to 8-bit and store to memory. Buffer + * overwrite is permitted up to the next multiple of ALIGN_SIZE bytes. + */ + vst1_u8(outptr0, vmovn_u16(y_u16)); + vst1_u8(outptr1, vmovn_u16(cb_u16)); + vst1_u8(outptr2, vmovn_u16(cr_u16)); + } + } +} diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/aarch64/jchuff-neon.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/aarch64/jchuff-neon.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/aarch64/jchuff-neon.c 1970-01-01 01:00:00.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/aarch64/jchuff-neon.c 2021-11-20 03:41:33.398600450 +0000 @@ -0,0 +1,403 @@ +/* + * jchuff-neon.c - Huffman entropy encoding (64-bit Arm Neon) + * + * Copyright (C) 2020-2021, Arm Limited. All Rights Reserved. + * Copyright (C) 2020, D. R. Commander. All Rights Reserved. + * + * This software is provided 'as-is', without any express or implied + * warranty. In no event will the authors be held liable for any damages + * arising from the use of this software. + * + * Permission is granted to anyone to use this software for any purpose, + * including commercial applications, and to alter it and redistribute it + * freely, subject to the following restrictions: + * + * 1. The origin of this software must not be misrepresented; you must not + * claim that you wrote the original software. If you use this software + * in a product, an acknowledgment in the product documentation would be + * appreciated but is not required. + * 2. Altered source versions must be plainly marked as such, and must not be + * misrepresented as being the original software. + * 3. This notice may not be removed or altered from any source distribution. + * + * NOTE: All referenced figures are from + * Recommendation ITU-T T.81 (1992) | ISO/IEC 10918-1:1994. + */ + +#define JPEG_INTERNALS +#include "../../../jinclude.h" +#include "../../../jpeglib.h" +#include "../../../jsimd.h" +#include "../../../jdct.h" +#include "../../../jsimddct.h" +#include "../../jsimd.h" +#include "../align.h" +#include "../jchuff.h" +#include "neon-compat.h" + +#include + +#include + + +ALIGN(16) static const uint8_t jsimd_huff_encode_one_block_consts[] = { + 0, 1, 2, 3, 16, 17, 32, 33, + 18, 19, 4, 5, 6, 7, 20, 21, + 34, 35, 48, 49, 255, 255, 50, 51, + 36, 37, 22, 23, 8, 9, 10, 11, + 255, 255, 6, 7, 20, 21, 34, 35, + 48, 49, 255, 255, 50, 51, 36, 37, + 54, 55, 40, 41, 26, 27, 12, 13, + 14, 15, 28, 29, 42, 43, 56, 57, + 6, 7, 20, 21, 34, 35, 48, 49, + 50, 51, 36, 37, 22, 23, 8, 9, + 26, 27, 12, 13, 255, 255, 14, 15, + 28, 29, 42, 43, 56, 57, 255, 255, + 52, 53, 54, 55, 40, 41, 26, 27, + 12, 13, 255, 255, 14, 15, 28, 29, + 26, 27, 40, 41, 42, 43, 28, 29, + 14, 15, 30, 31, 44, 45, 46, 47 +}; + +JOCTET *jsimd_huff_encode_one_block_neon(void *state, JOCTET *buffer, + JCOEFPTR block, int last_dc_val, + c_derived_tbl *dctbl, + c_derived_tbl *actbl) +{ + uint16_t block_diff[DCTSIZE2]; + + /* Load lookup table indices for rows of zig-zag ordering. */ +#ifdef HAVE_VLD1Q_U8_X4 + const uint8x16x4_t idx_rows_0123 = + vld1q_u8_x4(jsimd_huff_encode_one_block_consts + 0 * DCTSIZE); + const uint8x16x4_t idx_rows_4567 = + vld1q_u8_x4(jsimd_huff_encode_one_block_consts + 8 * DCTSIZE); +#else + /* GCC does not currently support intrinsics vl1dq__x4(). */ + const uint8x16x4_t idx_rows_0123 = { { + vld1q_u8(jsimd_huff_encode_one_block_consts + 0 * DCTSIZE), + vld1q_u8(jsimd_huff_encode_one_block_consts + 2 * DCTSIZE), + vld1q_u8(jsimd_huff_encode_one_block_consts + 4 * DCTSIZE), + vld1q_u8(jsimd_huff_encode_one_block_consts + 6 * DCTSIZE) + } }; + const uint8x16x4_t idx_rows_4567 = { { + vld1q_u8(jsimd_huff_encode_one_block_consts + 8 * DCTSIZE), + vld1q_u8(jsimd_huff_encode_one_block_consts + 10 * DCTSIZE), + vld1q_u8(jsimd_huff_encode_one_block_consts + 12 * DCTSIZE), + vld1q_u8(jsimd_huff_encode_one_block_consts + 14 * DCTSIZE) + } }; +#endif + + /* Load 8x8 block of DCT coefficients. */ +#ifdef HAVE_VLD1Q_U8_X4 + const int8x16x4_t tbl_rows_0123 = + vld1q_s8_x4((int8_t *)(block + 0 * DCTSIZE)); + const int8x16x4_t tbl_rows_4567 = + vld1q_s8_x4((int8_t *)(block + 4 * DCTSIZE)); +#else + const int8x16x4_t tbl_rows_0123 = { { + vld1q_s8((int8_t *)(block + 0 * DCTSIZE)), + vld1q_s8((int8_t *)(block + 1 * DCTSIZE)), + vld1q_s8((int8_t *)(block + 2 * DCTSIZE)), + vld1q_s8((int8_t *)(block + 3 * DCTSIZE)) + } }; + const int8x16x4_t tbl_rows_4567 = { { + vld1q_s8((int8_t *)(block + 4 * DCTSIZE)), + vld1q_s8((int8_t *)(block + 5 * DCTSIZE)), + vld1q_s8((int8_t *)(block + 6 * DCTSIZE)), + vld1q_s8((int8_t *)(block + 7 * DCTSIZE)) + } }; +#endif + + /* Initialise extra lookup tables. */ + const int8x16x4_t tbl_rows_2345 = { { + tbl_rows_0123.val[2], tbl_rows_0123.val[3], + tbl_rows_4567.val[0], tbl_rows_4567.val[1] + } }; + const int8x16x3_t tbl_rows_567 = + { { tbl_rows_4567.val[1], tbl_rows_4567.val[2], tbl_rows_4567.val[3] } }; + + /* Shuffle coefficients into zig-zag order. */ + int16x8_t row0 = + vreinterpretq_s16_s8(vqtbl4q_s8(tbl_rows_0123, idx_rows_0123.val[0])); + int16x8_t row1 = + vreinterpretq_s16_s8(vqtbl4q_s8(tbl_rows_0123, idx_rows_0123.val[1])); + int16x8_t row2 = + vreinterpretq_s16_s8(vqtbl4q_s8(tbl_rows_2345, idx_rows_0123.val[2])); + int16x8_t row3 = + vreinterpretq_s16_s8(vqtbl4q_s8(tbl_rows_0123, idx_rows_0123.val[3])); + int16x8_t row4 = + vreinterpretq_s16_s8(vqtbl4q_s8(tbl_rows_4567, idx_rows_4567.val[0])); + int16x8_t row5 = + vreinterpretq_s16_s8(vqtbl4q_s8(tbl_rows_2345, idx_rows_4567.val[1])); + int16x8_t row6 = + vreinterpretq_s16_s8(vqtbl4q_s8(tbl_rows_4567, idx_rows_4567.val[2])); + int16x8_t row7 = + vreinterpretq_s16_s8(vqtbl3q_s8(tbl_rows_567, idx_rows_4567.val[3])); + + /* Compute DC coefficient difference value (F.1.1.5.1). */ + row0 = vsetq_lane_s16(block[0] - last_dc_val, row0, 0); + /* Initialize AC coefficient lanes not reachable by lookup tables. */ + row1 = + vsetq_lane_s16(vgetq_lane_s16(vreinterpretq_s16_s8(tbl_rows_4567.val[0]), + 0), row1, 2); + row2 = + vsetq_lane_s16(vgetq_lane_s16(vreinterpretq_s16_s8(tbl_rows_0123.val[1]), + 4), row2, 0); + row2 = + vsetq_lane_s16(vgetq_lane_s16(vreinterpretq_s16_s8(tbl_rows_4567.val[2]), + 0), row2, 5); + row5 = + vsetq_lane_s16(vgetq_lane_s16(vreinterpretq_s16_s8(tbl_rows_0123.val[1]), + 7), row5, 2); + row5 = + vsetq_lane_s16(vgetq_lane_s16(vreinterpretq_s16_s8(tbl_rows_4567.val[2]), + 3), row5, 7); + row6 = + vsetq_lane_s16(vgetq_lane_s16(vreinterpretq_s16_s8(tbl_rows_0123.val[3]), + 7), row6, 5); + + /* DCT block is now in zig-zag order; start Huffman encoding process. */ + int16x8_t abs_row0 = vabsq_s16(row0); + int16x8_t abs_row1 = vabsq_s16(row1); + int16x8_t abs_row2 = vabsq_s16(row2); + int16x8_t abs_row3 = vabsq_s16(row3); + int16x8_t abs_row4 = vabsq_s16(row4); + int16x8_t abs_row5 = vabsq_s16(row5); + int16x8_t abs_row6 = vabsq_s16(row6); + int16x8_t abs_row7 = vabsq_s16(row7); + + /* For negative coeffs: diff = abs(coeff) -1 = ~abs(coeff) */ + uint16x8_t row0_diff = + vreinterpretq_u16_s16(veorq_s16(abs_row0, vshrq_n_s16(row0, 15))); + uint16x8_t row1_diff = + vreinterpretq_u16_s16(veorq_s16(abs_row1, vshrq_n_s16(row1, 15))); + uint16x8_t row2_diff = + vreinterpretq_u16_s16(veorq_s16(abs_row2, vshrq_n_s16(row2, 15))); + uint16x8_t row3_diff = + vreinterpretq_u16_s16(veorq_s16(abs_row3, vshrq_n_s16(row3, 15))); + uint16x8_t row4_diff = + vreinterpretq_u16_s16(veorq_s16(abs_row4, vshrq_n_s16(row4, 15))); + uint16x8_t row5_diff = + vreinterpretq_u16_s16(veorq_s16(abs_row5, vshrq_n_s16(row5, 15))); + uint16x8_t row6_diff = + vreinterpretq_u16_s16(veorq_s16(abs_row6, vshrq_n_s16(row6, 15))); + uint16x8_t row7_diff = + vreinterpretq_u16_s16(veorq_s16(abs_row7, vshrq_n_s16(row7, 15))); + + /* Construct bitmap to accelerate encoding of AC coefficients. A set bit + * means that the corresponding coefficient != 0. + */ + uint8x8_t abs_row0_gt0 = vmovn_u16(vcgtq_u16(vreinterpretq_u16_s16(abs_row0), + vdupq_n_u16(0))); + uint8x8_t abs_row1_gt0 = vmovn_u16(vcgtq_u16(vreinterpretq_u16_s16(abs_row1), + vdupq_n_u16(0))); + uint8x8_t abs_row2_gt0 = vmovn_u16(vcgtq_u16(vreinterpretq_u16_s16(abs_row2), + vdupq_n_u16(0))); + uint8x8_t abs_row3_gt0 = vmovn_u16(vcgtq_u16(vreinterpretq_u16_s16(abs_row3), + vdupq_n_u16(0))); + uint8x8_t abs_row4_gt0 = vmovn_u16(vcgtq_u16(vreinterpretq_u16_s16(abs_row4), + vdupq_n_u16(0))); + uint8x8_t abs_row5_gt0 = vmovn_u16(vcgtq_u16(vreinterpretq_u16_s16(abs_row5), + vdupq_n_u16(0))); + uint8x8_t abs_row6_gt0 = vmovn_u16(vcgtq_u16(vreinterpretq_u16_s16(abs_row6), + vdupq_n_u16(0))); + uint8x8_t abs_row7_gt0 = vmovn_u16(vcgtq_u16(vreinterpretq_u16_s16(abs_row7), + vdupq_n_u16(0))); + + /* { 0x80, 0x40, 0x20, 0x10, 0x08, 0x04, 0x02, 0x01 } */ + const uint8x8_t bitmap_mask = + vreinterpret_u8_u64(vmov_n_u64(0x0102040810204080)); + + abs_row0_gt0 = vand_u8(abs_row0_gt0, bitmap_mask); + abs_row1_gt0 = vand_u8(abs_row1_gt0, bitmap_mask); + abs_row2_gt0 = vand_u8(abs_row2_gt0, bitmap_mask); + abs_row3_gt0 = vand_u8(abs_row3_gt0, bitmap_mask); + abs_row4_gt0 = vand_u8(abs_row4_gt0, bitmap_mask); + abs_row5_gt0 = vand_u8(abs_row5_gt0, bitmap_mask); + abs_row6_gt0 = vand_u8(abs_row6_gt0, bitmap_mask); + abs_row7_gt0 = vand_u8(abs_row7_gt0, bitmap_mask); + + uint8x8_t bitmap_rows_10 = vpadd_u8(abs_row1_gt0, abs_row0_gt0); + uint8x8_t bitmap_rows_32 = vpadd_u8(abs_row3_gt0, abs_row2_gt0); + uint8x8_t bitmap_rows_54 = vpadd_u8(abs_row5_gt0, abs_row4_gt0); + uint8x8_t bitmap_rows_76 = vpadd_u8(abs_row7_gt0, abs_row6_gt0); + uint8x8_t bitmap_rows_3210 = vpadd_u8(bitmap_rows_32, bitmap_rows_10); + uint8x8_t bitmap_rows_7654 = vpadd_u8(bitmap_rows_76, bitmap_rows_54); + uint8x8_t bitmap_all = vpadd_u8(bitmap_rows_7654, bitmap_rows_3210); + + /* Shift left to remove DC bit. */ + bitmap_all = + vreinterpret_u8_u64(vshl_n_u64(vreinterpret_u64_u8(bitmap_all), 1)); + /* Count bits set (number of non-zero coefficients) in bitmap. */ + unsigned int non_zero_coefficients = vaddv_u8(vcnt_u8(bitmap_all)); + /* Move bitmap to 64-bit scalar register. */ + uint64_t bitmap = vget_lane_u64(vreinterpret_u64_u8(bitmap_all), 0); + + /* Set up state and bit buffer for output bitstream. */ + working_state *state_ptr = (working_state *)state; + int free_bits = state_ptr->cur.free_bits; + size_t put_buffer = state_ptr->cur.put_buffer; + + /* Encode DC coefficient. */ + + /* Find nbits required to specify sign and amplitude of coefficient. */ +#if defined(_MSC_VER) && !defined(__clang__) + unsigned int lz = BUILTIN_CLZ(vgetq_lane_s16(abs_row0, 0)); +#else + unsigned int lz; + __asm__("clz %w0, %w1" : "=r"(lz) : "r"(vgetq_lane_s16(abs_row0, 0))); +#endif + unsigned int nbits = 32 - lz; + /* Emit Huffman-coded symbol and additional diff bits. */ + unsigned int diff = (unsigned int)(vgetq_lane_u16(row0_diff, 0) << lz) >> lz; + PUT_CODE(dctbl->ehufco[nbits], dctbl->ehufsi[nbits], diff) + + /* Encode AC coefficients. */ + + unsigned int r = 0; /* r = run length of zeros */ + unsigned int i = 1; /* i = number of coefficients encoded */ + /* Code and size information for a run length of 16 zero coefficients */ + const unsigned int code_0xf0 = actbl->ehufco[0xf0]; + const unsigned int size_0xf0 = actbl->ehufsi[0xf0]; + + /* The most efficient method of computing nbits and diff depends on the + * number of non-zero coefficients. If the bitmap is not too sparse (> 8 + * non-zero AC coefficients), it is beneficial to use Neon; else we compute + * nbits and diff on demand using scalar code. + */ + if (non_zero_coefficients > 8) { + uint8_t block_nbits[DCTSIZE2]; + + int16x8_t row0_lz = vclzq_s16(abs_row0); + int16x8_t row1_lz = vclzq_s16(abs_row1); + int16x8_t row2_lz = vclzq_s16(abs_row2); + int16x8_t row3_lz = vclzq_s16(abs_row3); + int16x8_t row4_lz = vclzq_s16(abs_row4); + int16x8_t row5_lz = vclzq_s16(abs_row5); + int16x8_t row6_lz = vclzq_s16(abs_row6); + int16x8_t row7_lz = vclzq_s16(abs_row7); + /* Compute nbits needed to specify magnitude of each coefficient. */ + uint8x8_t row0_nbits = vsub_u8(vdup_n_u8(16), + vmovn_u16(vreinterpretq_u16_s16(row0_lz))); + uint8x8_t row1_nbits = vsub_u8(vdup_n_u8(16), + vmovn_u16(vreinterpretq_u16_s16(row1_lz))); + uint8x8_t row2_nbits = vsub_u8(vdup_n_u8(16), + vmovn_u16(vreinterpretq_u16_s16(row2_lz))); + uint8x8_t row3_nbits = vsub_u8(vdup_n_u8(16), + vmovn_u16(vreinterpretq_u16_s16(row3_lz))); + uint8x8_t row4_nbits = vsub_u8(vdup_n_u8(16), + vmovn_u16(vreinterpretq_u16_s16(row4_lz))); + uint8x8_t row5_nbits = vsub_u8(vdup_n_u8(16), + vmovn_u16(vreinterpretq_u16_s16(row5_lz))); + uint8x8_t row6_nbits = vsub_u8(vdup_n_u8(16), + vmovn_u16(vreinterpretq_u16_s16(row6_lz))); + uint8x8_t row7_nbits = vsub_u8(vdup_n_u8(16), + vmovn_u16(vreinterpretq_u16_s16(row7_lz))); + /* Store nbits. */ + vst1_u8(block_nbits + 0 * DCTSIZE, row0_nbits); + vst1_u8(block_nbits + 1 * DCTSIZE, row1_nbits); + vst1_u8(block_nbits + 2 * DCTSIZE, row2_nbits); + vst1_u8(block_nbits + 3 * DCTSIZE, row3_nbits); + vst1_u8(block_nbits + 4 * DCTSIZE, row4_nbits); + vst1_u8(block_nbits + 5 * DCTSIZE, row5_nbits); + vst1_u8(block_nbits + 6 * DCTSIZE, row6_nbits); + vst1_u8(block_nbits + 7 * DCTSIZE, row7_nbits); + /* Mask bits not required to specify sign and amplitude of diff. */ + row0_diff = vshlq_u16(row0_diff, row0_lz); + row1_diff = vshlq_u16(row1_diff, row1_lz); + row2_diff = vshlq_u16(row2_diff, row2_lz); + row3_diff = vshlq_u16(row3_diff, row3_lz); + row4_diff = vshlq_u16(row4_diff, row4_lz); + row5_diff = vshlq_u16(row5_diff, row5_lz); + row6_diff = vshlq_u16(row6_diff, row6_lz); + row7_diff = vshlq_u16(row7_diff, row7_lz); + row0_diff = vshlq_u16(row0_diff, vnegq_s16(row0_lz)); + row1_diff = vshlq_u16(row1_diff, vnegq_s16(row1_lz)); + row2_diff = vshlq_u16(row2_diff, vnegq_s16(row2_lz)); + row3_diff = vshlq_u16(row3_diff, vnegq_s16(row3_lz)); + row4_diff = vshlq_u16(row4_diff, vnegq_s16(row4_lz)); + row5_diff = vshlq_u16(row5_diff, vnegq_s16(row5_lz)); + row6_diff = vshlq_u16(row6_diff, vnegq_s16(row6_lz)); + row7_diff = vshlq_u16(row7_diff, vnegq_s16(row7_lz)); + /* Store diff bits. */ + vst1q_u16(block_diff + 0 * DCTSIZE, row0_diff); + vst1q_u16(block_diff + 1 * DCTSIZE, row1_diff); + vst1q_u16(block_diff + 2 * DCTSIZE, row2_diff); + vst1q_u16(block_diff + 3 * DCTSIZE, row3_diff); + vst1q_u16(block_diff + 4 * DCTSIZE, row4_diff); + vst1q_u16(block_diff + 5 * DCTSIZE, row5_diff); + vst1q_u16(block_diff + 6 * DCTSIZE, row6_diff); + vst1q_u16(block_diff + 7 * DCTSIZE, row7_diff); + + while (bitmap != 0) { + r = BUILTIN_CLZLL(bitmap); + i += r; + bitmap <<= r; + nbits = block_nbits[i]; + diff = block_diff[i]; + while (r > 15) { + /* If run length > 15, emit special run-length-16 codes. */ + PUT_BITS(code_0xf0, size_0xf0) + r -= 16; + } + /* Emit Huffman symbol for run length / number of bits. (F.1.2.2.1) */ + unsigned int rs = (r << 4) + nbits; + PUT_CODE(actbl->ehufco[rs], actbl->ehufsi[rs], diff) + i++; + bitmap <<= 1; + } + } else if (bitmap != 0) { + uint16_t block_abs[DCTSIZE2]; + /* Store absolute value of coefficients. */ + vst1q_u16(block_abs + 0 * DCTSIZE, vreinterpretq_u16_s16(abs_row0)); + vst1q_u16(block_abs + 1 * DCTSIZE, vreinterpretq_u16_s16(abs_row1)); + vst1q_u16(block_abs + 2 * DCTSIZE, vreinterpretq_u16_s16(abs_row2)); + vst1q_u16(block_abs + 3 * DCTSIZE, vreinterpretq_u16_s16(abs_row3)); + vst1q_u16(block_abs + 4 * DCTSIZE, vreinterpretq_u16_s16(abs_row4)); + vst1q_u16(block_abs + 5 * DCTSIZE, vreinterpretq_u16_s16(abs_row5)); + vst1q_u16(block_abs + 6 * DCTSIZE, vreinterpretq_u16_s16(abs_row6)); + vst1q_u16(block_abs + 7 * DCTSIZE, vreinterpretq_u16_s16(abs_row7)); + /* Store diff bits. */ + vst1q_u16(block_diff + 0 * DCTSIZE, row0_diff); + vst1q_u16(block_diff + 1 * DCTSIZE, row1_diff); + vst1q_u16(block_diff + 2 * DCTSIZE, row2_diff); + vst1q_u16(block_diff + 3 * DCTSIZE, row3_diff); + vst1q_u16(block_diff + 4 * DCTSIZE, row4_diff); + vst1q_u16(block_diff + 5 * DCTSIZE, row5_diff); + vst1q_u16(block_diff + 6 * DCTSIZE, row6_diff); + vst1q_u16(block_diff + 7 * DCTSIZE, row7_diff); + + /* Same as above but must mask diff bits and compute nbits on demand. */ + while (bitmap != 0) { + r = BUILTIN_CLZLL(bitmap); + i += r; + bitmap <<= r; + lz = BUILTIN_CLZ(block_abs[i]); + nbits = 32 - lz; + diff = (unsigned int)(block_diff[i] << lz) >> lz; + while (r > 15) { + /* If run length > 15, emit special run-length-16 codes. */ + PUT_BITS(code_0xf0, size_0xf0) + r -= 16; + } + /* Emit Huffman symbol for run length / number of bits. (F.1.2.2.1) */ + unsigned int rs = (r << 4) + nbits; + PUT_CODE(actbl->ehufco[rs], actbl->ehufsi[rs], diff) + i++; + bitmap <<= 1; + } + } + + /* If the last coefficient(s) were zero, emit an end-of-block (EOB) code. + * The value of RS for the EOB code is 0. + */ + if (i != 64) { + PUT_BITS(actbl->ehufco[0], actbl->ehufsi[0]) + } + + state_ptr->cur.put_buffer = put_buffer; + state_ptr->cur.free_bits = free_bits; + + return buffer; +} diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/aarch64/jsimd.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/aarch64/jsimd.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/aarch64/jsimd.c 1970-01-01 01:00:00.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/aarch64/jsimd.c 2021-11-20 03:41:33.398600450 +0000 @@ -0,0 +1,1063 @@ +/* + * jsimd_arm64.c + * + * Copyright 2009 Pierre Ossman for Cendio AB + * Copyright (C) 2011, Nokia Corporation and/or its subsidiary(-ies). + * Copyright (C) 2009-2011, 2013-2014, 2016, 2018, 2020, D. R. Commander. + * Copyright (C) 2015-2016, 2018, Matthieu Darbois. + * Copyright (C) 2020, Arm Limited. + * + * Based on the x86 SIMD extension for IJG JPEG library, + * Copyright (C) 1999-2006, MIYASAKA Masaru. + * For conditions of distribution and use, see copyright notice in jsimdext.inc + * + * This file contains the interface between the "normal" portions + * of the library and the SIMD implementations when running on a + * 64-bit Arm architecture. + */ + +#define JPEG_INTERNALS +#include "../../../jinclude.h" +#include "../../../jpeglib.h" +#include "../../../jsimd.h" +#include "../../../jdct.h" +#include "../../../jsimddct.h" +#include "../../jsimd.h" +#include "jconfigint.h" + +#include +#include +#include + +#define JSIMD_FASTLD3 1 +#define JSIMD_FASTST3 2 +#define JSIMD_FASTTBL 4 + +static unsigned int simd_support = ~0; +static unsigned int simd_huffman = 1; +static unsigned int simd_features = JSIMD_FASTLD3 | JSIMD_FASTST3 | + JSIMD_FASTTBL; + +#if defined(__linux__) || defined(ANDROID) || defined(__ANDROID__) + +#define SOMEWHAT_SANE_PROC_CPUINFO_SIZE_LIMIT (1024 * 1024) + +LOCAL(int) +check_cpuinfo(char *buffer, const char *field, char *value) +{ + char *p; + + if (*value == 0) + return 0; + if (strncmp(buffer, field, strlen(field)) != 0) + return 0; + buffer += strlen(field); + while (isspace(*buffer)) + buffer++; + + /* Check if 'value' is present in the buffer as a separate word */ + while ((p = strstr(buffer, value))) { + if (p > buffer && !isspace(*(p - 1))) { + buffer++; + continue; + } + p += strlen(value); + if (*p != 0 && !isspace(*p)) { + buffer++; + continue; + } + return 1; + } + return 0; +} + +LOCAL(int) +parse_proc_cpuinfo(int bufsize) +{ + char *buffer = (char *)malloc(bufsize); + FILE *fd; + + if (!buffer) + return 0; + + fd = fopen("/proc/cpuinfo", "r"); + if (fd) { + while (fgets(buffer, bufsize, fd)) { + if (!strchr(buffer, '\n') && !feof(fd)) { + /* "impossible" happened - insufficient size of the buffer! */ + fclose(fd); + free(buffer); + return 0; + } + if (check_cpuinfo(buffer, "CPU part", "0xd03") || + check_cpuinfo(buffer, "CPU part", "0xd07")) + /* The Cortex-A53 has a slow tbl implementation. We can gain a few + percent speedup by disabling the use of that instruction. The + speedup on Cortex-A57 is more subtle but still measurable. */ + simd_features &= ~JSIMD_FASTTBL; + else if (check_cpuinfo(buffer, "CPU part", "0x0a1")) + /* The SIMD version of Huffman encoding is slower than the C version on + Cavium ThunderX. Also, ld3 and st3 are abyssmally slow on that + CPU. */ + simd_huffman = simd_features = 0; + } + fclose(fd); + } + free(buffer); + return 1; +} + +#endif + +/* + * Check what SIMD accelerations are supported. + * + * FIXME: This code is racy under a multi-threaded environment. + */ + +/* + * Armv8 architectures support Neon extensions by default. + * It is no longer optional as it was with Armv7. + */ + + +LOCAL(void) +init_simd(void) +{ +#ifndef NO_GETENV + char *env = NULL; +#endif +#if defined(__linux__) || defined(ANDROID) || defined(__ANDROID__) + int bufsize = 1024; /* an initial guess for the line buffer size limit */ +#endif + + if (simd_support != ~0U) + return; + + simd_support = 0; + + simd_support |= JSIMD_NEON; +#if defined(__linux__) || defined(ANDROID) || defined(__ANDROID__) + while (!parse_proc_cpuinfo(bufsize)) { + bufsize *= 2; + if (bufsize > SOMEWHAT_SANE_PROC_CPUINFO_SIZE_LIMIT) + break; + } +#endif + +#ifndef NO_GETENV + /* Force different settings through environment variables */ + env = getenv("JSIMD_FORCENEON"); + if ((env != NULL) && (strcmp(env, "1") == 0)) + simd_support = JSIMD_NEON; + env = getenv("JSIMD_FORCENONE"); + if ((env != NULL) && (strcmp(env, "1") == 0)) + simd_support = 0; + env = getenv("JSIMD_NOHUFFENC"); + if ((env != NULL) && (strcmp(env, "1") == 0)) + simd_huffman = 0; + env = getenv("JSIMD_FASTLD3"); + if ((env != NULL) && (strcmp(env, "1") == 0)) + simd_features |= JSIMD_FASTLD3; + if ((env != NULL) && (strcmp(env, "0") == 0)) + simd_features &= ~JSIMD_FASTLD3; + env = getenv("JSIMD_FASTST3"); + if ((env != NULL) && (strcmp(env, "1") == 0)) + simd_features |= JSIMD_FASTST3; + if ((env != NULL) && (strcmp(env, "0") == 0)) + simd_features &= ~JSIMD_FASTST3; +#endif +} + +GLOBAL(int) +jsimd_can_rgb_ycc(void) +{ + init_simd(); + + /* The code is optimised for these values only */ + if (BITS_IN_JSAMPLE != 8) + return 0; + if (sizeof(JDIMENSION) != 4) + return 0; + if ((RGB_PIXELSIZE != 3) && (RGB_PIXELSIZE != 4)) + return 0; + + if (simd_support & JSIMD_NEON) + return 1; + + return 0; +} + +GLOBAL(int) +jsimd_can_rgb_gray(void) +{ + init_simd(); + + /* The code is optimised for these values only */ + if (BITS_IN_JSAMPLE != 8) + return 0; + if (sizeof(JDIMENSION) != 4) + return 0; + if ((RGB_PIXELSIZE != 3) && (RGB_PIXELSIZE != 4)) + return 0; + + if (simd_support & JSIMD_NEON) + return 1; + + return 0; +} + +GLOBAL(int) +jsimd_can_ycc_rgb(void) +{ + init_simd(); + + /* The code is optimised for these values only */ + if (BITS_IN_JSAMPLE != 8) + return 0; + if (sizeof(JDIMENSION) != 4) + return 0; + if ((RGB_PIXELSIZE != 3) && (RGB_PIXELSIZE != 4)) + return 0; + + if (simd_support & JSIMD_NEON) + return 1; + + return 0; +} + +GLOBAL(int) +jsimd_can_ycc_rgb565(void) +{ + init_simd(); + + /* The code is optimised for these values only */ + if (BITS_IN_JSAMPLE != 8) + return 0; + if (sizeof(JDIMENSION) != 4) + return 0; + + if (simd_support & JSIMD_NEON) + return 1; + + return 0; +} + +GLOBAL(void) +jsimd_rgb_ycc_convert(j_compress_ptr cinfo, JSAMPARRAY input_buf, + JSAMPIMAGE output_buf, JDIMENSION output_row, + int num_rows) +{ + void (*neonfct) (JDIMENSION, JSAMPARRAY, JSAMPIMAGE, JDIMENSION, int); + + switch (cinfo->in_color_space) { + case JCS_EXT_RGB: +#ifndef NEON_INTRINSICS + if (simd_features & JSIMD_FASTLD3) +#endif + neonfct = jsimd_extrgb_ycc_convert_neon; +#ifndef NEON_INTRINSICS + else + neonfct = jsimd_extrgb_ycc_convert_neon_slowld3; +#endif + break; + case JCS_EXT_RGBX: + case JCS_EXT_RGBA: + neonfct = jsimd_extrgbx_ycc_convert_neon; + break; + case JCS_EXT_BGR: +#ifndef NEON_INTRINSICS + if (simd_features & JSIMD_FASTLD3) +#endif + neonfct = jsimd_extbgr_ycc_convert_neon; +#ifndef NEON_INTRINSICS + else + neonfct = jsimd_extbgr_ycc_convert_neon_slowld3; +#endif + break; + case JCS_EXT_BGRX: + case JCS_EXT_BGRA: + neonfct = jsimd_extbgrx_ycc_convert_neon; + break; + case JCS_EXT_XBGR: + case JCS_EXT_ABGR: + neonfct = jsimd_extxbgr_ycc_convert_neon; + break; + case JCS_EXT_XRGB: + case JCS_EXT_ARGB: + neonfct = jsimd_extxrgb_ycc_convert_neon; + break; + default: +#ifndef NEON_INTRINSICS + if (simd_features & JSIMD_FASTLD3) +#endif + neonfct = jsimd_extrgb_ycc_convert_neon; +#ifndef NEON_INTRINSICS + else + neonfct = jsimd_extrgb_ycc_convert_neon_slowld3; +#endif + break; + } + + neonfct(cinfo->image_width, input_buf, output_buf, output_row, num_rows); +} + +GLOBAL(void) +jsimd_rgb_gray_convert(j_compress_ptr cinfo, JSAMPARRAY input_buf, + JSAMPIMAGE output_buf, JDIMENSION output_row, + int num_rows) +{ + void (*neonfct) (JDIMENSION, JSAMPARRAY, JSAMPIMAGE, JDIMENSION, int); + + switch (cinfo->in_color_space) { + case JCS_EXT_RGB: + neonfct = jsimd_extrgb_gray_convert_neon; + break; + case JCS_EXT_RGBX: + case JCS_EXT_RGBA: + neonfct = jsimd_extrgbx_gray_convert_neon; + break; + case JCS_EXT_BGR: + neonfct = jsimd_extbgr_gray_convert_neon; + break; + case JCS_EXT_BGRX: + case JCS_EXT_BGRA: + neonfct = jsimd_extbgrx_gray_convert_neon; + break; + case JCS_EXT_XBGR: + case JCS_EXT_ABGR: + neonfct = jsimd_extxbgr_gray_convert_neon; + break; + case JCS_EXT_XRGB: + case JCS_EXT_ARGB: + neonfct = jsimd_extxrgb_gray_convert_neon; + break; + default: + neonfct = jsimd_extrgb_gray_convert_neon; + break; + } + + neonfct(cinfo->image_width, input_buf, output_buf, output_row, num_rows); +} + +GLOBAL(void) +jsimd_ycc_rgb_convert(j_decompress_ptr cinfo, JSAMPIMAGE input_buf, + JDIMENSION input_row, JSAMPARRAY output_buf, + int num_rows) +{ + void (*neonfct) (JDIMENSION, JSAMPIMAGE, JDIMENSION, JSAMPARRAY, int); + + switch (cinfo->out_color_space) { + case JCS_EXT_RGB: +#ifndef NEON_INTRINSICS + if (simd_features & JSIMD_FASTST3) +#endif + neonfct = jsimd_ycc_extrgb_convert_neon; +#ifndef NEON_INTRINSICS + else + neonfct = jsimd_ycc_extrgb_convert_neon_slowst3; +#endif + break; + case JCS_EXT_RGBX: + case JCS_EXT_RGBA: + neonfct = jsimd_ycc_extrgbx_convert_neon; + break; + case JCS_EXT_BGR: +#ifndef NEON_INTRINSICS + if (simd_features & JSIMD_FASTST3) +#endif + neonfct = jsimd_ycc_extbgr_convert_neon; +#ifndef NEON_INTRINSICS + else + neonfct = jsimd_ycc_extbgr_convert_neon_slowst3; +#endif + break; + case JCS_EXT_BGRX: + case JCS_EXT_BGRA: + neonfct = jsimd_ycc_extbgrx_convert_neon; + break; + case JCS_EXT_XBGR: + case JCS_EXT_ABGR: + neonfct = jsimd_ycc_extxbgr_convert_neon; + break; + case JCS_EXT_XRGB: + case JCS_EXT_ARGB: + neonfct = jsimd_ycc_extxrgb_convert_neon; + break; + default: +#ifndef NEON_INTRINSICS + if (simd_features & JSIMD_FASTST3) +#endif + neonfct = jsimd_ycc_extrgb_convert_neon; +#ifndef NEON_INTRINSICS + else + neonfct = jsimd_ycc_extrgb_convert_neon_slowst3; +#endif + break; + } + + neonfct(cinfo->output_width, input_buf, input_row, output_buf, num_rows); +} + +GLOBAL(void) +jsimd_ycc_rgb565_convert(j_decompress_ptr cinfo, JSAMPIMAGE input_buf, + JDIMENSION input_row, JSAMPARRAY output_buf, + int num_rows) +{ + jsimd_ycc_rgb565_convert_neon(cinfo->output_width, input_buf, input_row, + output_buf, num_rows); +} + +GLOBAL(int) +jsimd_can_h2v2_downsample(void) +{ + init_simd(); + + /* The code is optimised for these values only */ + if (BITS_IN_JSAMPLE != 8) + return 0; + if (DCTSIZE != 8) + return 0; + if (sizeof(JDIMENSION) != 4) + return 0; + + if (simd_support & JSIMD_NEON) + return 1; + + return 0; +} + +GLOBAL(int) +jsimd_can_h2v1_downsample(void) +{ + init_simd(); + + /* The code is optimised for these values only */ + if (BITS_IN_JSAMPLE != 8) + return 0; + if (DCTSIZE != 8) + return 0; + if (sizeof(JDIMENSION) != 4) + return 0; + + if (simd_support & JSIMD_NEON) + return 1; + + return 0; +} + +GLOBAL(void) +jsimd_h2v2_downsample(j_compress_ptr cinfo, jpeg_component_info *compptr, + JSAMPARRAY input_data, JSAMPARRAY output_data) +{ + jsimd_h2v2_downsample_neon(cinfo->image_width, cinfo->max_v_samp_factor, + compptr->v_samp_factor, compptr->width_in_blocks, + input_data, output_data); +} + +GLOBAL(void) +jsimd_h2v1_downsample(j_compress_ptr cinfo, jpeg_component_info *compptr, + JSAMPARRAY input_data, JSAMPARRAY output_data) +{ + jsimd_h2v1_downsample_neon(cinfo->image_width, cinfo->max_v_samp_factor, + compptr->v_samp_factor, compptr->width_in_blocks, + input_data, output_data); +} + +GLOBAL(int) +jsimd_can_h2v2_upsample(void) +{ + init_simd(); + + /* The code is optimised for these values only */ + if (BITS_IN_JSAMPLE != 8) + return 0; + if (sizeof(JDIMENSION) != 4) + return 0; + + if (simd_support & JSIMD_NEON) + return 1; + + return 0; +} + +GLOBAL(int) +jsimd_can_h2v1_upsample(void) +{ + init_simd(); + + /* The code is optimised for these values only */ + if (BITS_IN_JSAMPLE != 8) + return 0; + if (sizeof(JDIMENSION) != 4) + return 0; + if (simd_support & JSIMD_NEON) + return 1; + + return 0; +} + +GLOBAL(void) +jsimd_h2v2_upsample(j_decompress_ptr cinfo, jpeg_component_info *compptr, + JSAMPARRAY input_data, JSAMPARRAY *output_data_ptr) +{ + jsimd_h2v2_upsample_neon(cinfo->max_v_samp_factor, cinfo->output_width, + input_data, output_data_ptr); +} + +GLOBAL(void) +jsimd_h2v1_upsample(j_decompress_ptr cinfo, jpeg_component_info *compptr, + JSAMPARRAY input_data, JSAMPARRAY *output_data_ptr) +{ + jsimd_h2v1_upsample_neon(cinfo->max_v_samp_factor, cinfo->output_width, + input_data, output_data_ptr); +} + +GLOBAL(int) +jsimd_can_h2v2_fancy_upsample(void) +{ + init_simd(); + + /* The code is optimised for these values only */ + if (BITS_IN_JSAMPLE != 8) + return 0; + if (sizeof(JDIMENSION) != 4) + return 0; + + if (simd_support & JSIMD_NEON) + return 1; + + return 0; +} + +GLOBAL(int) +jsimd_can_h2v1_fancy_upsample(void) +{ + init_simd(); + + /* The code is optimised for these values only */ + if (BITS_IN_JSAMPLE != 8) + return 0; + if (sizeof(JDIMENSION) != 4) + return 0; + + if (simd_support & JSIMD_NEON) + return 1; + + return 0; +} + +GLOBAL(int) +jsimd_can_h1v2_fancy_upsample(void) +{ + init_simd(); + + /* The code is optimised for these values only */ + if (BITS_IN_JSAMPLE != 8) + return 0; + if (sizeof(JDIMENSION) != 4) + return 0; + + if (simd_support & JSIMD_NEON) + return 1; + + return 0; +} + +GLOBAL(void) +jsimd_h2v2_fancy_upsample(j_decompress_ptr cinfo, jpeg_component_info *compptr, + JSAMPARRAY input_data, JSAMPARRAY *output_data_ptr) +{ + jsimd_h2v2_fancy_upsample_neon(cinfo->max_v_samp_factor, + compptr->downsampled_width, input_data, + output_data_ptr); +} + +GLOBAL(void) +jsimd_h2v1_fancy_upsample(j_decompress_ptr cinfo, jpeg_component_info *compptr, + JSAMPARRAY input_data, JSAMPARRAY *output_data_ptr) +{ + jsimd_h2v1_fancy_upsample_neon(cinfo->max_v_samp_factor, + compptr->downsampled_width, input_data, + output_data_ptr); +} + +GLOBAL(void) +jsimd_h1v2_fancy_upsample(j_decompress_ptr cinfo, jpeg_component_info *compptr, + JSAMPARRAY input_data, JSAMPARRAY *output_data_ptr) +{ + jsimd_h1v2_fancy_upsample_neon(cinfo->max_v_samp_factor, + compptr->downsampled_width, input_data, + output_data_ptr); +} + +GLOBAL(int) +jsimd_can_h2v2_merged_upsample(void) +{ + init_simd(); + + /* The code is optimised for these values only */ + if (BITS_IN_JSAMPLE != 8) + return 0; + if (sizeof(JDIMENSION) != 4) + return 0; + + if (simd_support & JSIMD_NEON) + return 1; + + return 0; +} + +GLOBAL(int) +jsimd_can_h2v1_merged_upsample(void) +{ + init_simd(); + + /* The code is optimised for these values only */ + if (BITS_IN_JSAMPLE != 8) + return 0; + if (sizeof(JDIMENSION) != 4) + return 0; + + if (simd_support & JSIMD_NEON) + return 1; + + return 0; +} + +GLOBAL(void) +jsimd_h2v2_merged_upsample(j_decompress_ptr cinfo, JSAMPIMAGE input_buf, + JDIMENSION in_row_group_ctr, JSAMPARRAY output_buf) +{ + void (*neonfct) (JDIMENSION, JSAMPIMAGE, JDIMENSION, JSAMPARRAY); + + switch (cinfo->out_color_space) { + case JCS_EXT_RGB: + neonfct = jsimd_h2v2_extrgb_merged_upsample_neon; + break; + case JCS_EXT_RGBX: + case JCS_EXT_RGBA: + neonfct = jsimd_h2v2_extrgbx_merged_upsample_neon; + break; + case JCS_EXT_BGR: + neonfct = jsimd_h2v2_extbgr_merged_upsample_neon; + break; + case JCS_EXT_BGRX: + case JCS_EXT_BGRA: + neonfct = jsimd_h2v2_extbgrx_merged_upsample_neon; + break; + case JCS_EXT_XBGR: + case JCS_EXT_ABGR: + neonfct = jsimd_h2v2_extxbgr_merged_upsample_neon; + break; + case JCS_EXT_XRGB: + case JCS_EXT_ARGB: + neonfct = jsimd_h2v2_extxrgb_merged_upsample_neon; + break; + default: + neonfct = jsimd_h2v2_extrgb_merged_upsample_neon; + break; + } + + neonfct(cinfo->output_width, input_buf, in_row_group_ctr, output_buf); +} + +GLOBAL(void) +jsimd_h2v1_merged_upsample(j_decompress_ptr cinfo, JSAMPIMAGE input_buf, + JDIMENSION in_row_group_ctr, JSAMPARRAY output_buf) +{ + void (*neonfct) (JDIMENSION, JSAMPIMAGE, JDIMENSION, JSAMPARRAY); + + switch (cinfo->out_color_space) { + case JCS_EXT_RGB: + neonfct = jsimd_h2v1_extrgb_merged_upsample_neon; + break; + case JCS_EXT_RGBX: + case JCS_EXT_RGBA: + neonfct = jsimd_h2v1_extrgbx_merged_upsample_neon; + break; + case JCS_EXT_BGR: + neonfct = jsimd_h2v1_extbgr_merged_upsample_neon; + break; + case JCS_EXT_BGRX: + case JCS_EXT_BGRA: + neonfct = jsimd_h2v1_extbgrx_merged_upsample_neon; + break; + case JCS_EXT_XBGR: + case JCS_EXT_ABGR: + neonfct = jsimd_h2v1_extxbgr_merged_upsample_neon; + break; + case JCS_EXT_XRGB: + case JCS_EXT_ARGB: + neonfct = jsimd_h2v1_extxrgb_merged_upsample_neon; + break; + default: + neonfct = jsimd_h2v1_extrgb_merged_upsample_neon; + break; + } + + neonfct(cinfo->output_width, input_buf, in_row_group_ctr, output_buf); +} + +GLOBAL(int) +jsimd_can_convsamp(void) +{ + init_simd(); + + /* The code is optimised for these values only */ + if (DCTSIZE != 8) + return 0; + if (BITS_IN_JSAMPLE != 8) + return 0; + if (sizeof(JDIMENSION) != 4) + return 0; + if (sizeof(DCTELEM) != 2) + return 0; + + if (simd_support & JSIMD_NEON) + return 1; + + return 0; +} + +GLOBAL(int) +jsimd_can_convsamp_float(void) +{ + return 0; +} + +GLOBAL(void) +jsimd_convsamp(JSAMPARRAY sample_data, JDIMENSION start_col, + DCTELEM *workspace) +{ + jsimd_convsamp_neon(sample_data, start_col, workspace); +} + +GLOBAL(void) +jsimd_convsamp_float(JSAMPARRAY sample_data, JDIMENSION start_col, + FAST_FLOAT *workspace) +{ +} + +GLOBAL(int) +jsimd_can_fdct_islow(void) +{ + init_simd(); + + /* The code is optimised for these values only */ + if (DCTSIZE != 8) + return 0; + if (sizeof(DCTELEM) != 2) + return 0; + + if (simd_support & JSIMD_NEON) + return 1; + + return 0; +} + +GLOBAL(int) +jsimd_can_fdct_ifast(void) +{ + init_simd(); + + /* The code is optimised for these values only */ + if (DCTSIZE != 8) + return 0; + if (sizeof(DCTELEM) != 2) + return 0; + + if (simd_support & JSIMD_NEON) + return 1; + + return 0; +} + +GLOBAL(int) +jsimd_can_fdct_float(void) +{ + return 0; +} + +GLOBAL(void) +jsimd_fdct_islow(DCTELEM *data) +{ + jsimd_fdct_islow_neon(data); +} + +GLOBAL(void) +jsimd_fdct_ifast(DCTELEM *data) +{ + jsimd_fdct_ifast_neon(data); +} + +GLOBAL(void) +jsimd_fdct_float(FAST_FLOAT *data) +{ +} + +GLOBAL(int) +jsimd_can_quantize(void) +{ + init_simd(); + + /* The code is optimised for these values only */ + if (DCTSIZE != 8) + return 0; + if (sizeof(JCOEF) != 2) + return 0; + if (sizeof(DCTELEM) != 2) + return 0; + + if (simd_support & JSIMD_NEON) + return 1; + + return 0; +} + +GLOBAL(int) +jsimd_can_quantize_float(void) +{ + return 0; +} + +GLOBAL(void) +jsimd_quantize(JCOEFPTR coef_block, DCTELEM *divisors, DCTELEM *workspace) +{ + jsimd_quantize_neon(coef_block, divisors, workspace); +} + +GLOBAL(void) +jsimd_quantize_float(JCOEFPTR coef_block, FAST_FLOAT *divisors, + FAST_FLOAT *workspace) +{ +} + +GLOBAL(int) +jsimd_can_idct_2x2(void) +{ + init_simd(); + + /* The code is optimised for these values only */ + if (DCTSIZE != 8) + return 0; + if (sizeof(JCOEF) != 2) + return 0; + if (BITS_IN_JSAMPLE != 8) + return 0; + if (sizeof(JDIMENSION) != 4) + return 0; + if (sizeof(ISLOW_MULT_TYPE) != 2) + return 0; + + if (simd_support & JSIMD_NEON) + return 1; + + return 0; +} + +GLOBAL(int) +jsimd_can_idct_4x4(void) +{ + init_simd(); + + /* The code is optimised for these values only */ + if (DCTSIZE != 8) + return 0; + if (sizeof(JCOEF) != 2) + return 0; + if (BITS_IN_JSAMPLE != 8) + return 0; + if (sizeof(JDIMENSION) != 4) + return 0; + if (sizeof(ISLOW_MULT_TYPE) != 2) + return 0; + + if (simd_support & JSIMD_NEON) + return 1; + + return 0; +} + +GLOBAL(void) +jsimd_idct_2x2(j_decompress_ptr cinfo, jpeg_component_info *compptr, + JCOEFPTR coef_block, JSAMPARRAY output_buf, + JDIMENSION output_col) +{ + jsimd_idct_2x2_neon(compptr->dct_table, coef_block, output_buf, output_col); +} + +GLOBAL(void) +jsimd_idct_4x4(j_decompress_ptr cinfo, jpeg_component_info *compptr, + JCOEFPTR coef_block, JSAMPARRAY output_buf, + JDIMENSION output_col) +{ + jsimd_idct_4x4_neon(compptr->dct_table, coef_block, output_buf, output_col); +} + +GLOBAL(int) +jsimd_can_idct_islow(void) +{ + init_simd(); + + /* The code is optimised for these values only */ + if (DCTSIZE != 8) + return 0; + if (sizeof(JCOEF) != 2) + return 0; + if (BITS_IN_JSAMPLE != 8) + return 0; + if (sizeof(JDIMENSION) != 4) + return 0; + if (sizeof(ISLOW_MULT_TYPE) != 2) + return 0; + + if (simd_support & JSIMD_NEON) + return 1; + + return 0; +} + +GLOBAL(int) +jsimd_can_idct_ifast(void) +{ + init_simd(); + + /* The code is optimised for these values only */ + if (DCTSIZE != 8) + return 0; + if (sizeof(JCOEF) != 2) + return 0; + if (BITS_IN_JSAMPLE != 8) + return 0; + if (sizeof(JDIMENSION) != 4) + return 0; + if (sizeof(IFAST_MULT_TYPE) != 2) + return 0; + if (IFAST_SCALE_BITS != 2) + return 0; + + if (simd_support & JSIMD_NEON) + return 1; + + return 0; +} + +GLOBAL(int) +jsimd_can_idct_float(void) +{ + return 0; +} + +GLOBAL(void) +jsimd_idct_islow(j_decompress_ptr cinfo, jpeg_component_info *compptr, + JCOEFPTR coef_block, JSAMPARRAY output_buf, + JDIMENSION output_col) +{ + jsimd_idct_islow_neon(compptr->dct_table, coef_block, output_buf, + output_col); +} + +GLOBAL(void) +jsimd_idct_ifast(j_decompress_ptr cinfo, jpeg_component_info *compptr, + JCOEFPTR coef_block, JSAMPARRAY output_buf, + JDIMENSION output_col) +{ + jsimd_idct_ifast_neon(compptr->dct_table, coef_block, output_buf, + output_col); +} + +GLOBAL(void) +jsimd_idct_float(j_decompress_ptr cinfo, jpeg_component_info *compptr, + JCOEFPTR coef_block, JSAMPARRAY output_buf, + JDIMENSION output_col) +{ +} + +GLOBAL(int) +jsimd_can_huff_encode_one_block(void) +{ + init_simd(); + + if (DCTSIZE != 8) + return 0; + if (sizeof(JCOEF) != 2) + return 0; + + if (simd_support & JSIMD_NEON && simd_huffman) + return 1; + + return 0; +} + +GLOBAL(JOCTET *) +jsimd_huff_encode_one_block(void *state, JOCTET *buffer, JCOEFPTR block, + int last_dc_val, c_derived_tbl *dctbl, + c_derived_tbl *actbl) +{ +#ifndef NEON_INTRINSICS + if (simd_features & JSIMD_FASTTBL) +#endif + return jsimd_huff_encode_one_block_neon(state, buffer, block, last_dc_val, + dctbl, actbl); +#ifndef NEON_INTRINSICS + else + return jsimd_huff_encode_one_block_neon_slowtbl(state, buffer, block, + last_dc_val, dctbl, actbl); +#endif +} + +GLOBAL(int) +jsimd_can_encode_mcu_AC_first_prepare(void) +{ + init_simd(); + + if (DCTSIZE != 8) + return 0; + if (sizeof(JCOEF) != 2) + return 0; + if (SIZEOF_SIZE_T != 8) + return 0; + + if (simd_support & JSIMD_NEON) + return 1; + + return 0; +} + +GLOBAL(void) +jsimd_encode_mcu_AC_first_prepare(const JCOEF *block, + const int *jpeg_natural_order_start, int Sl, + int Al, JCOEF *values, size_t *zerobits) +{ + jsimd_encode_mcu_AC_first_prepare_neon(block, jpeg_natural_order_start, + Sl, Al, values, zerobits); +} + +GLOBAL(int) +jsimd_can_encode_mcu_AC_refine_prepare(void) +{ + init_simd(); + + if (DCTSIZE != 8) + return 0; + if (sizeof(JCOEF) != 2) + return 0; + if (SIZEOF_SIZE_T != 8) + return 0; + + if (simd_support & JSIMD_NEON) + return 1; + + return 0; +} + +GLOBAL(int) +jsimd_encode_mcu_AC_refine_prepare(const JCOEF *block, + const int *jpeg_natural_order_start, int Sl, + int Al, JCOEF *absvalues, size_t *bits) +{ + return jsimd_encode_mcu_AC_refine_prepare_neon(block, + jpeg_natural_order_start, + Sl, Al, absvalues, bits); +} diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/align.h b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/align.h --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/align.h 1970-01-01 01:00:00.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/align.h 2021-11-20 03:41:33.398600450 +0000 @@ -0,0 +1,28 @@ +/* + * Copyright (C) 2020, Arm Limited. All Rights Reserved. + * + * This software is provided 'as-is', without any express or implied + * warranty. In no event will the authors be held liable for any damages + * arising from the use of this software. + * + * Permission is granted to anyone to use this software for any purpose, + * including commercial applications, and to alter it and redistribute it + * freely, subject to the following restrictions: + * + * 1. The origin of this software must not be misrepresented; you must not + * claim that you wrote the original software. If you use this software + * in a product, an acknowledgment in the product documentation would be + * appreciated but is not required. + * 2. Altered source versions must be plainly marked as such, and must not be + * misrepresented as being the original software. + * 3. This notice may not be removed or altered from any source distribution. + */ + +/* How to obtain memory alignment for structures and variables */ +#if defined(_MSC_VER) +#define ALIGN(alignment) __declspec(align(alignment)) +#elif defined(__clang__) || defined(__GNUC__) +#define ALIGN(alignment) __attribute__((aligned(alignment))) +#else +#error "Unknown compiler" +#endif diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/common/jccolor-neon.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/common/jccolor-neon.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/common/jccolor-neon.c 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/common/jccolor-neon.c 1970-01-01 01:00:00.000000000 +0100 @@ -1,158 +0,0 @@ -/* - * jccolor-neon.c - colorspace conversion (Arm Neon) - * - * Copyright 2020 The Chromium Authors. All Rights Reserved. - * - * This software is provided 'as-is', without any express or implied - * warranty. In no event will the authors be held liable for any damages - * arising from the use of this software. - * - * Permission is granted to anyone to use this software for any purpose, - * including commercial applications, and to alter it and redistribute it - * freely, subject to the following restrictions: - * - * 1. The origin of this software must not be misrepresented; you must not - * claim that you wrote the original software. If you use this software - * in a product, an acknowledgment in the product documentation would be - * appreciated but is not required. - * 2. Altered source versions must be plainly marked as such, and must not be - * misrepresented as being the original software. - * 3. This notice may not be removed or altered from any source distribution. - */ - -#define JPEG_INTERNALS -#include "../../../jconfigint.h" -#include "../../../jinclude.h" -#include "../../../jpeglib.h" -#include "../../../jsimd.h" -#include "../../../jdct.h" -#include "../../../jsimddct.h" -#include "../../jsimd.h" - -#include - -/* RGB -> YCbCr conversion constants. */ - -#define F_0_298 19595 -#define F_0_587 38470 -#define F_0_113 7471 -#define F_0_168 11059 -#define F_0_331 21709 -#define F_0_500 32768 -#define F_0_418 27439 -#define F_0_081 5329 - -ALIGN(16) static const uint16_t jsimd_rgb_ycc_neon_consts[] = { - F_0_298, F_0_587, - F_0_113, F_0_168, - F_0_331, F_0_500, - F_0_418, F_0_081 - }; - -/* Include inline routines for colorspace extensions. */ - -#if defined(__aarch64__) -#include "../arm64/jccolext-neon.c" -#else -#include "../arm/jccolext-neon.c" -#endif -#undef RGB_RED -#undef RGB_GREEN -#undef RGB_BLUE -#undef RGB_PIXELSIZE - -#define RGB_RED EXT_RGB_RED -#define RGB_GREEN EXT_RGB_GREEN -#define RGB_BLUE EXT_RGB_BLUE -#define RGB_PIXELSIZE EXT_RGB_PIXELSIZE -#define jsimd_rgb_ycc_convert_neon jsimd_extrgb_ycc_convert_neon -#if defined(__aarch64__) -#include "../arm64/jccolext-neon.c" -#else -#include "../arm/jccolext-neon.c" -#endif -#undef RGB_RED -#undef RGB_GREEN -#undef RGB_BLUE -#undef RGB_PIXELSIZE -#undef jsimd_rgb_ycc_convert_neon - -#define RGB_RED EXT_RGBX_RED -#define RGB_GREEN EXT_RGBX_GREEN -#define RGB_BLUE EXT_RGBX_BLUE -#define RGB_PIXELSIZE EXT_RGBX_PIXELSIZE -#define jsimd_rgb_ycc_convert_neon jsimd_extrgbx_ycc_convert_neon -#if defined(__aarch64__) -#include "../arm64/jccolext-neon.c" -#else -#include "../arm/jccolext-neon.c" -#endif -#undef RGB_RED -#undef RGB_GREEN -#undef RGB_BLUE -#undef RGB_PIXELSIZE -#undef jsimd_rgb_ycc_convert_neon - -#define RGB_RED EXT_BGR_RED -#define RGB_GREEN EXT_BGR_GREEN -#define RGB_BLUE EXT_BGR_BLUE -#define RGB_PIXELSIZE EXT_BGR_PIXELSIZE -#define jsimd_rgb_ycc_convert_neon jsimd_extbgr_ycc_convert_neon -#if defined(__aarch64__) -#include "../arm64/jccolext-neon.c" -#else -#include "../arm/jccolext-neon.c" -#endif -#undef RGB_RED -#undef RGB_GREEN -#undef RGB_BLUE -#undef RGB_PIXELSIZE -#undef jsimd_rgb_ycc_convert_neon - -#define RGB_RED EXT_BGRX_RED -#define RGB_GREEN EXT_BGRX_GREEN -#define RGB_BLUE EXT_BGRX_BLUE -#define RGB_PIXELSIZE EXT_BGRX_PIXELSIZE -#define jsimd_rgb_ycc_convert_neon jsimd_extbgrx_ycc_convert_neon -#if defined(__aarch64__) -#include "../arm64/jccolext-neon.c" -#else -#include "../arm/jccolext-neon.c" -#endif -#undef RGB_RED -#undef RGB_GREEN -#undef RGB_BLUE -#undef RGB_PIXELSIZE -#undef jsimd_rgb_ycc_convert_neon - -#define RGB_RED EXT_XBGR_RED -#define RGB_GREEN EXT_XBGR_GREEN -#define RGB_BLUE EXT_XBGR_BLUE -#define RGB_PIXELSIZE EXT_XBGR_PIXELSIZE -#define jsimd_rgb_ycc_convert_neon jsimd_extxbgr_ycc_convert_neon -#if defined(__aarch64__) -#include "../arm64/jccolext-neon.c" -#else -#include "../arm/jccolext-neon.c" -#endif -#undef RGB_RED -#undef RGB_GREEN -#undef RGB_BLUE -#undef RGB_PIXELSIZE -#undef jsimd_rgb_ycc_convert_neon - -#define RGB_RED EXT_XRGB_RED -#define RGB_GREEN EXT_XRGB_GREEN -#define RGB_BLUE EXT_XRGB_BLUE -#define RGB_PIXELSIZE EXT_XRGB_PIXELSIZE -#define jsimd_rgb_ycc_convert_neon jsimd_extxrgb_ycc_convert_neon -#if defined(__aarch64__) -#include "../arm64/jccolext-neon.c" -#else -#include "../arm/jccolext-neon.c" -#endif -#undef RGB_RED -#undef RGB_GREEN -#undef RGB_BLUE -#undef RGB_PIXELSIZE -#undef jsimd_rgb_ycc_convert_neon diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/common/jcgray-neon.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/common/jcgray-neon.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/common/jcgray-neon.c 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/common/jcgray-neon.c 1970-01-01 01:00:00.000000000 +0100 @@ -1,118 +0,0 @@ -/* - * jcgray-neon.c - grayscale colorspace conversion (Arm NEON) - * - * Copyright 2020 The Chromium Authors. All Rights Reserved. - * - * This software is provided 'as-is', without any express or implied - * warranty. In no event will the authors be held liable for any damages - * arising from the use of this software. - * - * Permission is granted to anyone to use this software for any purpose, - * including commercial applications, and to alter it and redistribute it - * freely, subject to the following restrictions: - * - * 1. The origin of this software must not be misrepresented; you must not - * claim that you wrote the original software. If you use this software - * in a product, an acknowledgment in the product documentation would be - * appreciated but is not required. - * 2. Altered source versions must be plainly marked as such, and must not be - * misrepresented as being the original software. - * 3. This notice may not be removed or altered from any source distribution. - */ - -#define JPEG_INTERNALS -#include "../../../jconfigint.h" -#include "../../../jinclude.h" -#include "../../../jpeglib.h" -#include "../../../jsimd.h" -#include "../../../jdct.h" -#include "../../../jsimddct.h" -#include "../../jsimd.h" - -#include - -/* RGB -> Grayscale conversion constants. */ - -#define F_0_298 19595 -#define F_0_587 38470 -#define F_0_113 7471 - -/* Include inline routines for colorspace extensions. */ - -#include "jcgryext-neon.c" -#undef RGB_RED -#undef RGB_GREEN -#undef RGB_BLUE -#undef RGB_PIXELSIZE - -#define RGB_RED EXT_RGB_RED -#define RGB_GREEN EXT_RGB_GREEN -#define RGB_BLUE EXT_RGB_BLUE -#define RGB_PIXELSIZE EXT_RGB_PIXELSIZE -#define jsimd_rgb_gray_convert_neon jsimd_extrgb_gray_convert_neon -#include "jcgryext-neon.c" -#undef RGB_RED -#undef RGB_GREEN -#undef RGB_BLUE -#undef RGB_PIXELSIZE -#undef jsimd_rgb_gray_convert_neon - -#define RGB_RED EXT_RGBX_RED -#define RGB_GREEN EXT_RGBX_GREEN -#define RGB_BLUE EXT_RGBX_BLUE -#define RGB_PIXELSIZE EXT_RGBX_PIXELSIZE -#define jsimd_rgb_gray_convert_neon jsimd_extrgbx_gray_convert_neon -#include "jcgryext-neon.c" -#undef RGB_RED -#undef RGB_GREEN -#undef RGB_BLUE -#undef RGB_PIXELSIZE -#undef jsimd_rgb_gray_convert_neon - -#define RGB_RED EXT_BGR_RED -#define RGB_GREEN EXT_BGR_GREEN -#define RGB_BLUE EXT_BGR_BLUE -#define RGB_PIXELSIZE EXT_BGR_PIXELSIZE -#define jsimd_rgb_gray_convert_neon jsimd_extbgr_gray_convert_neon -#include "jcgryext-neon.c" -#undef RGB_RED -#undef RGB_GREEN -#undef RGB_BLUE -#undef RGB_PIXELSIZE -#undef jsimd_rgb_gray_convert_neon - -#define RGB_RED EXT_BGRX_RED -#define RGB_GREEN EXT_BGRX_GREEN -#define RGB_BLUE EXT_BGRX_BLUE -#define RGB_PIXELSIZE EXT_BGRX_PIXELSIZE -#define jsimd_rgb_gray_convert_neon jsimd_extbgrx_gray_convert_neon -#include "jcgryext-neon.c" -#undef RGB_RED -#undef RGB_GREEN -#undef RGB_BLUE -#undef RGB_PIXELSIZE -#undef jsimd_rgb_gray_convert_neon - -#define RGB_RED EXT_XBGR_RED -#define RGB_GREEN EXT_XBGR_GREEN -#define RGB_BLUE EXT_XBGR_BLUE -#define RGB_PIXELSIZE EXT_XBGR_PIXELSIZE -#define jsimd_rgb_gray_convert_neon jsimd_extxbgr_gray_convert_neon -#include "jcgryext-neon.c" -#undef RGB_RED -#undef RGB_GREEN -#undef RGB_BLUE -#undef RGB_PIXELSIZE -#undef jsimd_rgb_gray_convert_neon - -#define RGB_RED EXT_XRGB_RED -#define RGB_GREEN EXT_XRGB_GREEN -#define RGB_BLUE EXT_XRGB_BLUE -#define RGB_PIXELSIZE EXT_XRGB_PIXELSIZE -#define jsimd_rgb_gray_convert_neon jsimd_extxrgb_gray_convert_neon -#include "jcgryext-neon.c" -#undef RGB_RED -#undef RGB_GREEN -#undef RGB_BLUE -#undef RGB_PIXELSIZE -#undef jsimd_rgb_gray_convert_neon diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/common/jcgryext-neon.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/common/jcgryext-neon.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/common/jcgryext-neon.c 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/common/jcgryext-neon.c 1970-01-01 01:00:00.000000000 +0100 @@ -1,107 +0,0 @@ -/* - * jcgryext-neon.c - grayscale colorspace conversion (Arm NEON) - * - * Copyright 2020 The Chromium Authors. All Rights Reserved. - * - * This software is provided 'as-is', without any express or implied - * warranty. In no event will the authors be held liable for any damages - * arising from the use of this software. - * - * Permission is granted to anyone to use this software for any purpose, - * including commercial applications, and to alter it and redistribute it - * freely, subject to the following restrictions: - * - * 1. The origin of this software must not be misrepresented; you must not - * claim that you wrote the original software. If you use this software - * in a product, an acknowledgment in the product documentation would be - * appreciated but is not required. - * 2. Altered source versions must be plainly marked as such, and must not be - * misrepresented as being the original software. - * 3. This notice may not be removed or altered from any source distribution. - */ - -/* This file is included by jcgray-neon.c */ - -/* - * RGB -> Grayscale conversion is defined by the following equation: - * Y = 0.29900 * R + 0.58700 * G + 0.11400 * B - * - * Avoid floating point arithmetic by using shifted integer constants: - * 0.29899597 = 19595 * 2^-16 - * 0.58700561 = 38470 * 2^-16 - * 0.11399841 = 7471 * 2^-16 - * These constants are defined in jcgray-neon.c - * - * We use rounding later to get correct values. - * - * This is the same computation as the RGB -> Y portion of RGB -> YCbCr. - */ - -void jsimd_rgb_gray_convert_neon(JDIMENSION image_width, - JSAMPARRAY input_buf, - JSAMPIMAGE output_buf, - JDIMENSION output_row, - int num_rows) -{ - JSAMPROW inptr; - JSAMPROW outptr; - - while (--num_rows >= 0) { - inptr = *input_buf++; - outptr = output_buf[0][output_row]; - output_row++; - - int cols_remaining = image_width; - for (; cols_remaining > 0; cols_remaining -= 16) { - - /* To prevent buffer overread by the vector load instructions, the */ - /* last (image_width % 16) columns of data are first memcopied to a */ - /* temporary buffer large enough to accommodate the vector load. */ - if (cols_remaining < 16) { - ALIGN(16) uint8_t tmp_buf[16 * RGB_PIXELSIZE]; - memcpy(tmp_buf, inptr, cols_remaining * RGB_PIXELSIZE); - inptr = tmp_buf; - } - -#if RGB_PIXELSIZE == 4 - uint8x16x4_t input_pixels = vld4q_u8(inptr); -#else - uint8x16x3_t input_pixels = vld3q_u8(inptr); -#endif - uint16x8_t r_l = vmovl_u8(vget_low_u8(input_pixels.val[RGB_RED])); - uint16x8_t r_h = vmovl_u8(vget_high_u8(input_pixels.val[RGB_RED])); - uint16x8_t g_l = vmovl_u8(vget_low_u8(input_pixels.val[RGB_GREEN])); - uint16x8_t g_h = vmovl_u8(vget_high_u8(input_pixels.val[RGB_GREEN])); - uint16x8_t b_l = vmovl_u8(vget_low_u8(input_pixels.val[RGB_BLUE])); - uint16x8_t b_h = vmovl_u8(vget_high_u8(input_pixels.val[RGB_BLUE])); - - /* Compute Y = 0.29900 * R + 0.58700 * G + 0.11400 * B */ - uint32x4_t y_ll = vmull_n_u16(vget_low_u16(r_l), F_0_298); - uint32x4_t y_lh = vmull_n_u16(vget_high_u16(r_l), F_0_298); - uint32x4_t y_hl = vmull_n_u16(vget_low_u16(r_h), F_0_298); - uint32x4_t y_hh = vmull_n_u16(vget_high_u16(r_h), F_0_298); - y_ll = vmlal_n_u16(y_ll, vget_low_u16(g_l), F_0_587); - y_lh = vmlal_n_u16(y_lh, vget_high_u16(g_l), F_0_587); - y_hl = vmlal_n_u16(y_hl, vget_low_u16(g_h), F_0_587); - y_hh = vmlal_n_u16(y_hh, vget_high_u16(g_h), F_0_587); - y_ll = vmlal_n_u16(y_ll, vget_low_u16(b_l), F_0_113); - y_lh = vmlal_n_u16(y_lh, vget_high_u16(b_l), F_0_113); - y_hl = vmlal_n_u16(y_hl, vget_low_u16(b_h), F_0_113); - y_hh = vmlal_n_u16(y_hh, vget_high_u16(b_h), F_0_113); - - /* Descale Y values (rounding right shift) and narrow to 16-bit. */ - uint16x8_t y_l = vcombine_u16(vrshrn_n_u32(y_ll, 16), - vrshrn_n_u32(y_lh, 16)); - uint16x8_t y_h = vcombine_u16(vrshrn_n_u32(y_hl, 16), - vrshrn_n_u32(y_hh, 16)); - - /* Narrow Y values to 8-bit and store to memory. Buffer overwrite is */ - /* permitted up to the next multiple of ALIGN_SIZE bytes. */ - vst1q_u8(outptr, vcombine_u8(vmovn_u16(y_l), vmovn_u16(y_h))); - - /* Increment pointers. */ - inptr += (16 * RGB_PIXELSIZE); - outptr += 16; - } - } -} diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/common/jcsample-neon.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/common/jcsample-neon.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/common/jcsample-neon.c 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/common/jcsample-neon.c 1970-01-01 01:00:00.000000000 +0100 @@ -1,191 +0,0 @@ -/* - * jcsample-neon.c - downsampling (Arm NEON) - * - * Copyright 2020 The Chromium Authors. All Rights Reserved. - * - * This software is provided 'as-is', without any express or implied - * warranty. In no event will the authors be held liable for any damages - * arising from the use of this software. - * - * Permission is granted to anyone to use this software for any purpose, - * including commercial applications, and to alter it and redistribute it - * freely, subject to the following restrictions: - * - * 1. The origin of this software must not be misrepresented; you must not - * claim that you wrote the original software. If you use this software - * in a product, an acknowledgment in the product documentation would be - * appreciated but is not required. - * 2. Altered source versions must be plainly marked as such, and must not be - * misrepresented as being the original software. - * 3. This notice may not be removed or altered from any source distribution. - */ - -#define JPEG_INTERNALS -#include "../../../jconfigint.h" -#include "../../../jinclude.h" -#include "../../../jpeglib.h" -#include "../../../jsimd.h" -#include "../../../jdct.h" -#include "../../../jsimddct.h" -#include "../../jsimd.h" - -#include - - -ALIGN(16) static const uint8_t jsimd_h2_downsample_consts[] = { - 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, /* Pad 0 */ - 0x08, 0x09, 0x0A, 0x0B, 0x0C, 0x0D, 0x0E, 0x0F, - 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, /* Pad 1 */ - 0x08, 0x09, 0x0A, 0x0B, 0x0C, 0x0D, 0x0E, 0x0E, - 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, /* Pad 2 */ - 0x08, 0x09, 0x0A, 0x0B, 0x0C, 0x0D, 0x0D, 0x0D, - 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, /* Pad 3 */ - 0x08, 0x09, 0x0A, 0x0B, 0x0C, 0x0C, 0x0C, 0x0C, - 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, /* Pad 4 */ - 0x08, 0x09, 0x0A, 0x0B, 0x0B, 0x0B, 0x0B, 0x0B, - 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, /* Pad 5 */ - 0x08, 0x09, 0x0A, 0x0A, 0x0A, 0x0A, 0x0A, 0x0A, - 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, /* Pad 6 */ - 0x08, 0x09, 0x09, 0x09, 0x09, 0x09, 0x09, 0x09, - 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, /* Pad 7 */ - 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, - 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, /* Pad 8 */ - 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, - 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x06, /* Pad 9 */ - 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, - 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x05, 0x05, /* Pad 10 */ - 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, - 0x00, 0x01, 0x02, 0x03, 0x04, 0x04, 0x04, 0x04, /* Pad 11 */ - 0x04, 0x04, 0x04, 0x04, 0x04, 0x04, 0x04, 0x04, - 0x00, 0x01, 0x02, 0x03, 0x03, 0x03, 0x03, 0x03, /* Pad 12 */ - 0x03, 0x03, 0x03, 0x03, 0x03, 0x03, 0x03, 0x03, - 0x00, 0x01, 0x02, 0x02, 0x02, 0x02, 0x02, 0x02, /* Pad 13 */ - 0x02, 0x02, 0x02, 0x02, 0x02, 0x02, 0x02, 0x02, - 0x00, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, /* Pad 14 */ - 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, - 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* Pad 15 */ - 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 -}; - - -/* - * Downsample pixel values of a single chroma component i.e. Cb, Cr. - * This version handles the common case of 2:1 horizontal and 1:1 vertical, - * without smoothing. - */ - -void jsimd_h2v1_downsample_neon(JDIMENSION image_width, - int max_v_samp_factor, - JDIMENSION v_samp_factor, - JDIMENSION width_in_blocks, - JSAMPARRAY input_data, - JSAMPARRAY output_data) -{ - JSAMPROW inptr, outptr; - /* Load expansion mask to pad remaining elements of last DCT block. */ - const int mask_offset = 16 * ((width_in_blocks * 2 * DCTSIZE) - image_width); - const uint8x16_t expand_mask = vld1q_u8( - &jsimd_h2_downsample_consts[mask_offset]); - /* Load bias pattern alternating every pixel. */ - const uint16x8_t bias = { 0, 1, 0, 1, 0, 1, 0, 1 }; - - for (unsigned outrow = 0; outrow < v_samp_factor; outrow++) { - outptr = output_data[outrow]; - inptr = input_data[outrow]; - - /* Downsample all but the last DCT block of pixels. */ - for (unsigned i = 0; i < width_in_blocks - 1; i++) { - uint8x16_t pixels = vld1q_u8(inptr + i * 2 * DCTSIZE); - /* Add adjacent pixel values, widen to 16-bit and add bias. */ - uint16x8_t samples_u16 = vpadalq_u8(bias, pixels); - /* Divide total by 2 and narrow to 8-bit. */ - uint8x8_t samples_u8 = vshrn_n_u16(samples_u16, 1); - /* Store samples to memory. */ - vst1_u8(outptr + i * DCTSIZE, samples_u8); - } - - /* Load pixels in last DCT block into a table. */ - uint8x16_t pixels = vld1q_u8(inptr + (width_in_blocks - 1) * 2 * DCTSIZE); -#if defined(__aarch64__) - /* Pad the empty elements with the value of the last pixel. */ - pixels = vqtbl1q_u8(pixels, expand_mask); -#else - uint8x8x2_t table = { vget_low_u8(pixels), vget_high_u8(pixels) }; - pixels = vcombine_u8(vtbl2_u8(table, vget_low_u8(expand_mask)), - vtbl2_u8(table, vget_high_u8(expand_mask))); -#endif - /* Add adjacent pixel values, widen to 16-bit and add bias. */ - uint16x8_t samples_u16 = vpadalq_u8(bias, pixels); - /* Divide total by 2, narrow to 8-bit and store. */ - uint8x8_t samples_u8 = vshrn_n_u16(samples_u16, 1); - vst1_u8(outptr + (width_in_blocks - 1) * DCTSIZE, samples_u8); - } -} - - -/* - * Downsample pixel values of a single chroma component i.e. Cb, Cr. - * This version handles the standard case of 2:1 horizontal and 2:1 vertical, - * without smoothing. - */ - -void jsimd_h2v2_downsample_neon(JDIMENSION image_width, - int max_v_samp_factor, - JDIMENSION v_samp_factor, - JDIMENSION width_in_blocks, - JSAMPARRAY input_data, - JSAMPARRAY output_data) -{ - JSAMPROW inptr0, inptr1, outptr; - /* Load expansion mask to pad remaining elements of last DCT block. */ - const int mask_offset = 16 * ((width_in_blocks * 2 * DCTSIZE) - image_width); - const uint8x16_t expand_mask = vld1q_u8( - &jsimd_h2_downsample_consts[mask_offset]); - /* Load bias pattern alternating every pixel. */ - const uint16x8_t bias = { 1, 2, 1, 2, 1, 2, 1, 2 }; - - for (unsigned outrow = 0; outrow < v_samp_factor; outrow++) { - outptr = output_data[outrow]; - inptr0 = input_data[outrow]; - inptr1 = input_data[outrow + 1]; - - /* Downsample all but the last DCT block of pixels. */ - for (unsigned i = 0; i < width_in_blocks - 1; i++) { - uint8x16_t pixels_r0 = vld1q_u8(inptr0 + i * 2 * DCTSIZE); - uint8x16_t pixels_r1 = vld1q_u8(inptr1 + i * 2 * DCTSIZE); - /* Add adjacent pixel values in row 0, widen to 16-bit and add bias. */ - uint16x8_t samples_u16 = vpadalq_u8(bias, pixels_r0); - /* Add adjacent pixel values in row 1, widen to 16-bit and accumulate. */ - samples_u16 = vpadalq_u8(samples_u16, pixels_r1); - /* Divide total by 4 and narrow to 8-bit. */ - uint8x8_t samples_u8 = vshrn_n_u16(samples_u16, 2); - /* Store samples to memory and increment pointers. */ - vst1_u8(outptr + i * DCTSIZE, samples_u8); - } - - /* Load pixels in last DCT block into a table. */ - uint8x16_t pixels_r0 = vld1q_u8( - inptr0 + (width_in_blocks - 1) * 2 * DCTSIZE); - uint8x16_t pixels_r1 = vld1q_u8( - inptr1 + (width_in_blocks - 1) * 2 * DCTSIZE); -#if defined(__aarch64__) - /* Pad the empty elements with the value of the last pixel. */ - pixels_r0 = vqtbl1q_u8(pixels_r0, expand_mask); - pixels_r1 = vqtbl1q_u8(pixels_r1, expand_mask); -#else - uint8x8x2_t table_r0 = { vget_low_u8(pixels_r0), vget_high_u8(pixels_r0) }; - uint8x8x2_t table_r1 = { vget_low_u8(pixels_r1), vget_high_u8(pixels_r1) }; - pixels_r0 = vcombine_u8(vtbl2_u8(table_r0, vget_low_u8(expand_mask)), - vtbl2_u8(table_r0, vget_high_u8(expand_mask))); - pixels_r1 = vcombine_u8(vtbl2_u8(table_r1, vget_low_u8(expand_mask)), - vtbl2_u8(table_r1, vget_high_u8(expand_mask))); -#endif - /* Add adjacent pixel values in row 0, widen to 16-bit and add bias. */ - uint16x8_t samples_u16 = vpadalq_u8(bias, pixels_r0); - /* Add adjacent pixel values in row 1, widen to 16-bit and accumulate. */ - samples_u16 = vpadalq_u8(samples_u16, pixels_r1); - /* Divide total by 4, narrow to 8-bit and store. */ - uint8x8_t samples_u8 = vshrn_n_u16(samples_u16, 2); - vst1_u8(outptr + (width_in_blocks - 1) * DCTSIZE, samples_u8); - } -} diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/common/jdcolext-neon.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/common/jdcolext-neon.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/common/jdcolext-neon.c 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/common/jdcolext-neon.c 1970-01-01 01:00:00.000000000 +0100 @@ -1,330 +0,0 @@ -/* - * jdcolext-neon.c - colorspace conversion (Arm NEON) - * - * Copyright 2019 The Chromium Authors. All Rights Reserved. - * - * This software is provided 'as-is', without any express or implied - * warranty. In no event will the authors be held liable for any damages - * arising from the use of this software. - * - * Permission is granted to anyone to use this software for any purpose, - * including commercial applications, and to alter it and redistribute it - * freely, subject to the following restrictions: - * - * 1. The origin of this software must not be misrepresented; you must not - * claim that you wrote the original software. If you use this software - * in a product, an acknowledgment in the product documentation would be - * appreciated but is not required. - * 2. Altered source versions must be plainly marked as such, and must not be - * misrepresented as being the original software. - * 3. This notice may not be removed or altered from any source distribution. - */ - -/* This file is included by jdcolor-neon.c. */ - -/* - * YCbCr -> RGB conversion is defined by the following equations: - * R = Y + 1.40200 * (Cr - 128) - * G = Y - 0.34414 * (Cb - 128) - 0.71414 * (Cr - 128) - * B = Y + 1.77200 * (Cb - 128) - * - * Scaled integer constants are used to avoid floating-point arithmetic: - * 0.3441467 = 11277 * 2^-15 - * 0.7141418 = 23401 * 2^-15 - * 1.4020386 = 22971 * 2^-14 - * 1.7720337 = 29033 * 2^-14 - * These constants are defined in jdcolor-neon.c. - * - * Rounding is used when descaling to ensure correct results. - */ - -/* - * Notes on safe memory access for YCbCr -> RGB conversion routines: - * - * Input memory buffers can be safely overread up to the next multiple of - * ALIGN_SIZE bytes since they are always allocated by alloc_sarray() in - * jmemmgr.c. - * - * The output buffer cannot safely be written beyond output_width since the - * TurboJPEG API permits it to be allocated with or without padding up to the - * next multiple of ALIGN_SIZE bytes. - */ - -void jsimd_ycc_rgb_convert_neon(JDIMENSION output_width, - JSAMPIMAGE input_buf, - JDIMENSION input_row, - JSAMPARRAY output_buf, - int num_rows) -{ - JSAMPROW outptr; - /* Pointers to Y, Cb and Cr data. */ - JSAMPROW inptr0, inptr1, inptr2; - - const int16x8_t neg_128 = vdupq_n_s16(-128); - - while (--num_rows >= 0) { - inptr0 = input_buf[0][input_row]; - inptr1 = input_buf[1][input_row]; - inptr2 = input_buf[2][input_row]; - input_row++; - outptr = *output_buf++; - int cols_remaining = output_width; - for (; cols_remaining >= 16; cols_remaining -= 16) { - uint8x16_t y = vld1q_u8(inptr0); - uint8x16_t cb = vld1q_u8(inptr1); - uint8x16_t cr = vld1q_u8(inptr2); - /* Subtract 128 from Cb and Cr. */ - int16x8_t cr_128_l = vreinterpretq_s16_u16( - vaddw_u8(vreinterpretq_u16_s16(neg_128), vget_low_u8(cr))); - int16x8_t cr_128_h = vreinterpretq_s16_u16( - vaddw_u8(vreinterpretq_u16_s16(neg_128), vget_high_u8(cr))); - int16x8_t cb_128_l = vreinterpretq_s16_u16( - vaddw_u8(vreinterpretq_u16_s16(neg_128), vget_low_u8(cb))); - int16x8_t cb_128_h = vreinterpretq_s16_u16( - vaddw_u8(vreinterpretq_u16_s16(neg_128), vget_high_u8(cb))); - /* Compute G-Y: - 0.34414 * (Cb - 128) - 0.71414 * (Cr - 128) */ - int32x4_t g_sub_y_ll = vmull_n_s16(vget_low_s16(cb_128_l), -F_0_344); - int32x4_t g_sub_y_lh = vmull_n_s16(vget_high_s16(cb_128_l), -F_0_344); - int32x4_t g_sub_y_hl = vmull_n_s16(vget_low_s16(cb_128_h), -F_0_344); - int32x4_t g_sub_y_hh = vmull_n_s16(vget_high_s16(cb_128_h), -F_0_344); - g_sub_y_ll = vmlsl_n_s16(g_sub_y_ll, vget_low_s16(cr_128_l), F_0_714); - g_sub_y_lh = vmlsl_n_s16(g_sub_y_lh, vget_high_s16(cr_128_l), F_0_714); - g_sub_y_hl = vmlsl_n_s16(g_sub_y_hl, vget_low_s16(cr_128_h), F_0_714); - g_sub_y_hh = vmlsl_n_s16(g_sub_y_hh, vget_high_s16(cr_128_h), F_0_714); - /* Descale G components: shift right 15, round and narrow to 16-bit. */ - int16x8_t g_sub_y_l = vcombine_s16(vrshrn_n_s32(g_sub_y_ll, 15), - vrshrn_n_s32(g_sub_y_lh, 15)); - int16x8_t g_sub_y_h = vcombine_s16(vrshrn_n_s32(g_sub_y_hl, 15), - vrshrn_n_s32(g_sub_y_hh, 15)); - /* Compute R-Y: 1.40200 * (Cr - 128) */ - int16x8_t r_sub_y_l = vqrdmulhq_n_s16(vshlq_n_s16(cr_128_l, 1), F_1_402); - int16x8_t r_sub_y_h = vqrdmulhq_n_s16(vshlq_n_s16(cr_128_h, 1), F_1_402); - /* Compute B-Y: 1.77200 * (Cb - 128) */ - int16x8_t b_sub_y_l = vqrdmulhq_n_s16(vshlq_n_s16(cb_128_l, 1), F_1_772); - int16x8_t b_sub_y_h = vqrdmulhq_n_s16(vshlq_n_s16(cb_128_h, 1), F_1_772); - /* Add Y. */ - int16x8_t r_l = vreinterpretq_s16_u16( - vaddw_u8(vreinterpretq_u16_s16(r_sub_y_l), vget_low_u8(y))); - int16x8_t r_h = vreinterpretq_s16_u16( - vaddw_u8(vreinterpretq_u16_s16(r_sub_y_h), vget_high_u8(y))); - int16x8_t b_l = vreinterpretq_s16_u16( - vaddw_u8(vreinterpretq_u16_s16(b_sub_y_l), vget_low_u8(y))); - int16x8_t b_h = vreinterpretq_s16_u16( - vaddw_u8(vreinterpretq_u16_s16(b_sub_y_h), vget_high_u8(y))); - int16x8_t g_l = vreinterpretq_s16_u16( - vaddw_u8(vreinterpretq_u16_s16(g_sub_y_l), vget_low_u8(y))); - int16x8_t g_h = vreinterpretq_s16_u16( - vaddw_u8(vreinterpretq_u16_s16(g_sub_y_h), vget_high_u8(y))); - -#if RGB_PIXELSIZE == 4 - uint8x16x4_t rgba; - /* Convert each component to unsigned and narrow, clamping to [0-255]. */ - rgba.val[RGB_RED] = vcombine_u8(vqmovun_s16(r_l), vqmovun_s16(r_h)); - rgba.val[RGB_GREEN] = vcombine_u8(vqmovun_s16(g_l), vqmovun_s16(g_h)); - rgba.val[RGB_BLUE] = vcombine_u8(vqmovun_s16(b_l), vqmovun_s16(b_h)); - /* Set alpha channel to opaque (0xFF). */ - rgba.val[RGB_ALPHA] = vdupq_n_u8(0xFF); - /* Store RGBA pixel data to memory. */ - vst4q_u8(outptr, rgba); -#elif RGB_PIXELSIZE == 3 - uint8x16x3_t rgb; - /* Convert each component to unsigned and narrow, clamping to [0-255]. */ - rgb.val[RGB_RED] = vcombine_u8(vqmovun_s16(r_l), vqmovun_s16(r_h)); - rgb.val[RGB_GREEN] = vcombine_u8(vqmovun_s16(g_l), vqmovun_s16(g_h)); - rgb.val[RGB_BLUE] = vcombine_u8(vqmovun_s16(b_l), vqmovun_s16(b_h)); - /* Store RGB pixel data to memory. */ - vst3q_u8(outptr, rgb); -#else /* RGB565 */ - /* Pack R, G and B values in ratio 5:6:5. */ - uint16x8_t rgb565_l = vqshluq_n_s16(r_l, 8); - rgb565_l = vsriq_n_u16(rgb565_l, vqshluq_n_s16(g_l, 8), 5); - rgb565_l = vsriq_n_u16(rgb565_l, vqshluq_n_s16(b_l, 8), 11); - uint16x8_t rgb565_h = vqshluq_n_s16(r_h, 8); - rgb565_h = vsriq_n_u16(rgb565_h, vqshluq_n_s16(g_h, 8), 5); - rgb565_h = vsriq_n_u16(rgb565_h, vqshluq_n_s16(b_h, 8), 11); - /* Store RGB pixel data to memory. */ - vst1q_u16((uint16_t *)outptr, rgb565_l); - vst1q_u16(((uint16_t *)outptr) + 8, rgb565_h); -#endif /* RGB565 */ - - /* Increment pointers. */ - inptr0 += 16; - inptr1 += 16; - inptr2 += 16; - outptr += (RGB_PIXELSIZE * 16); - } - - if (cols_remaining >= 8) { - uint8x8_t y = vld1_u8(inptr0); - uint8x8_t cb = vld1_u8(inptr1); - uint8x8_t cr = vld1_u8(inptr2); - /* Subtract 128 from Cb and Cr. */ - int16x8_t cr_128 = vreinterpretq_s16_u16( - vaddw_u8(vreinterpretq_u16_s16(neg_128), cr)); - int16x8_t cb_128 = vreinterpretq_s16_u16( - vaddw_u8(vreinterpretq_u16_s16(neg_128), cb)); - /* Compute G-Y: - 0.34414 * (Cb - 128) - 0.71414 * (Cr - 128) */ - int32x4_t g_sub_y_l = vmull_n_s16(vget_low_s16(cb_128), -F_0_344); - int32x4_t g_sub_y_h = vmull_n_s16(vget_high_s16(cb_128), -F_0_344); - g_sub_y_l = vmlsl_n_s16(g_sub_y_l, vget_low_s16(cr_128), F_0_714); - g_sub_y_h = vmlsl_n_s16(g_sub_y_h, vget_high_s16(cr_128), F_0_714); - /* Descale G components: shift right 15, round and narrow to 16-bit. */ - int16x8_t g_sub_y = vcombine_s16(vrshrn_n_s32(g_sub_y_l, 15), - vrshrn_n_s32(g_sub_y_h, 15)); - /* Compute R-Y: 1.40200 * (Cr - 128) */ - int16x8_t r_sub_y = vqrdmulhq_n_s16(vshlq_n_s16(cr_128, 1), F_1_402); - /* Compute B-Y: 1.77200 * (Cb - 128) */ - int16x8_t b_sub_y = vqrdmulhq_n_s16(vshlq_n_s16(cb_128, 1), F_1_772); - /* Add Y. */ - int16x8_t r = vreinterpretq_s16_u16( - vaddw_u8(vreinterpretq_u16_s16(r_sub_y), y)); - int16x8_t b = vreinterpretq_s16_u16( - vaddw_u8(vreinterpretq_u16_s16(b_sub_y), y)); - int16x8_t g = vreinterpretq_s16_u16( - vaddw_u8(vreinterpretq_u16_s16(g_sub_y), y)); - -#if RGB_PIXELSIZE == 4 - uint8x8x4_t rgba; - /* Convert each component to unsigned and narrow, clamping to [0-255]. */ - rgba.val[RGB_RED] = vqmovun_s16(r); - rgba.val[RGB_GREEN] = vqmovun_s16(g); - rgba.val[RGB_BLUE] = vqmovun_s16(b); - /* Set alpha channel to opaque (0xFF). */ - rgba.val[RGB_ALPHA] = vdup_n_u8(0xFF); - /* Store RGBA pixel data to memory. */ - vst4_u8(outptr, rgba); -#elif RGB_PIXELSIZE == 3 - uint8x8x3_t rgb; - /* Convert each component to unsigned and narrow, clamping to [0-255]. */ - rgb.val[RGB_RED] = vqmovun_s16(r); - rgb.val[RGB_GREEN] = vqmovun_s16(g); - rgb.val[RGB_BLUE] = vqmovun_s16(b); - /* Store RGB pixel data to memory. */ - vst3_u8(outptr, rgb); -#else /* RGB565 */ - /* Pack R, G and B values in ratio 5:6:5. */ - uint16x8_t rgb565 = vqshluq_n_s16(r, 8); - rgb565 = vsriq_n_u16(rgb565, vqshluq_n_s16(g, 8), 5); - rgb565 = vsriq_n_u16(rgb565, vqshluq_n_s16(b, 8), 11); - /* Store RGB pixel data to memory. */ - vst1q_u16((uint16_t *)outptr, rgb565); -#endif /* RGB565 */ - - /* Increment pointers. */ - inptr0 += 8; - inptr1 += 8; - inptr2 += 8; - outptr += (RGB_PIXELSIZE * 8); - cols_remaining -= 8; - } - - /* Handle the tail elements. */ - if (cols_remaining > 0) { - uint8x8_t y = vld1_u8(inptr0); - uint8x8_t cb = vld1_u8(inptr1); - uint8x8_t cr = vld1_u8(inptr2); - /* Subtract 128 from Cb and Cr. */ - int16x8_t cr_128 = vreinterpretq_s16_u16( - vaddw_u8(vreinterpretq_u16_s16(neg_128), cr)); - int16x8_t cb_128 = vreinterpretq_s16_u16( - vaddw_u8(vreinterpretq_u16_s16(neg_128), cb)); - /* Compute G-Y: - 0.34414 * (Cb - 128) - 0.71414 * (Cr - 128) */ - int32x4_t g_sub_y_l = vmull_n_s16(vget_low_s16(cb_128), -F_0_344); - int32x4_t g_sub_y_h = vmull_n_s16(vget_high_s16(cb_128), -F_0_344); - g_sub_y_l = vmlsl_n_s16(g_sub_y_l, vget_low_s16(cr_128), F_0_714); - g_sub_y_h = vmlsl_n_s16(g_sub_y_h, vget_high_s16(cr_128), F_0_714); - /* Descale G components: shift right 15, round and narrow to 16-bit. */ - int16x8_t g_sub_y = vcombine_s16(vrshrn_n_s32(g_sub_y_l, 15), - vrshrn_n_s32(g_sub_y_h, 15)); - /* Compute R-Y: 1.40200 * (Cr - 128) */ - int16x8_t r_sub_y = vqrdmulhq_n_s16(vshlq_n_s16(cr_128, 1), F_1_402); - /* Compute B-Y: 1.77200 * (Cb - 128) */ - int16x8_t b_sub_y = vqrdmulhq_n_s16(vshlq_n_s16(cb_128, 1), F_1_772); - /* Add Y. */ - int16x8_t r = vreinterpretq_s16_u16( - vaddw_u8(vreinterpretq_u16_s16(r_sub_y), y)); - int16x8_t b = vreinterpretq_s16_u16( - vaddw_u8(vreinterpretq_u16_s16(b_sub_y), y)); - int16x8_t g = vreinterpretq_s16_u16( - vaddw_u8(vreinterpretq_u16_s16(g_sub_y), y)); - -#if RGB_PIXELSIZE == 4 - uint8x8x4_t rgba; - /* Convert each component to unsigned and narrow, clamping to [0-255]. */ - rgba.val[RGB_RED] = vqmovun_s16(r); - rgba.val[RGB_GREEN] = vqmovun_s16(g); - rgba.val[RGB_BLUE] = vqmovun_s16(b); - /* Set alpha channel to opaque (0xFF). */ - rgba.val[RGB_ALPHA] = vdup_n_u8(0xFF); - /* Store RGBA pixel data to memory. */ - switch (cols_remaining) { - case 7 : - vst4_lane_u8(outptr + 6 * RGB_PIXELSIZE, rgba, 6); - case 6 : - vst4_lane_u8(outptr + 5 * RGB_PIXELSIZE, rgba, 5); - case 5 : - vst4_lane_u8(outptr + 4 * RGB_PIXELSIZE, rgba, 4); - case 4 : - vst4_lane_u8(outptr + 3 * RGB_PIXELSIZE, rgba, 3); - case 3 : - vst4_lane_u8(outptr + 2 * RGB_PIXELSIZE, rgba, 2); - case 2 : - vst4_lane_u8(outptr + RGB_PIXELSIZE, rgba, 1); - case 1 : - vst4_lane_u8(outptr, rgba, 0); - default: - break; - } -#elif RGB_PIXELSIZE == 3 - uint8x8x3_t rgb; - /* Convert each component to unsigned and narrow, clamping to [0-255]. */ - rgb.val[RGB_RED] = vqmovun_s16(r); - rgb.val[RGB_GREEN] = vqmovun_s16(g); - rgb.val[RGB_BLUE] = vqmovun_s16(b); - /* Store RGB pixel data to memory. */ - switch (cols_remaining) { - case 7 : - vst3_lane_u8(outptr + 6 * RGB_PIXELSIZE, rgb, 6); - case 6 : - vst3_lane_u8(outptr + 5 * RGB_PIXELSIZE, rgb, 5); - case 5 : - vst3_lane_u8(outptr + 4 * RGB_PIXELSIZE, rgb, 4); - case 4 : - vst3_lane_u8(outptr + 3 * RGB_PIXELSIZE, rgb, 3); - case 3 : - vst3_lane_u8(outptr + 2 * RGB_PIXELSIZE, rgb, 2); - case 2 : - vst3_lane_u8(outptr + RGB_PIXELSIZE, rgb, 1); - case 1 : - vst3_lane_u8(outptr, rgb, 0); - default: - break; - } -#else /* RGB565 */ - /* Pack R, G and B values in ratio 5:6:5. */ - uint16x8_t rgb565 = vqshluq_n_s16(r, 8); - rgb565 = vsriq_n_u16(rgb565, vqshluq_n_s16(g, 8), 5); - rgb565 = vsriq_n_u16(rgb565, vqshluq_n_s16(b, 8), 11); - /* Store RGB565 pixel data to memory. */ - switch (cols_remaining) { - case 7 : - vst1q_lane_u16(outptr + 6 * RGB_PIXELSIZE, rgb565, 6); - case 6 : - vst1q_lane_u16(outptr + 5 * RGB_PIXELSIZE, rgb565, 5); - case 5 : - vst1q_lane_u16(outptr + 4 * RGB_PIXELSIZE, rgb565, 4); - case 4 : - vst1q_lane_u16(outptr + 3 * RGB_PIXELSIZE, rgb565, 3); - case 3 : - vst1q_lane_u16(outptr + 2 * RGB_PIXELSIZE, rgb565, 2); - case 2 : - vst1q_lane_u16(outptr + RGB_PIXELSIZE, rgb565, 1); - case 1 : - vst1q_lane_u16(outptr, rgb565, 0); - default: - break; - } -#endif /* RGB565 */ - } - } -} diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/common/jdcolor-neon.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/common/jdcolor-neon.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/common/jdcolor-neon.c 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/common/jdcolor-neon.c 1970-01-01 01:00:00.000000000 +0100 @@ -1,134 +0,0 @@ -/* - * jdcolor-neon.c - colorspace conversion (Arm NEON) - * - * Copyright 2019 The Chromium Authors. All Rights Reserved. - * - * This software is provided 'as-is', without any express or implied - * warranty. In no event will the authors be held liable for any damages - * arising from the use of this software. - * - * Permission is granted to anyone to use this software for any purpose, - * including commercial applications, and to alter it and redistribute it - * freely, subject to the following restrictions: - * - * 1. The origin of this software must not be misrepresented; you must not - * claim that you wrote the original software. If you use this software - * in a product, an acknowledgment in the product documentation would be - * appreciated but is not required. - * 2. Altered source versions must be plainly marked as such, and must not be - * misrepresented as being the original software. - * 3. This notice may not be removed or altered from any source distribution. - */ - -#define JPEG_INTERNALS -#include "../../../jinclude.h" -#include "../../../jpeglib.h" -#include "../../../jsimd.h" -#include "../../../jdct.h" -#include "../../../jsimddct.h" -#include "../../jsimd.h" - -#include - -/* YCbCr -> RGB conversion constants. */ - -#define F_0_344 11277 /* 0.3441467 = 11277 * 2^-15 */ -#define F_0_714 23401 /* 0.7141418 = 23401 * 2^-15 */ -#define F_1_402 22971 /* 1.4020386 = 22971 * 2^-14 */ -#define F_1_772 29033 /* 1.7720337 = 29033 * 2^-14 */ - -/* Include inline routines for colorspace extensions. */ - -#include "jdcolext-neon.c" -#undef RGB_RED -#undef RGB_GREEN -#undef RGB_BLUE -#undef RGB_PIXELSIZE - -#define RGB_RED EXT_RGB_RED -#define RGB_GREEN EXT_RGB_GREEN -#define RGB_BLUE EXT_RGB_BLUE -#define RGB_PIXELSIZE EXT_RGB_PIXELSIZE -#define jsimd_ycc_rgb_convert_neon jsimd_ycc_extrgb_convert_neon -#include "jdcolext-neon.c" -#undef RGB_RED -#undef RGB_GREEN -#undef RGB_BLUE -#undef RGB_PIXELSIZE -#undef jsimd_ycc_rgb_convert_neon - -#define RGB_RED EXT_RGBX_RED -#define RGB_GREEN EXT_RGBX_GREEN -#define RGB_BLUE EXT_RGBX_BLUE -#define RGB_ALPHA 3 -#define RGB_PIXELSIZE EXT_RGBX_PIXELSIZE -#define jsimd_ycc_rgb_convert_neon jsimd_ycc_extrgbx_convert_neon -#include "jdcolext-neon.c" -#undef RGB_RED -#undef RGB_GREEN -#undef RGB_BLUE -#undef RGB_ALPHA -#undef RGB_PIXELSIZE -#undef jsimd_ycc_rgb_convert_neon - -#define RGB_RED EXT_BGR_RED -#define RGB_GREEN EXT_BGR_GREEN -#define RGB_BLUE EXT_BGR_BLUE -#define RGB_PIXELSIZE EXT_BGR_PIXELSIZE -#define jsimd_ycc_rgb_convert_neon jsimd_ycc_extbgr_convert_neon -#include "jdcolext-neon.c" -#undef RGB_RED -#undef RGB_GREEN -#undef RGB_BLUE -#undef RGB_PIXELSIZE -#undef jsimd_ycc_rgb_convert_neon - -#define RGB_RED EXT_BGRX_RED -#define RGB_GREEN EXT_BGRX_GREEN -#define RGB_BLUE EXT_BGRX_BLUE -#define RGB_ALPHA 3 -#define RGB_PIXELSIZE EXT_BGRX_PIXELSIZE -#define jsimd_ycc_rgb_convert_neon jsimd_ycc_extbgrx_convert_neon -#include "jdcolext-neon.c" -#undef RGB_RED -#undef RGB_GREEN -#undef RGB_BLUE -#undef RGB_ALPHA -#undef RGB_PIXELSIZE -#undef jsimd_ycc_rgb_convert_neon - -#define RGB_RED EXT_XBGR_RED -#define RGB_GREEN EXT_XBGR_GREEN -#define RGB_BLUE EXT_XBGR_BLUE -#define RGB_ALPHA 0 -#define RGB_PIXELSIZE EXT_XBGR_PIXELSIZE -#define jsimd_ycc_rgb_convert_neon jsimd_ycc_extxbgr_convert_neon -#include "jdcolext-neon.c" -#undef RGB_RED -#undef RGB_GREEN -#undef RGB_BLUE -#undef RGB_ALPHA -#undef RGB_PIXELSIZE -#undef jsimd_ycc_rgb_convert_neon - -#define RGB_RED EXT_XRGB_RED -#define RGB_GREEN EXT_XRGB_GREEN -#define RGB_BLUE EXT_XRGB_BLUE -#define RGB_ALPHA 0 -#define RGB_PIXELSIZE EXT_XRGB_PIXELSIZE -#define jsimd_ycc_rgb_convert_neon jsimd_ycc_extxrgb_convert_neon -#include "jdcolext-neon.c" -#undef RGB_RED -#undef RGB_GREEN -#undef RGB_BLUE -#undef RGB_ALPHA -#undef RGB_PIXELSIZE -#undef jsimd_ycc_rgb_convert_neon - -/* YCbCr -> RGB565 Conversion. */ - -#define RGB_PIXELSIZE 2 -#define jsimd_ycc_rgb_convert_neon jsimd_ycc_rgb565_convert_neon -#include "jdcolext-neon.c" -#undef RGB_PIXELSIZE -#undef jsimd_ycc_rgb_convert_neon diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/common/jdmerge-neon.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/common/jdmerge-neon.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/common/jdmerge-neon.c 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/common/jdmerge-neon.c 1970-01-01 01:00:00.000000000 +0100 @@ -1,138 +0,0 @@ -/* - * jdmerge-neon.c - merged upsampling/color conversion (Arm NEON) - * - * Copyright 2019 The Chromium Authors. All Rights Reserved. - * - * This software is provided 'as-is', without any express or implied - * warranty. In no event will the authors be held liable for any damages - * arising from the use of this software. - * - * Permission is granted to anyone to use this software for any purpose, - * including commercial applications, and to alter it and redistribute it - * freely, subject to the following restrictions: - * - * 1. The origin of this software must not be misrepresented; you must not - * claim that you wrote the original software. If you use this software - * in a product, an acknowledgment in the product documentation would be - * appreciated but is not required. - * 2. Altered source versions must be plainly marked as such, and must not be - * misrepresented as being the original software. - * 3. This notice may not be removed or altered from any source distribution. - */ - -#define JPEG_INTERNALS -#include "../../../jinclude.h" -#include "../../../jpeglib.h" -#include "../../../jsimd.h" -#include "../../../jdct.h" -#include "../../../jsimddct.h" -#include "../../jsimd.h" - -#include - -/* YCbCr -> RGB conversion constants. */ - -#define F_0_344 11277 /* 0.3441467 = 11277 * 2^-15 */ -#define F_0_714 23401 /* 0.7141418 = 23401 * 2^-15 */ -#define F_1_402 22971 /* 1.4020386 = 22971 * 2^-14 */ -#define F_1_772 29033 /* 1.7720337 = 29033 * 2^-14 */ - -/* Include inline routines for colorspace extensions */ - -#include "jdmrgext-neon.c" -#undef RGB_RED -#undef RGB_GREEN -#undef RGB_BLUE -#undef RGB_PIXELSIZE - -#define RGB_RED EXT_RGB_RED -#define RGB_GREEN EXT_RGB_GREEN -#define RGB_BLUE EXT_RGB_BLUE -#define RGB_PIXELSIZE EXT_RGB_PIXELSIZE -#define jsimd_h2v1_merged_upsample_neon jsimd_h2v1_extrgb_merged_upsample_neon -#define jsimd_h2v2_merged_upsample_neon jsimd_h2v2_extrgb_merged_upsample_neon -#include "jdmrgext-neon.c" -#undef RGB_RED -#undef RGB_GREEN -#undef RGB_BLUE -#undef RGB_PIXELSIZE -#undef jsimd_h2v1_merged_upsample_neon -#undef jsimd_h2v2_merged_upsample_neon - -#define RGB_RED EXT_RGBX_RED -#define RGB_GREEN EXT_RGBX_GREEN -#define RGB_BLUE EXT_RGBX_BLUE -#define RGB_ALPHA 3 -#define RGB_PIXELSIZE EXT_RGBX_PIXELSIZE -#define jsimd_h2v1_merged_upsample_neon jsimd_h2v1_extrgbx_merged_upsample_neon -#define jsimd_h2v2_merged_upsample_neon jsimd_h2v2_extrgbx_merged_upsample_neon -#include "jdmrgext-neon.c" -#undef RGB_RED -#undef RGB_GREEN -#undef RGB_BLUE -#undef RGB_ALPHA -#undef RGB_PIXELSIZE -#undef jsimd_h2v1_merged_upsample_neon -#undef jsimd_h2v2_merged_upsample_neon - -#define RGB_RED EXT_BGR_RED -#define RGB_GREEN EXT_BGR_GREEN -#define RGB_BLUE EXT_BGR_BLUE -#define RGB_PIXELSIZE EXT_BGR_PIXELSIZE -#define jsimd_h2v1_merged_upsample_neon jsimd_h2v1_extbgr_merged_upsample_neon -#define jsimd_h2v2_merged_upsample_neon jsimd_h2v2_extbgr_merged_upsample_neon -#include "jdmrgext-neon.c" -#undef RGB_RED -#undef RGB_GREEN -#undef RGB_BLUE -#undef RGB_PIXELSIZE -#undef jsimd_h2v1_merged_upsample_neon -#undef jsimd_h2v2_merged_upsample_neon - -#define RGB_RED EXT_BGRX_RED -#define RGB_GREEN EXT_BGRX_GREEN -#define RGB_BLUE EXT_BGRX_BLUE -#define RGB_ALPHA 3 -#define RGB_PIXELSIZE EXT_BGRX_PIXELSIZE -#define jsimd_h2v1_merged_upsample_neon jsimd_h2v1_extbgrx_merged_upsample_neon -#define jsimd_h2v2_merged_upsample_neon jsimd_h2v2_extbgrx_merged_upsample_neon -#include "jdmrgext-neon.c" -#undef RGB_RED -#undef RGB_GREEN -#undef RGB_BLUE -#undef RGB_ALPHA -#undef RGB_PIXELSIZE -#undef jsimd_h2v1_merged_upsample_neon -#undef jsimd_h2v2_merged_upsample_neon - -#define RGB_RED EXT_XBGR_RED -#define RGB_GREEN EXT_XBGR_GREEN -#define RGB_BLUE EXT_XBGR_BLUE -#define RGB_ALPHA 0 -#define RGB_PIXELSIZE EXT_XBGR_PIXELSIZE -#define jsimd_h2v1_merged_upsample_neon jsimd_h2v1_extxbgr_merged_upsample_neon -#define jsimd_h2v2_merged_upsample_neon jsimd_h2v2_extxbgr_merged_upsample_neon -#include "jdmrgext-neon.c" -#undef RGB_RED -#undef RGB_GREEN -#undef RGB_BLUE -#undef RGB_ALPHA -#undef RGB_PIXELSIZE -#undef jsimd_h2v1_merged_upsample_neon -#undef jsimd_h2v2_merged_upsample_neon - -#define RGB_RED EXT_XRGB_RED -#define RGB_GREEN EXT_XRGB_GREEN -#define RGB_BLUE EXT_XRGB_BLUE -#define RGB_ALPHA 0 -#define RGB_PIXELSIZE EXT_XRGB_PIXELSIZE -#define jsimd_h2v1_merged_upsample_neon jsimd_h2v1_extxrgb_merged_upsample_neon -#define jsimd_h2v2_merged_upsample_neon jsimd_h2v2_extxrgb_merged_upsample_neon -#include "jdmrgext-neon.c" -#undef RGB_RED -#undef RGB_GREEN -#undef RGB_BLUE -#undef RGB_ALPHA -#undef RGB_PIXELSIZE -#undef jsimd_h2v1_merged_upsample_neon -#undef jsimd_h2v2_merged_upsample_neon diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/common/jdmrgext-neon.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/common/jdmrgext-neon.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/common/jdmrgext-neon.c 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/common/jdmrgext-neon.c 1970-01-01 01:00:00.000000000 +0100 @@ -1,607 +0,0 @@ -/* - * jdmrgext-neon.c - merged upsampling/color conversion (Arm NEON) - * - * Copyright 2019 The Chromium Authors. All Rights Reserved. - * - * This software is provided 'as-is', without any express or implied - * warranty. In no event will the authors be held liable for any damages - * arising from the use of this software. - * - * Permission is granted to anyone to use this software for any purpose, - * including commercial applications, and to alter it and redistribute it - * freely, subject to the following restrictions: - * - * 1. The origin of this software must not be misrepresented; you must not - * claim that you wrote the original software. If you use this software - * in a product, an acknowledgment in the product documentation would be - * appreciated but is not required. - * 2. Altered source versions must be plainly marked as such, and must not be - * misrepresented as being the original software. - * 3. This notice may not be removed or altered from any source distribution. - */ - -/* This file is included by jdmerge-neon.c. */ - -/* - * These routines perform simple chroma upsampling - h2v1 or h2v2 - followed by - * YCbCr -> RGB color conversion all in the same function. - * - * As with the standalone functions, YCbCr -> RGB conversion is defined by the - * following equations: - * R = Y + 1.40200 * (Cr - 128) - * G = Y - 0.34414 * (Cb - 128) - 0.71414 * (Cr - 128) - * B = Y + 1.77200 * (Cb - 128) - * - * Scaled integer constants are used to avoid floating-point arithmetic: - * 0.3441467 = 11277 * 2^-15 - * 0.7141418 = 23401 * 2^-15 - * 1.4020386 = 22971 * 2^-14 - * 1.7720337 = 29033 * 2^-14 - * These constants are defined in jdmerge-neon.c. - * - * Rounding is used when descaling to ensure correct results. - */ - -/* - * Notes on safe memory access for merged upsampling/YCbCr -> RGB conversion - * routines: - * - * Input memory buffers can be safely overread up to the next multiple of - * ALIGN_SIZE bytes since they are always allocated by alloc_sarray() in - * jmemmgr.c. - * - * The output buffer cannot safely be written beyond output_width since the - * TurboJPEG API permits it to be allocated with or without padding up to the - * next multiple of ALIGN_SIZE bytes. - */ - -/* - * Upsample and color convert from YCbCr -> RGB for the case of 2:1 horizontal. - */ - -void jsimd_h2v1_merged_upsample_neon(JDIMENSION output_width, - JSAMPIMAGE input_buf, - JDIMENSION in_row_group_ctr, - JSAMPARRAY output_buf) -{ - JSAMPROW outptr; - /* Pointers to Y, Cb and Cr data. */ - JSAMPROW inptr0, inptr1, inptr2; - - int16x8_t neg_128 = vdupq_n_s16(-128); - - inptr0 = input_buf[0][in_row_group_ctr]; - inptr1 = input_buf[1][in_row_group_ctr]; - inptr2 = input_buf[2][in_row_group_ctr]; - outptr = output_buf[0]; - - int cols_remaining = output_width; - for (; cols_remaining >= 16; cols_remaining -= 16) { - /* Load Y-values such that even pixel indices are in one vector and odd */ - /* pixel indices are in another vector. */ - uint8x8x2_t y = vld2_u8(inptr0); - uint8x8_t cb = vld1_u8(inptr1); - uint8x8_t cr = vld1_u8(inptr2); - /* Subtract 128 from Cb and Cr. */ - int16x8_t cr_128 = vreinterpretq_s16_u16( - vaddw_u8(vreinterpretq_u16_s16(neg_128), cr)); - int16x8_t cb_128 = vreinterpretq_s16_u16( - vaddw_u8(vreinterpretq_u16_s16(neg_128), cb)); - /* Compute G-Y: - 0.34414 * (Cb - 128) - 0.71414 * (Cr - 128) */ - int32x4_t g_sub_y_l = vmull_n_s16(vget_low_s16(cb_128), -F_0_344); - int32x4_t g_sub_y_h = vmull_n_s16(vget_high_s16(cb_128), -F_0_344); - g_sub_y_l = vmlsl_n_s16(g_sub_y_l, vget_low_s16(cr_128), F_0_714); - g_sub_y_h = vmlsl_n_s16(g_sub_y_h, vget_high_s16(cr_128), F_0_714); - /* Descale G components: shift right 15, round and narrow to 16-bit. */ - int16x8_t g_sub_y = vcombine_s16(vrshrn_n_s32(g_sub_y_l, 15), - vrshrn_n_s32(g_sub_y_h, 15)); - /* Compute R-Y: 1.40200 * (Cr - 128) */ - int16x8_t r_sub_y = vqrdmulhq_n_s16(vshlq_n_s16(cr_128, 1), F_1_402); - /* Compute B-Y: 1.77200 * (Cb - 128) */ - int16x8_t b_sub_y = vqrdmulhq_n_s16(vshlq_n_s16(cb_128, 1), F_1_772); - /* Add Y and duplicate chroma components; upsampling horizontally. */ - int16x8_t g_even = vreinterpretq_s16_u16( - vaddw_u8(vreinterpretq_u16_s16(g_sub_y), y.val[0])); - int16x8_t r_even = vreinterpretq_s16_u16( - vaddw_u8(vreinterpretq_u16_s16(r_sub_y), y.val[0])); - int16x8_t b_even = vreinterpretq_s16_u16( - vaddw_u8(vreinterpretq_u16_s16(b_sub_y), y.val[0])); - int16x8_t g_odd = vreinterpretq_s16_u16( - vaddw_u8(vreinterpretq_u16_s16(g_sub_y), y.val[1])); - int16x8_t r_odd = vreinterpretq_s16_u16( - vaddw_u8(vreinterpretq_u16_s16(r_sub_y), y.val[1])); - int16x8_t b_odd = vreinterpretq_s16_u16( - vaddw_u8(vreinterpretq_u16_s16(b_sub_y), y.val[1])); - /* Convert each component to unsigned and narrow, clamping to [0-255]. */ - /* Interleave pixel channel values having odd and even pixel indices. */ - uint8x8x2_t r = vzip_u8(vqmovun_s16(r_even), vqmovun_s16(r_odd)); - uint8x8x2_t g = vzip_u8(vqmovun_s16(g_even), vqmovun_s16(g_odd)); - uint8x8x2_t b = vzip_u8(vqmovun_s16(b_even), vqmovun_s16(b_odd)); - -#ifdef RGB_ALPHA - uint8x16x4_t rgba; - rgba.val[RGB_RED] = vcombine_u8(r.val[0], r.val[1]); - rgba.val[RGB_GREEN] = vcombine_u8(g.val[0], g.val[1]); - rgba.val[RGB_BLUE] = vcombine_u8(b.val[0], b.val[1]); - /* Set alpha channel to opaque (0xFF). */ - rgba.val[RGB_ALPHA] = vdupq_n_u8(0xFF); - /* Store RGBA pixel data to memory. */ - vst4q_u8(outptr, rgba); -#else - uint8x16x3_t rgb; - rgb.val[RGB_RED] = vcombine_u8(r.val[0], r.val[1]); - rgb.val[RGB_GREEN] = vcombine_u8(g.val[0], g.val[1]); - rgb.val[RGB_BLUE] = vcombine_u8(b.val[0], b.val[1]); - /* Store RGB pixel data to memory. */ - vst3q_u8(outptr, rgb); -#endif - - /* Increment pointers. */ - inptr0 += 16; - inptr1 += 8; - inptr2 += 8; - outptr += (RGB_PIXELSIZE * 16); - } - - if (cols_remaining > 0) { - /* Load y-values such that even pixel indices are in one vector and odd */ - /* pixel indices are in another vector. */ - uint8x8x2_t y = vld2_u8(inptr0); - uint8x8_t cb = vld1_u8(inptr1); - uint8x8_t cr = vld1_u8(inptr2); - /* Subtract 128 from Cb and Cr. */ - int16x8_t cr_128 = vreinterpretq_s16_u16( - vaddw_u8(vreinterpretq_u16_s16(neg_128), cr)); - int16x8_t cb_128 = vreinterpretq_s16_u16( - vaddw_u8(vreinterpretq_u16_s16(neg_128), cb)); - /* Compute G-Y: - 0.34414 * (Cb - 128) - 0.71414 * (Cr - 128) */ - int32x4_t g_sub_y_l = vmull_n_s16(vget_low_s16(cb_128), -F_0_344); - int32x4_t g_sub_y_h = vmull_n_s16(vget_high_s16(cb_128), -F_0_344); - g_sub_y_l = vmlsl_n_s16(g_sub_y_l, vget_low_s16(cr_128), F_0_714); - g_sub_y_h = vmlsl_n_s16(g_sub_y_h, vget_high_s16(cr_128), F_0_714); - /* Descale G components: shift right 15, round and narrow to 16-bit. */ - int16x8_t g_sub_y = vcombine_s16(vrshrn_n_s32(g_sub_y_l, 15), - vrshrn_n_s32(g_sub_y_h, 15)); - /* Compute R-Y: 1.40200 * (Cr - 128) */ - int16x8_t r_sub_y = vqrdmulhq_n_s16(vshlq_n_s16(cr_128, 1), F_1_402); - /* Compute B-Y: 1.77200 * (Cb - 128) */ - int16x8_t b_sub_y = vqrdmulhq_n_s16(vshlq_n_s16(cb_128, 1), F_1_772); - /* Add Y and duplicate chroma components - upsample horizontally. */ - int16x8_t g_even = vreinterpretq_s16_u16( - vaddw_u8(vreinterpretq_u16_s16(g_sub_y), y.val[0])); - int16x8_t r_even = vreinterpretq_s16_u16( - vaddw_u8(vreinterpretq_u16_s16(r_sub_y), y.val[0])); - int16x8_t b_even = vreinterpretq_s16_u16( - vaddw_u8(vreinterpretq_u16_s16(b_sub_y), y.val[0])); - int16x8_t g_odd = vreinterpretq_s16_u16( - vaddw_u8(vreinterpretq_u16_s16(g_sub_y), y.val[1])); - int16x8_t r_odd = vreinterpretq_s16_u16( - vaddw_u8(vreinterpretq_u16_s16(r_sub_y), y.val[1])); - int16x8_t b_odd = vreinterpretq_s16_u16( - vaddw_u8(vreinterpretq_u16_s16(b_sub_y), y.val[1])); - /* Convert each component to unsigned and narrow, clamping to [0-255]. */ - /* Interleave pixel channel values having odd and even pixel indices. */ - uint8x8x2_t r = vzip_u8(vqmovun_s16(r_even), vqmovun_s16(r_odd)); - uint8x8x2_t g = vzip_u8(vqmovun_s16(g_even), vqmovun_s16(g_odd)); - uint8x8x2_t b = vzip_u8(vqmovun_s16(b_even), vqmovun_s16(b_odd)); - -#ifdef RGB_ALPHA - uint8x8x4_t rgba_h; - rgba_h.val[RGB_RED] = r.val[1]; - rgba_h.val[RGB_GREEN] = g.val[1]; - rgba_h.val[RGB_BLUE] = b.val[1]; - /* Set alpha channel to opaque (0xFF). */ - rgba_h.val[RGB_ALPHA] = vdup_n_u8(0xFF); - uint8x8x4_t rgba_l; - rgba_l.val[RGB_RED] = r.val[0]; - rgba_l.val[RGB_GREEN] = g.val[0]; - rgba_l.val[RGB_BLUE] = b.val[0]; - /* Set alpha channel to opaque (0xFF). */ - rgba_l.val[RGB_ALPHA] = vdup_n_u8(0xFF); - /* Store RGBA pixel data to memory. */ - switch (cols_remaining) { - case 15 : - vst4_lane_u8(outptr + 14 * RGB_PIXELSIZE, rgba_h, 6); - case 14 : - vst4_lane_u8(outptr + 13 * RGB_PIXELSIZE, rgba_h, 5); - case 13 : - vst4_lane_u8(outptr + 12 * RGB_PIXELSIZE, rgba_h, 4); - case 12 : - vst4_lane_u8(outptr + 11 * RGB_PIXELSIZE, rgba_h, 3); - case 11 : - vst4_lane_u8(outptr + 10 * RGB_PIXELSIZE, rgba_h, 2); - case 10 : - vst4_lane_u8(outptr + 9 * RGB_PIXELSIZE, rgba_h, 1); - case 9 : - vst4_lane_u8(outptr + 8 * RGB_PIXELSIZE, rgba_h, 0); - case 8 : - vst4_u8(outptr, rgba_l); - break; - case 7 : - vst4_lane_u8(outptr + 6 * RGB_PIXELSIZE, rgba_l, 6); - case 6 : - vst4_lane_u8(outptr + 5 * RGB_PIXELSIZE, rgba_l, 5); - case 5 : - vst4_lane_u8(outptr + 4 * RGB_PIXELSIZE, rgba_l, 4); - case 4 : - vst4_lane_u8(outptr + 3 * RGB_PIXELSIZE, rgba_l, 3); - case 3 : - vst4_lane_u8(outptr + 2 * RGB_PIXELSIZE, rgba_l, 2); - case 2 : - vst4_lane_u8(outptr + RGB_PIXELSIZE, rgba_l, 1); - case 1 : - vst4_lane_u8(outptr, rgba_l, 0); - default : - break; - } -#else - uint8x8x3_t rgb_h; - rgb_h.val[RGB_RED] = r.val[1]; - rgb_h.val[RGB_GREEN] = g.val[1]; - rgb_h.val[RGB_BLUE] = b.val[1]; - uint8x8x3_t rgb_l; - rgb_l.val[RGB_RED] = r.val[0]; - rgb_l.val[RGB_GREEN] = g.val[0]; - rgb_l.val[RGB_BLUE] = b.val[0]; - /* Store RGB pixel data to memory. */ - switch (cols_remaining) { - case 15 : - vst3_lane_u8(outptr + 14 * RGB_PIXELSIZE, rgb_h, 6); - case 14 : - vst3_lane_u8(outptr + 13 * RGB_PIXELSIZE, rgb_h, 5); - case 13 : - vst3_lane_u8(outptr + 12 * RGB_PIXELSIZE, rgb_h, 4); - case 12 : - vst3_lane_u8(outptr + 11 * RGB_PIXELSIZE, rgb_h, 3); - case 11 : - vst3_lane_u8(outptr + 10 * RGB_PIXELSIZE, rgb_h, 2); - case 10 : - vst3_lane_u8(outptr + 9 * RGB_PIXELSIZE, rgb_h, 1); - case 9 : - vst3_lane_u8(outptr + 8 * RGB_PIXELSIZE, rgb_h, 0); - case 8 : - vst3_u8(outptr, rgb_l); - break; - case 7 : - vst3_lane_u8(outptr + 6 * RGB_PIXELSIZE, rgb_l, 6); - case 6 : - vst3_lane_u8(outptr + 5 * RGB_PIXELSIZE, rgb_l, 5); - case 5 : - vst3_lane_u8(outptr + 4 * RGB_PIXELSIZE, rgb_l, 4); - case 4 : - vst3_lane_u8(outptr + 3 * RGB_PIXELSIZE, rgb_l, 3); - case 3 : - vst3_lane_u8(outptr + 2 * RGB_PIXELSIZE, rgb_l, 2); - case 2 : - vst3_lane_u8(outptr + RGB_PIXELSIZE, rgb_l, 1); - case 1 : - vst3_lane_u8(outptr, rgb_l, 0); - default : - break; - } -#endif - } -} - - -/* - * Upsample and color convert from YCbCr -> RGB for the case of 2:1 horizontal - * and 2:1 vertical. - * - * See above for details of color conversion and safe memory buffer access. - */ - -void jsimd_h2v2_merged_upsample_neon(JDIMENSION output_width, - JSAMPIMAGE input_buf, - JDIMENSION in_row_group_ctr, - JSAMPARRAY output_buf) -{ - JSAMPROW outptr0, outptr1; - /* Pointers to Y (both rows), Cb and Cr data. */ - JSAMPROW inptr0_0, inptr0_1, inptr1, inptr2; - - int16x8_t neg_128 = vdupq_n_s16(-128); - - inptr0_0 = input_buf[0][in_row_group_ctr * 2]; - inptr0_1 = input_buf[0][in_row_group_ctr * 2 + 1]; - inptr1 = input_buf[1][in_row_group_ctr]; - inptr2 = input_buf[2][in_row_group_ctr]; - outptr0 = output_buf[0]; - outptr1 = output_buf[1]; - - int cols_remaining = output_width; - for (; cols_remaining >= 16; cols_remaining -= 16) { - /* Load Y-values such that even pixel indices are in one vector and odd */ - /* pixel indices are in another vector. */ - uint8x8x2_t y0 = vld2_u8(inptr0_0); - uint8x8x2_t y1 = vld2_u8(inptr0_1); - uint8x8_t cb = vld1_u8(inptr1); - uint8x8_t cr = vld1_u8(inptr2); - /* Subtract 128 from Cb and Cr. */ - int16x8_t cr_128 = vreinterpretq_s16_u16( - vaddw_u8(vreinterpretq_u16_s16(neg_128), cr)); - int16x8_t cb_128 = vreinterpretq_s16_u16( - vaddw_u8(vreinterpretq_u16_s16(neg_128), cb)); - /* Compute G-Y: - 0.34414 * (Cb - 128) - 0.71414 * (Cr - 128) */ - int32x4_t g_sub_y_l = vmull_n_s16(vget_low_s16(cb_128), -F_0_344); - int32x4_t g_sub_y_h = vmull_n_s16(vget_high_s16(cb_128), -F_0_344); - g_sub_y_l = vmlsl_n_s16(g_sub_y_l, vget_low_s16(cr_128), F_0_714); - g_sub_y_h = vmlsl_n_s16(g_sub_y_h, vget_high_s16(cr_128), F_0_714); - /* Descale G components: shift right 15, round and narrow to 16-bit. */ - int16x8_t g_sub_y = vcombine_s16(vrshrn_n_s32(g_sub_y_l, 15), - vrshrn_n_s32(g_sub_y_h, 15)); - /* Compute R-Y: 1.40200 * (Cr - 128) */ - int16x8_t r_sub_y = vqrdmulhq_n_s16(vshlq_n_s16(cr_128, 1), F_1_402); - /* Compute B-Y: 1.77200 * (Cb - 128) */ - int16x8_t b_sub_y = vqrdmulhq_n_s16(vshlq_n_s16(cb_128, 1), F_1_772); - /* Add Y and duplicate chroma components - upsample horizontally. */ - int16x8_t g0_even = vreinterpretq_s16_u16( - vaddw_u8(vreinterpretq_u16_s16(g_sub_y), y0.val[0])); - int16x8_t r0_even = vreinterpretq_s16_u16( - vaddw_u8(vreinterpretq_u16_s16(r_sub_y), y0.val[0])); - int16x8_t b0_even = vreinterpretq_s16_u16( - vaddw_u8(vreinterpretq_u16_s16(b_sub_y), y0.val[0])); - int16x8_t g0_odd = vreinterpretq_s16_u16( - vaddw_u8(vreinterpretq_u16_s16(g_sub_y), y0.val[1])); - int16x8_t r0_odd = vreinterpretq_s16_u16( - vaddw_u8(vreinterpretq_u16_s16(r_sub_y), y0.val[1])); - int16x8_t b0_odd = vreinterpretq_s16_u16( - vaddw_u8(vreinterpretq_u16_s16(b_sub_y), y0.val[1])); - int16x8_t g1_even = vreinterpretq_s16_u16( - vaddw_u8(vreinterpretq_u16_s16(g_sub_y), y1.val[0])); - int16x8_t r1_even = vreinterpretq_s16_u16( - vaddw_u8(vreinterpretq_u16_s16(r_sub_y), y1.val[0])); - int16x8_t b1_even = vreinterpretq_s16_u16( - vaddw_u8(vreinterpretq_u16_s16(b_sub_y), y1.val[0])); - int16x8_t g1_odd = vreinterpretq_s16_u16( - vaddw_u8(vreinterpretq_u16_s16(g_sub_y), y1.val[1])); - int16x8_t r1_odd = vreinterpretq_s16_u16( - vaddw_u8(vreinterpretq_u16_s16(r_sub_y), y1.val[1])); - int16x8_t b1_odd = vreinterpretq_s16_u16( - vaddw_u8(vreinterpretq_u16_s16(b_sub_y), y1.val[1])); - /* Convert each component to unsigned and narrow, clamping to [0-255]. */ - /* Interleave pixel channel values having odd and even pixel indices. */ - uint8x8x2_t r0 = vzip_u8(vqmovun_s16(r0_even), vqmovun_s16(r0_odd)); - uint8x8x2_t r1 = vzip_u8(vqmovun_s16(r1_even), vqmovun_s16(r1_odd)); - uint8x8x2_t g0 = vzip_u8(vqmovun_s16(g0_even), vqmovun_s16(g0_odd)); - uint8x8x2_t g1 = vzip_u8(vqmovun_s16(g1_even), vqmovun_s16(g1_odd)); - uint8x8x2_t b0 = vzip_u8(vqmovun_s16(b0_even), vqmovun_s16(b0_odd)); - uint8x8x2_t b1 = vzip_u8(vqmovun_s16(b1_even), vqmovun_s16(b1_odd)); - -#ifdef RGB_ALPHA - uint8x16x4_t rgba0, rgba1; - rgba0.val[RGB_RED] = vcombine_u8(r0.val[0], r0.val[1]); - rgba1.val[RGB_RED] = vcombine_u8(r1.val[0], r1.val[1]); - rgba0.val[RGB_GREEN] = vcombine_u8(g0.val[0], g0.val[1]); - rgba1.val[RGB_GREEN] = vcombine_u8(g1.val[0], g1.val[1]); - rgba0.val[RGB_BLUE] = vcombine_u8(b0.val[0], b0.val[1]); - rgba1.val[RGB_BLUE] = vcombine_u8(b1.val[0], b1.val[1]); - /* Set alpha channel to opaque (0xFF). */ - rgba0.val[RGB_ALPHA] = vdupq_n_u8(0xFF); - rgba1.val[RGB_ALPHA] = vdupq_n_u8(0xFF); - /* Store RGBA pixel data to memory. */ - vst4q_u8(outptr0, rgba0); - vst4q_u8(outptr1, rgba1); -#else - uint8x16x3_t rgb0, rgb1; - rgb0.val[RGB_RED] = vcombine_u8(r0.val[0], r0.val[1]); - rgb1.val[RGB_RED] = vcombine_u8(r1.val[0], r1.val[1]); - rgb0.val[RGB_GREEN] = vcombine_u8(g0.val[0], g0.val[1]); - rgb1.val[RGB_GREEN] = vcombine_u8(g1.val[0], g1.val[1]); - rgb0.val[RGB_BLUE] = vcombine_u8(b0.val[0], b0.val[1]); - rgb1.val[RGB_BLUE] = vcombine_u8(b1.val[0], b1.val[1]); - /* Store RGB pixel data to memory. */ - vst3q_u8(outptr0, rgb0); - vst3q_u8(outptr1, rgb1); -#endif - - /* Increment pointers. */ - inptr0_0 += 16; - inptr0_1 += 16; - inptr1 += 8; - inptr2 += 8; - outptr0 += (RGB_PIXELSIZE * 16); - outptr1 += (RGB_PIXELSIZE * 16); - } - - if (cols_remaining > 0) { - /* Load Y-values such that even pixel indices are in one vector and */ - /* odd pixel indices are in another vector. */ - uint8x8x2_t y0 = vld2_u8(inptr0_0); - uint8x8x2_t y1 = vld2_u8(inptr0_1); - uint8x8_t cb = vld1_u8(inptr1); - uint8x8_t cr = vld1_u8(inptr2); - /* Subtract 128 from Cb and Cr. */ - int16x8_t cr_128 = vreinterpretq_s16_u16( - vaddw_u8(vreinterpretq_u16_s16(neg_128), cr)); - int16x8_t cb_128 = vreinterpretq_s16_u16( - vaddw_u8(vreinterpretq_u16_s16(neg_128), cb)); - /* Compute G-Y: - 0.34414 * (Cb - 128) - 0.71414 * (Cr - 128) */ - int32x4_t g_sub_y_l = vmull_n_s16(vget_low_s16(cb_128), -F_0_344); - int32x4_t g_sub_y_h = vmull_n_s16(vget_high_s16(cb_128), -F_0_344); - g_sub_y_l = vmlsl_n_s16(g_sub_y_l, vget_low_s16(cr_128), F_0_714); - g_sub_y_h = vmlsl_n_s16(g_sub_y_h, vget_high_s16(cr_128), F_0_714); - /* Descale G components: shift right 15, round and narrow to 16-bit. */ - int16x8_t g_sub_y = vcombine_s16(vrshrn_n_s32(g_sub_y_l, 15), - vrshrn_n_s32(g_sub_y_h, 15)); - /* Compute R-Y: 1.40200 * (Cr - 128) */ - int16x8_t r_sub_y = vqrdmulhq_n_s16(vshlq_n_s16(cr_128, 1), F_1_402); - /* Compute B-Y: 1.77200 * (Cb - 128) */ - int16x8_t b_sub_y = vqrdmulhq_n_s16(vshlq_n_s16(cb_128, 1), F_1_772); - /* Add Y and duplicate chroma components - upsample horizontally. */ - int16x8_t g0_even = vreinterpretq_s16_u16( - vaddw_u8(vreinterpretq_u16_s16(g_sub_y), y0.val[0])); - int16x8_t r0_even = vreinterpretq_s16_u16( - vaddw_u8(vreinterpretq_u16_s16(r_sub_y), y0.val[0])); - int16x8_t b0_even = vreinterpretq_s16_u16( - vaddw_u8(vreinterpretq_u16_s16(b_sub_y), y0.val[0])); - int16x8_t g0_odd = vreinterpretq_s16_u16( - vaddw_u8(vreinterpretq_u16_s16(g_sub_y), y0.val[1])); - int16x8_t r0_odd = vreinterpretq_s16_u16( - vaddw_u8(vreinterpretq_u16_s16(r_sub_y), y0.val[1])); - int16x8_t b0_odd = vreinterpretq_s16_u16( - vaddw_u8(vreinterpretq_u16_s16(b_sub_y), y0.val[1])); - int16x8_t g1_even = vreinterpretq_s16_u16( - vaddw_u8(vreinterpretq_u16_s16(g_sub_y), y1.val[0])); - int16x8_t r1_even = vreinterpretq_s16_u16( - vaddw_u8(vreinterpretq_u16_s16(r_sub_y), y1.val[0])); - int16x8_t b1_even = vreinterpretq_s16_u16( - vaddw_u8(vreinterpretq_u16_s16(b_sub_y), y1.val[0])); - int16x8_t g1_odd = vreinterpretq_s16_u16( - vaddw_u8(vreinterpretq_u16_s16(g_sub_y), y1.val[1])); - int16x8_t r1_odd = vreinterpretq_s16_u16( - vaddw_u8(vreinterpretq_u16_s16(r_sub_y), y1.val[1])); - int16x8_t b1_odd = vreinterpretq_s16_u16( - vaddw_u8(vreinterpretq_u16_s16(b_sub_y), y1.val[1])); - /* Convert each component to unsigned and narrow, clamping to [0-255]. */ - /* Interleave pixel channel values having odd and even pixel indices. */ - uint8x8x2_t r0 = vzip_u8(vqmovun_s16(r0_even), vqmovun_s16(r0_odd)); - uint8x8x2_t r1 = vzip_u8(vqmovun_s16(r1_even), vqmovun_s16(r1_odd)); - uint8x8x2_t g0 = vzip_u8(vqmovun_s16(g0_even), vqmovun_s16(g0_odd)); - uint8x8x2_t g1 = vzip_u8(vqmovun_s16(g1_even), vqmovun_s16(g1_odd)); - uint8x8x2_t b0 = vzip_u8(vqmovun_s16(b0_even), vqmovun_s16(b0_odd)); - uint8x8x2_t b1 = vzip_u8(vqmovun_s16(b1_even), vqmovun_s16(b1_odd)); - -#ifdef RGB_ALPHA - uint8x8x4_t rgba0_h, rgba1_h; - rgba0_h.val[RGB_RED] = r0.val[1]; - rgba1_h.val[RGB_RED] = r1.val[1]; - rgba0_h.val[RGB_GREEN] = g0.val[1]; - rgba1_h.val[RGB_GREEN] = g1.val[1]; - rgba0_h.val[RGB_BLUE] = b0.val[1]; - rgba1_h.val[RGB_BLUE] = b1.val[1]; - /* Set alpha channel to opaque (0xFF). */ - rgba0_h.val[RGB_ALPHA] = vdup_n_u8(0xFF); - rgba1_h.val[RGB_ALPHA] = vdup_n_u8(0xFF); - - uint8x8x4_t rgba0_l, rgba1_l; - rgba0_l.val[RGB_RED] = r0.val[0]; - rgba1_l.val[RGB_RED] = r1.val[0]; - rgba0_l.val[RGB_GREEN] = g0.val[0]; - rgba1_l.val[RGB_GREEN] = g1.val[0]; - rgba0_l.val[RGB_BLUE] = b0.val[0]; - rgba1_l.val[RGB_BLUE] = b1.val[0]; - /* Set alpha channel to opaque (0xFF). */ - rgba0_l.val[RGB_ALPHA] = vdup_n_u8(0xFF); - rgba1_l.val[RGB_ALPHA] = vdup_n_u8(0xFF); - /* Store RGBA pixel data to memory. */ - switch (cols_remaining) { - case 15 : - vst4_lane_u8(outptr0 + 14 * RGB_PIXELSIZE, rgba0_h, 6); - vst4_lane_u8(outptr1 + 14 * RGB_PIXELSIZE, rgba1_h, 6); - case 14 : - vst4_lane_u8(outptr0 + 13 * RGB_PIXELSIZE, rgba0_h, 5); - vst4_lane_u8(outptr1 + 13 * RGB_PIXELSIZE, rgba1_h, 5); - case 13 : - vst4_lane_u8(outptr0 + 12 * RGB_PIXELSIZE, rgba0_h, 4); - vst4_lane_u8(outptr1 + 12 * RGB_PIXELSIZE, rgba1_h, 4); - case 12 : - vst4_lane_u8(outptr0 + 11 * RGB_PIXELSIZE, rgba0_h, 3); - vst4_lane_u8(outptr1 + 11 * RGB_PIXELSIZE, rgba1_h, 3); - case 11 : - vst4_lane_u8(outptr0 + 10 * RGB_PIXELSIZE, rgba0_h, 2); - vst4_lane_u8(outptr1 + 10 * RGB_PIXELSIZE, rgba1_h, 2); - case 10 : - vst4_lane_u8(outptr0 + 9 * RGB_PIXELSIZE, rgba0_h, 1); - vst4_lane_u8(outptr1 + 9 * RGB_PIXELSIZE, rgba1_h, 1); - case 9 : - vst4_lane_u8(outptr0 + 8 * RGB_PIXELSIZE, rgba0_h, 0); - vst4_lane_u8(outptr1 + 8 * RGB_PIXELSIZE, rgba1_h, 0); - case 8 : - vst4_u8(outptr0, rgba0_l); - vst4_u8(outptr1, rgba1_l); - break; - case 7 : - vst4_lane_u8(outptr0 + 6 * RGB_PIXELSIZE, rgba0_l, 6); - vst4_lane_u8(outptr1 + 6 * RGB_PIXELSIZE, rgba1_l, 6); - case 6 : - vst4_lane_u8(outptr0 + 5 * RGB_PIXELSIZE, rgba0_l, 5); - vst4_lane_u8(outptr1 + 5 * RGB_PIXELSIZE, rgba1_l, 5); - case 5 : - vst4_lane_u8(outptr0 + 4 * RGB_PIXELSIZE, rgba0_l, 4); - vst4_lane_u8(outptr1 + 4 * RGB_PIXELSIZE, rgba1_l, 4); - case 4 : - vst4_lane_u8(outptr0 + 3 * RGB_PIXELSIZE, rgba0_l, 3); - vst4_lane_u8(outptr1 + 3 * RGB_PIXELSIZE, rgba1_l, 3); - case 3 : - vst4_lane_u8(outptr0 + 2 * RGB_PIXELSIZE, rgba0_l, 2); - vst4_lane_u8(outptr1 + 2 * RGB_PIXELSIZE, rgba1_l, 2); - case 2 : - vst4_lane_u8(outptr0 + 1 * RGB_PIXELSIZE, rgba0_l, 1); - vst4_lane_u8(outptr1 + 1 * RGB_PIXELSIZE, rgba1_l, 1); - case 1 : - vst4_lane_u8(outptr0, rgba0_l, 0); - vst4_lane_u8(outptr1, rgba1_l, 0); - default : - break; - } -#else - uint8x8x3_t rgb0_h, rgb1_h; - rgb0_h.val[RGB_RED] = r0.val[1]; - rgb1_h.val[RGB_RED] = r1.val[1]; - rgb0_h.val[RGB_GREEN] = g0.val[1]; - rgb1_h.val[RGB_GREEN] = g1.val[1]; - rgb0_h.val[RGB_BLUE] = b0.val[1]; - rgb1_h.val[RGB_BLUE] = b1.val[1]; - - uint8x8x3_t rgb0_l, rgb1_l; - rgb0_l.val[RGB_RED] = r0.val[0]; - rgb1_l.val[RGB_RED] = r1.val[0]; - rgb0_l.val[RGB_GREEN] = g0.val[0]; - rgb1_l.val[RGB_GREEN] = g1.val[0]; - rgb0_l.val[RGB_BLUE] = b0.val[0]; - rgb1_l.val[RGB_BLUE] = b1.val[0]; - /* Store RGB pixel data to memory. */ - switch (cols_remaining) { - case 15 : - vst3_lane_u8(outptr0 + 14 * RGB_PIXELSIZE, rgb0_h, 6); - vst3_lane_u8(outptr1 + 14 * RGB_PIXELSIZE, rgb1_h, 6); - case 14 : - vst3_lane_u8(outptr0 + 13 * RGB_PIXELSIZE, rgb0_h, 5); - vst3_lane_u8(outptr1 + 13 * RGB_PIXELSIZE, rgb1_h, 5); - case 13 : - vst3_lane_u8(outptr0 + 12 * RGB_PIXELSIZE, rgb0_h, 4); - vst3_lane_u8(outptr1 + 12 * RGB_PIXELSIZE, rgb1_h, 4); - case 12 : - vst3_lane_u8(outptr0 + 11 * RGB_PIXELSIZE, rgb0_h, 3); - vst3_lane_u8(outptr1 + 11 * RGB_PIXELSIZE, rgb1_h, 3); - case 11 : - vst3_lane_u8(outptr0 + 10 * RGB_PIXELSIZE, rgb0_h, 2); - vst3_lane_u8(outptr1 + 10 * RGB_PIXELSIZE, rgb1_h, 2); - case 10 : - vst3_lane_u8(outptr0 + 9 * RGB_PIXELSIZE, rgb0_h, 1); - vst3_lane_u8(outptr1 + 9 * RGB_PIXELSIZE, rgb1_h, 1); - case 9 : - vst3_lane_u8(outptr0 + 8 * RGB_PIXELSIZE, rgb0_h, 0); - vst3_lane_u8(outptr1 + 8 * RGB_PIXELSIZE, rgb1_h, 0); - case 8 : - vst3_u8(outptr0, rgb0_l); - vst3_u8(outptr1, rgb1_l); - break; - case 7 : - vst3_lane_u8(outptr0 + 6 * RGB_PIXELSIZE, rgb0_l, 6); - vst3_lane_u8(outptr1 + 6 * RGB_PIXELSIZE, rgb1_l, 6); - case 6 : - vst3_lane_u8(outptr0 + 5 * RGB_PIXELSIZE, rgb0_l, 5); - vst3_lane_u8(outptr1 + 5 * RGB_PIXELSIZE, rgb1_l, 5); - case 5 : - vst3_lane_u8(outptr0 + 4 * RGB_PIXELSIZE, rgb0_l, 4); - vst3_lane_u8(outptr1 + 4 * RGB_PIXELSIZE, rgb1_l, 4); - case 4 : - vst3_lane_u8(outptr0 + 3 * RGB_PIXELSIZE, rgb0_l, 3); - vst3_lane_u8(outptr1 + 3 * RGB_PIXELSIZE, rgb1_l, 3); - case 3 : - vst3_lane_u8(outptr0 + 2 * RGB_PIXELSIZE, rgb0_l, 2); - vst3_lane_u8(outptr1 + 2 * RGB_PIXELSIZE, rgb1_l, 2); - case 2 : - vst3_lane_u8(outptr0 + 1 * RGB_PIXELSIZE, rgb0_l, 1); - vst3_lane_u8(outptr1 + 1 * RGB_PIXELSIZE, rgb1_l, 1); - case 1 : - vst3_lane_u8(outptr0, rgb0_l, 0); - vst3_lane_u8(outptr1, rgb1_l, 0); - default : - break; - } -#endif - } -} diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/common/jdsample-neon.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/common/jdsample-neon.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/common/jdsample-neon.c 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/common/jdsample-neon.c 1970-01-01 01:00:00.000000000 +0100 @@ -1,557 +0,0 @@ -/* - * jdsample-neon.c - upsampling (Arm NEON) - * - * Copyright 2019 The Chromium Authors. All Rights Reserved. - * - * This software is provided 'as-is', without any express or implied - * warranty. In no event will the authors be held liable for any damages - * arising from the use of this software. - * - * Permission is granted to anyone to use this software for any purpose, - * including commercial applications, and to alter it and redistribute it - * freely, subject to the following restrictions: - * - * 1. The origin of this software must not be misrepresented; you must not - * claim that you wrote the original software. If you use this software - * in a product, an acknowledgment in the product documentation would be - * appreciated but is not required. - * 2. Altered source versions must be plainly marked as such, and must not be - * misrepresented as being the original software. - * 3. This notice may not be removed or altered from any source distribution. - */ - -#define JPEG_INTERNALS -#include "../../../jinclude.h" -#include "../../../jpeglib.h" -#include "../../../jsimd.h" -#include "../../../jdct.h" -#include "../../../jsimddct.h" -#include "../../jsimd.h" - -#include - -/* - * The diagram below shows a row of samples (luma or chroma) produced by h2v1 - * downsampling. - * - * s0 s1 s2 - * +---------+---------+---------+ - * | | | | - * | p0 p1 | p2 p3 | p4 p5 | - * | | | | - * +---------+---------+---------+ - * - * Each sample contains two of the original pixel channel values. These pixel - * channel values are centred at positions p0, p1, p2, p3, p4 and p5 above. To - * compute the channel values of the original image, we proportionally blend - * the adjacent samples in each row. - * - * There are three cases to consider: - * - * 1) The first pixel in the original image. - * Pixel channel value p0 contains only a component from sample s0, so we - * set p0 = s0. - * 2) The last pixel in the original image. - * Pixel channel value p5 contains only a component from sample s2, so we - * set p5 = s2. - * 3) General case (all other pixels in the row). - * Apart from the first and last pixels, every other pixel channel value is - * computed by blending the containing sample and the nearest neigbouring - * sample in the ratio 3:1. - * For example, the pixel channel value centred at p1 would be computed as - * follows: - * 3/4 * s0 + 1/4 * s1 - * while the pixel channel value centred at p2 would be: - * 3/4 * s1 + 1/4 * s0 - */ - -void jsimd_h2v1_fancy_upsample_neon(int max_v_samp_factor, - JDIMENSION downsampled_width, - JSAMPARRAY input_data, - JSAMPARRAY *output_data_ptr) -{ - JSAMPARRAY output_data = *output_data_ptr; - JSAMPROW inptr, outptr; - /* Setup constants. */ - const uint16x8_t one_u16 = vdupq_n_u16(1); - const uint8x8_t three_u8 = vdup_n_u8(3); - - for (int inrow = 0; inrow < max_v_samp_factor; inrow++) { - inptr = input_data[inrow]; - outptr = output_data[inrow]; - /* Case 1: first pixel channel value in this row of the original image. */ - *outptr = (JSAMPLE)GETJSAMPLE(*inptr); - - /* General case: */ - /* 3/4 * containing sample + 1/4 * nearest neighbouring sample */ - /* For p1: containing sample = s0, nearest neighbouring sample = s1. */ - /* For p2: containing sample = s1, nearest neighbouring sample = s0. */ - uint8x16_t s0 = vld1q_u8(inptr); - uint8x16_t s1 = vld1q_u8(inptr + 1); - /* Multiplication makes vectors twice as wide: '_l' and '_h' suffixes */ - /* denote low half and high half respectively. */ - uint16x8_t s1_add_3s0_l = vmlal_u8(vmovl_u8(vget_low_u8(s1)), - vget_low_u8(s0), three_u8); - uint16x8_t s1_add_3s0_h = vmlal_u8(vmovl_u8(vget_high_u8(s1)), - vget_high_u8(s0), three_u8); - uint16x8_t s0_add_3s1_l = vmlal_u8(vmovl_u8(vget_low_u8(s0)), - vget_low_u8(s1), three_u8); - uint16x8_t s0_add_3s1_h = vmlal_u8(vmovl_u8(vget_high_u8(s0)), - vget_high_u8(s1), three_u8); - /* Add ordered dithering bias to odd pixel values. */ - s0_add_3s1_l = vaddq_u16(s0_add_3s1_l, one_u16); - s0_add_3s1_h = vaddq_u16(s0_add_3s1_h, one_u16); - - /* Initially 1 - due to having already stored the first pixel of the */ - /* image. However, in subsequent iterations of the SIMD loop this offset */ - /* is (2 * colctr - 1) to stay within the bounds of the sample buffers */ - /* without having to resort to a slow scalar tail case for the last */ - /* (downsampled_width % 16) samples. See "Creation of 2-D sample arrays" */ - /* in jmemmgr.c for details. */ - unsigned outptr_offset = 1; - uint8x16x2_t output_pixels; - -#if defined(__aarch64__) && defined(__clang__) && !defined(__OPTIMIZE_SIZE__) - /* Unrolling by four is beneficial on AArch64 as there are 16 additional */ - /* 128-bit SIMD registers to accommodate the extra data in flight. */ - #pragma clang loop unroll_count(4) -#endif - /* We use software pipelining to maximise performance. The code indented */ - /* an extra 6 spaces begins the next iteration of the loop. */ - for (unsigned colctr = 16; colctr < downsampled_width; colctr += 16) { - s0 = vld1q_u8(inptr + colctr - 1); - s1 = vld1q_u8(inptr + colctr); - /* Right-shift by 2 (divide by 4), narrow to 8-bit and combine. */ - output_pixels.val[0] = vcombine_u8(vrshrn_n_u16(s1_add_3s0_l, 2), - vrshrn_n_u16(s1_add_3s0_h, 2)); - output_pixels.val[1] = vcombine_u8(vshrn_n_u16(s0_add_3s1_l, 2), - vshrn_n_u16(s0_add_3s1_h, 2)); - /* Multiplication makes vectors twice as wide: '_l' and '_h' */ - /* suffixes denote low half and high half respectively. */ - s1_add_3s0_l = vmlal_u8(vmovl_u8(vget_low_u8(s1)), - vget_low_u8(s0), three_u8); - s1_add_3s0_h = vmlal_u8(vmovl_u8(vget_high_u8(s1)), - vget_high_u8(s0), three_u8); - s0_add_3s1_l = vmlal_u8(vmovl_u8(vget_low_u8(s0)), - vget_low_u8(s1), three_u8); - s0_add_3s1_h = vmlal_u8(vmovl_u8(vget_high_u8(s0)), - vget_high_u8(s1), three_u8); - /* Add ordered dithering bias to odd pixel values. */ - s0_add_3s1_l = vaddq_u16(s0_add_3s1_l, one_u16); - s0_add_3s1_h = vaddq_u16(s0_add_3s1_h, one_u16); - /* Store pixel channel values to memory. */ - vst2q_u8(outptr + outptr_offset, output_pixels); - outptr_offset = 2 * colctr - 1; - } - - /* Complete the last iteration of the loop. */ - /* Right-shift by 2 (divide by 4), narrow to 8-bit and combine. */ - output_pixels.val[0] = vcombine_u8(vrshrn_n_u16(s1_add_3s0_l, 2), - vrshrn_n_u16(s1_add_3s0_h, 2)); - output_pixels.val[1] = vcombine_u8(vshrn_n_u16(s0_add_3s1_l, 2), - vshrn_n_u16(s0_add_3s1_h, 2)); - /* Store pixel channel values to memory. */ - vst2q_u8(outptr + outptr_offset, output_pixels); - - /* Case 2: last pixel channel value in this row of the original image. */ - outptr[2 * downsampled_width - 1] = - GETJSAMPLE(inptr[downsampled_width - 1]); - } -} - - -/* - * The diagram below shows a grid-window of samples (luma or chroma) produced - * by h2v2 downsampling. - * - * s0 s1 - * +---------+---------+ - * | p0 p1 | p2 p3 | - * r0 | | | - * | p4 p5 | p6 p7 | - * +---------+---------+ - * | p8 p9 | p10 p11| - * r1 | | | - * | p12 p13| p14 p15| - * +---------+---------+ - * | p16 p17| p18 p19| - * r2 | | | - * | p20 p21| p22 p23| - * +---------+---------+ - * - * Every sample contains four of the original pixel channel values. The pixels' - * channel values are centred at positions p0, p1, p2,..., p23 above. For a - * given grid-window position, r1 is always used to denote the row of samples - * containing the pixel channel values we are computing. For the top row of - * pixel channel values in r1 (p8-p11), the nearest neighbouring samples are in - * the row above - denoted by r0. Likewise, for the bottom row of pixels in r1 - * (p12-p15), the nearest neighbouring samples are in the row below - denoted - * by r2. - * - * To compute the pixel channel values of the original image, we proportionally - * blend the sample containing the pixel centre with the nearest neighbouring - * samples in each row, column and diagonal. - * - * There are three cases to consider: - * - * 1) The first pixel in this row of the original image. - * Pixel channel value p8 only contains components from sample column s0. - * Its value is computed by blending samples s0r1 and s0r0 in the ratio 3:1. - * 2) The last pixel in this row of the original image. - * Pixel channel value p11 only contains components from sample column s1. - * Its value is computed by blending samples s1r1 and s1r0 in the ratio 3:1. - * 3) General case (all other pixels in the row). - * Apart from the first and last pixels, every other pixel channel value in - * the row contains components from samples in adjacent columns. - * - * For example, the pixel centred at p9 would be computed as follows: - * (9/16 * s0r1) + (3/16 * s0r0) + (3/16 * s1r1) + (1/16 * s1r0) - * - * This can be broken down into two steps: - * 1) Blend samples vertically in columns s0 and s1 in the ratio 3:1: - * s0colsum = 3/4 * s0r1 + 1/4 * s0r0 - * s1colsum = 3/4 * s1r1 + 1/4 * s1r0 - * 2) Blend the already-blended columns in the ratio 3:1: - * p9 = 3/4 * s0colsum + 1/4 * s1colsum - * - * The bottom row of pixel channel values in row r1 can be computed in the same - * way for each of the three cases, only using samples in row r2 instead of row - * r0 - as r2 is the nearest neighbouring row. - */ - -void jsimd_h2v2_fancy_upsample_neon(int max_v_samp_factor, - JDIMENSION downsampled_width, - JSAMPARRAY input_data, - JSAMPARRAY *output_data_ptr) -{ - JSAMPARRAY output_data = *output_data_ptr; - JSAMPROW inptr0, inptr1, inptr2, outptr0, outptr1; - int inrow, outrow; - /* Setup constants. */ - const uint16x8_t seven_u16 = vdupq_n_u16(7); - const uint8x8_t three_u8 = vdup_n_u8(3); - const uint16x8_t three_u16 = vdupq_n_u16(3); - - inrow = outrow = 0; - while (outrow < max_v_samp_factor) { - inptr0 = input_data[inrow - 1]; - inptr1 = input_data[inrow]; - inptr2 = input_data[inrow + 1]; - /* Suffixes 0 and 1 denote the top and bottom rows of output pixels */ - /* respectively. */ - outptr0 = output_data[outrow++]; - outptr1 = output_data[outrow++]; - - /* Case 1: first pixel channel value in this row of original image. */ - int s0colsum0 = GETJSAMPLE(*inptr1) * 3 + GETJSAMPLE(*inptr0); - *outptr0 = (JSAMPLE)((s0colsum0 * 4 + 8) >> 4); - int s0colsum1 = GETJSAMPLE(*inptr1) * 3 + GETJSAMPLE(*inptr2); - *outptr1 = (JSAMPLE)((s0colsum1 * 4 + 8) >> 4); - - /* General case as described above. */ - /* Step 1: Blend samples vertically in columns s0 and s1. */ - /* Leave the divide by 4 to the end when it can be done for both */ - /* dimensions at once, right-shifting by 4. */ - - /* Load and compute s0colsum0 and s0colsum1. */ - uint8x16_t s0r0 = vld1q_u8(inptr0); - uint8x16_t s0r1 = vld1q_u8(inptr1); - uint8x16_t s0r2 = vld1q_u8(inptr2); - /* Multiplication makes vectors twice as wide: '_l' and '_h' suffixes */ - /* denote low half and high half respectively. */ - uint16x8_t s0colsum0_l = vmlal_u8(vmovl_u8(vget_low_u8(s0r0)), - vget_low_u8(s0r1), three_u8); - uint16x8_t s0colsum0_h = vmlal_u8(vmovl_u8(vget_high_u8(s0r0)), - vget_high_u8(s0r1), three_u8); - uint16x8_t s0colsum1_l = vmlal_u8(vmovl_u8(vget_low_u8(s0r2)), - vget_low_u8(s0r1), three_u8); - uint16x8_t s0colsum1_h = vmlal_u8(vmovl_u8(vget_high_u8(s0r2)), - vget_high_u8(s0r1), three_u8); - /* Load and compute s1colsum0 and s1colsum1. */ - uint8x16_t s1r0 = vld1q_u8(inptr0 + 1); - uint8x16_t s1r1 = vld1q_u8(inptr1 + 1); - uint8x16_t s1r2 = vld1q_u8(inptr2 + 1); - uint16x8_t s1colsum0_l = vmlal_u8(vmovl_u8(vget_low_u8(s1r0)), - vget_low_u8(s1r1), three_u8); - uint16x8_t s1colsum0_h = vmlal_u8(vmovl_u8(vget_high_u8(s1r0)), - vget_high_u8(s1r1), three_u8); - uint16x8_t s1colsum1_l = vmlal_u8(vmovl_u8(vget_low_u8(s1r2)), - vget_low_u8(s1r1), three_u8); - uint16x8_t s1colsum1_h = vmlal_u8(vmovl_u8(vget_high_u8(s1r2)), - vget_high_u8(s1r1), three_u8); - /* Step 2: Blend the already-blended columns. */ - uint16x8_t output0_p1_l = vmlaq_u16(s1colsum0_l, s0colsum0_l, three_u16); - uint16x8_t output0_p1_h = vmlaq_u16(s1colsum0_h, s0colsum0_h, three_u16); - uint16x8_t output0_p2_l = vmlaq_u16(s0colsum0_l, s1colsum0_l, three_u16); - uint16x8_t output0_p2_h = vmlaq_u16(s0colsum0_h, s1colsum0_h, three_u16); - uint16x8_t output1_p1_l = vmlaq_u16(s1colsum1_l, s0colsum1_l, three_u16); - uint16x8_t output1_p1_h = vmlaq_u16(s1colsum1_h, s0colsum1_h, three_u16); - uint16x8_t output1_p2_l = vmlaq_u16(s0colsum1_l, s1colsum1_l, three_u16); - uint16x8_t output1_p2_h = vmlaq_u16(s0colsum1_h, s1colsum1_h, three_u16); - /* Add ordered dithering bias to odd pixel values. */ - output0_p1_l = vaddq_u16(output0_p1_l, seven_u16); - output0_p1_h = vaddq_u16(output0_p1_h, seven_u16); - output1_p1_l = vaddq_u16(output1_p1_l, seven_u16); - output1_p1_h = vaddq_u16(output1_p1_h, seven_u16); - /* Right-shift by 4 (divide by 16), narrow to 8-bit and combine. */ - uint8x16x2_t output_pixels0 = { vcombine_u8(vshrn_n_u16(output0_p1_l, 4), - vshrn_n_u16(output0_p1_h, 4)), - vcombine_u8(vrshrn_n_u16(output0_p2_l, 4), - vrshrn_n_u16(output0_p2_h, 4)) - }; - uint8x16x2_t output_pixels1 = { vcombine_u8(vshrn_n_u16(output1_p1_l, 4), - vshrn_n_u16(output1_p1_h, 4)), - vcombine_u8(vrshrn_n_u16(output1_p2_l, 4), - vrshrn_n_u16(output1_p2_h, 4)) - }; - /* Store pixel channel values to memory. */ - /* The minimum size of the output buffer for each row is 64 bytes => no */ - /* need to worry about buffer overflow here. See "Creation of 2-D sample */ - /* arrays" in jmemmgr.c for details. */ - vst2q_u8(outptr0 + 1, output_pixels0); - vst2q_u8(outptr1 + 1, output_pixels1); - - /* The first pixel of the image shifted our loads and stores by one */ - /* byte. We have to re-align on a 32-byte boundary at some point before */ - /* the end of the row (we do it now on the 32/33 pixel boundary) to stay */ - /* within the bounds of the sample buffers without having to resort to a */ - /* slow scalar tail case for the last (downsampled_width % 16) samples. */ - /* See "Creation of 2-D sample arrays" in jmemmgr.c for details.*/ - for (unsigned colctr = 16; colctr < downsampled_width; colctr += 16) { - /* Step 1: Blend samples vertically in columns s0 and s1. */ - /* Load and compute s0colsum0 and s0colsum1. */ - s0r0 = vld1q_u8(inptr0 + colctr - 1); - s0r1 = vld1q_u8(inptr1 + colctr - 1); - s0r2 = vld1q_u8(inptr2 + colctr - 1); - s0colsum0_l = vmlal_u8(vmovl_u8(vget_low_u8(s0r0)), - vget_low_u8(s0r1), three_u8); - s0colsum0_h = vmlal_u8(vmovl_u8(vget_high_u8(s0r0)), - vget_high_u8(s0r1), three_u8); - s0colsum1_l = vmlal_u8(vmovl_u8(vget_low_u8(s0r2)), - vget_low_u8(s0r1), three_u8); - s0colsum1_h = vmlal_u8(vmovl_u8(vget_high_u8(s0r2)), - vget_high_u8(s0r1), three_u8); - /* Load and compute s1colsum0 and s1colsum1. */ - s1r0 = vld1q_u8(inptr0 + colctr); - s1r1 = vld1q_u8(inptr1 + colctr); - s1r2 = vld1q_u8(inptr2 + colctr); - s1colsum0_l = vmlal_u8(vmovl_u8(vget_low_u8(s1r0)), - vget_low_u8(s1r1), three_u8); - s1colsum0_h = vmlal_u8(vmovl_u8(vget_high_u8(s1r0)), - vget_high_u8(s1r1), three_u8); - s1colsum1_l = vmlal_u8(vmovl_u8(vget_low_u8(s1r2)), - vget_low_u8(s1r1), three_u8); - s1colsum1_h = vmlal_u8(vmovl_u8(vget_high_u8(s1r2)), - vget_high_u8(s1r1), three_u8); - /* Step 2: Blend the already-blended columns. */ - output0_p1_l = vmlaq_u16(s1colsum0_l, s0colsum0_l, three_u16); - output0_p1_h = vmlaq_u16(s1colsum0_h, s0colsum0_h, three_u16); - output0_p2_l = vmlaq_u16(s0colsum0_l, s1colsum0_l, three_u16); - output0_p2_h = vmlaq_u16(s0colsum0_h, s1colsum0_h, three_u16); - output1_p1_l = vmlaq_u16(s1colsum1_l, s0colsum1_l, three_u16); - output1_p1_h = vmlaq_u16(s1colsum1_h, s0colsum1_h, three_u16); - output1_p2_l = vmlaq_u16(s0colsum1_l, s1colsum1_l, three_u16); - output1_p2_h = vmlaq_u16(s0colsum1_h, s1colsum1_h, three_u16); - /* Add ordered dithering bias to odd pixel values. */ - output0_p1_l = vaddq_u16(output0_p1_l, seven_u16); - output0_p1_h = vaddq_u16(output0_p1_h, seven_u16); - output1_p1_l = vaddq_u16(output1_p1_l, seven_u16); - output1_p1_h = vaddq_u16(output1_p1_h, seven_u16); - /* Right-shift by 4 (divide by 16), narrow to 8-bit and combine. */ - output_pixels0.val[0] = vcombine_u8(vshrn_n_u16(output0_p1_l, 4), - vshrn_n_u16(output0_p1_h, 4)); - output_pixels0.val[1] = vcombine_u8(vrshrn_n_u16(output0_p2_l, 4), - vrshrn_n_u16(output0_p2_h, 4)); - output_pixels1.val[0] = vcombine_u8(vshrn_n_u16(output1_p1_l, 4), - vshrn_n_u16(output1_p1_h, 4)); - output_pixels1.val[1] = vcombine_u8(vrshrn_n_u16(output1_p2_l, 4), - vrshrn_n_u16(output1_p2_h, 4)); - /* Store pixel channel values to memory. */ - vst2q_u8(outptr0 + 2 * colctr - 1, output_pixels0); - vst2q_u8(outptr1 + 2 * colctr - 1, output_pixels1); - } - - /* Case 2: last pixel channel value in this row of the original image. */ - int s1colsum0 = GETJSAMPLE(inptr1[downsampled_width - 1]) * 3 + - GETJSAMPLE(inptr0[downsampled_width - 1]); - outptr0[2 * downsampled_width - 1] = (JSAMPLE)((s1colsum0 * 4 + 7) >> 4); - int s1colsum1 = GETJSAMPLE(inptr1[downsampled_width - 1]) * 3 + - GETJSAMPLE(inptr2[downsampled_width - 1]); - outptr1[2 * downsampled_width - 1] = (JSAMPLE)((s1colsum1 * 4 + 7) >> 4); - inrow++; - } -} - - -/* - * The diagram below shows a grid-window of samples (luma or chroma) produced - * by h2v1 downsampling; which has been subsequently rotated 90 degrees. (The - * usual use of h1v2 upsampling is upsampling rotated or transposed h2v1 - * downsampled images.) - * - * s0 s1 - * +---------+---------+ - * | p0 | p1 | - * r0 | | | - * | p2 | p3 | - * +---------+---------+ - * | p4 | p5 | - * r1 | | | - * | p6 | p7 | - * +---------+---------+ - * | p8 | p9 | - * r2 | | | - * | p10 | p11 | - * +---------+---------+ - * - * Every sample contains two of the original pixel channel values. The pixels' - * channel values are centred at positions p0, p1, p2,..., p11 above. For a - * given grid-window position, r1 is always used to denote the row of samples - * containing the pixel channel values we are computing. For the top row of - * pixel channel values in r1 (p4 and p5), the nearest neighbouring samples are - * in the row above - denoted by r0. Likewise, for the bottom row of pixels in - * r1 (p6 and p7), the nearest neighbouring samples are in the row below - - * denoted by r2. - * - * To compute the pixel channel values of the original image, we proportionally - * blend the adjacent samples in each column. - * - * For example, the pixel channel value centred at p4 would be computed as - * follows: - * 3/4 * s0r1 + 1/4 * s0r0 - * while the pixel channel value centred at p6 would be: - * 3/4 * s0r1 + 1/4 * s0r2 - */ - -void jsimd_h1v2_fancy_upsample_neon(int max_v_samp_factor, - JDIMENSION downsampled_width, - JSAMPARRAY input_data, - JSAMPARRAY *output_data_ptr) -{ - JSAMPARRAY output_data = *output_data_ptr; - JSAMPROW inptr0, inptr1, inptr2, outptr0, outptr1; - int inrow, outrow; - /* Setup constants. */ - const uint16x8_t one_u16 = vdupq_n_u16(1); - const uint8x8_t three_u8 = vdup_n_u8(3); - - inrow = outrow = 0; - while (outrow < max_v_samp_factor) { - inptr0 = input_data[inrow - 1]; - inptr1 = input_data[inrow]; - inptr2 = input_data[inrow + 1]; - /* Suffixes 0 and 1 denote the top and bottom rows of output pixels */ - /* respectively. */ - outptr0 = output_data[outrow++]; - outptr1 = output_data[outrow++]; - inrow++; - - /* The size of the input and output buffers is always a multiple of 32 */ - /* bytes => no need to worry about buffer overflow when reading/writing */ - /* memory. See "Creation of 2-D sample arrays" in jmemmgr.c for details. */ - for (unsigned colctr = 0; colctr < downsampled_width; colctr += 16) { - /* Load samples. */ - uint8x16_t r0 = vld1q_u8(inptr0 + colctr); - uint8x16_t r1 = vld1q_u8(inptr1 + colctr); - uint8x16_t r2 = vld1q_u8(inptr2 + colctr); - /* Blend samples vertically. */ - uint16x8_t colsum0_l = vmlal_u8(vmovl_u8(vget_low_u8(r0)), - vget_low_u8(r1), three_u8); - uint16x8_t colsum0_h = vmlal_u8(vmovl_u8(vget_high_u8(r0)), - vget_high_u8(r1), three_u8); - uint16x8_t colsum1_l = vmlal_u8(vmovl_u8(vget_low_u8(r2)), - vget_low_u8(r1), three_u8); - uint16x8_t colsum1_h = vmlal_u8(vmovl_u8(vget_high_u8(r2)), - vget_high_u8(r1), three_u8); - /* Add ordered dithering bias to pixel values in even output rows. */ - colsum0_l = vaddq_u16(colsum0_l, one_u16); - colsum0_h = vaddq_u16(colsum0_h, one_u16); - /* Right-shift by 2 (divide by 4), narrow to 8-bit and combine. */ - uint8x16_t output_pixels0 = vcombine_u8(vshrn_n_u16(colsum0_l, 2), - vshrn_n_u16(colsum0_h, 2)); - uint8x16_t output_pixels1 = vcombine_u8(vrshrn_n_u16(colsum1_l, 2), - vrshrn_n_u16(colsum1_h, 2)); - /* Store pixel channel values to memory. */ - vst1q_u8(outptr0 + colctr, output_pixels0); - vst1q_u8(outptr1 + colctr, output_pixels1); - } - } -} - - -/* - * The diagram below shows the operation of h2v1 (simple) upsampling. Each - * sample in the row is duplicated to form two output pixel channel values. - * - * p0 p1 p2 p3 - * +----+----+ +----+----+----+----+ - * | s0 | s1 | -> | s0 | s0 | s1 | s1 | - * +----+----+ +----+----+----+----+ - */ - -void jsimd_h2v1_upsample_neon(int max_v_samp_factor, - JDIMENSION output_width, - JSAMPARRAY input_data, - JSAMPARRAY *output_data_ptr) -{ - JSAMPARRAY output_data = *output_data_ptr; - JSAMPROW inptr, outptr; - - for (int inrow = 0; inrow < max_v_samp_factor; inrow++) { - inptr = input_data[inrow]; - outptr = output_data[inrow]; - for (unsigned colctr = 0; 2 * colctr < output_width; colctr += 16) { - uint8x16_t samples = vld1q_u8(inptr + colctr); - /* Duplicate the samples - the store interleaves them to produce the */ - /* pattern in the diagram above. */ - uint8x16x2_t output_pixels = { samples, samples }; - /* Store pixel values to memory. */ - /* Due to the way sample buffers are allocated, we don't need to worry */ - /* about tail cases when output_width is not a multiple of 32. */ - /* See "Creation of 2-D sample arrays" in jmemmgr.c for details. */ - vst2q_u8(outptr + 2 * colctr, output_pixels); - } - } -} - - -/* - * The diagram below shows the operation of h2v2 (simple) upsampling. Each - * sample in the row is duplicated to form two output pixel channel values. - * This horizontally-upsampled row is then also duplicated. - * - * p0 p1 p2 p3 - * +-----+-----+ +-----+-----+-----+-----+ - * | s0 | s1 | -> | s0 | s0 | s1 | s1 | - * +-----+-----+ +-----+-----+-----+-----+ - * | s0 | s0 | s1 | s1 | - * +-----+-----+-----+-----+ - */ - -void jsimd_h2v2_upsample_neon(int max_v_samp_factor, - JDIMENSION output_width, - JSAMPARRAY input_data, - JSAMPARRAY *output_data_ptr) -{ - JSAMPARRAY output_data = *output_data_ptr; - JSAMPROW inptr, outptr0, outptr1; - - for (int inrow = 0, outrow = 0; outrow < max_v_samp_factor; inrow++) { - inptr = input_data[inrow]; - outptr0 = output_data[outrow++]; - outptr1 = output_data[outrow++]; - - for (unsigned colctr = 0; 2 * colctr < output_width; colctr += 16) { - uint8x16_t samples = vld1q_u8(inptr + colctr); - /* Duplicate the samples - the store interleaves them to produce the */ - /* pattern in the diagram above. */ - uint8x16x2_t output_pixels = { samples, samples }; - /* Store pixel values to memory for both output rows. */ - /* Due to the way sample buffers are allocated, we don't need to worry */ - /* about tail cases when output_width is not a multiple of 32. */ - /* See "Creation of 2-D sample arrays" in jmemmgr.c for details. */ - vst2q_u8(outptr0 + 2 * colctr, output_pixels); - vst2q_u8(outptr1 + 2 * colctr, output_pixels); - } - } -} diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/common/jfdctfst-neon.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/common/jfdctfst-neon.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/common/jfdctfst-neon.c 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/common/jfdctfst-neon.c 1970-01-01 01:00:00.000000000 +0100 @@ -1,211 +0,0 @@ -/* - * jfdctfst-neon.c - fast DCT (Arm NEON) - * - * Copyright 2020 The Chromium Authors. All Rights Reserved. - * - * This software is provided 'as-is', without any express or implied - * warranty. In no event will the authors be held liable for any damages - * arising from the use of this software. - * - * Permission is granted to anyone to use this software for any purpose, - * including commercial applications, and to alter it and redistribute it - * freely, subject to the following restrictions: - * - * 1. The origin of this software must not be misrepresented; you must not - * claim that you wrote the original software. If you use this software - * in a product, an acknowledgment in the product documentation would be - * appreciated but is not required. - * 2. Altered source versions must be plainly marked as such, and must not be - * misrepresented as being the original software. - * 3. This notice may not be removed or altered from any source distribution. - */ - -#define JPEG_INTERNALS -#include "../../../jconfigint.h" -#include "../../../jinclude.h" -#include "../../../jpeglib.h" -#include "../../../jsimd.h" -#include "../../../jdct.h" -#include "../../../jsimddct.h" -#include "../../jsimd.h" - -#include - -/* - * 'jsimd_fdct_ifast_neon' performs a fast, not so accurate forward DCT - * (Discrete Cosine Transform) on one block of samples. It uses the same - * calculations and produces exactly the same output as IJG's original - * 'jpeg_fdct_ifast' function, which can be found in jfdctfst.c. - * - * Scaled integer constants are used to avoid floating-point arithmetic: - * 0.382683433 = 12544 * 2^-15 - * 0.541196100 = 17795 * 2^-15 - * 0.707106781 = 23168 * 2^-15 - * 0.306562965 = 9984 * 2^-15 - * - * See jfdctfst.c for further details of the IDCT algorithm. Where possible, - * the variable names and comments here in 'jsimd_fdct_ifast_neon' match up - * with those in 'jpeg_fdct_ifast'. - */ - -#define F_0_382 12544 -#define F_0_541 17792 -#define F_0_707 23168 -#define F_0_306 9984 - -ALIGN(16) static const int16_t jsimd_fdct_ifast_neon_consts[] = { - F_0_382, F_0_541, F_0_707, F_0_306 -}; - -void jsimd_fdct_ifast_neon(DCTELEM *data) -{ - /* Load an 8x8 block of samples into Neon registers. De-interleaving loads */ - /* are used followed by vuzp to transpose the block such that we have a */ - /* column of samples per vector - allowing all rows to be processed at */ - /* once. */ - int16x8x4_t data1 = vld4q_s16(data); - int16x8x4_t data2 = vld4q_s16(data + 4 * DCTSIZE); - - int16x8x2_t cols_04 = vuzpq_s16(data1.val[0], data2.val[0]); - int16x8x2_t cols_15 = vuzpq_s16(data1.val[1], data2.val[1]); - int16x8x2_t cols_26 = vuzpq_s16(data1.val[2], data2.val[2]); - int16x8x2_t cols_37 = vuzpq_s16(data1.val[3], data2.val[3]); - - int16x8_t col0 = cols_04.val[0]; - int16x8_t col1 = cols_15.val[0]; - int16x8_t col2 = cols_26.val[0]; - int16x8_t col3 = cols_37.val[0]; - int16x8_t col4 = cols_04.val[1]; - int16x8_t col5 = cols_15.val[1]; - int16x8_t col6 = cols_26.val[1]; - int16x8_t col7 = cols_37.val[1]; - - /* Load DCT conversion constants. */ - const int16x4_t consts = vld1_s16(jsimd_fdct_ifast_neon_consts); - - /* Pass 1: process rows. */ - int16x8_t tmp0 = vaddq_s16(col0, col7); - int16x8_t tmp7 = vsubq_s16(col0, col7); - int16x8_t tmp1 = vaddq_s16(col1, col6); - int16x8_t tmp6 = vsubq_s16(col1, col6); - int16x8_t tmp2 = vaddq_s16(col2, col5); - int16x8_t tmp5 = vsubq_s16(col2, col5); - int16x8_t tmp3 = vaddq_s16(col3, col4); - int16x8_t tmp4 = vsubq_s16(col3, col4); - - /* Even part */ - int16x8_t tmp10 = vaddq_s16(tmp0, tmp3); /* phase 2 */ - int16x8_t tmp13 = vsubq_s16(tmp0, tmp3); - int16x8_t tmp11 = vaddq_s16(tmp1, tmp2); - int16x8_t tmp12 = vsubq_s16(tmp1, tmp2); - - col0 = vaddq_s16(tmp10, tmp11); /* phase 3 */ - col4 = vsubq_s16(tmp10, tmp11); - - int16x8_t z1 = vqdmulhq_lane_s16(vaddq_s16(tmp12, tmp13), consts, 2); - col2 = vaddq_s16(tmp13, z1); /* phase 5 */ - col6 = vsubq_s16(tmp13, z1); - - /* Odd part */ - tmp10 = vaddq_s16(tmp4, tmp5); /* phase 2 */ - tmp11 = vaddq_s16(tmp5, tmp6); - tmp12 = vaddq_s16(tmp6, tmp7); - - int16x8_t z5 = vqdmulhq_lane_s16(vsubq_s16(tmp10, tmp12), consts, 0); - int16x8_t z2 = vqdmulhq_lane_s16(tmp10, consts, 1); - z2 = vaddq_s16(z2, z5); - int16x8_t z4 = vqdmulhq_lane_s16(tmp12, consts, 3); - z5 = vaddq_s16(tmp12, z5); - z4 = vaddq_s16(z4, z5); - int16x8_t z3 = vqdmulhq_lane_s16(tmp11, consts, 2); - - int16x8_t z11 = vaddq_s16(tmp7, z3); /* phase 5 */ - int16x8_t z13 = vsubq_s16(tmp7, z3); - - col5 = vaddq_s16(z13, z2); /* phase 6 */ - col3 = vsubq_s16(z13, z2); - col1 = vaddq_s16(z11, z4); - col7 = vsubq_s16(z11, z4); - - /* Transpose to work on columns in pass 2. */ - int16x8x2_t cols_01 = vtrnq_s16(col0, col1); - int16x8x2_t cols_23 = vtrnq_s16(col2, col3); - int16x8x2_t cols_45 = vtrnq_s16(col4, col5); - int16x8x2_t cols_67 = vtrnq_s16(col6, col7); - - int32x4x2_t cols_0145_l = vtrnq_s32(vreinterpretq_s32_s16(cols_01.val[0]), - vreinterpretq_s32_s16(cols_45.val[0])); - int32x4x2_t cols_0145_h = vtrnq_s32(vreinterpretq_s32_s16(cols_01.val[1]), - vreinterpretq_s32_s16(cols_45.val[1])); - int32x4x2_t cols_2367_l = vtrnq_s32(vreinterpretq_s32_s16(cols_23.val[0]), - vreinterpretq_s32_s16(cols_67.val[0])); - int32x4x2_t cols_2367_h = vtrnq_s32(vreinterpretq_s32_s16(cols_23.val[1]), - vreinterpretq_s32_s16(cols_67.val[1])); - - int32x4x2_t rows_04 = vzipq_s32(cols_0145_l.val[0], cols_2367_l.val[0]); - int32x4x2_t rows_15 = vzipq_s32(cols_0145_h.val[0], cols_2367_h.val[0]); - int32x4x2_t rows_26 = vzipq_s32(cols_0145_l.val[1], cols_2367_l.val[1]); - int32x4x2_t rows_37 = vzipq_s32(cols_0145_h.val[1], cols_2367_h.val[1]); - - int16x8_t row0 = vreinterpretq_s16_s32(rows_04.val[0]); - int16x8_t row1 = vreinterpretq_s16_s32(rows_15.val[0]); - int16x8_t row2 = vreinterpretq_s16_s32(rows_26.val[0]); - int16x8_t row3 = vreinterpretq_s16_s32(rows_37.val[0]); - int16x8_t row4 = vreinterpretq_s16_s32(rows_04.val[1]); - int16x8_t row5 = vreinterpretq_s16_s32(rows_15.val[1]); - int16x8_t row6 = vreinterpretq_s16_s32(rows_26.val[1]); - int16x8_t row7 = vreinterpretq_s16_s32(rows_37.val[1]); - - /* Pass 2: process columns. */ - tmp0 = vaddq_s16(row0, row7); - tmp7 = vsubq_s16(row0, row7); - tmp1 = vaddq_s16(row1, row6); - tmp6 = vsubq_s16(row1, row6); - tmp2 = vaddq_s16(row2, row5); - tmp5 = vsubq_s16(row2, row5); - tmp3 = vaddq_s16(row3, row4); - tmp4 = vsubq_s16(row3, row4); - - /* Even part */ - tmp10 = vaddq_s16(tmp0, tmp3); /* phase 2 */ - tmp13 = vsubq_s16(tmp0, tmp3); - tmp11 = vaddq_s16(tmp1, tmp2); - tmp12 = vsubq_s16(tmp1, tmp2); - - row0 = vaddq_s16(tmp10, tmp11); /* phase 3 */ - row4 = vsubq_s16(tmp10, tmp11); - - z1 = vqdmulhq_lane_s16(vaddq_s16(tmp12, tmp13), consts, 2); - row2 = vaddq_s16(tmp13, z1); /* phase 5 */ - row6 = vsubq_s16(tmp13, z1); - - /* Odd part */ - tmp10 = vaddq_s16(tmp4, tmp5); /* phase 2 */ - tmp11 = vaddq_s16(tmp5, tmp6); - tmp12 = vaddq_s16(tmp6, tmp7); - - z5 = vqdmulhq_lane_s16(vsubq_s16(tmp10, tmp12), consts, 0); - z2 = vqdmulhq_lane_s16(tmp10, consts, 1); - z2 = vaddq_s16(z2, z5); - z4 = vqdmulhq_lane_s16(tmp12, consts, 3); - z5 = vaddq_s16(tmp12, z5); - z4 = vaddq_s16(z4, z5); - z3 = vqdmulhq_lane_s16(tmp11, consts, 2); - - z11 = vaddq_s16(tmp7, z3); /* phase 5 */ - z13 = vsubq_s16(tmp7, z3); - - row5 = vaddq_s16(z13, z2); /* phase 6 */ - row3 = vsubq_s16(z13, z2); - row1 = vaddq_s16(z11, z4); - row7 = vsubq_s16(z11, z4); - - vst1q_s16(data + 0 * DCTSIZE, row0); - vst1q_s16(data + 1 * DCTSIZE, row1); - vst1q_s16(data + 2 * DCTSIZE, row2); - vst1q_s16(data + 3 * DCTSIZE, row3); - vst1q_s16(data + 4 * DCTSIZE, row4); - vst1q_s16(data + 5 * DCTSIZE, row5); - vst1q_s16(data + 6 * DCTSIZE, row6); - vst1q_s16(data + 7 * DCTSIZE, row7); -} diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/common/jfdctint-neon.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/common/jfdctint-neon.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/common/jfdctint-neon.c 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/common/jfdctint-neon.c 1970-01-01 01:00:00.000000000 +0100 @@ -1,371 +0,0 @@ -/* - * jfdctint-neon.c - accurate DCT (Arm NEON) - * - * Copyright 2020 The Chromium Aruthors. All Rights Reserved. - * - * This software is provided 'as-is', without any express or implied - * warranty. In no event will the authors be held liable for any damages - * arising from the use of this software. - * - * Permission is granted to anyone to use this software for any purpose, - * including commercial applications, and to alter it and redistribute it - * freely, subject to the following restrictions: - * - * 1. The origin of this software must not be misrepresented; you must not - * claim that you wrote the original software. If you use this software - * in a product, an acknowledgment in the product documentation would be - * appreciated but is not required. - * 2. Altered source versions must be plainly marked as such, and must not be - * misrepresented as being the original software. - * 3. This notice may not be removed or altered from any source distribution. - */ - -#define JPEG_INTERNALS -#include "../../../jconfigint.h" -#include "../../../jinclude.h" -#include "../../../jpeglib.h" -#include "../../../jsimd.h" -#include "../../../jdct.h" -#include "../../../jsimddct.h" -#include "../../jsimd.h" - -#include - -/* - * 'jsimd_fdct_islow_neon' performs a slow-but-accurate forward DCT (Discrete - * Cosine Transform) on one block of samples. It uses the same calculations - * and produces exactly the same output as IJG's original 'jpeg_fdct_islow' - * function, which can be found in jfdctint.c. - * - * Scaled integer constants are used to avoid floating-point arithmetic: - * 0.298631336 = 2446 * 2^-13 - * 0.390180644 = 3196 * 2^-13 - * 0.541196100 = 4433 * 2^-13 - * 0.765366865 = 6270 * 2^-13 - * 0.899976223 = 7373 * 2^-13 - * 1.175875602 = 9633 * 2^-13 - * 1.501321110 = 12299 * 2^-13 - * 1.847759065 = 15137 * 2^-13 - * 1.961570560 = 16069 * 2^-13 - * 2.053119869 = 16819 * 2^-13 - * 2.562915447 = 20995 * 2^-13 - * 3.072711026 = 25172 * 2^-13 - * - * See jfdctint.c for further details of the DCT algorithm. Where possible, - * the variable names and comments here in 'jsimd_fdct_islow_neon' match up - * with those in 'jpeg_fdct_islow'. - */ - -#define CONST_BITS 13 -#define PASS1_BITS 2 - -#define DESCALE_P1 (CONST_BITS - PASS1_BITS) -#define DESCALE_P2 (CONST_BITS + PASS1_BITS) - -#define F_0_298 2446 -#define F_0_390 3196 -#define F_0_541 4433 -#define F_0_765 6270 -#define F_0_899 7373 -#define F_1_175 9633 -#define F_1_501 12299 -#define F_1_847 15137 -#define F_1_961 16069 -#define F_2_053 16819 -#define F_2_562 20995 -#define F_3_072 25172 - -ALIGN(16) static const int16_t jsimd_fdct_islow_neon_consts[] = { - F_0_298, -F_0_390, F_0_541, F_0_765, - -F_0_899, F_1_175, F_1_501, -F_1_847, - -F_1_961, F_2_053, -F_2_562, F_3_072 -}; - -void jsimd_fdct_islow_neon(DCTELEM *data) -{ - /* Load DCT constants. */ -#if defined(__clang__) || defined(_MSC_VER) - const int16x4x3_t consts = vld1_s16_x3(jsimd_fdct_islow_neon_consts); -#else - /* GCC does not currently support the intrinsic vld1__x3(). */ - const int16x4_t consts1 = vld1_s16(jsimd_fdct_islow_neon_consts); - const int16x4_t consts2 = vld1_s16(jsimd_fdct_islow_neon_consts + 4); - const int16x4_t consts3 = vld1_s16(jsimd_fdct_islow_neon_consts + 8); - const int16x4x3_t consts = { consts1, consts2, consts3 }; -#endif - - /* Load an 8x8 block of samples into Neon registers. De-interleaving loads */ - /* are used followed by vuzp to transpose the block such that we have a */ - /* column of samples per vector - allowing all rows to be processed at */ - /* once. */ - int16x8x4_t s_rows_0123 = vld4q_s16(data); - int16x8x4_t s_rows_4567 = vld4q_s16(data + 4 * DCTSIZE); - - int16x8x2_t cols_04 = vuzpq_s16(s_rows_0123.val[0], s_rows_4567.val[0]); - int16x8x2_t cols_15 = vuzpq_s16(s_rows_0123.val[1], s_rows_4567.val[1]); - int16x8x2_t cols_26 = vuzpq_s16(s_rows_0123.val[2], s_rows_4567.val[2]); - int16x8x2_t cols_37 = vuzpq_s16(s_rows_0123.val[3], s_rows_4567.val[3]); - - int16x8_t col0 = cols_04.val[0]; - int16x8_t col1 = cols_15.val[0]; - int16x8_t col2 = cols_26.val[0]; - int16x8_t col3 = cols_37.val[0]; - int16x8_t col4 = cols_04.val[1]; - int16x8_t col5 = cols_15.val[1]; - int16x8_t col6 = cols_26.val[1]; - int16x8_t col7 = cols_37.val[1]; - - /* Pass 1: process rows. */ - int16x8_t tmp0 = vaddq_s16(col0, col7); - int16x8_t tmp7 = vsubq_s16(col0, col7); - int16x8_t tmp1 = vaddq_s16(col1, col6); - int16x8_t tmp6 = vsubq_s16(col1, col6); - int16x8_t tmp2 = vaddq_s16(col2, col5); - int16x8_t tmp5 = vsubq_s16(col2, col5); - int16x8_t tmp3 = vaddq_s16(col3, col4); - int16x8_t tmp4 = vsubq_s16(col3, col4); - - /* Even part. */ - int16x8_t tmp10 = vaddq_s16(tmp0, tmp3); - int16x8_t tmp13 = vsubq_s16(tmp0, tmp3); - int16x8_t tmp11 = vaddq_s16(tmp1, tmp2); - int16x8_t tmp12 = vsubq_s16(tmp1, tmp2); - - col0 = vshlq_n_s16(vaddq_s16(tmp10, tmp11), PASS1_BITS); - col4 = vshlq_n_s16(vsubq_s16(tmp10, tmp11), PASS1_BITS); - - int16x8_t tmp12_add_tmp13 = vaddq_s16(tmp12, tmp13); - int32x4_t z1_l = vmull_lane_s16(vget_low_s16(tmp12_add_tmp13), - consts.val[0], 2); - int32x4_t z1_h = vmull_lane_s16(vget_high_s16(tmp12_add_tmp13), - consts.val[0], 2); - - int32x4_t col2_scaled_l = vmlal_lane_s16(z1_l, vget_low_s16(tmp13), - consts.val[0], 3); - int32x4_t col2_scaled_h = vmlal_lane_s16(z1_h, vget_high_s16(tmp13), - consts.val[0], 3); - col2 = vcombine_s16(vrshrn_n_s32(col2_scaled_l, DESCALE_P1), - vrshrn_n_s32(col2_scaled_h, DESCALE_P1)); - - int32x4_t col6_scaled_l = vmlal_lane_s16(z1_l, vget_low_s16(tmp12), - consts.val[1], 3); - int32x4_t col6_scaled_h = vmlal_lane_s16(z1_h, vget_high_s16(tmp12), - consts.val[1], 3); - col6 = vcombine_s16(vrshrn_n_s32(col6_scaled_l, DESCALE_P1), - vrshrn_n_s32(col6_scaled_h, DESCALE_P1)); - - /* Odd part. */ - int16x8_t z1 = vaddq_s16(tmp4, tmp7); - int16x8_t z2 = vaddq_s16(tmp5, tmp6); - int16x8_t z3 = vaddq_s16(tmp4, tmp6); - int16x8_t z4 = vaddq_s16(tmp5, tmp7); - /* sqrt(2) * c3 */ - int32x4_t z5_l = vmull_lane_s16(vget_low_s16(z3), consts.val[1], 1); - int32x4_t z5_h = vmull_lane_s16(vget_high_s16(z3), consts.val[1], 1); - z5_l = vmlal_lane_s16(z5_l, vget_low_s16(z4), consts.val[1], 1); - z5_h = vmlal_lane_s16(z5_h, vget_high_s16(z4), consts.val[1], 1); - - /* sqrt(2) * (-c1+c3+c5-c7) */ - int32x4_t tmp4_l = vmull_lane_s16(vget_low_s16(tmp4), consts.val[0], 0); - int32x4_t tmp4_h = vmull_lane_s16(vget_high_s16(tmp4), consts.val[0], 0); - /* sqrt(2) * ( c1+c3-c5+c7) */ - int32x4_t tmp5_l = vmull_lane_s16(vget_low_s16(tmp5), consts.val[2], 1); - int32x4_t tmp5_h = vmull_lane_s16(vget_high_s16(tmp5), consts.val[2], 1); - /* sqrt(2) * ( c1+c3+c5-c7) */ - int32x4_t tmp6_l = vmull_lane_s16(vget_low_s16(tmp6), consts.val[2], 3); - int32x4_t tmp6_h = vmull_lane_s16(vget_high_s16(tmp6), consts.val[2], 3); - /* sqrt(2) * ( c1+c3-c5-c7) */ - int32x4_t tmp7_l = vmull_lane_s16(vget_low_s16(tmp7), consts.val[1], 2); - int32x4_t tmp7_h = vmull_lane_s16(vget_high_s16(tmp7), consts.val[1], 2); - - /* sqrt(2) * (c7-c3) */ - z1_l = vmull_lane_s16(vget_low_s16(z1), consts.val[1], 0); - z1_h = vmull_lane_s16(vget_high_s16(z1), consts.val[1], 0); - /* sqrt(2) * (-c1-c3) */ - int32x4_t z2_l = vmull_lane_s16(vget_low_s16(z2), consts.val[2], 2); - int32x4_t z2_h = vmull_lane_s16(vget_high_s16(z2), consts.val[2], 2); - /* sqrt(2) * (-c3-c5) */ - int32x4_t z3_l = vmull_lane_s16(vget_low_s16(z3), consts.val[2], 0); - int32x4_t z3_h = vmull_lane_s16(vget_high_s16(z3), consts.val[2], 0); - /* sqrt(2) * (c5-c3) */ - int32x4_t z4_l = vmull_lane_s16(vget_low_s16(z4), consts.val[0], 1); - int32x4_t z4_h = vmull_lane_s16(vget_high_s16(z4), consts.val[0], 1); - - z3_l = vaddq_s32(z3_l, z5_l); - z3_h = vaddq_s32(z3_h, z5_h); - z4_l = vaddq_s32(z4_l, z5_l); - z4_h = vaddq_s32(z4_h, z5_h); - - tmp4_l = vaddq_s32(tmp4_l, z1_l); - tmp4_h = vaddq_s32(tmp4_h, z1_h); - tmp4_l = vaddq_s32(tmp4_l, z3_l); - tmp4_h = vaddq_s32(tmp4_h, z3_h); - col7 = vcombine_s16(vrshrn_n_s32(tmp4_l, DESCALE_P1), - vrshrn_n_s32(tmp4_h, DESCALE_P1)); - - tmp5_l = vaddq_s32(tmp5_l, z2_l); - tmp5_h = vaddq_s32(tmp5_h, z2_h); - tmp5_l = vaddq_s32(tmp5_l, z4_l); - tmp5_h = vaddq_s32(tmp5_h, z4_h); - col5 = vcombine_s16(vrshrn_n_s32(tmp5_l, DESCALE_P1), - vrshrn_n_s32(tmp5_h, DESCALE_P1)); - - tmp6_l = vaddq_s32(tmp6_l, z2_l); - tmp6_h = vaddq_s32(tmp6_h, z2_h); - tmp6_l = vaddq_s32(tmp6_l, z3_l); - tmp6_h = vaddq_s32(tmp6_h, z3_h); - col3 = vcombine_s16(vrshrn_n_s32(tmp6_l, DESCALE_P1), - vrshrn_n_s32(tmp6_h, DESCALE_P1)); - - tmp7_l = vaddq_s32(tmp7_l, z1_l); - tmp7_h = vaddq_s32(tmp7_h, z1_h); - tmp7_l = vaddq_s32(tmp7_l, z4_l); - tmp7_h = vaddq_s32(tmp7_h, z4_h); - col1 = vcombine_s16(vrshrn_n_s32(tmp7_l, DESCALE_P1), - vrshrn_n_s32(tmp7_h, DESCALE_P1)); - - /* Transpose to work on columns in pass 2. */ - int16x8x2_t cols_01 = vtrnq_s16(col0, col1); - int16x8x2_t cols_23 = vtrnq_s16(col2, col3); - int16x8x2_t cols_45 = vtrnq_s16(col4, col5); - int16x8x2_t cols_67 = vtrnq_s16(col6, col7); - - int32x4x2_t cols_0145_l = vtrnq_s32(vreinterpretq_s32_s16(cols_01.val[0]), - vreinterpretq_s32_s16(cols_45.val[0])); - int32x4x2_t cols_0145_h = vtrnq_s32(vreinterpretq_s32_s16(cols_01.val[1]), - vreinterpretq_s32_s16(cols_45.val[1])); - int32x4x2_t cols_2367_l = vtrnq_s32(vreinterpretq_s32_s16(cols_23.val[0]), - vreinterpretq_s32_s16(cols_67.val[0])); - int32x4x2_t cols_2367_h = vtrnq_s32(vreinterpretq_s32_s16(cols_23.val[1]), - vreinterpretq_s32_s16(cols_67.val[1])); - - int32x4x2_t rows_04 = vzipq_s32(cols_0145_l.val[0], cols_2367_l.val[0]); - int32x4x2_t rows_15 = vzipq_s32(cols_0145_h.val[0], cols_2367_h.val[0]); - int32x4x2_t rows_26 = vzipq_s32(cols_0145_l.val[1], cols_2367_l.val[1]); - int32x4x2_t rows_37 = vzipq_s32(cols_0145_h.val[1], cols_2367_h.val[1]); - - int16x8_t row0 = vreinterpretq_s16_s32(rows_04.val[0]); - int16x8_t row1 = vreinterpretq_s16_s32(rows_15.val[0]); - int16x8_t row2 = vreinterpretq_s16_s32(rows_26.val[0]); - int16x8_t row3 = vreinterpretq_s16_s32(rows_37.val[0]); - int16x8_t row4 = vreinterpretq_s16_s32(rows_04.val[1]); - int16x8_t row5 = vreinterpretq_s16_s32(rows_15.val[1]); - int16x8_t row6 = vreinterpretq_s16_s32(rows_26.val[1]); - int16x8_t row7 = vreinterpretq_s16_s32(rows_37.val[1]); - - /* Pass 2. */ - tmp0 = vaddq_s16(row0, row7); - tmp7 = vsubq_s16(row0, row7); - tmp1 = vaddq_s16(row1, row6); - tmp6 = vsubq_s16(row1, row6); - tmp2 = vaddq_s16(row2, row5); - tmp5 = vsubq_s16(row2, row5); - tmp3 = vaddq_s16(row3, row4); - tmp4 = vsubq_s16(row3, row4); - - /* Even part. */ - tmp10 = vaddq_s16(tmp0, tmp3); - tmp13 = vsubq_s16(tmp0, tmp3); - tmp11 = vaddq_s16(tmp1, tmp2); - tmp12 = vsubq_s16(tmp1, tmp2); - - row0 = vrshrq_n_s16(vaddq_s16(tmp10, tmp11), PASS1_BITS); - row4 = vrshrq_n_s16(vsubq_s16(tmp10, tmp11), PASS1_BITS); - - tmp12_add_tmp13 = vaddq_s16(tmp12, tmp13); - z1_l = vmull_lane_s16(vget_low_s16(tmp12_add_tmp13), consts.val[0], 2); - z1_h = vmull_lane_s16(vget_high_s16(tmp12_add_tmp13), consts.val[0], 2); - - int32x4_t row2_scaled_l = vmlal_lane_s16(z1_l, vget_low_s16(tmp13), - consts.val[0], 3); - int32x4_t row2_scaled_h = vmlal_lane_s16(z1_h, vget_high_s16(tmp13), - consts.val[0], 3); - row2 = vcombine_s16(vrshrn_n_s32(row2_scaled_l, DESCALE_P2), - vrshrn_n_s32(row2_scaled_h, DESCALE_P2)); - - int32x4_t row6_scaled_l = vmlal_lane_s16(z1_l, vget_low_s16(tmp12), - consts.val[1], 3); - int32x4_t row6_scaled_h = vmlal_lane_s16(z1_h, vget_high_s16(tmp12), - consts.val[1], 3); - row6 = vcombine_s16(vrshrn_n_s32(row6_scaled_l, DESCALE_P2), - vrshrn_n_s32(row6_scaled_h, DESCALE_P2)); - - /* Odd part. */ - z1 = vaddq_s16(tmp4, tmp7); - z2 = vaddq_s16(tmp5, tmp6); - z3 = vaddq_s16(tmp4, tmp6); - z4 = vaddq_s16(tmp5, tmp7); - /* sqrt(2) * c3 */ - z5_l = vmull_lane_s16(vget_low_s16(z3), consts.val[1], 1); - z5_h = vmull_lane_s16(vget_high_s16(z3), consts.val[1], 1); - z5_l = vmlal_lane_s16(z5_l, vget_low_s16(z4), consts.val[1], 1); - z5_h = vmlal_lane_s16(z5_h, vget_high_s16(z4), consts.val[1], 1); - - /* sqrt(2) * (-c1+c3+c5-c7) */ - tmp4_l = vmull_lane_s16(vget_low_s16(tmp4), consts.val[0], 0); - tmp4_h = vmull_lane_s16(vget_high_s16(tmp4), consts.val[0], 0); - /* sqrt(2) * ( c1+c3-c5+c7) */ - tmp5_l = vmull_lane_s16(vget_low_s16(tmp5), consts.val[2], 1); - tmp5_h = vmull_lane_s16(vget_high_s16(tmp5), consts.val[2], 1); - /* sqrt(2) * ( c1+c3+c5-c7) */ - tmp6_l = vmull_lane_s16(vget_low_s16(tmp6), consts.val[2], 3); - tmp6_h = vmull_lane_s16(vget_high_s16(tmp6), consts.val[2], 3); - /* sqrt(2) * ( c1+c3-c5-c7) */ - tmp7_l = vmull_lane_s16(vget_low_s16(tmp7), consts.val[1], 2); - tmp7_h = vmull_lane_s16(vget_high_s16(tmp7), consts.val[1], 2); - - /* sqrt(2) * (c7-c3) */ - z1_l = vmull_lane_s16(vget_low_s16(z1), consts.val[1], 0); - z1_h = vmull_lane_s16(vget_high_s16(z1), consts.val[1], 0); - /* sqrt(2) * (-c1-c3) */ - z2_l = vmull_lane_s16(vget_low_s16(z2), consts.val[2], 2); - z2_h = vmull_lane_s16(vget_high_s16(z2), consts.val[2], 2); - /* sqrt(2) * (-c3-c5) */ - z3_l = vmull_lane_s16(vget_low_s16(z3), consts.val[2], 0); - z3_h = vmull_lane_s16(vget_high_s16(z3), consts.val[2], 0); - /* sqrt(2) * (c5-c3) */ - z4_l = vmull_lane_s16(vget_low_s16(z4), consts.val[0], 1); - z4_h = vmull_lane_s16(vget_high_s16(z4), consts.val[0], 1); - - z3_l = vaddq_s32(z3_l, z5_l); - z3_h = vaddq_s32(z3_h, z5_h); - z4_l = vaddq_s32(z4_l, z5_l); - z4_h = vaddq_s32(z4_h, z5_h); - - tmp4_l = vaddq_s32(tmp4_l, z1_l); - tmp4_h = vaddq_s32(tmp4_h, z1_h); - tmp4_l = vaddq_s32(tmp4_l, z3_l); - tmp4_h = vaddq_s32(tmp4_h, z3_h); - row7 = vcombine_s16(vrshrn_n_s32(tmp4_l, DESCALE_P2), - vrshrn_n_s32(tmp4_h, DESCALE_P2)); - - tmp5_l = vaddq_s32(tmp5_l, z2_l); - tmp5_h = vaddq_s32(tmp5_h, z2_h); - tmp5_l = vaddq_s32(tmp5_l, z4_l); - tmp5_h = vaddq_s32(tmp5_h, z4_h); - row5 = vcombine_s16(vrshrn_n_s32(tmp5_l, DESCALE_P2), - vrshrn_n_s32(tmp5_h, DESCALE_P2)); - - tmp6_l = vaddq_s32(tmp6_l, z2_l); - tmp6_h = vaddq_s32(tmp6_h, z2_h); - tmp6_l = vaddq_s32(tmp6_l, z3_l); - tmp6_h = vaddq_s32(tmp6_h, z3_h); - row3 = vcombine_s16(vrshrn_n_s32(tmp6_l, DESCALE_P2), - vrshrn_n_s32(tmp6_h, DESCALE_P2)); - - tmp7_l = vaddq_s32(tmp7_l, z1_l); - tmp7_h = vaddq_s32(tmp7_h, z1_h); - tmp7_l = vaddq_s32(tmp7_l, z4_l); - tmp7_h = vaddq_s32(tmp7_h, z4_h); - row1 = vcombine_s16(vrshrn_n_s32(tmp7_l, DESCALE_P2), - vrshrn_n_s32(tmp7_h, DESCALE_P2)); - - vst1q_s16(data + 0 * DCTSIZE, row0); - vst1q_s16(data + 1 * DCTSIZE, row1); - vst1q_s16(data + 2 * DCTSIZE, row2); - vst1q_s16(data + 3 * DCTSIZE, row3); - vst1q_s16(data + 4 * DCTSIZE, row4); - vst1q_s16(data + 5 * DCTSIZE, row5); - vst1q_s16(data + 6 * DCTSIZE, row6); - vst1q_s16(data + 7 * DCTSIZE, row7); -} diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/common/jidctfst-neon.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/common/jidctfst-neon.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/common/jidctfst-neon.c 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/common/jidctfst-neon.c 1970-01-01 01:00:00.000000000 +0100 @@ -1,454 +0,0 @@ -/* - * jidctfst-neon.c - fast IDCT (Arm NEON) - * - * Copyright 2019 The Chromium Authors. All Rights Reserved. - * - * This software is provided 'as-is', without any express or implied - * warranty. In no event will the authors be held liable for any damages - * arising from the use of this software. - * - * Permission is granted to anyone to use this software for any purpose, - * including commercial applications, and to alter it and redistribute it - * freely, subject to the following restrictions: - * - * 1. The origin of this software must not be misrepresented; you must not - * claim that you wrote the original software. If you use this software - * in a product, an acknowledgment in the product documentation would be - * appreciated but is not required. - * 2. Altered source versions must be plainly marked as such, and must not be - * misrepresented as being the original software. - * 3. This notice may not be removed or altered from any source distribution. - */ - -#define JPEG_INTERNALS -#include "../../../jinclude.h" -#include "../../../jpeglib.h" -#include "../../../jsimd.h" -#include "../../../jdct.h" -#include "../../../jsimddct.h" -#include "../../jsimd.h" - -#include - -/* - * 'jsimd_idct_ifast_neon' performs dequantization and a fast, not so accurate - * inverse DCT (Discrete Cosine Transform) on one block of coefficients. It - * uses the same calculations and produces exactly the same output as IJG's - * original 'jpeg_idct_ifast' function, which can be found in jidctfst.c. - * - * Scaled integer constants are used to avoid floating-point arithmetic: - * 0.082392200 = 2688 * 2^-15 - * 0.414213562 = 13568 * 2^-15 - * 0.847759065 = 27776 * 2^-15 - * 0.613125930 = 20096 * 2^-15 - * - * See jidctfst.c for further details of the IDCT algorithm. Where possible, - * the variable names and comments here in 'jsimd_idct_ifast_neon' match up - * with those in 'jpeg_idct_ifast'. - */ - -#define PASS1_BITS 2 - -#define F_0_082 2688 -#define F_0_414 13568 -#define F_0_847 27776 -#define F_0_613 20096 - -void jsimd_idct_ifast_neon(void *dct_table, - JCOEFPTR coef_block, - JSAMPARRAY output_buf, - JDIMENSION output_col) -{ - IFAST_MULT_TYPE *quantptr = dct_table; - - /* Load DCT coefficients. */ - int16x8_t row0 = vld1q_s16(coef_block + 0 * DCTSIZE); - int16x8_t row1 = vld1q_s16(coef_block + 1 * DCTSIZE); - int16x8_t row2 = vld1q_s16(coef_block + 2 * DCTSIZE); - int16x8_t row3 = vld1q_s16(coef_block + 3 * DCTSIZE); - int16x8_t row4 = vld1q_s16(coef_block + 4 * DCTSIZE); - int16x8_t row5 = vld1q_s16(coef_block + 5 * DCTSIZE); - int16x8_t row6 = vld1q_s16(coef_block + 6 * DCTSIZE); - int16x8_t row7 = vld1q_s16(coef_block + 7 * DCTSIZE); - - /* Load quantization table values for DC coefficients. */ - int16x8_t quant_row0 = vld1q_s16(quantptr + 0 * DCTSIZE); - /* Dequantize DC coefficients. */ - row0 = vmulq_s16(row0, quant_row0); - - /* Construct bitmap to test if all AC coefficients are 0. */ - int16x8_t bitmap = vorrq_s16(row1, row2); - bitmap = vorrq_s16(bitmap, row3); - bitmap = vorrq_s16(bitmap, row4); - bitmap = vorrq_s16(bitmap, row5); - bitmap = vorrq_s16(bitmap, row6); - bitmap = vorrq_s16(bitmap, row7); - - int64_t left_ac_bitmap = vgetq_lane_s64(vreinterpretq_s64_s16(bitmap), 0); - int64_t right_ac_bitmap = vgetq_lane_s64(vreinterpretq_s64_s16(bitmap), 1); - - if (left_ac_bitmap == 0 && right_ac_bitmap == 0) { - /* All AC coefficients are zero. */ - /* Compute DC values and duplicate into vectors. */ - int16x8_t dcval = row0; - row1 = dcval; - row2 = dcval; - row3 = dcval; - row4 = dcval; - row5 = dcval; - row6 = dcval; - row7 = dcval; - } else if (left_ac_bitmap == 0) { - /* AC coefficients are zero for columns 0, 1, 2 and 3. */ - /* Use DC values for these columns. */ - int16x4_t dcval = vget_low_s16(row0); - - /* Commence regular fast IDCT computation for columns 4, 5, 6 and 7. */ - /* Load quantization table.*/ - int16x4_t quant_row1 = vld1_s16(quantptr + 1 * DCTSIZE + 4); - int16x4_t quant_row2 = vld1_s16(quantptr + 2 * DCTSIZE + 4); - int16x4_t quant_row3 = vld1_s16(quantptr + 3 * DCTSIZE + 4); - int16x4_t quant_row4 = vld1_s16(quantptr + 4 * DCTSIZE + 4); - int16x4_t quant_row5 = vld1_s16(quantptr + 5 * DCTSIZE + 4); - int16x4_t quant_row6 = vld1_s16(quantptr + 6 * DCTSIZE + 4); - int16x4_t quant_row7 = vld1_s16(quantptr + 7 * DCTSIZE + 4); - - /* Even part: dequantize DCT coefficients. */ - int16x4_t tmp0 = vget_high_s16(row0); - int16x4_t tmp1 = vmul_s16(vget_high_s16(row2), quant_row2); - int16x4_t tmp2 = vmul_s16(vget_high_s16(row4), quant_row4); - int16x4_t tmp3 = vmul_s16(vget_high_s16(row6), quant_row6); - - int16x4_t tmp10 = vadd_s16(tmp0, tmp2); /* phase 3 */ - int16x4_t tmp11 = vsub_s16(tmp0, tmp2); - - int16x4_t tmp13 = vadd_s16(tmp1, tmp3); /* phases 5-3 */ - int16x4_t tmp1_sub_tmp3 = vsub_s16(tmp1, tmp3); - int16x4_t tmp12 = vqdmulh_n_s16(tmp1_sub_tmp3, F_0_414); - tmp12 = vadd_s16(tmp12, tmp1_sub_tmp3); - tmp12 = vsub_s16(tmp12, tmp13); - - tmp0 = vadd_s16(tmp10, tmp13); /* phase 2 */ - tmp3 = vsub_s16(tmp10, tmp13); - tmp1 = vadd_s16(tmp11, tmp12); - tmp2 = vsub_s16(tmp11, tmp12); - - /* Odd part: dequantize DCT coefficients. */ - int16x4_t tmp4 = vmul_s16(vget_high_s16(row1), quant_row1); - int16x4_t tmp5 = vmul_s16(vget_high_s16(row3), quant_row3); - int16x4_t tmp6 = vmul_s16(vget_high_s16(row5), quant_row5); - int16x4_t tmp7 = vmul_s16(vget_high_s16(row7), quant_row7); - - int16x4_t z13 = vadd_s16(tmp6, tmp5); /* phase 6 */ - int16x4_t neg_z10 = vsub_s16(tmp5, tmp6); - int16x4_t z11 = vadd_s16(tmp4, tmp7); - int16x4_t z12 = vsub_s16(tmp4, tmp7); - - tmp7 = vadd_s16(z11, z13); /* phase 5 */ - int16x4_t z11_sub_z13 = vsub_s16(z11, z13); - tmp11 = vqdmulh_n_s16(z11_sub_z13, F_0_414); - tmp11 = vadd_s16(tmp11, z11_sub_z13); - - int16x4_t z10_add_z12 = vsub_s16(z12, neg_z10); - int16x4_t z5 = vqdmulh_n_s16(z10_add_z12, F_0_847); - z5 = vadd_s16(z5, z10_add_z12); - tmp10 = vqdmulh_n_s16(z12, F_0_082); - tmp10 = vadd_s16(tmp10, z12); - tmp10 = vsub_s16(tmp10, z5); - tmp12 = vqdmulh_n_s16(neg_z10, F_0_613); - tmp12 = vadd_s16(tmp12, vadd_s16(neg_z10, neg_z10)); - tmp12 = vadd_s16(tmp12, z5); - - tmp6 = vsub_s16(tmp12, tmp7); /* phase 2 */ - tmp5 = vsub_s16(tmp11, tmp6); - tmp4 = vadd_s16(tmp10, tmp5); - - row0 = vcombine_s16(dcval, vadd_s16(tmp0, tmp7)); - row7 = vcombine_s16(dcval, vsub_s16(tmp0, tmp7)); - row1 = vcombine_s16(dcval, vadd_s16(tmp1, tmp6)); - row6 = vcombine_s16(dcval, vsub_s16(tmp1, tmp6)); - row2 = vcombine_s16(dcval, vadd_s16(tmp2, tmp5)); - row5 = vcombine_s16(dcval, vsub_s16(tmp2, tmp5)); - row4 = vcombine_s16(dcval, vadd_s16(tmp3, tmp4)); - row3 = vcombine_s16(dcval, vsub_s16(tmp3, tmp4)); - } else if (right_ac_bitmap == 0) { - /* AC coefficients are zero for columns 4, 5, 6 and 7. */ - /* Use DC values for these columns. */ - int16x4_t dcval = vget_high_s16(row0); - - /* Commence regular fast IDCT computation for columns 0, 1, 2 and 3. */ - /* Load quantization table.*/ - int16x4_t quant_row1 = vld1_s16(quantptr + 1 * DCTSIZE); - int16x4_t quant_row2 = vld1_s16(quantptr + 2 * DCTSIZE); - int16x4_t quant_row3 = vld1_s16(quantptr + 3 * DCTSIZE); - int16x4_t quant_row4 = vld1_s16(quantptr + 4 * DCTSIZE); - int16x4_t quant_row5 = vld1_s16(quantptr + 5 * DCTSIZE); - int16x4_t quant_row6 = vld1_s16(quantptr + 6 * DCTSIZE); - int16x4_t quant_row7 = vld1_s16(quantptr + 7 * DCTSIZE); - - /* Even part: dequantize DCT coefficients. */ - int16x4_t tmp0 = vget_low_s16(row0); - int16x4_t tmp1 = vmul_s16(vget_low_s16(row2), quant_row2); - int16x4_t tmp2 = vmul_s16(vget_low_s16(row4), quant_row4); - int16x4_t tmp3 = vmul_s16(vget_low_s16(row6), quant_row6); - - int16x4_t tmp10 = vadd_s16(tmp0, tmp2); /* phase 3 */ - int16x4_t tmp11 = vsub_s16(tmp0, tmp2); - - int16x4_t tmp13 = vadd_s16(tmp1, tmp3); /* phases 5-3 */ - int16x4_t tmp1_sub_tmp3 = vsub_s16(tmp1, tmp3); - int16x4_t tmp12 = vqdmulh_n_s16(tmp1_sub_tmp3, F_0_414); - tmp12 = vadd_s16(tmp12, tmp1_sub_tmp3); - tmp12 = vsub_s16(tmp12, tmp13); - - tmp0 = vadd_s16(tmp10, tmp13); /* phase 2 */ - tmp3 = vsub_s16(tmp10, tmp13); - tmp1 = vadd_s16(tmp11, tmp12); - tmp2 = vsub_s16(tmp11, tmp12); - - /* Odd part: dequantize DCT coefficients. */ - int16x4_t tmp4 = vmul_s16(vget_low_s16(row1), quant_row1); - int16x4_t tmp5 = vmul_s16(vget_low_s16(row3), quant_row3); - int16x4_t tmp6 = vmul_s16(vget_low_s16(row5), quant_row5); - int16x4_t tmp7 = vmul_s16(vget_low_s16(row7), quant_row7); - - int16x4_t z13 = vadd_s16(tmp6, tmp5); /* phase 6 */ - int16x4_t neg_z10 = vsub_s16(tmp5, tmp6); - int16x4_t z11 = vadd_s16(tmp4, tmp7); - int16x4_t z12 = vsub_s16(tmp4, tmp7); - - tmp7 = vadd_s16(z11, z13); /* phase 5 */ - int16x4_t z11_sub_z13 = vsub_s16(z11, z13); - tmp11 = vqdmulh_n_s16(z11_sub_z13, F_0_414); - tmp11 = vadd_s16(tmp11, z11_sub_z13); - - int16x4_t z10_add_z12 = vsub_s16(z12, neg_z10); - int16x4_t z5 = vqdmulh_n_s16(z10_add_z12, F_0_847); - z5 = vadd_s16(z5, z10_add_z12); - tmp10 = vqdmulh_n_s16(z12, F_0_082); - tmp10 = vadd_s16(tmp10, z12); - tmp10 = vsub_s16(tmp10, z5); - tmp12 = vqdmulh_n_s16(neg_z10, F_0_613); - tmp12 = vadd_s16(tmp12, vadd_s16(neg_z10, neg_z10)); - tmp12 = vadd_s16(tmp12, z5); - - tmp6 = vsub_s16(tmp12, tmp7); /* phase 2 */ - tmp5 = vsub_s16(tmp11, tmp6); - tmp4 = vadd_s16(tmp10, tmp5); - - row0 = vcombine_s16(vadd_s16(tmp0, tmp7), dcval); - row7 = vcombine_s16(vsub_s16(tmp0, tmp7), dcval); - row1 = vcombine_s16(vadd_s16(tmp1, tmp6), dcval); - row6 = vcombine_s16(vsub_s16(tmp1, tmp6), dcval); - row2 = vcombine_s16(vadd_s16(tmp2, tmp5), dcval); - row5 = vcombine_s16(vsub_s16(tmp2, tmp5), dcval); - row4 = vcombine_s16(vadd_s16(tmp3, tmp4), dcval); - row3 = vcombine_s16(vsub_s16(tmp3, tmp4), dcval); - } else { - /* Some AC coefficients are non-zero; full IDCT calculation required. */ - /* Load quantization table.*/ - int16x8_t quant_row1 = vld1q_s16(quantptr + 1 * DCTSIZE); - int16x8_t quant_row2 = vld1q_s16(quantptr + 2 * DCTSIZE); - int16x8_t quant_row3 = vld1q_s16(quantptr + 3 * DCTSIZE); - int16x8_t quant_row4 = vld1q_s16(quantptr + 4 * DCTSIZE); - int16x8_t quant_row5 = vld1q_s16(quantptr + 5 * DCTSIZE); - int16x8_t quant_row6 = vld1q_s16(quantptr + 6 * DCTSIZE); - int16x8_t quant_row7 = vld1q_s16(quantptr + 7 * DCTSIZE); - - /* Even part: dequantize DCT coefficients. */ - int16x8_t tmp0 = row0; - int16x8_t tmp1 = vmulq_s16(row2, quant_row2); - int16x8_t tmp2 = vmulq_s16(row4, quant_row4); - int16x8_t tmp3 = vmulq_s16(row6, quant_row6); - - int16x8_t tmp10 = vaddq_s16(tmp0, tmp2); /* phase 3 */ - int16x8_t tmp11 = vsubq_s16(tmp0, tmp2); - - int16x8_t tmp13 = vaddq_s16(tmp1, tmp3); /* phases 5-3 */ - int16x8_t tmp1_sub_tmp3 = vsubq_s16(tmp1, tmp3); - int16x8_t tmp12 = vqdmulhq_n_s16(tmp1_sub_tmp3, F_0_414); - tmp12 = vaddq_s16(tmp12, tmp1_sub_tmp3); - tmp12 = vsubq_s16(tmp12, tmp13); - - tmp0 = vaddq_s16(tmp10, tmp13); /* phase 2 */ - tmp3 = vsubq_s16(tmp10, tmp13); - tmp1 = vaddq_s16(tmp11, tmp12); - tmp2 = vsubq_s16(tmp11, tmp12); - - /* Odd part: dequantize DCT coefficients. */ - int16x8_t tmp4 = vmulq_s16(row1, quant_row1); - int16x8_t tmp5 = vmulq_s16(row3, quant_row3); - int16x8_t tmp6 = vmulq_s16(row5, quant_row5); - int16x8_t tmp7 = vmulq_s16(row7, quant_row7); - - int16x8_t z13 = vaddq_s16(tmp6, tmp5); /* phase 6 */ - int16x8_t neg_z10 = vsubq_s16(tmp5, tmp6); - int16x8_t z11 = vaddq_s16(tmp4, tmp7); - int16x8_t z12 = vsubq_s16(tmp4, tmp7); - - tmp7 = vaddq_s16(z11, z13); /* phase 5 */ - int16x8_t z11_sub_z13 = vsubq_s16(z11, z13); - tmp11 = vqdmulhq_n_s16(z11_sub_z13, F_0_414); - tmp11 = vaddq_s16(tmp11, z11_sub_z13); - - int16x8_t z10_add_z12 = vsubq_s16(z12, neg_z10); - int16x8_t z5 = vqdmulhq_n_s16(z10_add_z12, F_0_847); - z5 = vaddq_s16(z5, z10_add_z12); - tmp10 = vqdmulhq_n_s16(z12, F_0_082); - tmp10 = vaddq_s16(tmp10, z12); - tmp10 = vsubq_s16(tmp10, z5); - tmp12 = vqdmulhq_n_s16(neg_z10, F_0_613); - tmp12 = vaddq_s16(tmp12, vaddq_s16(neg_z10, neg_z10)); - tmp12 = vaddq_s16(tmp12, z5); - - tmp6 = vsubq_s16(tmp12, tmp7); /* phase 2 */ - tmp5 = vsubq_s16(tmp11, tmp6); - tmp4 = vaddq_s16(tmp10, tmp5); - - row0 = vaddq_s16(tmp0, tmp7); - row7 = vsubq_s16(tmp0, tmp7); - row1 = vaddq_s16(tmp1, tmp6); - row6 = vsubq_s16(tmp1, tmp6); - row2 = vaddq_s16(tmp2, tmp5); - row5 = vsubq_s16(tmp2, tmp5); - row4 = vaddq_s16(tmp3, tmp4); - row3 = vsubq_s16(tmp3, tmp4); - } - - /* Tranpose rows to work on columns in pass 2. */ - int16x8x2_t rows_01 = vtrnq_s16(row0, row1); - int16x8x2_t rows_23 = vtrnq_s16(row2, row3); - int16x8x2_t rows_45 = vtrnq_s16(row4, row5); - int16x8x2_t rows_67 = vtrnq_s16(row6, row7); - - int32x4x2_t rows_0145_l = vtrnq_s32(vreinterpretq_s32_s16(rows_01.val[0]), - vreinterpretq_s32_s16(rows_45.val[0])); - int32x4x2_t rows_0145_h = vtrnq_s32(vreinterpretq_s32_s16(rows_01.val[1]), - vreinterpretq_s32_s16(rows_45.val[1])); - int32x4x2_t rows_2367_l = vtrnq_s32(vreinterpretq_s32_s16(rows_23.val[0]), - vreinterpretq_s32_s16(rows_67.val[0])); - int32x4x2_t rows_2367_h = vtrnq_s32(vreinterpretq_s32_s16(rows_23.val[1]), - vreinterpretq_s32_s16(rows_67.val[1])); - - int32x4x2_t cols_04 = vzipq_s32(rows_0145_l.val[0], rows_2367_l.val[0]); - int32x4x2_t cols_15 = vzipq_s32(rows_0145_h.val[0], rows_2367_h.val[0]); - int32x4x2_t cols_26 = vzipq_s32(rows_0145_l.val[1], rows_2367_l.val[1]); - int32x4x2_t cols_37 = vzipq_s32(rows_0145_h.val[1], rows_2367_h.val[1]); - - int16x8_t col0 = vreinterpretq_s16_s32(cols_04.val[0]); - int16x8_t col1 = vreinterpretq_s16_s32(cols_15.val[0]); - int16x8_t col2 = vreinterpretq_s16_s32(cols_26.val[0]); - int16x8_t col3 = vreinterpretq_s16_s32(cols_37.val[0]); - int16x8_t col4 = vreinterpretq_s16_s32(cols_04.val[1]); - int16x8_t col5 = vreinterpretq_s16_s32(cols_15.val[1]); - int16x8_t col6 = vreinterpretq_s16_s32(cols_26.val[1]); - int16x8_t col7 = vreinterpretq_s16_s32(cols_37.val[1]); - - /* 1-D IDCT, pass 2. */ - /* Even part. */ - int16x8_t tmp10 = vaddq_s16(col0, col4); - int16x8_t tmp11 = vsubq_s16(col0, col4); - - int16x8_t tmp13 = vaddq_s16(col2, col6); - int16x8_t col2_sub_col6 = vsubq_s16(col2, col6); - int16x8_t tmp12 = vqdmulhq_n_s16(col2_sub_col6, F_0_414); - tmp12 = vaddq_s16(tmp12, col2_sub_col6); - tmp12 = vsubq_s16(tmp12, tmp13); - - int16x8_t tmp0 = vaddq_s16(tmp10, tmp13); - int16x8_t tmp3 = vsubq_s16(tmp10, tmp13); - int16x8_t tmp1 = vaddq_s16(tmp11, tmp12); - int16x8_t tmp2 = vsubq_s16(tmp11, tmp12); - - /* Odd part. */ - int16x8_t z13 = vaddq_s16(col5, col3); - int16x8_t neg_z10 = vsubq_s16(col3, col5); - int16x8_t z11 = vaddq_s16(col1, col7); - int16x8_t z12 = vsubq_s16(col1, col7); - - int16x8_t tmp7 = vaddq_s16(z11, z13); /* phase 5 */ - int16x8_t z11_sub_z13 = vsubq_s16(z11, z13); - tmp11 = vqdmulhq_n_s16(z11_sub_z13, F_0_414); - tmp11 = vaddq_s16(tmp11, z11_sub_z13); - - int16x8_t z10_add_z12 = vsubq_s16(z12, neg_z10); - int16x8_t z5 = vqdmulhq_n_s16(z10_add_z12, F_0_847); - z5 = vaddq_s16(z5, z10_add_z12); - tmp10 = vqdmulhq_n_s16(z12, F_0_082); - tmp10 = vaddq_s16(tmp10, z12); - tmp10 = vsubq_s16(tmp10, z5); - tmp12 = vqdmulhq_n_s16(neg_z10, F_0_613); - tmp12 = vaddq_s16(tmp12, vaddq_s16(neg_z10, neg_z10)); - tmp12 = vaddq_s16(tmp12, z5); - - int16x8_t tmp6 = vsubq_s16(tmp12, tmp7); /* phase 2 */ - int16x8_t tmp5 = vsubq_s16(tmp11, tmp6); - int16x8_t tmp4 = vaddq_s16(tmp10, tmp5); - - col0 = vaddq_s16(tmp0, tmp7); - col7 = vsubq_s16(tmp0, tmp7); - col1 = vaddq_s16(tmp1, tmp6); - col6 = vsubq_s16(tmp1, tmp6); - col2 = vaddq_s16(tmp2, tmp5); - col5 = vsubq_s16(tmp2, tmp5); - col4 = vaddq_s16(tmp3, tmp4); - col3 = vsubq_s16(tmp3, tmp4); - - /* Scale down by factor of 8, narrowing to 8-bit. */ - int8x16_t cols_01_s8 = vcombine_s8(vqshrn_n_s16(col0, PASS1_BITS + 3), - vqshrn_n_s16(col1, PASS1_BITS + 3)); - int8x16_t cols_45_s8 = vcombine_s8(vqshrn_n_s16(col4, PASS1_BITS + 3), - vqshrn_n_s16(col5, PASS1_BITS + 3)); - int8x16_t cols_23_s8 = vcombine_s8(vqshrn_n_s16(col2, PASS1_BITS + 3), - vqshrn_n_s16(col3, PASS1_BITS + 3)); - int8x16_t cols_67_s8 = vcombine_s8(vqshrn_n_s16(col6, PASS1_BITS + 3), - vqshrn_n_s16(col7, PASS1_BITS + 3)); - /* Clamp to range [0-255]. */ - uint8x16_t cols_01 = vreinterpretq_u8_s8( - vaddq_s8(cols_01_s8, vreinterpretq_s8_u8(vdupq_n_u8(CENTERJSAMPLE)))); - uint8x16_t cols_45 = vreinterpretq_u8_s8( - vaddq_s8(cols_45_s8, vreinterpretq_s8_u8(vdupq_n_u8(CENTERJSAMPLE)))); - uint8x16_t cols_23 = vreinterpretq_u8_s8( - vaddq_s8(cols_23_s8, vreinterpretq_s8_u8(vdupq_n_u8(CENTERJSAMPLE)))); - uint8x16_t cols_67 = vreinterpretq_u8_s8( - vaddq_s8(cols_67_s8, vreinterpretq_s8_u8(vdupq_n_u8(CENTERJSAMPLE)))); - - /* Transpose block ready for store. */ - uint32x4x2_t cols_0415 = vzipq_u32(vreinterpretq_u32_u8(cols_01), - vreinterpretq_u32_u8(cols_45)); - uint32x4x2_t cols_2637 = vzipq_u32(vreinterpretq_u32_u8(cols_23), - vreinterpretq_u32_u8(cols_67)); - - uint8x16x2_t cols_0145 = vtrnq_u8(vreinterpretq_u8_u32(cols_0415.val[0]), - vreinterpretq_u8_u32(cols_0415.val[1])); - uint8x16x2_t cols_2367 = vtrnq_u8(vreinterpretq_u8_u32(cols_2637.val[0]), - vreinterpretq_u8_u32(cols_2637.val[1])); - uint16x8x2_t rows_0426 = vtrnq_u16(vreinterpretq_u16_u8(cols_0145.val[0]), - vreinterpretq_u16_u8(cols_2367.val[0])); - uint16x8x2_t rows_1537 = vtrnq_u16(vreinterpretq_u16_u8(cols_0145.val[1]), - vreinterpretq_u16_u8(cols_2367.val[1])); - - uint8x16_t rows_04 = vreinterpretq_u8_u16(rows_0426.val[0]); - uint8x16_t rows_15 = vreinterpretq_u8_u16(rows_1537.val[0]); - uint8x16_t rows_26 = vreinterpretq_u8_u16(rows_0426.val[1]); - uint8x16_t rows_37 = vreinterpretq_u8_u16(rows_1537.val[1]); - - JSAMPROW outptr0 = output_buf[0] + output_col; - JSAMPROW outptr1 = output_buf[1] + output_col; - JSAMPROW outptr2 = output_buf[2] + output_col; - JSAMPROW outptr3 = output_buf[3] + output_col; - JSAMPROW outptr4 = output_buf[4] + output_col; - JSAMPROW outptr5 = output_buf[5] + output_col; - JSAMPROW outptr6 = output_buf[6] + output_col; - JSAMPROW outptr7 = output_buf[7] + output_col; - - /* Store DCT block to memory. */ - vst1q_lane_u64((uint64_t *)outptr0, vreinterpretq_u64_u8(rows_04), 0); - vst1q_lane_u64((uint64_t *)outptr1, vreinterpretq_u64_u8(rows_15), 0); - vst1q_lane_u64((uint64_t *)outptr2, vreinterpretq_u64_u8(rows_26), 0); - vst1q_lane_u64((uint64_t *)outptr3, vreinterpretq_u64_u8(rows_37), 0); - vst1q_lane_u64((uint64_t *)outptr4, vreinterpretq_u64_u8(rows_04), 1); - vst1q_lane_u64((uint64_t *)outptr5, vreinterpretq_u64_u8(rows_15), 1); - vst1q_lane_u64((uint64_t *)outptr6, vreinterpretq_u64_u8(rows_26), 1); - vst1q_lane_u64((uint64_t *)outptr7, vreinterpretq_u64_u8(rows_37), 1); -} diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/common/jidctint-neon.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/common/jidctint-neon.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/common/jidctint-neon.c 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/common/jidctint-neon.c 1970-01-01 01:00:00.000000000 +0100 @@ -1,782 +0,0 @@ -/* - * jidctint-neon.c - slow IDCT (Arm NEON) - * - * Copyright 2019 The Chromium Authors. All Rights Reserved. - * - * This software is provided 'as-is', without any express or implied - * warranty. In no event will the authors be held liable for any damages - * arising from the use of this software. - * - * Permission is granted to anyone to use this software for any purpose, - * including commercial applications, and to alter it and redistribute it - * freely, subject to the following restrictions: - * - * 1. The origin of this software must not be misrepresented; you must not - * claim that you wrote the original software. If you use this software - * in a product, an acknowledgment in the product documentation would be - * appreciated but is not required. - * 2. Altered source versions must be plainly marked as such, and must not be - * misrepresented as being the original software. - * 3. This notice may not be removed or altered from any source distribution. - */ - -#define JPEG_INTERNALS -#include "../../../jconfigint.h" -#include "../../../jinclude.h" -#include "../../../jpeglib.h" -#include "../../../jsimd.h" -#include "../../../jdct.h" -#include "../../../jsimddct.h" -#include "../../jsimd.h" - -#include - -#define CONST_BITS 13 -#define PASS1_BITS 2 - -#define DESCALE_P1 (CONST_BITS - PASS1_BITS) -#define DESCALE_P2 (CONST_BITS + PASS1_BITS + 3) - -/* The computation of the inverse DCT requires the use of constants known at - * compile-time. Scaled integer constants are used to avoid floating-point - * arithmetic: - * 0.298631336 = 2446 * 2^-13 - * 0.390180644 = 3196 * 2^-13 - * 0.541196100 = 4433 * 2^-13 - * 0.765366865 = 6270 * 2^-13 - * 0.899976223 = 7373 * 2^-13 - * 1.175875602 = 9633 * 2^-13 - * 1.501321110 = 12299 * 2^-13 - * 1.847759065 = 15137 * 2^-13 - * 1.961570560 = 16069 * 2^-13 - * 2.053119869 = 16819 * 2^-13 - * 2.562915447 = 20995 * 2^-13 - * 3.072711026 = 25172 * 2^-13 - */ - -#define F_0_298 2446 -#define F_0_390 3196 -#define F_0_541 4433 -#define F_0_765 6270 -#define F_0_899 7373 -#define F_1_175 9633 -#define F_1_501 12299 -#define F_1_847 15137 -#define F_1_961 16069 -#define F_2_053 16819 -#define F_2_562 20995 -#define F_3_072 25172 - -#define F_1_175_MINUS_1_961 (F_1_175 - F_1_961) -#define F_1_175_MINUS_0_390 (F_1_175 - F_0_390) -#define F_0_541_MINUS_1_847 (F_0_541 - F_1_847) -#define F_3_072_MINUS_2_562 (F_3_072 - F_2_562) -#define F_0_298_MINUS_0_899 (F_0_298 - F_0_899) -#define F_1_501_MINUS_0_899 (F_1_501 - F_0_899) -#define F_2_053_MINUS_2_562 (F_2_053 - F_2_562) -#define F_0_541_PLUS_0_765 (F_0_541 + F_0_765) - -ALIGN(16) static const int16_t jsimd_idct_islow_neon_consts[] = { - F_0_899, F_0_541, - F_2_562, F_0_298_MINUS_0_899, - F_1_501_MINUS_0_899, F_2_053_MINUS_2_562, - F_0_541_PLUS_0_765, F_1_175, - F_1_175_MINUS_0_390, F_0_541_MINUS_1_847, - F_3_072_MINUS_2_562, F_1_175_MINUS_1_961, - 0, 0, 0, 0 - }; - -/* Forward declaration of regular and sparse IDCT helper functions. */ - -static inline void jsimd_idct_islow_pass1_regular(int16x4_t row0, - int16x4_t row1, - int16x4_t row2, - int16x4_t row3, - int16x4_t row4, - int16x4_t row5, - int16x4_t row6, - int16x4_t row7, - int16x4_t quant_row0, - int16x4_t quant_row1, - int16x4_t quant_row2, - int16x4_t quant_row3, - int16x4_t quant_row4, - int16x4_t quant_row5, - int16x4_t quant_row6, - int16x4_t quant_row7, - int16_t *workspace_1, - int16_t *workspace_2); - -static inline void jsimd_idct_islow_pass1_sparse(int16x4_t row0, - int16x4_t row1, - int16x4_t row2, - int16x4_t row3, - int16x4_t quant_row0, - int16x4_t quant_row1, - int16x4_t quant_row2, - int16x4_t quant_row3, - int16_t *workspace_1, - int16_t *workspace_2); - -static inline void jsimd_idct_islow_pass2_regular(int16_t *workspace, - JSAMPARRAY output_buf, - JDIMENSION output_col, - unsigned buf_offset); - -static inline void jsimd_idct_islow_pass2_sparse(int16_t *workspace, - JSAMPARRAY output_buf, - JDIMENSION output_col, - unsigned buf_offset); - - -/* Performs dequantization and inverse DCT on one block of coefficients. For - * reference, the C implementation 'jpeg_idct_slow' can be found jidctint.c. - * - * Optimization techniques used for data access: - * - * In each pass, the inverse DCT is computed on the left and right 4x8 halves - * of the DCT block. This avoids spilling due to register pressure and the - * increased granularity allows an optimized calculation depending on the - * values of the DCT coefficients. Between passes, intermediate data is stored - * in 4x8 workspace buffers. - * - * Transposing the 8x8 DCT block after each pass can be achieved by transposing - * each of the four 4x4 quadrants, and swapping quadrants 1 and 2 (in the - * diagram below.) Swapping quadrants is cheap as the second pass can just load - * from the other workspace buffer. - * - * +-------+-------+ +-------+-------+ - * | | | | | | - * | 0 | 1 | | 0 | 2 | - * | | | transpose | | | - * +-------+-------+ ------> +-------+-------+ - * | | | | | | - * | 2 | 3 | | 1 | 3 | - * | | | | | | - * +-------+-------+ +-------+-------+ - * - * Optimization techniques used to accelerate the inverse DCT calculation: - * - * In a DCT coefficient block, the coefficients are increasingly likely to be 0 - * moving diagonally from top left to bottom right. If whole rows of - * coefficients are 0, the inverse DCT calculation can be simplified. In this - * NEON implementation, on the first pass of the inverse DCT, we test for three - * special cases before defaulting to a full 'regular' inverse DCT: - * - * i) AC and DC coefficients are all zero. (Only tested for the right 4x8 - * half of the DCT coefficient block.) In this case the inverse DCT result - * is all zero. We do no work here, signalling that the 'sparse' case is - * required in the second pass. - * ii) AC coefficients (all but the top row) are zero. In this case, the value - * of the inverse DCT of the AC coefficients is just the DC coefficients. - * iii) Coefficients of rows 4, 5, 6 and 7 are all zero. In this case we opt to - * execute a 'sparse' simplified inverse DCT. - * - * In the second pass, only a single special case is tested: whether the the AC - * and DC coefficients were all zero in the right 4x8 block in the first pass - * (case 'i'). If this is the case, a 'sparse' variant of the second pass - * inverse DCT is executed for both the left and right halves of the DCT block. - * (The transposition after the first pass would have made the bottom half of - * the block all zero.) - */ - -void jsimd_idct_islow_neon(void *dct_table, - JCOEFPTR coef_block, - JSAMPARRAY output_buf, - JDIMENSION output_col) -{ - ISLOW_MULT_TYPE *quantptr = dct_table; - - int16_t workspace_l[8 * DCTSIZE / 2]; - int16_t workspace_r[8 * DCTSIZE / 2]; - - /* Compute IDCT first pass on left 4x8 coefficient block. */ - /* Load DCT coefficients in left 4x8 block. */ - int16x4_t row0 = vld1_s16(coef_block + 0 * DCTSIZE); - int16x4_t row1 = vld1_s16(coef_block + 1 * DCTSIZE); - int16x4_t row2 = vld1_s16(coef_block + 2 * DCTSIZE); - int16x4_t row3 = vld1_s16(coef_block + 3 * DCTSIZE); - int16x4_t row4 = vld1_s16(coef_block + 4 * DCTSIZE); - int16x4_t row5 = vld1_s16(coef_block + 5 * DCTSIZE); - int16x4_t row6 = vld1_s16(coef_block + 6 * DCTSIZE); - int16x4_t row7 = vld1_s16(coef_block + 7 * DCTSIZE); - - /* Load quantization table for left 4x8 block. */ - int16x4_t quant_row0 = vld1_s16(quantptr + 0 * DCTSIZE); - int16x4_t quant_row1 = vld1_s16(quantptr + 1 * DCTSIZE); - int16x4_t quant_row2 = vld1_s16(quantptr + 2 * DCTSIZE); - int16x4_t quant_row3 = vld1_s16(quantptr + 3 * DCTSIZE); - int16x4_t quant_row4 = vld1_s16(quantptr + 4 * DCTSIZE); - int16x4_t quant_row5 = vld1_s16(quantptr + 5 * DCTSIZE); - int16x4_t quant_row6 = vld1_s16(quantptr + 6 * DCTSIZE); - int16x4_t quant_row7 = vld1_s16(quantptr + 7 * DCTSIZE); - - /* Construct bitmap to test if DCT coefficients in left 4x8 block are 0. */ - int16x4_t bitmap = vorr_s16(row7, row6); - bitmap = vorr_s16(bitmap, row5); - bitmap = vorr_s16(bitmap, row4); - int64_t bitmap_rows_4567 = vget_lane_s64(vreinterpret_s64_s16(bitmap), 0); - - if (bitmap_rows_4567 == 0) { - bitmap = vorr_s16(bitmap, row3); - bitmap = vorr_s16(bitmap, row2); - bitmap = vorr_s16(bitmap, row1); - int64_t left_ac_bitmap = vget_lane_s64(vreinterpret_s64_s16(bitmap), 0); - - if (left_ac_bitmap == 0) { - int16x4_t dcval = vshl_n_s16(vmul_s16(row0, quant_row0), PASS1_BITS); - int16x4x4_t quadrant = { dcval, dcval, dcval, dcval }; - /* Store 4x4 blocks to workspace, transposing in the process. */ - vst4_s16(workspace_l, quadrant); - vst4_s16(workspace_r, quadrant); - } else { - jsimd_idct_islow_pass1_sparse(row0, row1, row2, row3, quant_row0, - quant_row1, quant_row2, quant_row3, - workspace_l, workspace_r); - } - } else { - jsimd_idct_islow_pass1_regular(row0, row1, row2, row3, row4, row5, - row6, row7, quant_row0, quant_row1, - quant_row2, quant_row3, quant_row4, - quant_row5, quant_row6, quant_row7, - workspace_l, workspace_r); - } - - /* Compute IDCT first pass on right 4x8 coefficient block.*/ - /* Load DCT coefficients for right 4x8 block. */ - row0 = vld1_s16(coef_block + 0 * DCTSIZE + 4); - row1 = vld1_s16(coef_block + 1 * DCTSIZE + 4); - row2 = vld1_s16(coef_block + 2 * DCTSIZE + 4); - row3 = vld1_s16(coef_block + 3 * DCTSIZE + 4); - row4 = vld1_s16(coef_block + 4 * DCTSIZE + 4); - row5 = vld1_s16(coef_block + 5 * DCTSIZE + 4); - row6 = vld1_s16(coef_block + 6 * DCTSIZE + 4); - row7 = vld1_s16(coef_block + 7 * DCTSIZE + 4); - - /* Load quantization table for right 4x8 block. */ - quant_row0 = vld1_s16(quantptr + 0 * DCTSIZE + 4); - quant_row1 = vld1_s16(quantptr + 1 * DCTSIZE + 4); - quant_row2 = vld1_s16(quantptr + 2 * DCTSIZE + 4); - quant_row3 = vld1_s16(quantptr + 3 * DCTSIZE + 4); - quant_row4 = vld1_s16(quantptr + 4 * DCTSIZE + 4); - quant_row5 = vld1_s16(quantptr + 5 * DCTSIZE + 4); - quant_row6 = vld1_s16(quantptr + 6 * DCTSIZE + 4); - quant_row7 = vld1_s16(quantptr + 7 * DCTSIZE + 4); - - /* Construct bitmap to test if DCT coefficients in right 4x8 block are 0. */ - bitmap = vorr_s16(row7, row6); - bitmap = vorr_s16(bitmap, row5); - bitmap = vorr_s16(bitmap, row4); - bitmap_rows_4567 = vget_lane_s64(vreinterpret_s64_s16(bitmap), 0); - bitmap = vorr_s16(bitmap, row3); - bitmap = vorr_s16(bitmap, row2); - bitmap = vorr_s16(bitmap, row1); - int64_t right_ac_bitmap = vget_lane_s64(vreinterpret_s64_s16(bitmap), 0); - - /* Initialise to non-zero value: defaults to regular second pass. */ - int64_t right_ac_dc_bitmap = 1; - - if (right_ac_bitmap == 0) { - bitmap = vorr_s16(bitmap, row0); - right_ac_dc_bitmap = vget_lane_s64(vreinterpret_s64_s16(bitmap), 0); - - if (right_ac_dc_bitmap != 0) { - int16x4_t dcval = vshl_n_s16(vmul_s16(row0, quant_row0), PASS1_BITS); - int16x4x4_t quadrant = { dcval, dcval, dcval, dcval }; - /* Store 4x4 blocks to workspace, transposing in the process. */ - vst4_s16(workspace_l + 4 * DCTSIZE / 2, quadrant); - vst4_s16(workspace_r + 4 * DCTSIZE / 2, quadrant); - } - } else { - if (bitmap_rows_4567 == 0) { - jsimd_idct_islow_pass1_sparse(row0, row1, row2, row3, quant_row0, - quant_row1, quant_row2, quant_row3, - workspace_l + 4 * DCTSIZE / 2, - workspace_r + 4 * DCTSIZE / 2); - } else { - jsimd_idct_islow_pass1_regular(row0, row1, row2, row3, row4, row5, - row6, row7, quant_row0, quant_row1, - quant_row2, quant_row3, quant_row4, - quant_row5, quant_row6, quant_row7, - workspace_l + 4 * DCTSIZE / 2, - workspace_r + 4 * DCTSIZE / 2); - } - } - - /* Second pass: compute IDCT on rows in workspace. */ - /* If all coefficients in right 4x8 block are 0, use 'sparse' second pass. */ - if (right_ac_dc_bitmap == 0) { - jsimd_idct_islow_pass2_sparse(workspace_l, output_buf, output_col, 0); - jsimd_idct_islow_pass2_sparse(workspace_r, output_buf, output_col, 4); - } else { - jsimd_idct_islow_pass2_regular(workspace_l, output_buf, output_col, 0); - jsimd_idct_islow_pass2_regular(workspace_r, output_buf, output_col, 4); - } -} - - -/* Performs dequantization and the first pass of the slow-but-accurate inverse - * DCT on a 4x8 block of coefficients. (To process the full 8x8 DCT block this - * function - or some other optimized variant - needs to be called on both the - * right and left 4x8 blocks.) - * - * This 'regular' version assumes that no optimization can be made to the IDCT - * calculation since no useful set of AC coefficients are all 0. - * - * The original C implementation of the slow IDCT 'jpeg_idct_slow' can be found - * in jidctint.c. Algorithmic changes made here are documented inline. - */ - -static inline void jsimd_idct_islow_pass1_regular(int16x4_t row0, - int16x4_t row1, - int16x4_t row2, - int16x4_t row3, - int16x4_t row4, - int16x4_t row5, - int16x4_t row6, - int16x4_t row7, - int16x4_t quant_row0, - int16x4_t quant_row1, - int16x4_t quant_row2, - int16x4_t quant_row3, - int16x4_t quant_row4, - int16x4_t quant_row5, - int16x4_t quant_row6, - int16x4_t quant_row7, - int16_t *workspace_1, - int16_t *workspace_2) -{ - /* Load constants for IDCT calculation. */ -#if defined(__aarch64__) || defined(__ARM64__) || defined(_M_ARM64) - const int16x4x3_t consts = vld1_s16_x3(jsimd_idct_islow_neon_consts); -#else - const int16x4x3_t consts = { vld1_s16(jsimd_idct_islow_neon_consts), - vld1_s16(jsimd_idct_islow_neon_consts + 4), - vld1_s16(jsimd_idct_islow_neon_consts + 8) }; -#endif - - /* Even part. */ - int16x4_t z2_s16 = vmul_s16(row2, quant_row2); - int16x4_t z3_s16 = vmul_s16(row6, quant_row6); - - int32x4_t tmp2 = vmull_lane_s16(z2_s16, consts.val[0], 1); - int32x4_t tmp3 = vmull_lane_s16(z2_s16, consts.val[1], 2); - tmp2 = vmlal_lane_s16(tmp2, z3_s16, consts.val[2], 1); - tmp3 = vmlal_lane_s16(tmp3, z3_s16, consts.val[0], 1); - - z2_s16 = vmul_s16(row0, quant_row0); - z3_s16 = vmul_s16(row4, quant_row4); - - int32x4_t tmp0 = vshll_n_s16(vadd_s16(z2_s16, z3_s16), CONST_BITS); - int32x4_t tmp1 = vshll_n_s16(vsub_s16(z2_s16, z3_s16), CONST_BITS); - - int32x4_t tmp10 = vaddq_s32(tmp0, tmp3); - int32x4_t tmp13 = vsubq_s32(tmp0, tmp3); - int32x4_t tmp11 = vaddq_s32(tmp1, tmp2); - int32x4_t tmp12 = vsubq_s32(tmp1, tmp2); - - /* Odd part. */ - int16x4_t tmp0_s16 = vmul_s16(row7, quant_row7); - int16x4_t tmp1_s16 = vmul_s16(row5, quant_row5); - int16x4_t tmp2_s16 = vmul_s16(row3, quant_row3); - int16x4_t tmp3_s16 = vmul_s16(row1, quant_row1); - - z3_s16 = vadd_s16(tmp0_s16, tmp2_s16); - int16x4_t z4_s16 = vadd_s16(tmp1_s16, tmp3_s16); - - /* Implementation as per 'jpeg_idct_islow' in jidctint.c: - * z5 = (z3 + z4) * 1.175875602; - * z3 = z3 * -1.961570560; z4 = z4 * -0.390180644; - * z3 += z5; z4 += z5; - * - * This implementation: - * z3 = z3 * (1.175875602 - 1.961570560) + z4 * 1.175875602; - * z4 = z3 * 1.175875602 + z4 * (1.175875602 - 0.390180644); - */ - - int32x4_t z3 = vmull_lane_s16(z3_s16, consts.val[2], 3); - int32x4_t z4 = vmull_lane_s16(z3_s16, consts.val[1], 3); - z3 = vmlal_lane_s16(z3, z4_s16, consts.val[1], 3); - z4 = vmlal_lane_s16(z4, z4_s16, consts.val[2], 0); - - /* Implementation as per 'jpeg_idct_islow' in jidctint.c: - * z1 = tmp0 + tmp3; z2 = tmp1 + tmp2; - * tmp0 = tmp0 * 0.298631336; tmp1 = tmp1 * 2.053119869; - * tmp2 = tmp2 * 3.072711026; tmp3 = tmp3 * 1.501321110; - * z1 = z1 * -0.899976223; z2 = z2 * -2.562915447; - * tmp0 += z1 + z3; tmp1 += z2 + z4; - * tmp2 += z2 + z3; tmp3 += z1 + z4; - * - * This implementation: - * tmp0 = tmp0 * (0.298631336 - 0.899976223) + tmp3 * -0.899976223; - * tmp1 = tmp1 * (2.053119869 - 2.562915447) + tmp2 * -2.562915447; - * tmp2 = tmp1 * -2.562915447 + tmp2 * (3.072711026 - 2.562915447); - * tmp3 = tmp0 * -0.899976223 + tmp3 * (1.501321110 - 0.899976223); - * tmp0 += z3; tmp1 += z4; - * tmp2 += z3; tmp3 += z4; - */ - - tmp0 = vmull_lane_s16(tmp0_s16, consts.val[0], 3); - tmp1 = vmull_lane_s16(tmp1_s16, consts.val[1], 1); - tmp2 = vmull_lane_s16(tmp2_s16, consts.val[2], 2); - tmp3 = vmull_lane_s16(tmp3_s16, consts.val[1], 0); - - tmp0 = vmlsl_lane_s16(tmp0, tmp3_s16, consts.val[0], 0); - tmp1 = vmlsl_lane_s16(tmp1, tmp2_s16, consts.val[0], 2); - tmp2 = vmlsl_lane_s16(tmp2, tmp1_s16, consts.val[0], 2); - tmp3 = vmlsl_lane_s16(tmp3, tmp0_s16, consts.val[0], 0); - - tmp0 = vaddq_s32(tmp0, z3); - tmp1 = vaddq_s32(tmp1, z4); - tmp2 = vaddq_s32(tmp2, z3); - tmp3 = vaddq_s32(tmp3, z4); - - /* Final output stage: descale and narrow to 16-bit. */ - int16x4x4_t rows_0123 = { vrshrn_n_s32(vaddq_s32(tmp10, tmp3), DESCALE_P1), - vrshrn_n_s32(vaddq_s32(tmp11, tmp2), DESCALE_P1), - vrshrn_n_s32(vaddq_s32(tmp12, tmp1), DESCALE_P1), - vrshrn_n_s32(vaddq_s32(tmp13, tmp0), DESCALE_P1) - }; - int16x4x4_t rows_4567 = { vrshrn_n_s32(vsubq_s32(tmp13, tmp0), DESCALE_P1), - vrshrn_n_s32(vsubq_s32(tmp12, tmp1), DESCALE_P1), - vrshrn_n_s32(vsubq_s32(tmp11, tmp2), DESCALE_P1), - vrshrn_n_s32(vsubq_s32(tmp10, tmp3), DESCALE_P1) - }; - - /* Store 4x4 blocks to the intermediate workspace ready for second pass. */ - /* (VST4 transposes the blocks - we need to operate on rows in next pass.) */ - vst4_s16(workspace_1, rows_0123); - vst4_s16(workspace_2, rows_4567); -} - - -/* Performs dequantization and the first pass of the slow-but-accurate inverse - * DCT on a 4x8 block of coefficients. - * - * This 'sparse' version assumes that the AC coefficients in rows 4, 5, 6 and 7 - * are all 0. This simplifies the IDCT calculation, accelerating overall - * performance. - */ - -static inline void jsimd_idct_islow_pass1_sparse(int16x4_t row0, - int16x4_t row1, - int16x4_t row2, - int16x4_t row3, - int16x4_t quant_row0, - int16x4_t quant_row1, - int16x4_t quant_row2, - int16x4_t quant_row3, - int16_t *workspace_1, - int16_t *workspace_2) -{ - /* Load constants for IDCT computation. */ -#if defined(__aarch64__) || defined(__ARM64__) || defined(_M_ARM64) - const int16x4x3_t consts = vld1_s16_x3(jsimd_idct_islow_neon_consts); -#else - const int16x4x3_t consts = { vld1_s16(jsimd_idct_islow_neon_consts), - vld1_s16(jsimd_idct_islow_neon_consts + 4), - vld1_s16(jsimd_idct_islow_neon_consts + 8) }; -#endif - - /* Even part. */ - int16x4_t z2_s16 = vmul_s16(row2, quant_row2); - /* z3 is all 0. */ - - int32x4_t tmp2 = vmull_lane_s16(z2_s16, consts.val[0], 1); - int32x4_t tmp3 = vmull_lane_s16(z2_s16, consts.val[1], 2); - - z2_s16 = vmul_s16(row0, quant_row0); - int32x4_t tmp0 = vshll_n_s16(z2_s16, CONST_BITS); - int32x4_t tmp1 = vshll_n_s16(z2_s16, CONST_BITS); - - int32x4_t tmp10 = vaddq_s32(tmp0, tmp3); - int32x4_t tmp13 = vsubq_s32(tmp0, tmp3); - int32x4_t tmp11 = vaddq_s32(tmp1, tmp2); - int32x4_t tmp12 = vsubq_s32(tmp1, tmp2); - - /* Odd part. */ - /* tmp0 and tmp1 are both all 0. */ - int16x4_t tmp2_s16 = vmul_s16(row3, quant_row3); - int16x4_t tmp3_s16 = vmul_s16(row1, quant_row1); - - int16x4_t z3_s16 = tmp2_s16; - int16x4_t z4_s16 = tmp3_s16; - - int32x4_t z3 = vmull_lane_s16(z3_s16, consts.val[2], 3); - int32x4_t z4 = vmull_lane_s16(z3_s16, consts.val[1], 3); - z3 = vmlal_lane_s16(z3, z4_s16, consts.val[1], 3); - z4 = vmlal_lane_s16(z4, z4_s16, consts.val[2], 0); - - tmp0 = vmlsl_lane_s16(z3, tmp3_s16, consts.val[0], 0); - tmp1 = vmlsl_lane_s16(z4, tmp2_s16, consts.val[0], 2); - tmp2 = vmlal_lane_s16(z3, tmp2_s16, consts.val[2], 2); - tmp3 = vmlal_lane_s16(z4, tmp3_s16, consts.val[1], 0); - - /* Final output stage: descale and narrow to 16-bit. */ - int16x4x4_t rows_0123 = { vrshrn_n_s32(vaddq_s32(tmp10, tmp3), DESCALE_P1), - vrshrn_n_s32(vaddq_s32(tmp11, tmp2), DESCALE_P1), - vrshrn_n_s32(vaddq_s32(tmp12, tmp1), DESCALE_P1), - vrshrn_n_s32(vaddq_s32(tmp13, tmp0), DESCALE_P1) - }; - int16x4x4_t rows_4567 = { vrshrn_n_s32(vsubq_s32(tmp13, tmp0), DESCALE_P1), - vrshrn_n_s32(vsubq_s32(tmp12, tmp1), DESCALE_P1), - vrshrn_n_s32(vsubq_s32(tmp11, tmp2), DESCALE_P1), - vrshrn_n_s32(vsubq_s32(tmp10, tmp3), DESCALE_P1) - }; - - /* Store 4x4 blocks to the intermediate workspace ready for second pass. */ - /* (VST4 transposes the blocks - we need to operate on rows in next pass.) */ - vst4_s16(workspace_1, rows_0123); - vst4_s16(workspace_2, rows_4567); -} - - -/* Performs the second pass of the slow-but-accurate inverse DCT on a 4x8 block - * of coefficients. (To process the full 8x8 DCT block this function - or some - * other optimized variant - needs to be called on both the right and left 4x8 - * blocks.) - * - * This 'regular' version assumes that no optimization can be made to the IDCT - * calculation since no useful set of coefficient values are all 0 after the - * first pass. - * - * Again, the original C implementation of the slow IDCT 'jpeg_idct_slow' can - * be found in jidctint.c. Algorithmic changes made here are documented inline. - */ - -static inline void jsimd_idct_islow_pass2_regular(int16_t *workspace, - JSAMPARRAY output_buf, - JDIMENSION output_col, - unsigned buf_offset) -{ - /* Load constants for IDCT computation. */ -#if defined(__aarch64__) || defined(__ARM64__) || defined(_M_ARM64) - const int16x4x3_t consts = vld1_s16_x3(jsimd_idct_islow_neon_consts); -#else - const int16x4x3_t consts = { vld1_s16(jsimd_idct_islow_neon_consts), - vld1_s16(jsimd_idct_islow_neon_consts + 4), - vld1_s16(jsimd_idct_islow_neon_consts + 8) }; -#endif - - /* Even part. */ - int16x4_t z2_s16 = vld1_s16(workspace + 2 * DCTSIZE / 2); - int16x4_t z3_s16 = vld1_s16(workspace + 6 * DCTSIZE / 2); - - int32x4_t tmp2 = vmull_lane_s16(z2_s16, consts.val[0], 1); - int32x4_t tmp3 = vmull_lane_s16(z2_s16, consts.val[1], 2); - tmp2 = vmlal_lane_s16(tmp2, z3_s16, consts.val[2], 1); - tmp3 = vmlal_lane_s16(tmp3, z3_s16, consts.val[0], 1); - - z2_s16 = vld1_s16(workspace + 0 * DCTSIZE / 2); - z3_s16 = vld1_s16(workspace + 4 * DCTSIZE / 2); - - int32x4_t tmp0 = vshll_n_s16(vadd_s16(z2_s16, z3_s16), CONST_BITS); - int32x4_t tmp1 = vshll_n_s16(vsub_s16(z2_s16, z3_s16), CONST_BITS); - - int32x4_t tmp10 = vaddq_s32(tmp0, tmp3); - int32x4_t tmp13 = vsubq_s32(tmp0, tmp3); - int32x4_t tmp11 = vaddq_s32(tmp1, tmp2); - int32x4_t tmp12 = vsubq_s32(tmp1, tmp2); - - /* Odd part. */ - int16x4_t tmp0_s16 = vld1_s16(workspace + 7 * DCTSIZE / 2); - int16x4_t tmp1_s16 = vld1_s16(workspace + 5 * DCTSIZE / 2); - int16x4_t tmp2_s16 = vld1_s16(workspace + 3 * DCTSIZE / 2); - int16x4_t tmp3_s16 = vld1_s16(workspace + 1 * DCTSIZE / 2); - - z3_s16 = vadd_s16(tmp0_s16, tmp2_s16); - int16x4_t z4_s16 = vadd_s16(tmp1_s16, tmp3_s16); - - /* Implementation as per 'jpeg_idct_islow' in jidctint.c: - * z5 = (z3 + z4) * 1.175875602; - * z3 = z3 * -1.961570560; z4 = z4 * -0.390180644; - * z3 += z5; z4 += z5; - * - * This implementation: - * z3 = z3 * (1.175875602 - 1.961570560) + z4 * 1.175875602; - * z4 = z3 * 1.175875602 + z4 * (1.175875602 - 0.390180644); - */ - - int32x4_t z3 = vmull_lane_s16(z3_s16, consts.val[2], 3); - int32x4_t z4 = vmull_lane_s16(z3_s16, consts.val[1], 3); - z3 = vmlal_lane_s16(z3, z4_s16, consts.val[1], 3); - z4 = vmlal_lane_s16(z4, z4_s16, consts.val[2], 0); - - /* Implementation as per 'jpeg_idct_islow' in jidctint.c: - * z1 = tmp0 + tmp3; z2 = tmp1 + tmp2; - * tmp0 = tmp0 * 0.298631336; tmp1 = tmp1 * 2.053119869; - * tmp2 = tmp2 * 3.072711026; tmp3 = tmp3 * 1.501321110; - * z1 = z1 * -0.899976223; z2 = z2 * -2.562915447; - * tmp0 += z1 + z3; tmp1 += z2 + z4; - * tmp2 += z2 + z3; tmp3 += z1 + z4; - * - * This implementation: - * tmp0 = tmp0 * (0.298631336 - 0.899976223) + tmp3 * -0.899976223; - * tmp1 = tmp1 * (2.053119869 - 2.562915447) + tmp2 * -2.562915447; - * tmp2 = tmp1 * -2.562915447 + tmp2 * (3.072711026 - 2.562915447); - * tmp3 = tmp0 * -0.899976223 + tmp3 * (1.501321110 - 0.899976223); - * tmp0 += z3; tmp1 += z4; - * tmp2 += z3; tmp3 += z4; - */ - - tmp0 = vmull_lane_s16(tmp0_s16, consts.val[0], 3); - tmp1 = vmull_lane_s16(tmp1_s16, consts.val[1], 1); - tmp2 = vmull_lane_s16(tmp2_s16, consts.val[2], 2); - tmp3 = vmull_lane_s16(tmp3_s16, consts.val[1], 0); - - tmp0 = vmlsl_lane_s16(tmp0, tmp3_s16, consts.val[0], 0); - tmp1 = vmlsl_lane_s16(tmp1, tmp2_s16, consts.val[0], 2); - tmp2 = vmlsl_lane_s16(tmp2, tmp1_s16, consts.val[0], 2); - tmp3 = vmlsl_lane_s16(tmp3, tmp0_s16, consts.val[0], 0); - - tmp0 = vaddq_s32(tmp0, z3); - tmp1 = vaddq_s32(tmp1, z4); - tmp2 = vaddq_s32(tmp2, z3); - tmp3 = vaddq_s32(tmp3, z4); - - /* Final output stage: descale and narrow to 16-bit. */ - int16x8_t cols_02_s16 = vcombine_s16(vaddhn_s32(tmp10, tmp3), - vaddhn_s32(tmp12, tmp1)); - int16x8_t cols_13_s16 = vcombine_s16(vaddhn_s32(tmp11, tmp2), - vaddhn_s32(tmp13, tmp0)); - int16x8_t cols_46_s16 = vcombine_s16(vsubhn_s32(tmp13, tmp0), - vsubhn_s32(tmp11, tmp2)); - int16x8_t cols_57_s16 = vcombine_s16(vsubhn_s32(tmp12, tmp1), - vsubhn_s32(tmp10, tmp3)); - /* Descale and narrow to 8-bit. */ - int8x8_t cols_02_s8 = vqrshrn_n_s16(cols_02_s16, DESCALE_P2 - 16); - int8x8_t cols_13_s8 = vqrshrn_n_s16(cols_13_s16, DESCALE_P2 - 16); - int8x8_t cols_46_s8 = vqrshrn_n_s16(cols_46_s16, DESCALE_P2 - 16); - int8x8_t cols_57_s8 = vqrshrn_n_s16(cols_57_s16, DESCALE_P2 - 16); - /* Clamp to range [0-255]. */ - uint8x8_t cols_02_u8 = vadd_u8(vreinterpret_u8_s8(cols_02_s8), - vdup_n_u8(CENTERJSAMPLE)); - uint8x8_t cols_13_u8 = vadd_u8(vreinterpret_u8_s8(cols_13_s8), - vdup_n_u8(CENTERJSAMPLE)); - uint8x8_t cols_46_u8 = vadd_u8(vreinterpret_u8_s8(cols_46_s8), - vdup_n_u8(CENTERJSAMPLE)); - uint8x8_t cols_57_u8 = vadd_u8(vreinterpret_u8_s8(cols_57_s8), - vdup_n_u8(CENTERJSAMPLE)); - - /* Transpose 4x8 block and store to memory. */ - /* Zipping adjacent columns together allows us to store 16-bit elements. */ - uint8x8x2_t cols_01_23 = vzip_u8(cols_02_u8, cols_13_u8); - uint8x8x2_t cols_45_67 = vzip_u8(cols_46_u8, cols_57_u8); - uint16x4x4_t cols_01_23_45_67 = { vreinterpret_u16_u8(cols_01_23.val[0]), - vreinterpret_u16_u8(cols_01_23.val[1]), - vreinterpret_u16_u8(cols_45_67.val[0]), - vreinterpret_u16_u8(cols_45_67.val[1]) - }; - - JSAMPROW outptr0 = output_buf[buf_offset + 0] + output_col; - JSAMPROW outptr1 = output_buf[buf_offset + 1] + output_col; - JSAMPROW outptr2 = output_buf[buf_offset + 2] + output_col; - JSAMPROW outptr3 = output_buf[buf_offset + 3] + output_col; - /* VST4 of 16-bit elements completes the transpose. */ - vst4_lane_u16((uint16_t *)outptr0, cols_01_23_45_67, 0); - vst4_lane_u16((uint16_t *)outptr1, cols_01_23_45_67, 1); - vst4_lane_u16((uint16_t *)outptr2, cols_01_23_45_67, 2); - vst4_lane_u16((uint16_t *)outptr3, cols_01_23_45_67, 3); -} - - -/* Performs the second pass of the slow-but-accurate inverse DCT on a 4x8 block - * of coefficients. - * - * This 'sparse' version assumes that the coefficient values (after the first - * pass) in rows 4, 5, 6 and 7 are all 0. This simplifies the IDCT calculation, - * accelerating overall performance. - */ - -static inline void jsimd_idct_islow_pass2_sparse(int16_t *workspace, - JSAMPARRAY output_buf, - JDIMENSION output_col, - unsigned buf_offset) -{ - /* Load constants for IDCT computation. */ -#if defined(__aarch64__) || defined(__ARM64__) || defined(_M_ARM64) - const int16x4x3_t consts = vld1_s16_x3(jsimd_idct_islow_neon_consts); -#else - const int16x4x3_t consts = { vld1_s16(jsimd_idct_islow_neon_consts), - vld1_s16(jsimd_idct_islow_neon_consts + 4), - vld1_s16(jsimd_idct_islow_neon_consts + 8) }; -#endif - - /* Even part. */ - int16x4_t z2_s16 = vld1_s16(workspace + 2 * DCTSIZE / 2); - /* z3 is all 0. */ - - int32x4_t tmp2 = vmull_lane_s16(z2_s16, consts.val[0], 1); - int32x4_t tmp3 = vmull_lane_s16(z2_s16, consts.val[1], 2); - - z2_s16 = vld1_s16(workspace + 0 * DCTSIZE / 2); - int32x4_t tmp0 = vshll_n_s16(z2_s16, CONST_BITS); - int32x4_t tmp1 = vshll_n_s16(z2_s16, CONST_BITS); - - int32x4_t tmp10 = vaddq_s32(tmp0, tmp3); - int32x4_t tmp13 = vsubq_s32(tmp0, tmp3); - int32x4_t tmp11 = vaddq_s32(tmp1, tmp2); - int32x4_t tmp12 = vsubq_s32(tmp1, tmp2); - - /* Odd part. */ - /* tmp0 and tmp1 are both all 0. */ - int16x4_t tmp2_s16 = vld1_s16(workspace + 3 * DCTSIZE / 2); - int16x4_t tmp3_s16 = vld1_s16(workspace + 1 * DCTSIZE / 2); - - int16x4_t z3_s16 = tmp2_s16; - int16x4_t z4_s16 = tmp3_s16; - - int32x4_t z3 = vmull_lane_s16(z3_s16, consts.val[2], 3); - z3 = vmlal_lane_s16(z3, z4_s16, consts.val[1], 3); - int32x4_t z4 = vmull_lane_s16(z3_s16, consts.val[1], 3); - z4 = vmlal_lane_s16(z4, z4_s16, consts.val[2], 0); - - tmp0 = vmlsl_lane_s16(z3, tmp3_s16, consts.val[0], 0); - tmp1 = vmlsl_lane_s16(z4, tmp2_s16, consts.val[0], 2); - tmp2 = vmlal_lane_s16(z3, tmp2_s16, consts.val[2], 2); - tmp3 = vmlal_lane_s16(z4, tmp3_s16, consts.val[1], 0); - - /* Final output stage: descale and narrow to 16-bit. */ - int16x8_t cols_02_s16 = vcombine_s16(vaddhn_s32(tmp10, tmp3), - vaddhn_s32(tmp12, tmp1)); - int16x8_t cols_13_s16 = vcombine_s16(vaddhn_s32(tmp11, tmp2), - vaddhn_s32(tmp13, tmp0)); - int16x8_t cols_46_s16 = vcombine_s16(vsubhn_s32(tmp13, tmp0), - vsubhn_s32(tmp11, tmp2)); - int16x8_t cols_57_s16 = vcombine_s16(vsubhn_s32(tmp12, tmp1), - vsubhn_s32(tmp10, tmp3)); - /* Descale and narrow to 8-bit. */ - int8x8_t cols_02_s8 = vqrshrn_n_s16(cols_02_s16, DESCALE_P2 - 16); - int8x8_t cols_13_s8 = vqrshrn_n_s16(cols_13_s16, DESCALE_P2 - 16); - int8x8_t cols_46_s8 = vqrshrn_n_s16(cols_46_s16, DESCALE_P2 - 16); - int8x8_t cols_57_s8 = vqrshrn_n_s16(cols_57_s16, DESCALE_P2 - 16); - /* Clamp to range [0-255]. */ - uint8x8_t cols_02_u8 = vadd_u8(vreinterpret_u8_s8(cols_02_s8), - vdup_n_u8(CENTERJSAMPLE)); - uint8x8_t cols_13_u8 = vadd_u8(vreinterpret_u8_s8(cols_13_s8), - vdup_n_u8(CENTERJSAMPLE)); - uint8x8_t cols_46_u8 = vadd_u8(vreinterpret_u8_s8(cols_46_s8), - vdup_n_u8(CENTERJSAMPLE)); - uint8x8_t cols_57_u8 = vadd_u8(vreinterpret_u8_s8(cols_57_s8), - vdup_n_u8(CENTERJSAMPLE)); - - /* Transpose 4x8 block and store to memory. */ - /* Zipping adjacent columns together allow us to store 16-bit elements. */ - uint8x8x2_t cols_01_23 = vzip_u8(cols_02_u8, cols_13_u8); - uint8x8x2_t cols_45_67 = vzip_u8(cols_46_u8, cols_57_u8); - uint16x4x4_t cols_01_23_45_67 = { vreinterpret_u16_u8(cols_01_23.val[0]), - vreinterpret_u16_u8(cols_01_23.val[1]), - vreinterpret_u16_u8(cols_45_67.val[0]), - vreinterpret_u16_u8(cols_45_67.val[1]) - }; - - JSAMPROW outptr0 = output_buf[buf_offset + 0] + output_col; - JSAMPROW outptr1 = output_buf[buf_offset + 1] + output_col; - JSAMPROW outptr2 = output_buf[buf_offset + 2] + output_col; - JSAMPROW outptr3 = output_buf[buf_offset + 3] + output_col; - /* VST4 of 16-bit elements completes the transpose. */ - vst4_lane_u16((uint16_t *)outptr0, cols_01_23_45_67, 0); - vst4_lane_u16((uint16_t *)outptr1, cols_01_23_45_67, 1); - vst4_lane_u16((uint16_t *)outptr2, cols_01_23_45_67, 2); - vst4_lane_u16((uint16_t *)outptr3, cols_01_23_45_67, 3); -} diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/common/jidctred-neon.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/common/jidctred-neon.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/common/jidctred-neon.c 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/common/jidctred-neon.c 1970-01-01 01:00:00.000000000 +0100 @@ -1,475 +0,0 @@ -/* - * jidctred-neon.c - reduced-size IDCT (Arm NEON) - * - * Copyright 2019 The Chromium Authors. All Rights Reserved. - * - * This software is provided 'as-is', without any express or implied - * warranty. In no event will the authors be held liable for any damages - * arising from the use of this software. - * - * Permission is granted to anyone to use this software for any purpose, - * including commercial applications, and to alter it and redistribute it - * freely, subject to the following restrictions: - * - * 1. The origin of this software must not be misrepresented; you must not - * claim that you wrote the original software. If you use this software - * in a product, an acknowledgment in the product documentation would be - * appreciated but is not required. - * 2. Altered source versions must be plainly marked as such, and must not be - * misrepresented as being the original software. - * 3. This notice may not be removed or altered from any source distribution. - */ - -#define JPEG_INTERNALS -#include "../../../jconfigint.h" -#include "../../../jinclude.h" -#include "../../../jpeglib.h" -#include "../../../jsimd.h" -#include "../../../jdct.h" -#include "../../../jsimddct.h" -#include "../../jsimd.h" - -#include - -#define CONST_BITS 13 -#define PASS1_BITS 2 - -#define F_0_211 1730 -#define F_0_509 4176 -#define F_0_601 4926 -#define F_0_720 5906 -#define F_0_765 6270 -#define F_0_850 6967 -#define F_0_899 7373 -#define F_1_061 8697 -#define F_1_272 10426 -#define F_1_451 11893 -#define F_1_847 15137 -#define F_2_172 17799 -#define F_2_562 20995 -#define F_3_624 29692 - -/* - * 'jsimd_idct_2x2_neon' is an inverse-DCT function for getting reduced-size - * 2x2 pixels output from an 8x8 DCT block. It uses the same calculations and - * produces exactly the same output as IJG's original 'jpeg_idct_2x2' function - * from jpeg-6b, which can be found in jidctred.c. - * - * Scaled integer constants are used to avoid floating-point arithmetic: - * 0.720959822 = 5906 * 2^-13 - * 0.850430095 = 6967 * 2^-13 - * 1.272758580 = 10426 * 2^-13 - * 3.624509785 = 29692 * 2^-13 - * - * See jidctred.c for further details of the 2x2 reduced IDCT algorithm. Where - * possible, the variable names and comments here in 'jsimd_idct_2x2_neon' - * match up with those in 'jpeg_idct_2x2'. - * - * NOTE: jpeg-8 has an improved implementation of the 2x2 inverse-DCT which - * requires fewer arithmetic operations and hence should be faster. The - * primary purpose of this particular NEON optimized function is bit - * exact compatibility with jpeg-6b. - */ - -void jsimd_idct_2x2_neon(void *dct_table, - JCOEFPTR coef_block, - JSAMPARRAY restrict output_buf, - JDIMENSION output_col) -{ - ISLOW_MULT_TYPE *quantptr = dct_table; - - /* Load DCT coefficients. */ - int16x8_t row0 = vld1q_s16(coef_block + 0 * DCTSIZE); - int16x8_t row1 = vld1q_s16(coef_block + 1 * DCTSIZE); - int16x8_t row3 = vld1q_s16(coef_block + 3 * DCTSIZE); - int16x8_t row5 = vld1q_s16(coef_block + 5 * DCTSIZE); - int16x8_t row7 = vld1q_s16(coef_block + 7 * DCTSIZE); - - /* Load DCT quantization table. */ - int16x8_t quant_row0 = vld1q_s16(quantptr + 0 * DCTSIZE); - int16x8_t quant_row1 = vld1q_s16(quantptr + 1 * DCTSIZE); - int16x8_t quant_row3 = vld1q_s16(quantptr + 3 * DCTSIZE); - int16x8_t quant_row5 = vld1q_s16(quantptr + 5 * DCTSIZE); - int16x8_t quant_row7 = vld1q_s16(quantptr + 7 * DCTSIZE); - - /* Dequantize DCT coefficients. */ - row0 = vmulq_s16(row0, quant_row0); - row1 = vmulq_s16(row1, quant_row1); - row3 = vmulq_s16(row3, quant_row3); - row5 = vmulq_s16(row5, quant_row5); - row7 = vmulq_s16(row7, quant_row7); - - /* Pass 1: process input columns; put results in vectors row0 and row1. */ - /* Even part. */ - int32x4_t tmp10_l = vshll_n_s16(vget_low_s16(row0), CONST_BITS + 2); - int32x4_t tmp10_h = vshll_n_s16(vget_high_s16(row0), CONST_BITS + 2); - - /* Odd part. */ - int32x4_t tmp0_l = vmull_n_s16(vget_low_s16(row1), F_3_624); - tmp0_l = vmlal_n_s16(tmp0_l, vget_low_s16(row3), -F_1_272); - tmp0_l = vmlal_n_s16(tmp0_l, vget_low_s16(row5), F_0_850); - tmp0_l = vmlal_n_s16(tmp0_l, vget_low_s16(row7), -F_0_720); - int32x4_t tmp0_h = vmull_n_s16(vget_high_s16(row1), F_3_624); - tmp0_h = vmlal_n_s16(tmp0_h, vget_high_s16(row3), -F_1_272); - tmp0_h = vmlal_n_s16(tmp0_h, vget_high_s16(row5), F_0_850); - tmp0_h = vmlal_n_s16(tmp0_h, vget_high_s16(row7), -F_0_720); - - /* Final output stage: descale and narrow to 16-bit. */ - row0 = vcombine_s16(vrshrn_n_s32(vaddq_s32(tmp10_l, tmp0_l), CONST_BITS), - vrshrn_n_s32(vaddq_s32(tmp10_h, tmp0_h), CONST_BITS)); - row1 = vcombine_s16(vrshrn_n_s32(vsubq_s32(tmp10_l, tmp0_l), CONST_BITS), - vrshrn_n_s32(vsubq_s32(tmp10_h, tmp0_h), CONST_BITS)); - - /* Transpose two rows ready for second pass. */ - int16x8x2_t cols_0246_1357 = vtrnq_s16(row0, row1); - int16x8_t cols_0246 = cols_0246_1357.val[0]; - int16x8_t cols_1357 = cols_0246_1357.val[1]; - /* Duplicate columns such that each is accessible in its own vector. */ - int32x4x2_t cols_1155_3377 = vtrnq_s32(vreinterpretq_s32_s16(cols_1357), - vreinterpretq_s32_s16(cols_1357)); - int16x8_t cols_1155 = vreinterpretq_s16_s32(cols_1155_3377.val[0]); - int16x8_t cols_3377 = vreinterpretq_s16_s32(cols_1155_3377.val[1]); - - /* Pass 2: process 2 rows, store to output array. */ - /* Even part: only interested in col0; top half of tmp10 is "don't care". */ - int32x4_t tmp10 = vshll_n_s16(vget_low_s16(cols_0246), CONST_BITS + 2); - - /* Odd part. Only interested in bottom half of tmp0. */ - int32x4_t tmp0 = vmull_n_s16(vget_low_s16(cols_1155), F_3_624); - tmp0 = vmlal_n_s16(tmp0, vget_low_s16(cols_3377), -F_1_272); - tmp0 = vmlal_n_s16(tmp0, vget_high_s16(cols_1155), F_0_850); - tmp0 = vmlal_n_s16(tmp0, vget_high_s16(cols_3377), -F_0_720); - - /* Final output stage: descale and clamp to range [0-255]. */ - int16x8_t output_s16 = vcombine_s16(vaddhn_s32(tmp10, tmp0), - vsubhn_s32(tmp10, tmp0)); - output_s16 = vrsraq_n_s16(vdupq_n_s16(CENTERJSAMPLE), output_s16, - CONST_BITS + PASS1_BITS + 3 + 2 - 16); - /* Narrow to 8-bit and convert to unsigned. */ - uint8x8_t output_u8 = vqmovun_s16(output_s16); - - /* Store 2x2 block to memory. */ - vst1_lane_u8(output_buf[0] + output_col, output_u8, 0); - vst1_lane_u8(output_buf[1] + output_col, output_u8, 1); - vst1_lane_u8(output_buf[0] + output_col + 1, output_u8, 4); - vst1_lane_u8(output_buf[1] + output_col + 1, output_u8, 5); -} - - -/* - * 'jsimd_idct_4x4_neon' is an inverse-DCT function for getting reduced-size - * 4x4 pixels output from an 8x8 DCT block. It uses the same calculations and - * produces exactly the same output as IJG's original 'jpeg_idct_4x4' function - * from jpeg-6b, which can be found in jidctred.c. - * - * Scaled integer constants are used to avoid floating-point arithmetic: - * 0.211164243 = 1730 * 2^-13 - * 0.509795579 = 4176 * 2^-13 - * 0.601344887 = 4926 * 2^-13 - * 0.765366865 = 6270 * 2^-13 - * 0.899976223 = 7373 * 2^-13 - * 1.061594337 = 8697 * 2^-13 - * 1.451774981 = 11893 * 2^-13 - * 1.847759065 = 15137 * 2^-13 - * 2.172734803 = 17799 * 2^-13 - * 2.562915447 = 20995 * 2^-13 - * - * See jidctred.c for further details of the 4x4 reduced IDCT algorithm. Where - * possible, the variable names and comments here in 'jsimd_idct_4x4_neon' - * match up with those in 'jpeg_idct_4x4'. - * - * NOTE: jpeg-8 has an improved implementation of the 4x4 inverse-DCT which - * requires fewer arithmetic operations and hence should be faster. The - * primary purpose of this particular NEON optimized function is bit - * exact compatibility with jpeg-6b. - */ - -ALIGN(16) static const int16_t jsimd_idct_4x4_neon_consts[] = { - F_1_847, -F_0_765, -F_0_211, F_1_451, - -F_2_172, F_1_061, -F_0_509, -F_0_601, - F_0_899, F_2_562, 0, 0 - }; - -void jsimd_idct_4x4_neon(void *dct_table, - JCOEFPTR coef_block, - JSAMPARRAY restrict output_buf, - JDIMENSION output_col) -{ - ISLOW_MULT_TYPE *quantptr = dct_table; - - /* Load DCT coefficients. */ - int16x8_t row0 = vld1q_s16(coef_block + 0 * DCTSIZE); - int16x8_t row1 = vld1q_s16(coef_block + 1 * DCTSIZE); - int16x8_t row2 = vld1q_s16(coef_block + 2 * DCTSIZE); - int16x8_t row3 = vld1q_s16(coef_block + 3 * DCTSIZE); - int16x8_t row5 = vld1q_s16(coef_block + 5 * DCTSIZE); - int16x8_t row6 = vld1q_s16(coef_block + 6 * DCTSIZE); - int16x8_t row7 = vld1q_s16(coef_block + 7 * DCTSIZE); - - /* Load quantization table values for DC coefficients. */ - int16x8_t quant_row0 = vld1q_s16(quantptr + 0 * DCTSIZE); - /* Dequantize DC coefficients. */ - row0 = vmulq_s16(row0, quant_row0); - - /* Construct bitmap to test if all AC coefficients are 0. */ - int16x8_t bitmap = vorrq_s16(row1, row2); - bitmap = vorrq_s16(bitmap, row3); - bitmap = vorrq_s16(bitmap, row5); - bitmap = vorrq_s16(bitmap, row6); - bitmap = vorrq_s16(bitmap, row7); - - int64_t left_ac_bitmap = vgetq_lane_s64(vreinterpretq_s64_s16(bitmap), 0); - int64_t right_ac_bitmap = vgetq_lane_s64(vreinterpretq_s64_s16(bitmap), 1); - - /* Load constants for IDCT computation. */ -#if defined(__aarch64__) || defined(__ARM64__) || defined(_M_ARM64) - const int16x4x3_t consts = vld1_s16_x3(jsimd_idct_4x4_neon_consts); -#else - const int16x4x3_t consts = { vld1_s16(jsimd_idct_4x4_neon_consts), - vld1_s16(jsimd_idct_4x4_neon_consts + 4), - vld1_s16(jsimd_idct_4x4_neon_consts + 8) }; -#endif - - if (left_ac_bitmap == 0 && right_ac_bitmap == 0) { - /* All AC coefficients are zero. */ - /* Compute DC values and duplicate into row vectors 0, 1, 2 and 3. */ - int16x8_t dcval = vshlq_n_s16(row0, PASS1_BITS); - row0 = dcval; - row1 = dcval; - row2 = dcval; - row3 = dcval; - } else if (left_ac_bitmap == 0) { - /* AC coefficients are zero for columns 0, 1, 2 and 3. */ - /* Compute DC values for these columns. */ - int16x4_t dcval = vshl_n_s16(vget_low_s16(row0), PASS1_BITS); - - /* Commence regular IDCT computation for columns 4, 5, 6 and 7. */ - /* Load quantization table. */ - int16x4_t quant_row1 = vld1_s16(quantptr + 1 * DCTSIZE + 4); - int16x4_t quant_row2 = vld1_s16(quantptr + 2 * DCTSIZE + 4); - int16x4_t quant_row3 = vld1_s16(quantptr + 3 * DCTSIZE + 4); - int16x4_t quant_row5 = vld1_s16(quantptr + 5 * DCTSIZE + 4); - int16x4_t quant_row6 = vld1_s16(quantptr + 6 * DCTSIZE + 4); - int16x4_t quant_row7 = vld1_s16(quantptr + 7 * DCTSIZE + 4); - - /* Even part. */ - int32x4_t tmp0 = vshll_n_s16(vget_high_s16(row0), CONST_BITS + 1); - - int16x4_t z2 = vmul_s16(vget_high_s16(row2), quant_row2); - int16x4_t z3 = vmul_s16(vget_high_s16(row6), quant_row6); - - int32x4_t tmp2 = vmull_lane_s16(z2, consts.val[0], 0); - tmp2 = vmlal_lane_s16(tmp2, z3, consts.val[0], 1); - - int32x4_t tmp10 = vaddq_s32(tmp0, tmp2); - int32x4_t tmp12 = vsubq_s32(tmp0, tmp2); - - /* Odd part. */ - int16x4_t z1 = vmul_s16(vget_high_s16(row7), quant_row7); - z2 = vmul_s16(vget_high_s16(row5), quant_row5); - z3 = vmul_s16(vget_high_s16(row3), quant_row3); - int16x4_t z4 = vmul_s16(vget_high_s16(row1), quant_row1); - - tmp0 = vmull_lane_s16(z1, consts.val[0], 2); - tmp0 = vmlal_lane_s16(tmp0, z2, consts.val[0], 3); - tmp0 = vmlal_lane_s16(tmp0, z3, consts.val[1], 0); - tmp0 = vmlal_lane_s16(tmp0, z4, consts.val[1], 1); - - tmp2 = vmull_lane_s16(z1, consts.val[1], 2); - tmp2 = vmlal_lane_s16(tmp2, z2, consts.val[1], 3); - tmp2 = vmlal_lane_s16(tmp2, z3, consts.val[2], 0); - tmp2 = vmlal_lane_s16(tmp2, z4, consts.val[2], 1); - - /* Final output stage: descale and narrow to 16-bit. */ - row0 = vcombine_s16(dcval, vrshrn_n_s32(vaddq_s32(tmp10, tmp2), - CONST_BITS - PASS1_BITS + 1)); - row3 = vcombine_s16(dcval, vrshrn_n_s32(vsubq_s32(tmp10, tmp2), - CONST_BITS - PASS1_BITS + 1)); - row1 = vcombine_s16(dcval, vrshrn_n_s32(vaddq_s32(tmp12, tmp0), - CONST_BITS - PASS1_BITS + 1)); - row2 = vcombine_s16(dcval, vrshrn_n_s32(vsubq_s32(tmp12, tmp0), - CONST_BITS - PASS1_BITS + 1)); - } else if (right_ac_bitmap == 0) { - /* AC coefficients are zero for columns 4, 5, 6 and 7. */ - /* Compute DC values for these columns. */ - int16x4_t dcval = vshl_n_s16(vget_high_s16(row0), PASS1_BITS); - - /* Commence regular IDCT computation for columns 0, 1, 2 and 3. */ - /* Load quantization table. */ - int16x4_t quant_row1 = vld1_s16(quantptr + 1 * DCTSIZE); - int16x4_t quant_row2 = vld1_s16(quantptr + 2 * DCTSIZE); - int16x4_t quant_row3 = vld1_s16(quantptr + 3 * DCTSIZE); - int16x4_t quant_row5 = vld1_s16(quantptr + 5 * DCTSIZE); - int16x4_t quant_row6 = vld1_s16(quantptr + 6 * DCTSIZE); - int16x4_t quant_row7 = vld1_s16(quantptr + 7 * DCTSIZE); - - /* Even part. */ - int32x4_t tmp0 = vshll_n_s16(vget_low_s16(row0), CONST_BITS + 1); - - int16x4_t z2 = vmul_s16(vget_low_s16(row2), quant_row2); - int16x4_t z3 = vmul_s16(vget_low_s16(row6), quant_row6); - - int32x4_t tmp2 = vmull_lane_s16(z2, consts.val[0], 0); - tmp2 = vmlal_lane_s16(tmp2, z3, consts.val[0], 1); - - int32x4_t tmp10 = vaddq_s32(tmp0, tmp2); - int32x4_t tmp12 = vsubq_s32(tmp0, tmp2); - - /* Odd part. */ - int16x4_t z1 = vmul_s16(vget_low_s16(row7), quant_row7); - z2 = vmul_s16(vget_low_s16(row5), quant_row5); - z3 = vmul_s16(vget_low_s16(row3), quant_row3); - int16x4_t z4 = vmul_s16(vget_low_s16(row1), quant_row1); - - tmp0 = vmull_lane_s16(z1, consts.val[0], 2); - tmp0 = vmlal_lane_s16(tmp0, z2, consts.val[0], 3); - tmp0 = vmlal_lane_s16(tmp0, z3, consts.val[1], 0); - tmp0 = vmlal_lane_s16(tmp0, z4, consts.val[1], 1); - - tmp2 = vmull_lane_s16(z1, consts.val[1], 2); - tmp2 = vmlal_lane_s16(tmp2, z2, consts.val[1], 3); - tmp2 = vmlal_lane_s16(tmp2, z3, consts.val[2], 0); - tmp2 = vmlal_lane_s16(tmp2, z4, consts.val[2], 1); - - /* Final output stage: descale and narrow to 16-bit. */ - row0 = vcombine_s16(vrshrn_n_s32(vaddq_s32(tmp10, tmp2), - CONST_BITS - PASS1_BITS + 1), dcval); - row3 = vcombine_s16(vrshrn_n_s32(vsubq_s32(tmp10, tmp2), - CONST_BITS - PASS1_BITS + 1), dcval); - row1 = vcombine_s16(vrshrn_n_s32(vaddq_s32(tmp12, tmp0), - CONST_BITS - PASS1_BITS + 1), dcval); - row2 = vcombine_s16(vrshrn_n_s32(vsubq_s32(tmp12, tmp0), - CONST_BITS - PASS1_BITS + 1), dcval); - } else { - /* All AC coefficients are non-zero; full IDCT calculation required. */ - int16x8_t quant_row1 = vld1q_s16(quantptr + 1 * DCTSIZE); - int16x8_t quant_row2 = vld1q_s16(quantptr + 2 * DCTSIZE); - int16x8_t quant_row3 = vld1q_s16(quantptr + 3 * DCTSIZE); - int16x8_t quant_row5 = vld1q_s16(quantptr + 5 * DCTSIZE); - int16x8_t quant_row6 = vld1q_s16(quantptr + 6 * DCTSIZE); - int16x8_t quant_row7 = vld1q_s16(quantptr + 7 * DCTSIZE); - - /* Even part. */ - int32x4_t tmp0_l = vshll_n_s16(vget_low_s16(row0), CONST_BITS + 1); - int32x4_t tmp0_h = vshll_n_s16(vget_high_s16(row0), CONST_BITS + 1); - - int16x8_t z2 = vmulq_s16(row2, quant_row2); - int16x8_t z3 = vmulq_s16(row6, quant_row6); - - int32x4_t tmp2_l = vmull_lane_s16(vget_low_s16(z2), consts.val[0], 0); - int32x4_t tmp2_h = vmull_lane_s16(vget_high_s16(z2), consts.val[0], 0); - tmp2_l = vmlal_lane_s16(tmp2_l, vget_low_s16(z3), consts.val[0], 1); - tmp2_h = vmlal_lane_s16(tmp2_h, vget_high_s16(z3), consts.val[0], 1); - - int32x4_t tmp10_l = vaddq_s32(tmp0_l, tmp2_l); - int32x4_t tmp10_h = vaddq_s32(tmp0_h, tmp2_h); - int32x4_t tmp12_l = vsubq_s32(tmp0_l, tmp2_l); - int32x4_t tmp12_h = vsubq_s32(tmp0_h, tmp2_h); - - /* Odd part. */ - int16x8_t z1 = vmulq_s16(row7, quant_row7); - z2 = vmulq_s16(row5, quant_row5); - z3 = vmulq_s16(row3, quant_row3); - int16x8_t z4 = vmulq_s16(row1, quant_row1); - - tmp0_l = vmull_lane_s16(vget_low_s16(z1), consts.val[0], 2); - tmp0_l = vmlal_lane_s16(tmp0_l, vget_low_s16(z2), consts.val[0], 3); - tmp0_l = vmlal_lane_s16(tmp0_l, vget_low_s16(z3), consts.val[1], 0); - tmp0_l = vmlal_lane_s16(tmp0_l, vget_low_s16(z4), consts.val[1], 1); - tmp0_h = vmull_lane_s16(vget_high_s16(z1), consts.val[0], 2); - tmp0_h = vmlal_lane_s16(tmp0_h, vget_high_s16(z2), consts.val[0], 3); - tmp0_h = vmlal_lane_s16(tmp0_h, vget_high_s16(z3), consts.val[1], 0); - tmp0_h = vmlal_lane_s16(tmp0_h, vget_high_s16(z4), consts.val[1], 1); - - tmp2_l = vmull_lane_s16(vget_low_s16(z1), consts.val[1], 2); - tmp2_l = vmlal_lane_s16(tmp2_l, vget_low_s16(z2), consts.val[1], 3); - tmp2_l = vmlal_lane_s16(tmp2_l, vget_low_s16(z3), consts.val[2], 0); - tmp2_l = vmlal_lane_s16(tmp2_l, vget_low_s16(z4), consts.val[2], 1); - tmp2_h = vmull_lane_s16(vget_high_s16(z1), consts.val[1], 2); - tmp2_h = vmlal_lane_s16(tmp2_h, vget_high_s16(z2), consts.val[1], 3); - tmp2_h = vmlal_lane_s16(tmp2_h, vget_high_s16(z3), consts.val[2], 0); - tmp2_h = vmlal_lane_s16(tmp2_h, vget_high_s16(z4), consts.val[2], 1); - - /* Final output stage: descale and narrow to 16-bit. */ - row0 = vcombine_s16(vrshrn_n_s32(vaddq_s32(tmp10_l, tmp2_l), - CONST_BITS - PASS1_BITS + 1), - vrshrn_n_s32(vaddq_s32(tmp10_h, tmp2_h), - CONST_BITS - PASS1_BITS + 1)); - row3 = vcombine_s16(vrshrn_n_s32(vsubq_s32(tmp10_l, tmp2_l), - CONST_BITS - PASS1_BITS + 1), - vrshrn_n_s32(vsubq_s32(tmp10_h, tmp2_h), - CONST_BITS - PASS1_BITS + 1)); - row1 = vcombine_s16(vrshrn_n_s32(vaddq_s32(tmp12_l, tmp0_l), - CONST_BITS - PASS1_BITS + 1), - vrshrn_n_s32(vaddq_s32(tmp12_h, tmp0_h), - CONST_BITS - PASS1_BITS + 1)); - row2 = vcombine_s16(vrshrn_n_s32(vsubq_s32(tmp12_l, tmp0_l), - CONST_BITS - PASS1_BITS + 1), - vrshrn_n_s32(vsubq_s32(tmp12_h, tmp0_h), - CONST_BITS - PASS1_BITS + 1)); - } - - /* Transpose 8x4 block to perform IDCT on rows in second pass. */ - int16x8x2_t row_01 = vtrnq_s16(row0, row1); - int16x8x2_t row_23 = vtrnq_s16(row2, row3); - - int32x4x2_t cols_0426 = vtrnq_s32(vreinterpretq_s32_s16(row_01.val[0]), - vreinterpretq_s32_s16(row_23.val[0])); - int32x4x2_t cols_1537 = vtrnq_s32(vreinterpretq_s32_s16(row_01.val[1]), - vreinterpretq_s32_s16(row_23.val[1])); - - int16x4_t col0 = vreinterpret_s16_s32(vget_low_s32(cols_0426.val[0])); - int16x4_t col1 = vreinterpret_s16_s32(vget_low_s32(cols_1537.val[0])); - int16x4_t col2 = vreinterpret_s16_s32(vget_low_s32(cols_0426.val[1])); - int16x4_t col3 = vreinterpret_s16_s32(vget_low_s32(cols_1537.val[1])); - int16x4_t col5 = vreinterpret_s16_s32(vget_high_s32(cols_1537.val[0])); - int16x4_t col6 = vreinterpret_s16_s32(vget_high_s32(cols_0426.val[1])); - int16x4_t col7 = vreinterpret_s16_s32(vget_high_s32(cols_1537.val[1])); - - /* Commence second pass of IDCT. */ - /* Even part. */ - int32x4_t tmp0 = vshll_n_s16(col0, CONST_BITS + 1); - int32x4_t tmp2 = vmull_lane_s16(col2, consts.val[0], 0); - tmp2 = vmlal_lane_s16(tmp2, col6, consts.val[0], 1); - - int32x4_t tmp10 = vaddq_s32(tmp0, tmp2); - int32x4_t tmp12 = vsubq_s32(tmp0, tmp2); - - /* Odd part. */ - tmp0 = vmull_lane_s16(col7, consts.val[0], 2); - tmp0 = vmlal_lane_s16(tmp0, col5, consts.val[0], 3); - tmp0 = vmlal_lane_s16(tmp0, col3, consts.val[1], 0); - tmp0 = vmlal_lane_s16(tmp0, col1, consts.val[1], 1); - - tmp2 = vmull_lane_s16(col7, consts.val[1], 2); - tmp2 = vmlal_lane_s16(tmp2, col5, consts.val[1], 3); - tmp2 = vmlal_lane_s16(tmp2, col3, consts.val[2], 0); - tmp2 = vmlal_lane_s16(tmp2, col1, consts.val[2], 1); - - /* Final output stage: descale and clamp to range [0-255]. */ - int16x8_t output_cols_02 = vcombine_s16(vaddhn_s32(tmp10, tmp2), - vsubhn_s32(tmp12, tmp0)); - int16x8_t output_cols_13 = vcombine_s16(vaddhn_s32(tmp12, tmp0), - vsubhn_s32(tmp10, tmp2)); - output_cols_02 = vrsraq_n_s16(vdupq_n_s16(CENTERJSAMPLE), output_cols_02, - CONST_BITS + PASS1_BITS + 3 + 1 - 16); - output_cols_13 = vrsraq_n_s16(vdupq_n_s16(CENTERJSAMPLE), output_cols_13, - CONST_BITS + PASS1_BITS + 3 + 1 - 16); - /* Narrow to 8-bit and convert to unsigned while zipping 8-bit elements. */ - /* Interleaving store completes the transpose. */ - uint8x8x2_t output_0123 = vzip_u8(vqmovun_s16(output_cols_02), - vqmovun_s16(output_cols_13)); - uint16x4x2_t output_01_23 = { vreinterpret_u16_u8(output_0123.val[0]), - vreinterpret_u16_u8(output_0123.val[1]) - }; - - /* Store 4x4 block to memory. */ - JSAMPROW outptr0 = output_buf[0] + output_col; - JSAMPROW outptr1 = output_buf[1] + output_col; - JSAMPROW outptr2 = output_buf[2] + output_col; - JSAMPROW outptr3 = output_buf[3] + output_col; - vst2_lane_u16((uint16_t *)outptr0, output_01_23, 0); - vst2_lane_u16((uint16_t *)outptr1, output_01_23, 1); - vst2_lane_u16((uint16_t *)outptr2, output_01_23, 2); - vst2_lane_u16((uint16_t *)outptr3, output_01_23, 3); -} diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/common/jquanti-neon.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/common/jquanti-neon.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/common/jquanti-neon.c 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/common/jquanti-neon.c 1970-01-01 01:00:00.000000000 +0100 @@ -1,190 +0,0 @@ -/* - * jquanti-neon.c - sample conversion and integer quantization (Arm NEON) - * - * Copyright 2020 The Chromium Authors. All Rights Reserved. - * - * This software is provided 'as-is', without any express or implied - * warranty. In no event will the authors be held liable for any damages - * arising from the use of this software. - * - * Permission is granted to anyone to use this software for any purpose, - * including commercial applications, and to alter it and redistribute it - * freely, subject to the following restrictions: - * - * 1. The origin of this software must not be misrepresented; you must not - * claim that you wrote the original software. If you use this software - * in a product, an acknowledgment in the product documentation would be - * appreciated but is not required. - * 2. Altered source versions must be plainly marked as such, and must not be - * misrepresented as being the original software. - * 3. This notice may not be removed or altered from any source distribution. - */ - -#define JPEG_INTERNALS -#include "../../../jinclude.h" -#include "../../../jpeglib.h" -#include "../../../jsimd.h" -#include "../../../jdct.h" -#include "../../../jsimddct.h" -#include "../../jsimd.h" - -#include - -/* - * Pixel channel sample values have range [0,255]. The Discrete Cosine - * Transform (DCT) operates on values centered around 0. - * - * To prepare sample values for the DCT, load samples into a DCT workspace, - * subtracting CENTREJSAMPLE (128). The samples, now in range [-128, 127], - * are also widened from 8- to 16-bit. - * - * The equivalent scalar C function 'convsamp' can be found in jcdctmgr.c. - */ - -void jsimd_convsamp_neon(JSAMPARRAY sample_data, - JDIMENSION start_col, - DCTELEM *workspace) -{ - uint8x8_t samp_row0 = vld1_u8(sample_data[0] + start_col); - uint8x8_t samp_row1 = vld1_u8(sample_data[1] + start_col); - uint8x8_t samp_row2 = vld1_u8(sample_data[2] + start_col); - uint8x8_t samp_row3 = vld1_u8(sample_data[3] + start_col); - uint8x8_t samp_row4 = vld1_u8(sample_data[4] + start_col); - uint8x8_t samp_row5 = vld1_u8(sample_data[5] + start_col); - uint8x8_t samp_row6 = vld1_u8(sample_data[6] + start_col); - uint8x8_t samp_row7 = vld1_u8(sample_data[7] + start_col); - - int16x8_t row0 = vreinterpretq_s16_u16(vsubl_u8(samp_row0, - vdup_n_u8(CENTERJSAMPLE))); - int16x8_t row1 = vreinterpretq_s16_u16(vsubl_u8(samp_row1, - vdup_n_u8(CENTERJSAMPLE))); - int16x8_t row2 = vreinterpretq_s16_u16(vsubl_u8(samp_row2, - vdup_n_u8(CENTERJSAMPLE))); - int16x8_t row3 = vreinterpretq_s16_u16(vsubl_u8(samp_row3, - vdup_n_u8(CENTERJSAMPLE))); - int16x8_t row4 = vreinterpretq_s16_u16(vsubl_u8(samp_row4, - vdup_n_u8(CENTERJSAMPLE))); - int16x8_t row5 = vreinterpretq_s16_u16(vsubl_u8(samp_row5, - vdup_n_u8(CENTERJSAMPLE))); - int16x8_t row6 = vreinterpretq_s16_u16(vsubl_u8(samp_row6, - vdup_n_u8(CENTERJSAMPLE))); - int16x8_t row7 = vreinterpretq_s16_u16(vsubl_u8(samp_row7, - vdup_n_u8(CENTERJSAMPLE))); - - vst1q_s16(workspace + 0 * DCTSIZE, row0); - vst1q_s16(workspace + 1 * DCTSIZE, row1); - vst1q_s16(workspace + 2 * DCTSIZE, row2); - vst1q_s16(workspace + 3 * DCTSIZE, row3); - vst1q_s16(workspace + 4 * DCTSIZE, row4); - vst1q_s16(workspace + 5 * DCTSIZE, row5); - vst1q_s16(workspace + 6 * DCTSIZE, row6); - vst1q_s16(workspace + 7 * DCTSIZE, row7); -} - - -/* - * After the DCT, the resulting coefficient values need to be divided by a - * quantization value. - * - * To avoid a slow division operation, the DCT coefficients are multiplied by - * the (scaled) reciprocal of the quantization values and then right-shifted. - * - * The equivalent scalar C function 'quantize' can be found in jcdctmgr.c. - */ - -void jsimd_quantize_neon(JCOEFPTR coef_block, - DCTELEM *divisors, - DCTELEM *workspace) -{ - JCOEFPTR out_ptr = coef_block; - UDCTELEM *recip_ptr = (UDCTELEM *)divisors; - UDCTELEM *corr_ptr = (UDCTELEM *)divisors + DCTSIZE2; - DCTELEM *shift_ptr = divisors + 3 * DCTSIZE2; - - for (int i = 0; i < DCTSIZE; i += DCTSIZE / 2) { - /* Load DCT coefficients. */ - int16x8_t row0 = vld1q_s16(workspace + (i + 0) * DCTSIZE); - int16x8_t row1 = vld1q_s16(workspace + (i + 1) * DCTSIZE); - int16x8_t row2 = vld1q_s16(workspace + (i + 2) * DCTSIZE); - int16x8_t row3 = vld1q_s16(workspace + (i + 3) * DCTSIZE); - /* Load reciprocals of quantization values. */ - uint16x8_t recip0 = vld1q_u16(recip_ptr + (i + 0) * DCTSIZE); - uint16x8_t recip1 = vld1q_u16(recip_ptr + (i + 1) * DCTSIZE); - uint16x8_t recip2 = vld1q_u16(recip_ptr + (i + 2) * DCTSIZE); - uint16x8_t recip3 = vld1q_u16(recip_ptr + (i + 3) * DCTSIZE); - uint16x8_t corr0 = vld1q_u16(corr_ptr + (i + 0) * DCTSIZE); - uint16x8_t corr1 = vld1q_u16(corr_ptr + (i + 1) * DCTSIZE); - uint16x8_t corr2 = vld1q_u16(corr_ptr + (i + 2) * DCTSIZE); - uint16x8_t corr3 = vld1q_u16(corr_ptr + (i + 3) * DCTSIZE); - int16x8_t shift0 = vld1q_s16(shift_ptr + (i + 0) * DCTSIZE); - int16x8_t shift1 = vld1q_s16(shift_ptr + (i + 1) * DCTSIZE); - int16x8_t shift2 = vld1q_s16(shift_ptr + (i + 2) * DCTSIZE); - int16x8_t shift3 = vld1q_s16(shift_ptr + (i + 3) * DCTSIZE); - - /* Extract sign from coefficients. */ - int16x8_t sign_row0 = vshrq_n_s16(row0, 15); - int16x8_t sign_row1 = vshrq_n_s16(row1, 15); - int16x8_t sign_row2 = vshrq_n_s16(row2, 15); - int16x8_t sign_row3 = vshrq_n_s16(row3, 15); - /* Get absolute value of DCT coefficients. */ - uint16x8_t abs_row0 = vreinterpretq_u16_s16(vabsq_s16(row0)); - uint16x8_t abs_row1 = vreinterpretq_u16_s16(vabsq_s16(row1)); - uint16x8_t abs_row2 = vreinterpretq_u16_s16(vabsq_s16(row2)); - uint16x8_t abs_row3 = vreinterpretq_u16_s16(vabsq_s16(row3)); - /* Add correction. */ - abs_row0 = vaddq_u16(abs_row0, corr0); - abs_row1 = vaddq_u16(abs_row1, corr1); - abs_row2 = vaddq_u16(abs_row2, corr2); - abs_row3 = vaddq_u16(abs_row3, corr3); - - /* Multiply DCT coefficients by quantization reciprocal. */ - int32x4_t row0_l = vreinterpretq_s32_u32(vmull_u16(vget_low_u16(abs_row0), - vget_low_u16(recip0))); - int32x4_t row0_h = vreinterpretq_s32_u32(vmull_u16(vget_high_u16(abs_row0), - vget_high_u16(recip0))); - int32x4_t row1_l = vreinterpretq_s32_u32(vmull_u16(vget_low_u16(abs_row1), - vget_low_u16(recip1))); - int32x4_t row1_h = vreinterpretq_s32_u32(vmull_u16(vget_high_u16(abs_row1), - vget_high_u16(recip1))); - int32x4_t row2_l = vreinterpretq_s32_u32(vmull_u16(vget_low_u16(abs_row2), - vget_low_u16(recip2))); - int32x4_t row2_h = vreinterpretq_s32_u32(vmull_u16(vget_high_u16(abs_row2), - vget_high_u16(recip2))); - int32x4_t row3_l = vreinterpretq_s32_u32(vmull_u16(vget_low_u16(abs_row3), - vget_low_u16(recip3))); - int32x4_t row3_h = vreinterpretq_s32_u32(vmull_u16(vget_high_u16(abs_row3), - vget_high_u16(recip3))); - /* Narrow back to 16-bit. */ - row0 = vcombine_s16(vshrn_n_s32(row0_l, 16), vshrn_n_s32(row0_h, 16)); - row1 = vcombine_s16(vshrn_n_s32(row1_l, 16), vshrn_n_s32(row1_h, 16)); - row2 = vcombine_s16(vshrn_n_s32(row2_l, 16), vshrn_n_s32(row2_h, 16)); - row3 = vcombine_s16(vshrn_n_s32(row3_l, 16), vshrn_n_s32(row3_h, 16)); - - /* Since VSHR only supports an immediate as its second argument, negate */ - /* the shift value and shift left. */ - row0 = vreinterpretq_s16_u16(vshlq_u16(vreinterpretq_u16_s16(row0), - vnegq_s16(shift0))); - row1 = vreinterpretq_s16_u16(vshlq_u16(vreinterpretq_u16_s16(row1), - vnegq_s16(shift1))); - row2 = vreinterpretq_s16_u16(vshlq_u16(vreinterpretq_u16_s16(row2), - vnegq_s16(shift2))); - row3 = vreinterpretq_s16_u16(vshlq_u16(vreinterpretq_u16_s16(row3), - vnegq_s16(shift3))); - - /* Restore sign to original product. */ - row0 = veorq_s16(row0, sign_row0); - row0 = vsubq_s16(row0, sign_row0); - row1 = veorq_s16(row1, sign_row1); - row1 = vsubq_s16(row1, sign_row1); - row2 = veorq_s16(row2, sign_row2); - row2 = vsubq_s16(row2, sign_row2); - row3 = veorq_s16(row3, sign_row3); - row3 = vsubq_s16(row3, sign_row3); - - /* Store quantized coefficients to memory. */ - vst1q_s16(out_ptr + (i + 0) * DCTSIZE, row0); - vst1q_s16(out_ptr + (i + 1) * DCTSIZE, row1); - vst1q_s16(out_ptr + (i + 2) * DCTSIZE, row2); - vst1q_s16(out_ptr + (i + 3) * DCTSIZE, row3); - } -} diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/jccolor-neon.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/jccolor-neon.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/jccolor-neon.c 1970-01-01 01:00:00.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/jccolor-neon.c 2021-11-20 03:41:33.398600450 +0000 @@ -0,0 +1,160 @@ +/* + * jccolor-neon.c - colorspace conversion (Arm Neon) + * + * Copyright (C) 2020, Arm Limited. All Rights Reserved. + * Copyright (C) 2020, D. R. Commander. All Rights Reserved. + * + * This software is provided 'as-is', without any express or implied + * warranty. In no event will the authors be held liable for any damages + * arising from the use of this software. + * + * Permission is granted to anyone to use this software for any purpose, + * including commercial applications, and to alter it and redistribute it + * freely, subject to the following restrictions: + * + * 1. The origin of this software must not be misrepresented; you must not + * claim that you wrote the original software. If you use this software + * in a product, an acknowledgment in the product documentation would be + * appreciated but is not required. + * 2. Altered source versions must be plainly marked as such, and must not be + * misrepresented as being the original software. + * 3. This notice may not be removed or altered from any source distribution. + */ + +#define JPEG_INTERNALS +#include "../../jinclude.h" +#include "../../jpeglib.h" +#include "../../jsimd.h" +#include "../../jdct.h" +#include "../../jsimddct.h" +#include "../jsimd.h" +#include "align.h" +#include "neon-compat.h" + +#include + + +/* RGB -> YCbCr conversion constants */ + +#define F_0_298 19595 +#define F_0_587 38470 +#define F_0_113 7471 +#define F_0_168 11059 +#define F_0_331 21709 +#define F_0_500 32768 +#define F_0_418 27439 +#define F_0_081 5329 + +ALIGN(16) static const uint16_t jsimd_rgb_ycc_neon_consts[] = { + F_0_298, F_0_587, F_0_113, F_0_168, + F_0_331, F_0_500, F_0_418, F_0_081 +}; + + +/* Include inline routines for colorspace extensions. */ + +#if defined(__aarch64__) || defined(_M_ARM64) +#include "aarch64/jccolext-neon.c" +#else +#include "aarch32/jccolext-neon.c" +#endif +#undef RGB_RED +#undef RGB_GREEN +#undef RGB_BLUE +#undef RGB_PIXELSIZE + +#define RGB_RED EXT_RGB_RED +#define RGB_GREEN EXT_RGB_GREEN +#define RGB_BLUE EXT_RGB_BLUE +#define RGB_PIXELSIZE EXT_RGB_PIXELSIZE +#define jsimd_rgb_ycc_convert_neon jsimd_extrgb_ycc_convert_neon +#if defined(__aarch64__) || defined(_M_ARM64) +#include "aarch64/jccolext-neon.c" +#else +#include "aarch32/jccolext-neon.c" +#endif +#undef RGB_RED +#undef RGB_GREEN +#undef RGB_BLUE +#undef RGB_PIXELSIZE +#undef jsimd_rgb_ycc_convert_neon + +#define RGB_RED EXT_RGBX_RED +#define RGB_GREEN EXT_RGBX_GREEN +#define RGB_BLUE EXT_RGBX_BLUE +#define RGB_PIXELSIZE EXT_RGBX_PIXELSIZE +#define jsimd_rgb_ycc_convert_neon jsimd_extrgbx_ycc_convert_neon +#if defined(__aarch64__) || defined(_M_ARM64) +#include "aarch64/jccolext-neon.c" +#else +#include "aarch32/jccolext-neon.c" +#endif +#undef RGB_RED +#undef RGB_GREEN +#undef RGB_BLUE +#undef RGB_PIXELSIZE +#undef jsimd_rgb_ycc_convert_neon + +#define RGB_RED EXT_BGR_RED +#define RGB_GREEN EXT_BGR_GREEN +#define RGB_BLUE EXT_BGR_BLUE +#define RGB_PIXELSIZE EXT_BGR_PIXELSIZE +#define jsimd_rgb_ycc_convert_neon jsimd_extbgr_ycc_convert_neon +#if defined(__aarch64__) || defined(_M_ARM64) +#include "aarch64/jccolext-neon.c" +#else +#include "aarch32/jccolext-neon.c" +#endif +#undef RGB_RED +#undef RGB_GREEN +#undef RGB_BLUE +#undef RGB_PIXELSIZE +#undef jsimd_rgb_ycc_convert_neon + +#define RGB_RED EXT_BGRX_RED +#define RGB_GREEN EXT_BGRX_GREEN +#define RGB_BLUE EXT_BGRX_BLUE +#define RGB_PIXELSIZE EXT_BGRX_PIXELSIZE +#define jsimd_rgb_ycc_convert_neon jsimd_extbgrx_ycc_convert_neon +#if defined(__aarch64__) || defined(_M_ARM64) +#include "aarch64/jccolext-neon.c" +#else +#include "aarch32/jccolext-neon.c" +#endif +#undef RGB_RED +#undef RGB_GREEN +#undef RGB_BLUE +#undef RGB_PIXELSIZE +#undef jsimd_rgb_ycc_convert_neon + +#define RGB_RED EXT_XBGR_RED +#define RGB_GREEN EXT_XBGR_GREEN +#define RGB_BLUE EXT_XBGR_BLUE +#define RGB_PIXELSIZE EXT_XBGR_PIXELSIZE +#define jsimd_rgb_ycc_convert_neon jsimd_extxbgr_ycc_convert_neon +#if defined(__aarch64__) || defined(_M_ARM64) +#include "aarch64/jccolext-neon.c" +#else +#include "aarch32/jccolext-neon.c" +#endif +#undef RGB_RED +#undef RGB_GREEN +#undef RGB_BLUE +#undef RGB_PIXELSIZE +#undef jsimd_rgb_ycc_convert_neon + +#define RGB_RED EXT_XRGB_RED +#define RGB_GREEN EXT_XRGB_GREEN +#define RGB_BLUE EXT_XRGB_BLUE +#define RGB_PIXELSIZE EXT_XRGB_PIXELSIZE +#define jsimd_rgb_ycc_convert_neon jsimd_extxrgb_ycc_convert_neon +#if defined(__aarch64__) || defined(_M_ARM64) +#include "aarch64/jccolext-neon.c" +#else +#include "aarch32/jccolext-neon.c" +#endif +#undef RGB_RED +#undef RGB_GREEN +#undef RGB_BLUE +#undef RGB_PIXELSIZE +#undef jsimd_rgb_ycc_convert_neon diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/jcgray-neon.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/jcgray-neon.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/jcgray-neon.c 1970-01-01 01:00:00.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/jcgray-neon.c 2021-11-20 03:41:33.398600450 +0000 @@ -0,0 +1,120 @@ +/* + * jcgray-neon.c - grayscale colorspace conversion (Arm Neon) + * + * Copyright (C) 2020, Arm Limited. All Rights Reserved. + * + * This software is provided 'as-is', without any express or implied + * warranty. In no event will the authors be held liable for any damages + * arising from the use of this software. + * + * Permission is granted to anyone to use this software for any purpose, + * including commercial applications, and to alter it and redistribute it + * freely, subject to the following restrictions: + * + * 1. The origin of this software must not be misrepresented; you must not + * claim that you wrote the original software. If you use this software + * in a product, an acknowledgment in the product documentation would be + * appreciated but is not required. + * 2. Altered source versions must be plainly marked as such, and must not be + * misrepresented as being the original software. + * 3. This notice may not be removed or altered from any source distribution. + */ + +#define JPEG_INTERNALS +#include "../../jinclude.h" +#include "../../jpeglib.h" +#include "../../jsimd.h" +#include "../../jdct.h" +#include "../../jsimddct.h" +#include "../jsimd.h" +#include "align.h" + +#include + + +/* RGB -> Grayscale conversion constants */ + +#define F_0_298 19595 +#define F_0_587 38470 +#define F_0_113 7471 + + +/* Include inline routines for colorspace extensions. */ + +#include "jcgryext-neon.c" +#undef RGB_RED +#undef RGB_GREEN +#undef RGB_BLUE +#undef RGB_PIXELSIZE + +#define RGB_RED EXT_RGB_RED +#define RGB_GREEN EXT_RGB_GREEN +#define RGB_BLUE EXT_RGB_BLUE +#define RGB_PIXELSIZE EXT_RGB_PIXELSIZE +#define jsimd_rgb_gray_convert_neon jsimd_extrgb_gray_convert_neon +#include "jcgryext-neon.c" +#undef RGB_RED +#undef RGB_GREEN +#undef RGB_BLUE +#undef RGB_PIXELSIZE +#undef jsimd_rgb_gray_convert_neon + +#define RGB_RED EXT_RGBX_RED +#define RGB_GREEN EXT_RGBX_GREEN +#define RGB_BLUE EXT_RGBX_BLUE +#define RGB_PIXELSIZE EXT_RGBX_PIXELSIZE +#define jsimd_rgb_gray_convert_neon jsimd_extrgbx_gray_convert_neon +#include "jcgryext-neon.c" +#undef RGB_RED +#undef RGB_GREEN +#undef RGB_BLUE +#undef RGB_PIXELSIZE +#undef jsimd_rgb_gray_convert_neon + +#define RGB_RED EXT_BGR_RED +#define RGB_GREEN EXT_BGR_GREEN +#define RGB_BLUE EXT_BGR_BLUE +#define RGB_PIXELSIZE EXT_BGR_PIXELSIZE +#define jsimd_rgb_gray_convert_neon jsimd_extbgr_gray_convert_neon +#include "jcgryext-neon.c" +#undef RGB_RED +#undef RGB_GREEN +#undef RGB_BLUE +#undef RGB_PIXELSIZE +#undef jsimd_rgb_gray_convert_neon + +#define RGB_RED EXT_BGRX_RED +#define RGB_GREEN EXT_BGRX_GREEN +#define RGB_BLUE EXT_BGRX_BLUE +#define RGB_PIXELSIZE EXT_BGRX_PIXELSIZE +#define jsimd_rgb_gray_convert_neon jsimd_extbgrx_gray_convert_neon +#include "jcgryext-neon.c" +#undef RGB_RED +#undef RGB_GREEN +#undef RGB_BLUE +#undef RGB_PIXELSIZE +#undef jsimd_rgb_gray_convert_neon + +#define RGB_RED EXT_XBGR_RED +#define RGB_GREEN EXT_XBGR_GREEN +#define RGB_BLUE EXT_XBGR_BLUE +#define RGB_PIXELSIZE EXT_XBGR_PIXELSIZE +#define jsimd_rgb_gray_convert_neon jsimd_extxbgr_gray_convert_neon +#include "jcgryext-neon.c" +#undef RGB_RED +#undef RGB_GREEN +#undef RGB_BLUE +#undef RGB_PIXELSIZE +#undef jsimd_rgb_gray_convert_neon + +#define RGB_RED EXT_XRGB_RED +#define RGB_GREEN EXT_XRGB_GREEN +#define RGB_BLUE EXT_XRGB_BLUE +#define RGB_PIXELSIZE EXT_XRGB_PIXELSIZE +#define jsimd_rgb_gray_convert_neon jsimd_extxrgb_gray_convert_neon +#include "jcgryext-neon.c" +#undef RGB_RED +#undef RGB_GREEN +#undef RGB_BLUE +#undef RGB_PIXELSIZE +#undef jsimd_rgb_gray_convert_neon diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/jcgryext-neon.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/jcgryext-neon.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/jcgryext-neon.c 1970-01-01 01:00:00.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/jcgryext-neon.c 2021-11-20 03:41:33.399600434 +0000 @@ -0,0 +1,106 @@ +/* + * jcgryext-neon.c - grayscale colorspace conversion (Arm Neon) + * + * Copyright (C) 2020, Arm Limited. All Rights Reserved. + * + * This software is provided 'as-is', without any express or implied + * warranty. In no event will the authors be held liable for any damages + * arising from the use of this software. + * + * Permission is granted to anyone to use this software for any purpose, + * including commercial applications, and to alter it and redistribute it + * freely, subject to the following restrictions: + * + * 1. The origin of this software must not be misrepresented; you must not + * claim that you wrote the original software. If you use this software + * in a product, an acknowledgment in the product documentation would be + * appreciated but is not required. + * 2. Altered source versions must be plainly marked as such, and must not be + * misrepresented as being the original software. + * 3. This notice may not be removed or altered from any source distribution. + */ + +/* This file is included by jcgray-neon.c */ + + +/* RGB -> Grayscale conversion is defined by the following equation: + * Y = 0.29900 * R + 0.58700 * G + 0.11400 * B + * + * Avoid floating point arithmetic by using shifted integer constants: + * 0.29899597 = 19595 * 2^-16 + * 0.58700561 = 38470 * 2^-16 + * 0.11399841 = 7471 * 2^-16 + * These constants are defined in jcgray-neon.c + * + * This is the same computation as the RGB -> Y portion of RGB -> YCbCr. + */ + +void jsimd_rgb_gray_convert_neon(JDIMENSION image_width, JSAMPARRAY input_buf, + JSAMPIMAGE output_buf, JDIMENSION output_row, + int num_rows) +{ + JSAMPROW inptr; + JSAMPROW outptr; + /* Allocate temporary buffer for final (image_width % 16) pixels in row. */ + ALIGN(16) uint8_t tmp_buf[16 * RGB_PIXELSIZE]; + + while (--num_rows >= 0) { + inptr = *input_buf++; + outptr = output_buf[0][output_row]; + output_row++; + + int cols_remaining = image_width; + for (; cols_remaining > 0; cols_remaining -= 16) { + + /* To prevent buffer overread by the vector load instructions, the last + * (image_width % 16) columns of data are first memcopied to a temporary + * buffer large enough to accommodate the vector load. + */ + if (cols_remaining < 16) { + memcpy(tmp_buf, inptr, cols_remaining * RGB_PIXELSIZE); + inptr = tmp_buf; + } + +#if RGB_PIXELSIZE == 4 + uint8x16x4_t input_pixels = vld4q_u8(inptr); +#else + uint8x16x3_t input_pixels = vld3q_u8(inptr); +#endif + uint16x8_t r_l = vmovl_u8(vget_low_u8(input_pixels.val[RGB_RED])); + uint16x8_t r_h = vmovl_u8(vget_high_u8(input_pixels.val[RGB_RED])); + uint16x8_t g_l = vmovl_u8(vget_low_u8(input_pixels.val[RGB_GREEN])); + uint16x8_t g_h = vmovl_u8(vget_high_u8(input_pixels.val[RGB_GREEN])); + uint16x8_t b_l = vmovl_u8(vget_low_u8(input_pixels.val[RGB_BLUE])); + uint16x8_t b_h = vmovl_u8(vget_high_u8(input_pixels.val[RGB_BLUE])); + + /* Compute Y = 0.29900 * R + 0.58700 * G + 0.11400 * B */ + uint32x4_t y_ll = vmull_n_u16(vget_low_u16(r_l), F_0_298); + uint32x4_t y_lh = vmull_n_u16(vget_high_u16(r_l), F_0_298); + uint32x4_t y_hl = vmull_n_u16(vget_low_u16(r_h), F_0_298); + uint32x4_t y_hh = vmull_n_u16(vget_high_u16(r_h), F_0_298); + y_ll = vmlal_n_u16(y_ll, vget_low_u16(g_l), F_0_587); + y_lh = vmlal_n_u16(y_lh, vget_high_u16(g_l), F_0_587); + y_hl = vmlal_n_u16(y_hl, vget_low_u16(g_h), F_0_587); + y_hh = vmlal_n_u16(y_hh, vget_high_u16(g_h), F_0_587); + y_ll = vmlal_n_u16(y_ll, vget_low_u16(b_l), F_0_113); + y_lh = vmlal_n_u16(y_lh, vget_high_u16(b_l), F_0_113); + y_hl = vmlal_n_u16(y_hl, vget_low_u16(b_h), F_0_113); + y_hh = vmlal_n_u16(y_hh, vget_high_u16(b_h), F_0_113); + + /* Descale Y values (rounding right shift) and narrow to 16-bit. */ + uint16x8_t y_l = vcombine_u16(vrshrn_n_u32(y_ll, 16), + vrshrn_n_u32(y_lh, 16)); + uint16x8_t y_h = vcombine_u16(vrshrn_n_u32(y_hl, 16), + vrshrn_n_u32(y_hh, 16)); + + /* Narrow Y values to 8-bit and store to memory. Buffer overwrite is + * permitted up to the next multiple of ALIGN_SIZE bytes. + */ + vst1q_u8(outptr, vcombine_u8(vmovn_u16(y_l), vmovn_u16(y_h))); + + /* Increment pointers. */ + inptr += (16 * RGB_PIXELSIZE); + outptr += 16; + } + } +} diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/jchuff.h b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/jchuff.h --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/jchuff.h 1970-01-01 01:00:00.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/jchuff.h 2021-11-20 03:41:33.399600434 +0000 @@ -0,0 +1,131 @@ +/* + * jchuff.h + * + * This file was part of the Independent JPEG Group's software: + * Copyright (C) 1991-1997, Thomas G. Lane. + * libjpeg-turbo Modifications: + * Copyright (C) 2009, 2018, 2021, D. R. Commander. + * Copyright (C) 2018, Matthias Räncker. + * Copyright (C) 2020-2021, Arm Limited. + * For conditions of distribution and use, see the accompanying README.ijg + * file. + */ + +/* Expanded entropy encoder object for Huffman encoding. + * + * The savable_state subrecord contains fields that change within an MCU, + * but must not be updated permanently until we complete the MCU. + */ + +#if defined(__aarch64__) || defined(_M_ARM64) +#define BIT_BUF_SIZE 64 +#else +#define BIT_BUF_SIZE 32 +#endif + +typedef struct { + size_t put_buffer; /* current bit accumulation buffer */ + int free_bits; /* # of bits available in it */ + int last_dc_val[MAX_COMPS_IN_SCAN]; /* last DC coef for each component */ +} savable_state; + +typedef struct { + JOCTET *next_output_byte; /* => next byte to write in buffer */ + size_t free_in_buffer; /* # of byte spaces remaining in buffer */ + savable_state cur; /* Current bit buffer & DC state */ + j_compress_ptr cinfo; /* dump_buffer needs access to this */ + int simd; +} working_state; + +/* Outputting bits to the file */ + +/* Output byte b and, speculatively, an additional 0 byte. 0xFF must be encoded + * as 0xFF 0x00, so the output buffer pointer is advanced by 2 if the byte is + * 0xFF. Otherwise, the output buffer pointer is advanced by 1, and the + * speculative 0 byte will be overwritten by the next byte. + */ +#define EMIT_BYTE(b) { \ + buffer[0] = (JOCTET)(b); \ + buffer[1] = 0; \ + buffer -= -2 + ((JOCTET)(b) < 0xFF); \ +} + +/* Output the entire bit buffer. If there are no 0xFF bytes in it, then write + * directly to the output buffer. Otherwise, use the EMIT_BYTE() macro to + * encode 0xFF as 0xFF 0x00. + */ +#if defined(__aarch64__) || defined(_M_ARM64) + +#define FLUSH() { \ + if (put_buffer & 0x8080808080808080 & ~(put_buffer + 0x0101010101010101)) { \ + EMIT_BYTE(put_buffer >> 56) \ + EMIT_BYTE(put_buffer >> 48) \ + EMIT_BYTE(put_buffer >> 40) \ + EMIT_BYTE(put_buffer >> 32) \ + EMIT_BYTE(put_buffer >> 24) \ + EMIT_BYTE(put_buffer >> 16) \ + EMIT_BYTE(put_buffer >> 8) \ + EMIT_BYTE(put_buffer ) \ + } else { \ + *((uint64_t *)buffer) = BUILTIN_BSWAP64(put_buffer); \ + buffer += 8; \ + } \ +} + +#else + +#if defined(_MSC_VER) && !defined(__clang__) +#define SPLAT() { \ + buffer[0] = (JOCTET)(put_buffer >> 24); \ + buffer[1] = (JOCTET)(put_buffer >> 16); \ + buffer[2] = (JOCTET)(put_buffer >> 8); \ + buffer[3] = (JOCTET)(put_buffer ); \ + buffer += 4; \ +} +#else +#define SPLAT() { \ + put_buffer = __builtin_bswap32(put_buffer); \ + __asm__("str %1, [%0], #4" : "+r" (buffer) : "r" (put_buffer)); \ +} +#endif + +#define FLUSH() { \ + if (put_buffer & 0x80808080 & ~(put_buffer + 0x01010101)) { \ + EMIT_BYTE(put_buffer >> 24) \ + EMIT_BYTE(put_buffer >> 16) \ + EMIT_BYTE(put_buffer >> 8) \ + EMIT_BYTE(put_buffer ) \ + } else { \ + SPLAT(); \ + } \ +} + +#endif + +/* Fill the bit buffer to capacity with the leading bits from code, then output + * the bit buffer and put the remaining bits from code into the bit buffer. + */ +#define PUT_AND_FLUSH(code, size) { \ + put_buffer = (put_buffer << (size + free_bits)) | (code >> -free_bits); \ + FLUSH() \ + free_bits += BIT_BUF_SIZE; \ + put_buffer = code; \ +} + +/* Insert code into the bit buffer and output the bit buffer if needed. + * NOTE: We can't flush with free_bits == 0, since the left shift in + * PUT_AND_FLUSH() would have undefined behavior. + */ +#define PUT_BITS(code, size) { \ + free_bits -= size; \ + if (free_bits < 0) \ + PUT_AND_FLUSH(code, size) \ + else \ + put_buffer = (put_buffer << size) | code; \ +} + +#define PUT_CODE(code, size, diff) { \ + diff |= code << nbits; \ + nbits += size; \ + PUT_BITS(diff, nbits) \ +} diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/jcphuff-neon.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/jcphuff-neon.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/jcphuff-neon.c 1970-01-01 01:00:00.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/jcphuff-neon.c 2021-11-20 03:41:33.399600434 +0000 @@ -0,0 +1,622 @@ +/* + * jcphuff-neon.c - prepare data for progressive Huffman encoding (Arm Neon) + * + * Copyright (C) 2020-2021, Arm Limited. All Rights Reserved. + * + * This software is provided 'as-is', without any express or implied + * warranty. In no event will the authors be held liable for any damages + * arising from the use of this software. + * + * Permission is granted to anyone to use this software for any purpose, + * including commercial applications, and to alter it and redistribute it + * freely, subject to the following restrictions: + * + * 1. The origin of this software must not be misrepresented; you must not + * claim that you wrote the original software. If you use this software + * in a product, an acknowledgment in the product documentation would be + * appreciated but is not required. + * 2. Altered source versions must be plainly marked as such, and must not be + * misrepresented as being the original software. + * 3. This notice may not be removed or altered from any source distribution. + */ + +#define JPEG_INTERNALS +#include "jconfigint.h" +#include "../../jinclude.h" +#include "../../jpeglib.h" +#include "../../jsimd.h" +#include "../../jdct.h" +#include "../../jsimddct.h" +#include "../jsimd.h" +#include "neon-compat.h" + +#include + + +/* Data preparation for encode_mcu_AC_first(). + * + * The equivalent scalar C function (encode_mcu_AC_first_prepare()) can be + * found in jcphuff.c. + */ + +void jsimd_encode_mcu_AC_first_prepare_neon + (const JCOEF *block, const int *jpeg_natural_order_start, int Sl, int Al, + JCOEF *values, size_t *zerobits) +{ + JCOEF *values_ptr = values; + JCOEF *diff_values_ptr = values + DCTSIZE2; + + /* Rows of coefficients to zero (since they haven't been processed) */ + int i, rows_to_zero = 8; + + for (i = 0; i < Sl / 16; i++) { + int16x8_t coefs1 = vld1q_dup_s16(block + jpeg_natural_order_start[0]); + coefs1 = vld1q_lane_s16(block + jpeg_natural_order_start[1], coefs1, 1); + coefs1 = vld1q_lane_s16(block + jpeg_natural_order_start[2], coefs1, 2); + coefs1 = vld1q_lane_s16(block + jpeg_natural_order_start[3], coefs1, 3); + coefs1 = vld1q_lane_s16(block + jpeg_natural_order_start[4], coefs1, 4); + coefs1 = vld1q_lane_s16(block + jpeg_natural_order_start[5], coefs1, 5); + coefs1 = vld1q_lane_s16(block + jpeg_natural_order_start[6], coefs1, 6); + coefs1 = vld1q_lane_s16(block + jpeg_natural_order_start[7], coefs1, 7); + int16x8_t coefs2 = vld1q_dup_s16(block + jpeg_natural_order_start[8]); + coefs2 = vld1q_lane_s16(block + jpeg_natural_order_start[9], coefs2, 1); + coefs2 = vld1q_lane_s16(block + jpeg_natural_order_start[10], coefs2, 2); + coefs2 = vld1q_lane_s16(block + jpeg_natural_order_start[11], coefs2, 3); + coefs2 = vld1q_lane_s16(block + jpeg_natural_order_start[12], coefs2, 4); + coefs2 = vld1q_lane_s16(block + jpeg_natural_order_start[13], coefs2, 5); + coefs2 = vld1q_lane_s16(block + jpeg_natural_order_start[14], coefs2, 6); + coefs2 = vld1q_lane_s16(block + jpeg_natural_order_start[15], coefs2, 7); + + /* Isolate sign of coefficients. */ + int16x8_t sign_coefs1 = vshrq_n_s16(coefs1, 15); + int16x8_t sign_coefs2 = vshrq_n_s16(coefs2, 15); + /* Compute absolute value of coefficients and apply point transform Al. */ + int16x8_t abs_coefs1 = vabsq_s16(coefs1); + int16x8_t abs_coefs2 = vabsq_s16(coefs2); + coefs1 = vshlq_s16(abs_coefs1, vdupq_n_s16(-Al)); + coefs2 = vshlq_s16(abs_coefs2, vdupq_n_s16(-Al)); + + /* Compute diff values. */ + int16x8_t diff1 = veorq_s16(coefs1, sign_coefs1); + int16x8_t diff2 = veorq_s16(coefs2, sign_coefs2); + + /* Store transformed coefficients and diff values. */ + vst1q_s16(values_ptr, coefs1); + vst1q_s16(values_ptr + DCTSIZE, coefs2); + vst1q_s16(diff_values_ptr, diff1); + vst1q_s16(diff_values_ptr + DCTSIZE, diff2); + values_ptr += 16; + diff_values_ptr += 16; + jpeg_natural_order_start += 16; + rows_to_zero -= 2; + } + + /* Same operation but for remaining partial vector */ + int remaining_coefs = Sl % 16; + if (remaining_coefs > 8) { + int16x8_t coefs1 = vld1q_dup_s16(block + jpeg_natural_order_start[0]); + coefs1 = vld1q_lane_s16(block + jpeg_natural_order_start[1], coefs1, 1); + coefs1 = vld1q_lane_s16(block + jpeg_natural_order_start[2], coefs1, 2); + coefs1 = vld1q_lane_s16(block + jpeg_natural_order_start[3], coefs1, 3); + coefs1 = vld1q_lane_s16(block + jpeg_natural_order_start[4], coefs1, 4); + coefs1 = vld1q_lane_s16(block + jpeg_natural_order_start[5], coefs1, 5); + coefs1 = vld1q_lane_s16(block + jpeg_natural_order_start[6], coefs1, 6); + coefs1 = vld1q_lane_s16(block + jpeg_natural_order_start[7], coefs1, 7); + int16x8_t coefs2 = vdupq_n_s16(0); + switch (remaining_coefs) { + case 15: + coefs2 = vld1q_lane_s16(block + jpeg_natural_order_start[14], coefs2, 6); + FALLTHROUGH /*FALLTHROUGH*/ + case 14: + coefs2 = vld1q_lane_s16(block + jpeg_natural_order_start[13], coefs2, 5); + FALLTHROUGH /*FALLTHROUGH*/ + case 13: + coefs2 = vld1q_lane_s16(block + jpeg_natural_order_start[12], coefs2, 4); + FALLTHROUGH /*FALLTHROUGH*/ + case 12: + coefs2 = vld1q_lane_s16(block + jpeg_natural_order_start[11], coefs2, 3); + FALLTHROUGH /*FALLTHROUGH*/ + case 11: + coefs2 = vld1q_lane_s16(block + jpeg_natural_order_start[10], coefs2, 2); + FALLTHROUGH /*FALLTHROUGH*/ + case 10: + coefs2 = vld1q_lane_s16(block + jpeg_natural_order_start[9], coefs2, 1); + FALLTHROUGH /*FALLTHROUGH*/ + case 9: + coefs2 = vld1q_lane_s16(block + jpeg_natural_order_start[8], coefs2, 0); + FALLTHROUGH /*FALLTHROUGH*/ + default: + break; + } + + /* Isolate sign of coefficients. */ + int16x8_t sign_coefs1 = vshrq_n_s16(coefs1, 15); + int16x8_t sign_coefs2 = vshrq_n_s16(coefs2, 15); + /* Compute absolute value of coefficients and apply point transform Al. */ + int16x8_t abs_coefs1 = vabsq_s16(coefs1); + int16x8_t abs_coefs2 = vabsq_s16(coefs2); + coefs1 = vshlq_s16(abs_coefs1, vdupq_n_s16(-Al)); + coefs2 = vshlq_s16(abs_coefs2, vdupq_n_s16(-Al)); + + /* Compute diff values. */ + int16x8_t diff1 = veorq_s16(coefs1, sign_coefs1); + int16x8_t diff2 = veorq_s16(coefs2, sign_coefs2); + + /* Store transformed coefficients and diff values. */ + vst1q_s16(values_ptr, coefs1); + vst1q_s16(values_ptr + DCTSIZE, coefs2); + vst1q_s16(diff_values_ptr, diff1); + vst1q_s16(diff_values_ptr + DCTSIZE, diff2); + values_ptr += 16; + diff_values_ptr += 16; + rows_to_zero -= 2; + + } else if (remaining_coefs > 0) { + int16x8_t coefs = vdupq_n_s16(0); + + switch (remaining_coefs) { + case 8: + coefs = vld1q_lane_s16(block + jpeg_natural_order_start[7], coefs, 7); + FALLTHROUGH /*FALLTHROUGH*/ + case 7: + coefs = vld1q_lane_s16(block + jpeg_natural_order_start[6], coefs, 6); + FALLTHROUGH /*FALLTHROUGH*/ + case 6: + coefs = vld1q_lane_s16(block + jpeg_natural_order_start[5], coefs, 5); + FALLTHROUGH /*FALLTHROUGH*/ + case 5: + coefs = vld1q_lane_s16(block + jpeg_natural_order_start[4], coefs, 4); + FALLTHROUGH /*FALLTHROUGH*/ + case 4: + coefs = vld1q_lane_s16(block + jpeg_natural_order_start[3], coefs, 3); + FALLTHROUGH /*FALLTHROUGH*/ + case 3: + coefs = vld1q_lane_s16(block + jpeg_natural_order_start[2], coefs, 2); + FALLTHROUGH /*FALLTHROUGH*/ + case 2: + coefs = vld1q_lane_s16(block + jpeg_natural_order_start[1], coefs, 1); + FALLTHROUGH /*FALLTHROUGH*/ + case 1: + coefs = vld1q_lane_s16(block + jpeg_natural_order_start[0], coefs, 0); + FALLTHROUGH /*FALLTHROUGH*/ + default: + break; + } + + /* Isolate sign of coefficients. */ + int16x8_t sign_coefs = vshrq_n_s16(coefs, 15); + /* Compute absolute value of coefficients and apply point transform Al. */ + int16x8_t abs_coefs = vabsq_s16(coefs); + coefs = vshlq_s16(abs_coefs, vdupq_n_s16(-Al)); + + /* Compute diff values. */ + int16x8_t diff = veorq_s16(coefs, sign_coefs); + + /* Store transformed coefficients and diff values. */ + vst1q_s16(values_ptr, coefs); + vst1q_s16(diff_values_ptr, diff); + values_ptr += 8; + diff_values_ptr += 8; + rows_to_zero--; + } + + /* Zero remaining memory in the values and diff_values blocks. */ + for (i = 0; i < rows_to_zero; i++) { + vst1q_s16(values_ptr, vdupq_n_s16(0)); + vst1q_s16(diff_values_ptr, vdupq_n_s16(0)); + values_ptr += 8; + diff_values_ptr += 8; + } + + /* Construct zerobits bitmap. A set bit means that the corresponding + * coefficient != 0. + */ + int16x8_t row0 = vld1q_s16(values + 0 * DCTSIZE); + int16x8_t row1 = vld1q_s16(values + 1 * DCTSIZE); + int16x8_t row2 = vld1q_s16(values + 2 * DCTSIZE); + int16x8_t row3 = vld1q_s16(values + 3 * DCTSIZE); + int16x8_t row4 = vld1q_s16(values + 4 * DCTSIZE); + int16x8_t row5 = vld1q_s16(values + 5 * DCTSIZE); + int16x8_t row6 = vld1q_s16(values + 6 * DCTSIZE); + int16x8_t row7 = vld1q_s16(values + 7 * DCTSIZE); + + uint8x8_t row0_eq0 = vmovn_u16(vceqq_s16(row0, vdupq_n_s16(0))); + uint8x8_t row1_eq0 = vmovn_u16(vceqq_s16(row1, vdupq_n_s16(0))); + uint8x8_t row2_eq0 = vmovn_u16(vceqq_s16(row2, vdupq_n_s16(0))); + uint8x8_t row3_eq0 = vmovn_u16(vceqq_s16(row3, vdupq_n_s16(0))); + uint8x8_t row4_eq0 = vmovn_u16(vceqq_s16(row4, vdupq_n_s16(0))); + uint8x8_t row5_eq0 = vmovn_u16(vceqq_s16(row5, vdupq_n_s16(0))); + uint8x8_t row6_eq0 = vmovn_u16(vceqq_s16(row6, vdupq_n_s16(0))); + uint8x8_t row7_eq0 = vmovn_u16(vceqq_s16(row7, vdupq_n_s16(0))); + + /* { 0x01, 0x02, 0x04, 0x08, 0x10, 0x20, 0x40, 0x80 } */ + const uint8x8_t bitmap_mask = + vreinterpret_u8_u64(vmov_n_u64(0x8040201008040201)); + + row0_eq0 = vand_u8(row0_eq0, bitmap_mask); + row1_eq0 = vand_u8(row1_eq0, bitmap_mask); + row2_eq0 = vand_u8(row2_eq0, bitmap_mask); + row3_eq0 = vand_u8(row3_eq0, bitmap_mask); + row4_eq0 = vand_u8(row4_eq0, bitmap_mask); + row5_eq0 = vand_u8(row5_eq0, bitmap_mask); + row6_eq0 = vand_u8(row6_eq0, bitmap_mask); + row7_eq0 = vand_u8(row7_eq0, bitmap_mask); + + uint8x8_t bitmap_rows_01 = vpadd_u8(row0_eq0, row1_eq0); + uint8x8_t bitmap_rows_23 = vpadd_u8(row2_eq0, row3_eq0); + uint8x8_t bitmap_rows_45 = vpadd_u8(row4_eq0, row5_eq0); + uint8x8_t bitmap_rows_67 = vpadd_u8(row6_eq0, row7_eq0); + uint8x8_t bitmap_rows_0123 = vpadd_u8(bitmap_rows_01, bitmap_rows_23); + uint8x8_t bitmap_rows_4567 = vpadd_u8(bitmap_rows_45, bitmap_rows_67); + uint8x8_t bitmap_all = vpadd_u8(bitmap_rows_0123, bitmap_rows_4567); + +#if defined(__aarch64__) || defined(_M_ARM64) + /* Move bitmap to a 64-bit scalar register. */ + uint64_t bitmap = vget_lane_u64(vreinterpret_u64_u8(bitmap_all), 0); + /* Store zerobits bitmap. */ + *zerobits = ~bitmap; +#else + /* Move bitmap to two 32-bit scalar registers. */ + uint32_t bitmap0 = vget_lane_u32(vreinterpret_u32_u8(bitmap_all), 0); + uint32_t bitmap1 = vget_lane_u32(vreinterpret_u32_u8(bitmap_all), 1); + /* Store zerobits bitmap. */ + zerobits[0] = ~bitmap0; + zerobits[1] = ~bitmap1; +#endif +} + + +/* Data preparation for encode_mcu_AC_refine(). + * + * The equivalent scalar C function (encode_mcu_AC_refine_prepare()) can be + * found in jcphuff.c. + */ + +int jsimd_encode_mcu_AC_refine_prepare_neon + (const JCOEF *block, const int *jpeg_natural_order_start, int Sl, int Al, + JCOEF *absvalues, size_t *bits) +{ + /* Temporary storage buffers for data used to compute the signbits bitmap and + * the end-of-block (EOB) position + */ + uint8_t coef_sign_bits[64]; + uint8_t coef_eq1_bits[64]; + + JCOEF *absvalues_ptr = absvalues; + uint8_t *coef_sign_bits_ptr = coef_sign_bits; + uint8_t *eq1_bits_ptr = coef_eq1_bits; + + /* Rows of coefficients to zero (since they haven't been processed) */ + int i, rows_to_zero = 8; + + for (i = 0; i < Sl / 16; i++) { + int16x8_t coefs1 = vld1q_dup_s16(block + jpeg_natural_order_start[0]); + coefs1 = vld1q_lane_s16(block + jpeg_natural_order_start[1], coefs1, 1); + coefs1 = vld1q_lane_s16(block + jpeg_natural_order_start[2], coefs1, 2); + coefs1 = vld1q_lane_s16(block + jpeg_natural_order_start[3], coefs1, 3); + coefs1 = vld1q_lane_s16(block + jpeg_natural_order_start[4], coefs1, 4); + coefs1 = vld1q_lane_s16(block + jpeg_natural_order_start[5], coefs1, 5); + coefs1 = vld1q_lane_s16(block + jpeg_natural_order_start[6], coefs1, 6); + coefs1 = vld1q_lane_s16(block + jpeg_natural_order_start[7], coefs1, 7); + int16x8_t coefs2 = vld1q_dup_s16(block + jpeg_natural_order_start[8]); + coefs2 = vld1q_lane_s16(block + jpeg_natural_order_start[9], coefs2, 1); + coefs2 = vld1q_lane_s16(block + jpeg_natural_order_start[10], coefs2, 2); + coefs2 = vld1q_lane_s16(block + jpeg_natural_order_start[11], coefs2, 3); + coefs2 = vld1q_lane_s16(block + jpeg_natural_order_start[12], coefs2, 4); + coefs2 = vld1q_lane_s16(block + jpeg_natural_order_start[13], coefs2, 5); + coefs2 = vld1q_lane_s16(block + jpeg_natural_order_start[14], coefs2, 6); + coefs2 = vld1q_lane_s16(block + jpeg_natural_order_start[15], coefs2, 7); + + /* Compute and store data for signbits bitmap. */ + uint8x8_t sign_coefs1 = + vmovn_u16(vreinterpretq_u16_s16(vshrq_n_s16(coefs1, 15))); + uint8x8_t sign_coefs2 = + vmovn_u16(vreinterpretq_u16_s16(vshrq_n_s16(coefs2, 15))); + vst1_u8(coef_sign_bits_ptr, sign_coefs1); + vst1_u8(coef_sign_bits_ptr + DCTSIZE, sign_coefs2); + + /* Compute absolute value of coefficients and apply point transform Al. */ + int16x8_t abs_coefs1 = vabsq_s16(coefs1); + int16x8_t abs_coefs2 = vabsq_s16(coefs2); + coefs1 = vshlq_s16(abs_coefs1, vdupq_n_s16(-Al)); + coefs2 = vshlq_s16(abs_coefs2, vdupq_n_s16(-Al)); + vst1q_s16(absvalues_ptr, coefs1); + vst1q_s16(absvalues_ptr + DCTSIZE, coefs2); + + /* Test whether transformed coefficient values == 1 (used to find EOB + * position.) + */ + uint8x8_t coefs_eq11 = vmovn_u16(vceqq_s16(coefs1, vdupq_n_s16(1))); + uint8x8_t coefs_eq12 = vmovn_u16(vceqq_s16(coefs2, vdupq_n_s16(1))); + vst1_u8(eq1_bits_ptr, coefs_eq11); + vst1_u8(eq1_bits_ptr + DCTSIZE, coefs_eq12); + + absvalues_ptr += 16; + coef_sign_bits_ptr += 16; + eq1_bits_ptr += 16; + jpeg_natural_order_start += 16; + rows_to_zero -= 2; + } + + /* Same operation but for remaining partial vector */ + int remaining_coefs = Sl % 16; + if (remaining_coefs > 8) { + int16x8_t coefs1 = vld1q_dup_s16(block + jpeg_natural_order_start[0]); + coefs1 = vld1q_lane_s16(block + jpeg_natural_order_start[1], coefs1, 1); + coefs1 = vld1q_lane_s16(block + jpeg_natural_order_start[2], coefs1, 2); + coefs1 = vld1q_lane_s16(block + jpeg_natural_order_start[3], coefs1, 3); + coefs1 = vld1q_lane_s16(block + jpeg_natural_order_start[4], coefs1, 4); + coefs1 = vld1q_lane_s16(block + jpeg_natural_order_start[5], coefs1, 5); + coefs1 = vld1q_lane_s16(block + jpeg_natural_order_start[6], coefs1, 6); + coefs1 = vld1q_lane_s16(block + jpeg_natural_order_start[7], coefs1, 7); + int16x8_t coefs2 = vdupq_n_s16(0); + switch (remaining_coefs) { + case 15: + coefs2 = vld1q_lane_s16(block + jpeg_natural_order_start[14], coefs2, 6); + FALLTHROUGH /*FALLTHROUGH*/ + case 14: + coefs2 = vld1q_lane_s16(block + jpeg_natural_order_start[13], coefs2, 5); + FALLTHROUGH /*FALLTHROUGH*/ + case 13: + coefs2 = vld1q_lane_s16(block + jpeg_natural_order_start[12], coefs2, 4); + FALLTHROUGH /*FALLTHROUGH*/ + case 12: + coefs2 = vld1q_lane_s16(block + jpeg_natural_order_start[11], coefs2, 3); + FALLTHROUGH /*FALLTHROUGH*/ + case 11: + coefs2 = vld1q_lane_s16(block + jpeg_natural_order_start[10], coefs2, 2); + FALLTHROUGH /*FALLTHROUGH*/ + case 10: + coefs2 = vld1q_lane_s16(block + jpeg_natural_order_start[9], coefs2, 1); + FALLTHROUGH /*FALLTHROUGH*/ + case 9: + coefs2 = vld1q_lane_s16(block + jpeg_natural_order_start[8], coefs2, 0); + FALLTHROUGH /*FALLTHROUGH*/ + default: + break; + } + + /* Compute and store data for signbits bitmap. */ + uint8x8_t sign_coefs1 = + vmovn_u16(vreinterpretq_u16_s16(vshrq_n_s16(coefs1, 15))); + uint8x8_t sign_coefs2 = + vmovn_u16(vreinterpretq_u16_s16(vshrq_n_s16(coefs2, 15))); + vst1_u8(coef_sign_bits_ptr, sign_coefs1); + vst1_u8(coef_sign_bits_ptr + DCTSIZE, sign_coefs2); + + /* Compute absolute value of coefficients and apply point transform Al. */ + int16x8_t abs_coefs1 = vabsq_s16(coefs1); + int16x8_t abs_coefs2 = vabsq_s16(coefs2); + coefs1 = vshlq_s16(abs_coefs1, vdupq_n_s16(-Al)); + coefs2 = vshlq_s16(abs_coefs2, vdupq_n_s16(-Al)); + vst1q_s16(absvalues_ptr, coefs1); + vst1q_s16(absvalues_ptr + DCTSIZE, coefs2); + + /* Test whether transformed coefficient values == 1 (used to find EOB + * position.) + */ + uint8x8_t coefs_eq11 = vmovn_u16(vceqq_s16(coefs1, vdupq_n_s16(1))); + uint8x8_t coefs_eq12 = vmovn_u16(vceqq_s16(coefs2, vdupq_n_s16(1))); + vst1_u8(eq1_bits_ptr, coefs_eq11); + vst1_u8(eq1_bits_ptr + DCTSIZE, coefs_eq12); + + absvalues_ptr += 16; + coef_sign_bits_ptr += 16; + eq1_bits_ptr += 16; + jpeg_natural_order_start += 16; + rows_to_zero -= 2; + + } else if (remaining_coefs > 0) { + int16x8_t coefs = vdupq_n_s16(0); + + switch (remaining_coefs) { + case 8: + coefs = vld1q_lane_s16(block + jpeg_natural_order_start[7], coefs, 7); + FALLTHROUGH /*FALLTHROUGH*/ + case 7: + coefs = vld1q_lane_s16(block + jpeg_natural_order_start[6], coefs, 6); + FALLTHROUGH /*FALLTHROUGH*/ + case 6: + coefs = vld1q_lane_s16(block + jpeg_natural_order_start[5], coefs, 5); + FALLTHROUGH /*FALLTHROUGH*/ + case 5: + coefs = vld1q_lane_s16(block + jpeg_natural_order_start[4], coefs, 4); + FALLTHROUGH /*FALLTHROUGH*/ + case 4: + coefs = vld1q_lane_s16(block + jpeg_natural_order_start[3], coefs, 3); + FALLTHROUGH /*FALLTHROUGH*/ + case 3: + coefs = vld1q_lane_s16(block + jpeg_natural_order_start[2], coefs, 2); + FALLTHROUGH /*FALLTHROUGH*/ + case 2: + coefs = vld1q_lane_s16(block + jpeg_natural_order_start[1], coefs, 1); + FALLTHROUGH /*FALLTHROUGH*/ + case 1: + coefs = vld1q_lane_s16(block + jpeg_natural_order_start[0], coefs, 0); + FALLTHROUGH /*FALLTHROUGH*/ + default: + break; + } + + /* Compute and store data for signbits bitmap. */ + uint8x8_t sign_coefs = + vmovn_u16(vreinterpretq_u16_s16(vshrq_n_s16(coefs, 15))); + vst1_u8(coef_sign_bits_ptr, sign_coefs); + + /* Compute absolute value of coefficients and apply point transform Al. */ + int16x8_t abs_coefs = vabsq_s16(coefs); + coefs = vshlq_s16(abs_coefs, vdupq_n_s16(-Al)); + vst1q_s16(absvalues_ptr, coefs); + + /* Test whether transformed coefficient values == 1 (used to find EOB + * position.) + */ + uint8x8_t coefs_eq1 = vmovn_u16(vceqq_s16(coefs, vdupq_n_s16(1))); + vst1_u8(eq1_bits_ptr, coefs_eq1); + + absvalues_ptr += 8; + coef_sign_bits_ptr += 8; + eq1_bits_ptr += 8; + rows_to_zero--; + } + + /* Zero remaining memory in blocks. */ + for (i = 0; i < rows_to_zero; i++) { + vst1q_s16(absvalues_ptr, vdupq_n_s16(0)); + vst1_u8(coef_sign_bits_ptr, vdup_n_u8(0)); + vst1_u8(eq1_bits_ptr, vdup_n_u8(0)); + absvalues_ptr += 8; + coef_sign_bits_ptr += 8; + eq1_bits_ptr += 8; + } + + /* Construct zerobits bitmap. */ + int16x8_t abs_row0 = vld1q_s16(absvalues + 0 * DCTSIZE); + int16x8_t abs_row1 = vld1q_s16(absvalues + 1 * DCTSIZE); + int16x8_t abs_row2 = vld1q_s16(absvalues + 2 * DCTSIZE); + int16x8_t abs_row3 = vld1q_s16(absvalues + 3 * DCTSIZE); + int16x8_t abs_row4 = vld1q_s16(absvalues + 4 * DCTSIZE); + int16x8_t abs_row5 = vld1q_s16(absvalues + 5 * DCTSIZE); + int16x8_t abs_row6 = vld1q_s16(absvalues + 6 * DCTSIZE); + int16x8_t abs_row7 = vld1q_s16(absvalues + 7 * DCTSIZE); + + uint8x8_t abs_row0_eq0 = vmovn_u16(vceqq_s16(abs_row0, vdupq_n_s16(0))); + uint8x8_t abs_row1_eq0 = vmovn_u16(vceqq_s16(abs_row1, vdupq_n_s16(0))); + uint8x8_t abs_row2_eq0 = vmovn_u16(vceqq_s16(abs_row2, vdupq_n_s16(0))); + uint8x8_t abs_row3_eq0 = vmovn_u16(vceqq_s16(abs_row3, vdupq_n_s16(0))); + uint8x8_t abs_row4_eq0 = vmovn_u16(vceqq_s16(abs_row4, vdupq_n_s16(0))); + uint8x8_t abs_row5_eq0 = vmovn_u16(vceqq_s16(abs_row5, vdupq_n_s16(0))); + uint8x8_t abs_row6_eq0 = vmovn_u16(vceqq_s16(abs_row6, vdupq_n_s16(0))); + uint8x8_t abs_row7_eq0 = vmovn_u16(vceqq_s16(abs_row7, vdupq_n_s16(0))); + + /* { 0x01, 0x02, 0x04, 0x08, 0x10, 0x20, 0x40, 0x80 } */ + const uint8x8_t bitmap_mask = + vreinterpret_u8_u64(vmov_n_u64(0x8040201008040201)); + + abs_row0_eq0 = vand_u8(abs_row0_eq0, bitmap_mask); + abs_row1_eq0 = vand_u8(abs_row1_eq0, bitmap_mask); + abs_row2_eq0 = vand_u8(abs_row2_eq0, bitmap_mask); + abs_row3_eq0 = vand_u8(abs_row3_eq0, bitmap_mask); + abs_row4_eq0 = vand_u8(abs_row4_eq0, bitmap_mask); + abs_row5_eq0 = vand_u8(abs_row5_eq0, bitmap_mask); + abs_row6_eq0 = vand_u8(abs_row6_eq0, bitmap_mask); + abs_row7_eq0 = vand_u8(abs_row7_eq0, bitmap_mask); + + uint8x8_t bitmap_rows_01 = vpadd_u8(abs_row0_eq0, abs_row1_eq0); + uint8x8_t bitmap_rows_23 = vpadd_u8(abs_row2_eq0, abs_row3_eq0); + uint8x8_t bitmap_rows_45 = vpadd_u8(abs_row4_eq0, abs_row5_eq0); + uint8x8_t bitmap_rows_67 = vpadd_u8(abs_row6_eq0, abs_row7_eq0); + uint8x8_t bitmap_rows_0123 = vpadd_u8(bitmap_rows_01, bitmap_rows_23); + uint8x8_t bitmap_rows_4567 = vpadd_u8(bitmap_rows_45, bitmap_rows_67); + uint8x8_t bitmap_all = vpadd_u8(bitmap_rows_0123, bitmap_rows_4567); + +#if defined(__aarch64__) || defined(_M_ARM64) + /* Move bitmap to a 64-bit scalar register. */ + uint64_t bitmap = vget_lane_u64(vreinterpret_u64_u8(bitmap_all), 0); + /* Store zerobits bitmap. */ + bits[0] = ~bitmap; +#else + /* Move bitmap to two 32-bit scalar registers. */ + uint32_t bitmap0 = vget_lane_u32(vreinterpret_u32_u8(bitmap_all), 0); + uint32_t bitmap1 = vget_lane_u32(vreinterpret_u32_u8(bitmap_all), 1); + /* Store zerobits bitmap. */ + bits[0] = ~bitmap0; + bits[1] = ~bitmap1; +#endif + + /* Construct signbits bitmap. */ + uint8x8_t signbits_row0 = vld1_u8(coef_sign_bits + 0 * DCTSIZE); + uint8x8_t signbits_row1 = vld1_u8(coef_sign_bits + 1 * DCTSIZE); + uint8x8_t signbits_row2 = vld1_u8(coef_sign_bits + 2 * DCTSIZE); + uint8x8_t signbits_row3 = vld1_u8(coef_sign_bits + 3 * DCTSIZE); + uint8x8_t signbits_row4 = vld1_u8(coef_sign_bits + 4 * DCTSIZE); + uint8x8_t signbits_row5 = vld1_u8(coef_sign_bits + 5 * DCTSIZE); + uint8x8_t signbits_row6 = vld1_u8(coef_sign_bits + 6 * DCTSIZE); + uint8x8_t signbits_row7 = vld1_u8(coef_sign_bits + 7 * DCTSIZE); + + signbits_row0 = vand_u8(signbits_row0, bitmap_mask); + signbits_row1 = vand_u8(signbits_row1, bitmap_mask); + signbits_row2 = vand_u8(signbits_row2, bitmap_mask); + signbits_row3 = vand_u8(signbits_row3, bitmap_mask); + signbits_row4 = vand_u8(signbits_row4, bitmap_mask); + signbits_row5 = vand_u8(signbits_row5, bitmap_mask); + signbits_row6 = vand_u8(signbits_row6, bitmap_mask); + signbits_row7 = vand_u8(signbits_row7, bitmap_mask); + + bitmap_rows_01 = vpadd_u8(signbits_row0, signbits_row1); + bitmap_rows_23 = vpadd_u8(signbits_row2, signbits_row3); + bitmap_rows_45 = vpadd_u8(signbits_row4, signbits_row5); + bitmap_rows_67 = vpadd_u8(signbits_row6, signbits_row7); + bitmap_rows_0123 = vpadd_u8(bitmap_rows_01, bitmap_rows_23); + bitmap_rows_4567 = vpadd_u8(bitmap_rows_45, bitmap_rows_67); + bitmap_all = vpadd_u8(bitmap_rows_0123, bitmap_rows_4567); + +#if defined(__aarch64__) || defined(_M_ARM64) + /* Move bitmap to a 64-bit scalar register. */ + bitmap = vget_lane_u64(vreinterpret_u64_u8(bitmap_all), 0); + /* Store signbits bitmap. */ + bits[1] = ~bitmap; +#else + /* Move bitmap to two 32-bit scalar registers. */ + bitmap0 = vget_lane_u32(vreinterpret_u32_u8(bitmap_all), 0); + bitmap1 = vget_lane_u32(vreinterpret_u32_u8(bitmap_all), 1); + /* Store signbits bitmap. */ + bits[2] = ~bitmap0; + bits[3] = ~bitmap1; +#endif + + /* Construct bitmap to find EOB position (the index of the last coefficient + * equal to 1.) + */ + uint8x8_t row0_eq1 = vld1_u8(coef_eq1_bits + 0 * DCTSIZE); + uint8x8_t row1_eq1 = vld1_u8(coef_eq1_bits + 1 * DCTSIZE); + uint8x8_t row2_eq1 = vld1_u8(coef_eq1_bits + 2 * DCTSIZE); + uint8x8_t row3_eq1 = vld1_u8(coef_eq1_bits + 3 * DCTSIZE); + uint8x8_t row4_eq1 = vld1_u8(coef_eq1_bits + 4 * DCTSIZE); + uint8x8_t row5_eq1 = vld1_u8(coef_eq1_bits + 5 * DCTSIZE); + uint8x8_t row6_eq1 = vld1_u8(coef_eq1_bits + 6 * DCTSIZE); + uint8x8_t row7_eq1 = vld1_u8(coef_eq1_bits + 7 * DCTSIZE); + + row0_eq1 = vand_u8(row0_eq1, bitmap_mask); + row1_eq1 = vand_u8(row1_eq1, bitmap_mask); + row2_eq1 = vand_u8(row2_eq1, bitmap_mask); + row3_eq1 = vand_u8(row3_eq1, bitmap_mask); + row4_eq1 = vand_u8(row4_eq1, bitmap_mask); + row5_eq1 = vand_u8(row5_eq1, bitmap_mask); + row6_eq1 = vand_u8(row6_eq1, bitmap_mask); + row7_eq1 = vand_u8(row7_eq1, bitmap_mask); + + bitmap_rows_01 = vpadd_u8(row0_eq1, row1_eq1); + bitmap_rows_23 = vpadd_u8(row2_eq1, row3_eq1); + bitmap_rows_45 = vpadd_u8(row4_eq1, row5_eq1); + bitmap_rows_67 = vpadd_u8(row6_eq1, row7_eq1); + bitmap_rows_0123 = vpadd_u8(bitmap_rows_01, bitmap_rows_23); + bitmap_rows_4567 = vpadd_u8(bitmap_rows_45, bitmap_rows_67); + bitmap_all = vpadd_u8(bitmap_rows_0123, bitmap_rows_4567); + +#if defined(__aarch64__) || defined(_M_ARM64) + /* Move bitmap to a 64-bit scalar register. */ + bitmap = vget_lane_u64(vreinterpret_u64_u8(bitmap_all), 0); + + /* Return EOB position. */ + if (bitmap == 0) { + /* EOB position is defined to be 0 if all coefficients != 1. */ + return 0; + } else { + return 63 - BUILTIN_CLZLL(bitmap); + } +#else + /* Move bitmap to two 32-bit scalar registers. */ + bitmap0 = vget_lane_u32(vreinterpret_u32_u8(bitmap_all), 0); + bitmap1 = vget_lane_u32(vreinterpret_u32_u8(bitmap_all), 1); + + /* Return EOB position. */ + if (bitmap0 == 0 && bitmap1 == 0) { + return 0; + } else if (bitmap1 != 0) { + return 63 - BUILTIN_CLZ(bitmap1); + } else { + return 31 - BUILTIN_CLZ(bitmap0); + } +#endif +} diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/jcsample-neon.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/jcsample-neon.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/jcsample-neon.c 1970-01-01 01:00:00.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/jcsample-neon.c 2021-11-20 03:41:33.399600434 +0000 @@ -0,0 +1,192 @@ +/* + * jcsample-neon.c - downsampling (Arm Neon) + * + * Copyright (C) 2020, Arm Limited. All Rights Reserved. + * + * This software is provided 'as-is', without any express or implied + * warranty. In no event will the authors be held liable for any damages + * arising from the use of this software. + * + * Permission is granted to anyone to use this software for any purpose, + * including commercial applications, and to alter it and redistribute it + * freely, subject to the following restrictions: + * + * 1. The origin of this software must not be misrepresented; you must not + * claim that you wrote the original software. If you use this software + * in a product, an acknowledgment in the product documentation would be + * appreciated but is not required. + * 2. Altered source versions must be plainly marked as such, and must not be + * misrepresented as being the original software. + * 3. This notice may not be removed or altered from any source distribution. + */ + +#define JPEG_INTERNALS +#include "../../jinclude.h" +#include "../../jpeglib.h" +#include "../../jsimd.h" +#include "../../jdct.h" +#include "../../jsimddct.h" +#include "../jsimd.h" +#include "align.h" + +#include + + +ALIGN(16) static const uint8_t jsimd_h2_downsample_consts[] = { + 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, /* Pad 0 */ + 0x08, 0x09, 0x0A, 0x0B, 0x0C, 0x0D, 0x0E, 0x0F, + 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, /* Pad 1 */ + 0x08, 0x09, 0x0A, 0x0B, 0x0C, 0x0D, 0x0E, 0x0E, + 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, /* Pad 2 */ + 0x08, 0x09, 0x0A, 0x0B, 0x0C, 0x0D, 0x0D, 0x0D, + 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, /* Pad 3 */ + 0x08, 0x09, 0x0A, 0x0B, 0x0C, 0x0C, 0x0C, 0x0C, + 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, /* Pad 4 */ + 0x08, 0x09, 0x0A, 0x0B, 0x0B, 0x0B, 0x0B, 0x0B, + 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, /* Pad 5 */ + 0x08, 0x09, 0x0A, 0x0A, 0x0A, 0x0A, 0x0A, 0x0A, + 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, /* Pad 6 */ + 0x08, 0x09, 0x09, 0x09, 0x09, 0x09, 0x09, 0x09, + 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, /* Pad 7 */ + 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, + 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, /* Pad 8 */ + 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, + 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x06, /* Pad 9 */ + 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, + 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x05, 0x05, /* Pad 10 */ + 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, + 0x00, 0x01, 0x02, 0x03, 0x04, 0x04, 0x04, 0x04, /* Pad 11 */ + 0x04, 0x04, 0x04, 0x04, 0x04, 0x04, 0x04, 0x04, + 0x00, 0x01, 0x02, 0x03, 0x03, 0x03, 0x03, 0x03, /* Pad 12 */ + 0x03, 0x03, 0x03, 0x03, 0x03, 0x03, 0x03, 0x03, + 0x00, 0x01, 0x02, 0x02, 0x02, 0x02, 0x02, 0x02, /* Pad 13 */ + 0x02, 0x02, 0x02, 0x02, 0x02, 0x02, 0x02, 0x02, + 0x00, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, /* Pad 14 */ + 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* Pad 15 */ + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 +}; + + +/* Downsample pixel values of a single component. + * This version handles the common case of 2:1 horizontal and 1:1 vertical, + * without smoothing. + */ + +void jsimd_h2v1_downsample_neon(JDIMENSION image_width, int max_v_samp_factor, + JDIMENSION v_samp_factor, + JDIMENSION width_in_blocks, + JSAMPARRAY input_data, JSAMPARRAY output_data) +{ + JSAMPROW inptr, outptr; + /* Load expansion mask to pad remaining elements of last DCT block. */ + const int mask_offset = 16 * ((width_in_blocks * 2 * DCTSIZE) - image_width); + const uint8x16_t expand_mask = + vld1q_u8(&jsimd_h2_downsample_consts[mask_offset]); + /* Load bias pattern (alternating every pixel.) */ + /* { 0, 1, 0, 1, 0, 1, 0, 1 } */ + const uint16x8_t bias = vreinterpretq_u16_u32(vdupq_n_u32(0x00010000)); + unsigned i, outrow; + + for (outrow = 0; outrow < v_samp_factor; outrow++) { + outptr = output_data[outrow]; + inptr = input_data[outrow]; + + /* Downsample all but the last DCT block of pixels. */ + for (i = 0; i < width_in_blocks - 1; i++) { + uint8x16_t pixels = vld1q_u8(inptr + i * 2 * DCTSIZE); + /* Add adjacent pixel values, widen to 16-bit, and add bias. */ + uint16x8_t samples_u16 = vpadalq_u8(bias, pixels); + /* Divide total by 2 and narrow to 8-bit. */ + uint8x8_t samples_u8 = vshrn_n_u16(samples_u16, 1); + /* Store samples to memory. */ + vst1_u8(outptr + i * DCTSIZE, samples_u8); + } + + /* Load pixels in last DCT block into a table. */ + uint8x16_t pixels = vld1q_u8(inptr + (width_in_blocks - 1) * 2 * DCTSIZE); +#if defined(__aarch64__) || defined(_M_ARM64) + /* Pad the empty elements with the value of the last pixel. */ + pixels = vqtbl1q_u8(pixels, expand_mask); +#else + uint8x8x2_t table = { { vget_low_u8(pixels), vget_high_u8(pixels) } }; + pixels = vcombine_u8(vtbl2_u8(table, vget_low_u8(expand_mask)), + vtbl2_u8(table, vget_high_u8(expand_mask))); +#endif + /* Add adjacent pixel values, widen to 16-bit, and add bias. */ + uint16x8_t samples_u16 = vpadalq_u8(bias, pixels); + /* Divide total by 2, narrow to 8-bit, and store. */ + uint8x8_t samples_u8 = vshrn_n_u16(samples_u16, 1); + vst1_u8(outptr + (width_in_blocks - 1) * DCTSIZE, samples_u8); + } +} + + +/* Downsample pixel values of a single component. + * This version handles the standard case of 2:1 horizontal and 2:1 vertical, + * without smoothing. + */ + +void jsimd_h2v2_downsample_neon(JDIMENSION image_width, int max_v_samp_factor, + JDIMENSION v_samp_factor, + JDIMENSION width_in_blocks, + JSAMPARRAY input_data, JSAMPARRAY output_data) +{ + JSAMPROW inptr0, inptr1, outptr; + /* Load expansion mask to pad remaining elements of last DCT block. */ + const int mask_offset = 16 * ((width_in_blocks * 2 * DCTSIZE) - image_width); + const uint8x16_t expand_mask = + vld1q_u8(&jsimd_h2_downsample_consts[mask_offset]); + /* Load bias pattern (alternating every pixel.) */ + /* { 1, 2, 1, 2, 1, 2, 1, 2 } */ + const uint16x8_t bias = vreinterpretq_u16_u32(vdupq_n_u32(0x00020001)); + unsigned i, outrow; + + for (outrow = 0; outrow < v_samp_factor; outrow++) { + outptr = output_data[outrow]; + inptr0 = input_data[outrow]; + inptr1 = input_data[outrow + 1]; + + /* Downsample all but the last DCT block of pixels. */ + for (i = 0; i < width_in_blocks - 1; i++) { + uint8x16_t pixels_r0 = vld1q_u8(inptr0 + i * 2 * DCTSIZE); + uint8x16_t pixels_r1 = vld1q_u8(inptr1 + i * 2 * DCTSIZE); + /* Add adjacent pixel values in row 0, widen to 16-bit, and add bias. */ + uint16x8_t samples_u16 = vpadalq_u8(bias, pixels_r0); + /* Add adjacent pixel values in row 1, widen to 16-bit, and accumulate. + */ + samples_u16 = vpadalq_u8(samples_u16, pixels_r1); + /* Divide total by 4 and narrow to 8-bit. */ + uint8x8_t samples_u8 = vshrn_n_u16(samples_u16, 2); + /* Store samples to memory and increment pointers. */ + vst1_u8(outptr + i * DCTSIZE, samples_u8); + } + + /* Load pixels in last DCT block into a table. */ + uint8x16_t pixels_r0 = + vld1q_u8(inptr0 + (width_in_blocks - 1) * 2 * DCTSIZE); + uint8x16_t pixels_r1 = + vld1q_u8(inptr1 + (width_in_blocks - 1) * 2 * DCTSIZE); +#if defined(__aarch64__) || defined(_M_ARM64) + /* Pad the empty elements with the value of the last pixel. */ + pixels_r0 = vqtbl1q_u8(pixels_r0, expand_mask); + pixels_r1 = vqtbl1q_u8(pixels_r1, expand_mask); +#else + uint8x8x2_t table_r0 = + { { vget_low_u8(pixels_r0), vget_high_u8(pixels_r0) } }; + uint8x8x2_t table_r1 = + { { vget_low_u8(pixels_r1), vget_high_u8(pixels_r1) } }; + pixels_r0 = vcombine_u8(vtbl2_u8(table_r0, vget_low_u8(expand_mask)), + vtbl2_u8(table_r0, vget_high_u8(expand_mask))); + pixels_r1 = vcombine_u8(vtbl2_u8(table_r1, vget_low_u8(expand_mask)), + vtbl2_u8(table_r1, vget_high_u8(expand_mask))); +#endif + /* Add adjacent pixel values in row 0, widen to 16-bit, and add bias. */ + uint16x8_t samples_u16 = vpadalq_u8(bias, pixels_r0); + /* Add adjacent pixel values in row 1, widen to 16-bit, and accumulate. */ + samples_u16 = vpadalq_u8(samples_u16, pixels_r1); + /* Divide total by 4, narrow to 8-bit, and store. */ + uint8x8_t samples_u8 = vshrn_n_u16(samples_u16, 2); + vst1_u8(outptr + (width_in_blocks - 1) * DCTSIZE, samples_u8); + } +} diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/jdcolext-neon.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/jdcolext-neon.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/jdcolext-neon.c 1970-01-01 01:00:00.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/jdcolext-neon.c 2021-11-20 03:41:33.399600434 +0000 @@ -0,0 +1,374 @@ +/* + * jdcolext-neon.c - colorspace conversion (Arm Neon) + * + * Copyright (C) 2020, Arm Limited. All Rights Reserved. + * Copyright (C) 2020, D. R. Commander. All Rights Reserved. + * + * This software is provided 'as-is', without any express or implied + * warranty. In no event will the authors be held liable for any damages + * arising from the use of this software. + * + * Permission is granted to anyone to use this software for any purpose, + * including commercial applications, and to alter it and redistribute it + * freely, subject to the following restrictions: + * + * 1. The origin of this software must not be misrepresented; you must not + * claim that you wrote the original software. If you use this software + * in a product, an acknowledgment in the product documentation would be + * appreciated but is not required. + * 2. Altered source versions must be plainly marked as such, and must not be + * misrepresented as being the original software. + * 3. This notice may not be removed or altered from any source distribution. + */ + +/* This file is included by jdcolor-neon.c. */ + + +/* YCbCr -> RGB conversion is defined by the following equations: + * R = Y + 1.40200 * (Cr - 128) + * G = Y - 0.34414 * (Cb - 128) - 0.71414 * (Cr - 128) + * B = Y + 1.77200 * (Cb - 128) + * + * Scaled integer constants are used to avoid floating-point arithmetic: + * 0.3441467 = 11277 * 2^-15 + * 0.7141418 = 23401 * 2^-15 + * 1.4020386 = 22971 * 2^-14 + * 1.7720337 = 29033 * 2^-14 + * These constants are defined in jdcolor-neon.c. + * + * To ensure correct results, rounding is used when descaling. + */ + +/* Notes on safe memory access for YCbCr -> RGB conversion routines: + * + * Input memory buffers can be safely overread up to the next multiple of + * ALIGN_SIZE bytes, since they are always allocated by alloc_sarray() in + * jmemmgr.c. + * + * The output buffer cannot safely be written beyond output_width, since + * output_buf points to a possibly unpadded row in the decompressed image + * buffer allocated by the calling program. + */ + +void jsimd_ycc_rgb_convert_neon(JDIMENSION output_width, JSAMPIMAGE input_buf, + JDIMENSION input_row, JSAMPARRAY output_buf, + int num_rows) +{ + JSAMPROW outptr; + /* Pointers to Y, Cb, and Cr data */ + JSAMPROW inptr0, inptr1, inptr2; + + const int16x4_t consts = vld1_s16(jsimd_ycc_rgb_convert_neon_consts); + const int16x8_t neg_128 = vdupq_n_s16(-128); + + while (--num_rows >= 0) { + inptr0 = input_buf[0][input_row]; + inptr1 = input_buf[1][input_row]; + inptr2 = input_buf[2][input_row]; + input_row++; + outptr = *output_buf++; + int cols_remaining = output_width; + for (; cols_remaining >= 16; cols_remaining -= 16) { + uint8x16_t y = vld1q_u8(inptr0); + uint8x16_t cb = vld1q_u8(inptr1); + uint8x16_t cr = vld1q_u8(inptr2); + /* Subtract 128 from Cb and Cr. */ + int16x8_t cr_128_l = + vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(neg_128), + vget_low_u8(cr))); + int16x8_t cr_128_h = + vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(neg_128), + vget_high_u8(cr))); + int16x8_t cb_128_l = + vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(neg_128), + vget_low_u8(cb))); + int16x8_t cb_128_h = + vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(neg_128), + vget_high_u8(cb))); + /* Compute G-Y: - 0.34414 * (Cb - 128) - 0.71414 * (Cr - 128) */ + int32x4_t g_sub_y_ll = vmull_lane_s16(vget_low_s16(cb_128_l), consts, 0); + int32x4_t g_sub_y_lh = vmull_lane_s16(vget_high_s16(cb_128_l), + consts, 0); + int32x4_t g_sub_y_hl = vmull_lane_s16(vget_low_s16(cb_128_h), consts, 0); + int32x4_t g_sub_y_hh = vmull_lane_s16(vget_high_s16(cb_128_h), + consts, 0); + g_sub_y_ll = vmlsl_lane_s16(g_sub_y_ll, vget_low_s16(cr_128_l), + consts, 1); + g_sub_y_lh = vmlsl_lane_s16(g_sub_y_lh, vget_high_s16(cr_128_l), + consts, 1); + g_sub_y_hl = vmlsl_lane_s16(g_sub_y_hl, vget_low_s16(cr_128_h), + consts, 1); + g_sub_y_hh = vmlsl_lane_s16(g_sub_y_hh, vget_high_s16(cr_128_h), + consts, 1); + /* Descale G components: shift right 15, round, and narrow to 16-bit. */ + int16x8_t g_sub_y_l = vcombine_s16(vrshrn_n_s32(g_sub_y_ll, 15), + vrshrn_n_s32(g_sub_y_lh, 15)); + int16x8_t g_sub_y_h = vcombine_s16(vrshrn_n_s32(g_sub_y_hl, 15), + vrshrn_n_s32(g_sub_y_hh, 15)); + /* Compute R-Y: 1.40200 * (Cr - 128) */ + int16x8_t r_sub_y_l = vqrdmulhq_lane_s16(vshlq_n_s16(cr_128_l, 1), + consts, 2); + int16x8_t r_sub_y_h = vqrdmulhq_lane_s16(vshlq_n_s16(cr_128_h, 1), + consts, 2); + /* Compute B-Y: 1.77200 * (Cb - 128) */ + int16x8_t b_sub_y_l = vqrdmulhq_lane_s16(vshlq_n_s16(cb_128_l, 1), + consts, 3); + int16x8_t b_sub_y_h = vqrdmulhq_lane_s16(vshlq_n_s16(cb_128_h, 1), + consts, 3); + /* Add Y. */ + int16x8_t r_l = + vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(r_sub_y_l), + vget_low_u8(y))); + int16x8_t r_h = + vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(r_sub_y_h), + vget_high_u8(y))); + int16x8_t b_l = + vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(b_sub_y_l), + vget_low_u8(y))); + int16x8_t b_h = + vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(b_sub_y_h), + vget_high_u8(y))); + int16x8_t g_l = + vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(g_sub_y_l), + vget_low_u8(y))); + int16x8_t g_h = + vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(g_sub_y_h), + vget_high_u8(y))); + +#if RGB_PIXELSIZE == 4 + uint8x16x4_t rgba; + /* Convert each component to unsigned and narrow, clamping to [0-255]. */ + rgba.val[RGB_RED] = vcombine_u8(vqmovun_s16(r_l), vqmovun_s16(r_h)); + rgba.val[RGB_GREEN] = vcombine_u8(vqmovun_s16(g_l), vqmovun_s16(g_h)); + rgba.val[RGB_BLUE] = vcombine_u8(vqmovun_s16(b_l), vqmovun_s16(b_h)); + /* Set alpha channel to opaque (0xFF). */ + rgba.val[RGB_ALPHA] = vdupq_n_u8(0xFF); + /* Store RGBA pixel data to memory. */ + vst4q_u8(outptr, rgba); +#elif RGB_PIXELSIZE == 3 + uint8x16x3_t rgb; + /* Convert each component to unsigned and narrow, clamping to [0-255]. */ + rgb.val[RGB_RED] = vcombine_u8(vqmovun_s16(r_l), vqmovun_s16(r_h)); + rgb.val[RGB_GREEN] = vcombine_u8(vqmovun_s16(g_l), vqmovun_s16(g_h)); + rgb.val[RGB_BLUE] = vcombine_u8(vqmovun_s16(b_l), vqmovun_s16(b_h)); + /* Store RGB pixel data to memory. */ + vst3q_u8(outptr, rgb); +#else + /* Pack R, G, and B values in ratio 5:6:5. */ + uint16x8_t rgb565_l = vqshluq_n_s16(r_l, 8); + rgb565_l = vsriq_n_u16(rgb565_l, vqshluq_n_s16(g_l, 8), 5); + rgb565_l = vsriq_n_u16(rgb565_l, vqshluq_n_s16(b_l, 8), 11); + uint16x8_t rgb565_h = vqshluq_n_s16(r_h, 8); + rgb565_h = vsriq_n_u16(rgb565_h, vqshluq_n_s16(g_h, 8), 5); + rgb565_h = vsriq_n_u16(rgb565_h, vqshluq_n_s16(b_h, 8), 11); + /* Store RGB pixel data to memory. */ + vst1q_u16((uint16_t *)outptr, rgb565_l); + vst1q_u16(((uint16_t *)outptr) + 8, rgb565_h); +#endif + + /* Increment pointers. */ + inptr0 += 16; + inptr1 += 16; + inptr2 += 16; + outptr += (RGB_PIXELSIZE * 16); + } + + if (cols_remaining >= 8) { + uint8x8_t y = vld1_u8(inptr0); + uint8x8_t cb = vld1_u8(inptr1); + uint8x8_t cr = vld1_u8(inptr2); + /* Subtract 128 from Cb and Cr. */ + int16x8_t cr_128 = + vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(neg_128), cr)); + int16x8_t cb_128 = + vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(neg_128), cb)); + /* Compute G-Y: - 0.34414 * (Cb - 128) - 0.71414 * (Cr - 128) */ + int32x4_t g_sub_y_l = vmull_lane_s16(vget_low_s16(cb_128), consts, 0); + int32x4_t g_sub_y_h = vmull_lane_s16(vget_high_s16(cb_128), consts, 0); + g_sub_y_l = vmlsl_lane_s16(g_sub_y_l, vget_low_s16(cr_128), consts, 1); + g_sub_y_h = vmlsl_lane_s16(g_sub_y_h, vget_high_s16(cr_128), consts, 1); + /* Descale G components: shift right 15, round, and narrow to 16-bit. */ + int16x8_t g_sub_y = vcombine_s16(vrshrn_n_s32(g_sub_y_l, 15), + vrshrn_n_s32(g_sub_y_h, 15)); + /* Compute R-Y: 1.40200 * (Cr - 128) */ + int16x8_t r_sub_y = vqrdmulhq_lane_s16(vshlq_n_s16(cr_128, 1), + consts, 2); + /* Compute B-Y: 1.77200 * (Cb - 128) */ + int16x8_t b_sub_y = vqrdmulhq_lane_s16(vshlq_n_s16(cb_128, 1), + consts, 3); + /* Add Y. */ + int16x8_t r = + vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(r_sub_y), y)); + int16x8_t b = + vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(b_sub_y), y)); + int16x8_t g = + vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(g_sub_y), y)); + +#if RGB_PIXELSIZE == 4 + uint8x8x4_t rgba; + /* Convert each component to unsigned and narrow, clamping to [0-255]. */ + rgba.val[RGB_RED] = vqmovun_s16(r); + rgba.val[RGB_GREEN] = vqmovun_s16(g); + rgba.val[RGB_BLUE] = vqmovun_s16(b); + /* Set alpha channel to opaque (0xFF). */ + rgba.val[RGB_ALPHA] = vdup_n_u8(0xFF); + /* Store RGBA pixel data to memory. */ + vst4_u8(outptr, rgba); +#elif RGB_PIXELSIZE == 3 + uint8x8x3_t rgb; + /* Convert each component to unsigned and narrow, clamping to [0-255]. */ + rgb.val[RGB_RED] = vqmovun_s16(r); + rgb.val[RGB_GREEN] = vqmovun_s16(g); + rgb.val[RGB_BLUE] = vqmovun_s16(b); + /* Store RGB pixel data to memory. */ + vst3_u8(outptr, rgb); +#else + /* Pack R, G, and B values in ratio 5:6:5. */ + uint16x8_t rgb565 = vqshluq_n_s16(r, 8); + rgb565 = vsriq_n_u16(rgb565, vqshluq_n_s16(g, 8), 5); + rgb565 = vsriq_n_u16(rgb565, vqshluq_n_s16(b, 8), 11); + /* Store RGB pixel data to memory. */ + vst1q_u16((uint16_t *)outptr, rgb565); +#endif + + /* Increment pointers. */ + inptr0 += 8; + inptr1 += 8; + inptr2 += 8; + outptr += (RGB_PIXELSIZE * 8); + cols_remaining -= 8; + } + + /* Handle the tail elements. */ + if (cols_remaining > 0) { + uint8x8_t y = vld1_u8(inptr0); + uint8x8_t cb = vld1_u8(inptr1); + uint8x8_t cr = vld1_u8(inptr2); + /* Subtract 128 from Cb and Cr. */ + int16x8_t cr_128 = + vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(neg_128), cr)); + int16x8_t cb_128 = + vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(neg_128), cb)); + /* Compute G-Y: - 0.34414 * (Cb - 128) - 0.71414 * (Cr - 128) */ + int32x4_t g_sub_y_l = vmull_lane_s16(vget_low_s16(cb_128), consts, 0); + int32x4_t g_sub_y_h = vmull_lane_s16(vget_high_s16(cb_128), consts, 0); + g_sub_y_l = vmlsl_lane_s16(g_sub_y_l, vget_low_s16(cr_128), consts, 1); + g_sub_y_h = vmlsl_lane_s16(g_sub_y_h, vget_high_s16(cr_128), consts, 1); + /* Descale G components: shift right 15, round, and narrow to 16-bit. */ + int16x8_t g_sub_y = vcombine_s16(vrshrn_n_s32(g_sub_y_l, 15), + vrshrn_n_s32(g_sub_y_h, 15)); + /* Compute R-Y: 1.40200 * (Cr - 128) */ + int16x8_t r_sub_y = vqrdmulhq_lane_s16(vshlq_n_s16(cr_128, 1), + consts, 2); + /* Compute B-Y: 1.77200 * (Cb - 128) */ + int16x8_t b_sub_y = vqrdmulhq_lane_s16(vshlq_n_s16(cb_128, 1), + consts, 3); + /* Add Y. */ + int16x8_t r = + vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(r_sub_y), y)); + int16x8_t b = + vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(b_sub_y), y)); + int16x8_t g = + vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(g_sub_y), y)); + +#if RGB_PIXELSIZE == 4 + uint8x8x4_t rgba; + /* Convert each component to unsigned and narrow, clamping to [0-255]. */ + rgba.val[RGB_RED] = vqmovun_s16(r); + rgba.val[RGB_GREEN] = vqmovun_s16(g); + rgba.val[RGB_BLUE] = vqmovun_s16(b); + /* Set alpha channel to opaque (0xFF). */ + rgba.val[RGB_ALPHA] = vdup_n_u8(0xFF); + /* Store RGBA pixel data to memory. */ + switch (cols_remaining) { + case 7: + vst4_lane_u8(outptr + 6 * RGB_PIXELSIZE, rgba, 6); + FALLTHROUGH /*FALLTHROUGH*/ + case 6: + vst4_lane_u8(outptr + 5 * RGB_PIXELSIZE, rgba, 5); + FALLTHROUGH /*FALLTHROUGH*/ + case 5: + vst4_lane_u8(outptr + 4 * RGB_PIXELSIZE, rgba, 4); + FALLTHROUGH /*FALLTHROUGH*/ + case 4: + vst4_lane_u8(outptr + 3 * RGB_PIXELSIZE, rgba, 3); + FALLTHROUGH /*FALLTHROUGH*/ + case 3: + vst4_lane_u8(outptr + 2 * RGB_PIXELSIZE, rgba, 2); + FALLTHROUGH /*FALLTHROUGH*/ + case 2: + vst4_lane_u8(outptr + RGB_PIXELSIZE, rgba, 1); + FALLTHROUGH /*FALLTHROUGH*/ + case 1: + vst4_lane_u8(outptr, rgba, 0); + FALLTHROUGH /*FALLTHROUGH*/ + default: + break; + } +#elif RGB_PIXELSIZE == 3 + uint8x8x3_t rgb; + /* Convert each component to unsigned and narrow, clamping to [0-255]. */ + rgb.val[RGB_RED] = vqmovun_s16(r); + rgb.val[RGB_GREEN] = vqmovun_s16(g); + rgb.val[RGB_BLUE] = vqmovun_s16(b); + /* Store RGB pixel data to memory. */ + switch (cols_remaining) { + case 7: + vst3_lane_u8(outptr + 6 * RGB_PIXELSIZE, rgb, 6); + FALLTHROUGH /*FALLTHROUGH*/ + case 6: + vst3_lane_u8(outptr + 5 * RGB_PIXELSIZE, rgb, 5); + FALLTHROUGH /*FALLTHROUGH*/ + case 5: + vst3_lane_u8(outptr + 4 * RGB_PIXELSIZE, rgb, 4); + FALLTHROUGH /*FALLTHROUGH*/ + case 4: + vst3_lane_u8(outptr + 3 * RGB_PIXELSIZE, rgb, 3); + FALLTHROUGH /*FALLTHROUGH*/ + case 3: + vst3_lane_u8(outptr + 2 * RGB_PIXELSIZE, rgb, 2); + FALLTHROUGH /*FALLTHROUGH*/ + case 2: + vst3_lane_u8(outptr + RGB_PIXELSIZE, rgb, 1); + FALLTHROUGH /*FALLTHROUGH*/ + case 1: + vst3_lane_u8(outptr, rgb, 0); + FALLTHROUGH /*FALLTHROUGH*/ + default: + break; + } +#else + /* Pack R, G, and B values in ratio 5:6:5. */ + uint16x8_t rgb565 = vqshluq_n_s16(r, 8); + rgb565 = vsriq_n_u16(rgb565, vqshluq_n_s16(g, 8), 5); + rgb565 = vsriq_n_u16(rgb565, vqshluq_n_s16(b, 8), 11); + /* Store RGB565 pixel data to memory. */ + switch (cols_remaining) { + case 7: + vst1q_lane_u16((uint16_t *)(outptr + 6 * RGB_PIXELSIZE), rgb565, 6); + FALLTHROUGH /*FALLTHROUGH*/ + case 6: + vst1q_lane_u16((uint16_t *)(outptr + 5 * RGB_PIXELSIZE), rgb565, 5); + FALLTHROUGH /*FALLTHROUGH*/ + case 5: + vst1q_lane_u16((uint16_t *)(outptr + 4 * RGB_PIXELSIZE), rgb565, 4); + FALLTHROUGH /*FALLTHROUGH*/ + case 4: + vst1q_lane_u16((uint16_t *)(outptr + 3 * RGB_PIXELSIZE), rgb565, 3); + FALLTHROUGH /*FALLTHROUGH*/ + case 3: + vst1q_lane_u16((uint16_t *)(outptr + 2 * RGB_PIXELSIZE), rgb565, 2); + FALLTHROUGH /*FALLTHROUGH*/ + case 2: + vst1q_lane_u16((uint16_t *)(outptr + RGB_PIXELSIZE), rgb565, 1); + FALLTHROUGH /*FALLTHROUGH*/ + case 1: + vst1q_lane_u16((uint16_t *)outptr, rgb565, 0); + FALLTHROUGH /*FALLTHROUGH*/ + default: + break; + } +#endif + } + } +} diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/jdcolor-neon.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/jdcolor-neon.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/jdcolor-neon.c 1970-01-01 01:00:00.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/jdcolor-neon.c 2021-11-20 03:41:33.399600434 +0000 @@ -0,0 +1,142 @@ +/* + * jdcolor-neon.c - colorspace conversion (Arm Neon) + * + * Copyright (C) 2020, Arm Limited. All Rights Reserved. + * + * This software is provided 'as-is', without any express or implied + * warranty. In no event will the authors be held liable for any damages + * arising from the use of this software. + * + * Permission is granted to anyone to use this software for any purpose, + * including commercial applications, and to alter it and redistribute it + * freely, subject to the following restrictions: + * + * 1. The origin of this software must not be misrepresented; you must not + * claim that you wrote the original software. If you use this software + * in a product, an acknowledgment in the product documentation would be + * appreciated but is not required. + * 2. Altered source versions must be plainly marked as such, and must not be + * misrepresented as being the original software. + * 3. This notice may not be removed or altered from any source distribution. + */ + +#define JPEG_INTERNALS +#include "jconfigint.h" +#include "../../jinclude.h" +#include "../../jpeglib.h" +#include "../../jsimd.h" +#include "../../jdct.h" +#include "../../jsimddct.h" +#include "../jsimd.h" +#include "align.h" + +#include + + +/* YCbCr -> RGB conversion constants */ + +#define F_0_344 11277 /* 0.3441467 = 11277 * 2^-15 */ +#define F_0_714 23401 /* 0.7141418 = 23401 * 2^-15 */ +#define F_1_402 22971 /* 1.4020386 = 22971 * 2^-14 */ +#define F_1_772 29033 /* 1.7720337 = 29033 * 2^-14 */ + +ALIGN(16) static const int16_t jsimd_ycc_rgb_convert_neon_consts[] = { + -F_0_344, F_0_714, F_1_402, F_1_772 +}; + + +/* Include inline routines for colorspace extensions. */ + +#include "jdcolext-neon.c" +#undef RGB_RED +#undef RGB_GREEN +#undef RGB_BLUE +#undef RGB_PIXELSIZE + +#define RGB_RED EXT_RGB_RED +#define RGB_GREEN EXT_RGB_GREEN +#define RGB_BLUE EXT_RGB_BLUE +#define RGB_PIXELSIZE EXT_RGB_PIXELSIZE +#define jsimd_ycc_rgb_convert_neon jsimd_ycc_extrgb_convert_neon +#include "jdcolext-neon.c" +#undef RGB_RED +#undef RGB_GREEN +#undef RGB_BLUE +#undef RGB_PIXELSIZE +#undef jsimd_ycc_rgb_convert_neon + +#define RGB_RED EXT_RGBX_RED +#define RGB_GREEN EXT_RGBX_GREEN +#define RGB_BLUE EXT_RGBX_BLUE +#define RGB_ALPHA 3 +#define RGB_PIXELSIZE EXT_RGBX_PIXELSIZE +#define jsimd_ycc_rgb_convert_neon jsimd_ycc_extrgbx_convert_neon +#include "jdcolext-neon.c" +#undef RGB_RED +#undef RGB_GREEN +#undef RGB_BLUE +#undef RGB_ALPHA +#undef RGB_PIXELSIZE +#undef jsimd_ycc_rgb_convert_neon + +#define RGB_RED EXT_BGR_RED +#define RGB_GREEN EXT_BGR_GREEN +#define RGB_BLUE EXT_BGR_BLUE +#define RGB_PIXELSIZE EXT_BGR_PIXELSIZE +#define jsimd_ycc_rgb_convert_neon jsimd_ycc_extbgr_convert_neon +#include "jdcolext-neon.c" +#undef RGB_RED +#undef RGB_GREEN +#undef RGB_BLUE +#undef RGB_PIXELSIZE +#undef jsimd_ycc_rgb_convert_neon + +#define RGB_RED EXT_BGRX_RED +#define RGB_GREEN EXT_BGRX_GREEN +#define RGB_BLUE EXT_BGRX_BLUE +#define RGB_ALPHA 3 +#define RGB_PIXELSIZE EXT_BGRX_PIXELSIZE +#define jsimd_ycc_rgb_convert_neon jsimd_ycc_extbgrx_convert_neon +#include "jdcolext-neon.c" +#undef RGB_RED +#undef RGB_GREEN +#undef RGB_BLUE +#undef RGB_ALPHA +#undef RGB_PIXELSIZE +#undef jsimd_ycc_rgb_convert_neon + +#define RGB_RED EXT_XBGR_RED +#define RGB_GREEN EXT_XBGR_GREEN +#define RGB_BLUE EXT_XBGR_BLUE +#define RGB_ALPHA 0 +#define RGB_PIXELSIZE EXT_XBGR_PIXELSIZE +#define jsimd_ycc_rgb_convert_neon jsimd_ycc_extxbgr_convert_neon +#include "jdcolext-neon.c" +#undef RGB_RED +#undef RGB_GREEN +#undef RGB_BLUE +#undef RGB_ALPHA +#undef RGB_PIXELSIZE +#undef jsimd_ycc_rgb_convert_neon + +#define RGB_RED EXT_XRGB_RED +#define RGB_GREEN EXT_XRGB_GREEN +#define RGB_BLUE EXT_XRGB_BLUE +#define RGB_ALPHA 0 +#define RGB_PIXELSIZE EXT_XRGB_PIXELSIZE +#define jsimd_ycc_rgb_convert_neon jsimd_ycc_extxrgb_convert_neon +#include "jdcolext-neon.c" +#undef RGB_RED +#undef RGB_GREEN +#undef RGB_BLUE +#undef RGB_ALPHA +#undef RGB_PIXELSIZE +#undef jsimd_ycc_rgb_convert_neon + +/* YCbCr -> RGB565 Conversion */ + +#define RGB_PIXELSIZE 2 +#define jsimd_ycc_rgb_convert_neon jsimd_ycc_rgb565_convert_neon +#include "jdcolext-neon.c" +#undef RGB_PIXELSIZE +#undef jsimd_ycc_rgb_convert_neon diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/jdmerge-neon.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/jdmerge-neon.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/jdmerge-neon.c 1970-01-01 01:00:00.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/jdmerge-neon.c 2021-11-20 03:41:33.399600434 +0000 @@ -0,0 +1,145 @@ +/* + * jdmerge-neon.c - merged upsampling/color conversion (Arm Neon) + * + * Copyright (C) 2020, Arm Limited. All Rights Reserved. + * + * This software is provided 'as-is', without any express or implied + * warranty. In no event will the authors be held liable for any damages + * arising from the use of this software. + * + * Permission is granted to anyone to use this software for any purpose, + * including commercial applications, and to alter it and redistribute it + * freely, subject to the following restrictions: + * + * 1. The origin of this software must not be misrepresented; you must not + * claim that you wrote the original software. If you use this software + * in a product, an acknowledgment in the product documentation would be + * appreciated but is not required. + * 2. Altered source versions must be plainly marked as such, and must not be + * misrepresented as being the original software. + * 3. This notice may not be removed or altered from any source distribution. + */ + +#define JPEG_INTERNALS +#include "jconfigint.h" +#include "../../jinclude.h" +#include "../../jpeglib.h" +#include "../../jsimd.h" +#include "../../jdct.h" +#include "../../jsimddct.h" +#include "../jsimd.h" +#include "align.h" + +#include + + +/* YCbCr -> RGB conversion constants */ + +#define F_0_344 11277 /* 0.3441467 = 11277 * 2^-15 */ +#define F_0_714 23401 /* 0.7141418 = 23401 * 2^-15 */ +#define F_1_402 22971 /* 1.4020386 = 22971 * 2^-14 */ +#define F_1_772 29033 /* 1.7720337 = 29033 * 2^-14 */ + +ALIGN(16) static const int16_t jsimd_ycc_rgb_convert_neon_consts[] = { + -F_0_344, F_0_714, F_1_402, F_1_772 +}; + + +/* Include inline routines for colorspace extensions. */ + +#include "jdmrgext-neon.c" +#undef RGB_RED +#undef RGB_GREEN +#undef RGB_BLUE +#undef RGB_PIXELSIZE + +#define RGB_RED EXT_RGB_RED +#define RGB_GREEN EXT_RGB_GREEN +#define RGB_BLUE EXT_RGB_BLUE +#define RGB_PIXELSIZE EXT_RGB_PIXELSIZE +#define jsimd_h2v1_merged_upsample_neon jsimd_h2v1_extrgb_merged_upsample_neon +#define jsimd_h2v2_merged_upsample_neon jsimd_h2v2_extrgb_merged_upsample_neon +#include "jdmrgext-neon.c" +#undef RGB_RED +#undef RGB_GREEN +#undef RGB_BLUE +#undef RGB_PIXELSIZE +#undef jsimd_h2v1_merged_upsample_neon +#undef jsimd_h2v2_merged_upsample_neon + +#define RGB_RED EXT_RGBX_RED +#define RGB_GREEN EXT_RGBX_GREEN +#define RGB_BLUE EXT_RGBX_BLUE +#define RGB_ALPHA 3 +#define RGB_PIXELSIZE EXT_RGBX_PIXELSIZE +#define jsimd_h2v1_merged_upsample_neon jsimd_h2v1_extrgbx_merged_upsample_neon +#define jsimd_h2v2_merged_upsample_neon jsimd_h2v2_extrgbx_merged_upsample_neon +#include "jdmrgext-neon.c" +#undef RGB_RED +#undef RGB_GREEN +#undef RGB_BLUE +#undef RGB_ALPHA +#undef RGB_PIXELSIZE +#undef jsimd_h2v1_merged_upsample_neon +#undef jsimd_h2v2_merged_upsample_neon + +#define RGB_RED EXT_BGR_RED +#define RGB_GREEN EXT_BGR_GREEN +#define RGB_BLUE EXT_BGR_BLUE +#define RGB_PIXELSIZE EXT_BGR_PIXELSIZE +#define jsimd_h2v1_merged_upsample_neon jsimd_h2v1_extbgr_merged_upsample_neon +#define jsimd_h2v2_merged_upsample_neon jsimd_h2v2_extbgr_merged_upsample_neon +#include "jdmrgext-neon.c" +#undef RGB_RED +#undef RGB_GREEN +#undef RGB_BLUE +#undef RGB_PIXELSIZE +#undef jsimd_h2v1_merged_upsample_neon +#undef jsimd_h2v2_merged_upsample_neon + +#define RGB_RED EXT_BGRX_RED +#define RGB_GREEN EXT_BGRX_GREEN +#define RGB_BLUE EXT_BGRX_BLUE +#define RGB_ALPHA 3 +#define RGB_PIXELSIZE EXT_BGRX_PIXELSIZE +#define jsimd_h2v1_merged_upsample_neon jsimd_h2v1_extbgrx_merged_upsample_neon +#define jsimd_h2v2_merged_upsample_neon jsimd_h2v2_extbgrx_merged_upsample_neon +#include "jdmrgext-neon.c" +#undef RGB_RED +#undef RGB_GREEN +#undef RGB_BLUE +#undef RGB_ALPHA +#undef RGB_PIXELSIZE +#undef jsimd_h2v1_merged_upsample_neon +#undef jsimd_h2v2_merged_upsample_neon + +#define RGB_RED EXT_XBGR_RED +#define RGB_GREEN EXT_XBGR_GREEN +#define RGB_BLUE EXT_XBGR_BLUE +#define RGB_ALPHA 0 +#define RGB_PIXELSIZE EXT_XBGR_PIXELSIZE +#define jsimd_h2v1_merged_upsample_neon jsimd_h2v1_extxbgr_merged_upsample_neon +#define jsimd_h2v2_merged_upsample_neon jsimd_h2v2_extxbgr_merged_upsample_neon +#include "jdmrgext-neon.c" +#undef RGB_RED +#undef RGB_GREEN +#undef RGB_BLUE +#undef RGB_ALPHA +#undef RGB_PIXELSIZE +#undef jsimd_h2v1_merged_upsample_neon +#undef jsimd_h2v2_merged_upsample_neon + +#define RGB_RED EXT_XRGB_RED +#define RGB_GREEN EXT_XRGB_GREEN +#define RGB_BLUE EXT_XRGB_BLUE +#define RGB_ALPHA 0 +#define RGB_PIXELSIZE EXT_XRGB_PIXELSIZE +#define jsimd_h2v1_merged_upsample_neon jsimd_h2v1_extxrgb_merged_upsample_neon +#define jsimd_h2v2_merged_upsample_neon jsimd_h2v2_extxrgb_merged_upsample_neon +#include "jdmrgext-neon.c" +#undef RGB_RED +#undef RGB_GREEN +#undef RGB_BLUE +#undef RGB_ALPHA +#undef RGB_PIXELSIZE +#undef jsimd_h2v1_merged_upsample_neon diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/jdmrgext-neon.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/jdmrgext-neon.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/jdmrgext-neon.c 1970-01-01 01:00:00.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/jdmrgext-neon.c 2021-11-20 03:41:33.399600434 +0000 @@ -0,0 +1,723 @@ +/* + * jdmrgext-neon.c - merged upsampling/color conversion (Arm Neon) + * + * Copyright (C) 2020, Arm Limited. All Rights Reserved. + * Copyright (C) 2020, D. R. Commander. All Rights Reserved. + * + * This software is provided 'as-is', without any express or implied + * warranty. In no event will the authors be held liable for any damages + * arising from the use of this software. + * + * Permission is granted to anyone to use this software for any purpose, + * including commercial applications, and to alter it and redistribute it + * freely, subject to the following restrictions: + * + * 1. The origin of this software must not be misrepresented; you must not + * claim that you wrote the original software. If you use this software + * in a product, an acknowledgment in the product documentation would be + * appreciated but is not required. + * 2. Altered source versions must be plainly marked as such, and must not be + * misrepresented as being the original software. + * 3. This notice may not be removed or altered from any source distribution. + */ + +/* This file is included by jdmerge-neon.c. */ + + +/* These routines combine simple (non-fancy, i.e. non-smooth) h2v1 or h2v2 + * chroma upsampling and YCbCr -> RGB color conversion into a single function. + * + * As with the standalone functions, YCbCr -> RGB conversion is defined by the + * following equations: + * R = Y + 1.40200 * (Cr - 128) + * G = Y - 0.34414 * (Cb - 128) - 0.71414 * (Cr - 128) + * B = Y + 1.77200 * (Cb - 128) + * + * Scaled integer constants are used to avoid floating-point arithmetic: + * 0.3441467 = 11277 * 2^-15 + * 0.7141418 = 23401 * 2^-15 + * 1.4020386 = 22971 * 2^-14 + * 1.7720337 = 29033 * 2^-14 + * These constants are defined in jdmerge-neon.c. + * + * To ensure correct results, rounding is used when descaling. + */ + +/* Notes on safe memory access for merged upsampling/YCbCr -> RGB conversion + * routines: + * + * Input memory buffers can be safely overread up to the next multiple of + * ALIGN_SIZE bytes, since they are always allocated by alloc_sarray() in + * jmemmgr.c. + * + * The output buffer cannot safely be written beyond output_width, since + * output_buf points to a possibly unpadded row in the decompressed image + * buffer allocated by the calling program. + */ + +/* Upsample and color convert for the case of 2:1 horizontal and 1:1 vertical. + */ + +void jsimd_h2v1_merged_upsample_neon(JDIMENSION output_width, + JSAMPIMAGE input_buf, + JDIMENSION in_row_group_ctr, + JSAMPARRAY output_buf) +{ + JSAMPROW outptr; + /* Pointers to Y, Cb, and Cr data */ + JSAMPROW inptr0, inptr1, inptr2; + + const int16x4_t consts = vld1_s16(jsimd_ycc_rgb_convert_neon_consts); + const int16x8_t neg_128 = vdupq_n_s16(-128); + + inptr0 = input_buf[0][in_row_group_ctr]; + inptr1 = input_buf[1][in_row_group_ctr]; + inptr2 = input_buf[2][in_row_group_ctr]; + outptr = output_buf[0]; + + int cols_remaining = output_width; + for (; cols_remaining >= 16; cols_remaining -= 16) { + /* De-interleave Y component values into two separate vectors, one + * containing the component values with even-numbered indices and one + * containing the component values with odd-numbered indices. + */ + uint8x8x2_t y = vld2_u8(inptr0); + uint8x8_t cb = vld1_u8(inptr1); + uint8x8_t cr = vld1_u8(inptr2); + /* Subtract 128 from Cb and Cr. */ + int16x8_t cr_128 = + vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(neg_128), cr)); + int16x8_t cb_128 = + vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(neg_128), cb)); + /* Compute G-Y: - 0.34414 * (Cb - 128) - 0.71414 * (Cr - 128) */ + int32x4_t g_sub_y_l = vmull_lane_s16(vget_low_s16(cb_128), consts, 0); + int32x4_t g_sub_y_h = vmull_lane_s16(vget_high_s16(cb_128), consts, 0); + g_sub_y_l = vmlsl_lane_s16(g_sub_y_l, vget_low_s16(cr_128), consts, 1); + g_sub_y_h = vmlsl_lane_s16(g_sub_y_h, vget_high_s16(cr_128), consts, 1); + /* Descale G components: shift right 15, round, and narrow to 16-bit. */ + int16x8_t g_sub_y = vcombine_s16(vrshrn_n_s32(g_sub_y_l, 15), + vrshrn_n_s32(g_sub_y_h, 15)); + /* Compute R-Y: 1.40200 * (Cr - 128) */ + int16x8_t r_sub_y = vqrdmulhq_lane_s16(vshlq_n_s16(cr_128, 1), consts, 2); + /* Compute B-Y: 1.77200 * (Cb - 128) */ + int16x8_t b_sub_y = vqrdmulhq_lane_s16(vshlq_n_s16(cb_128, 1), consts, 3); + /* Add the chroma-derived values (G-Y, R-Y, and B-Y) to both the "even" and + * "odd" Y component values. This effectively upsamples the chroma + * components horizontally. + */ + int16x8_t g_even = + vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(g_sub_y), + y.val[0])); + int16x8_t r_even = + vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(r_sub_y), + y.val[0])); + int16x8_t b_even = + vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(b_sub_y), + y.val[0])); + int16x8_t g_odd = + vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(g_sub_y), + y.val[1])); + int16x8_t r_odd = + vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(r_sub_y), + y.val[1])); + int16x8_t b_odd = + vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(b_sub_y), + y.val[1])); + /* Convert each component to unsigned and narrow, clamping to [0-255]. + * Re-interleave the "even" and "odd" component values. + */ + uint8x8x2_t r = vzip_u8(vqmovun_s16(r_even), vqmovun_s16(r_odd)); + uint8x8x2_t g = vzip_u8(vqmovun_s16(g_even), vqmovun_s16(g_odd)); + uint8x8x2_t b = vzip_u8(vqmovun_s16(b_even), vqmovun_s16(b_odd)); + +#ifdef RGB_ALPHA + uint8x16x4_t rgba; + rgba.val[RGB_RED] = vcombine_u8(r.val[0], r.val[1]); + rgba.val[RGB_GREEN] = vcombine_u8(g.val[0], g.val[1]); + rgba.val[RGB_BLUE] = vcombine_u8(b.val[0], b.val[1]); + /* Set alpha channel to opaque (0xFF). */ + rgba.val[RGB_ALPHA] = vdupq_n_u8(0xFF); + /* Store RGBA pixel data to memory. */ + vst4q_u8(outptr, rgba); +#else + uint8x16x3_t rgb; + rgb.val[RGB_RED] = vcombine_u8(r.val[0], r.val[1]); + rgb.val[RGB_GREEN] = vcombine_u8(g.val[0], g.val[1]); + rgb.val[RGB_BLUE] = vcombine_u8(b.val[0], b.val[1]); + /* Store RGB pixel data to memory. */ + vst3q_u8(outptr, rgb); +#endif + + /* Increment pointers. */ + inptr0 += 16; + inptr1 += 8; + inptr2 += 8; + outptr += (RGB_PIXELSIZE * 16); + } + + if (cols_remaining > 0) { + /* De-interleave Y component values into two separate vectors, one + * containing the component values with even-numbered indices and one + * containing the component values with odd-numbered indices. + */ + uint8x8x2_t y = vld2_u8(inptr0); + uint8x8_t cb = vld1_u8(inptr1); + uint8x8_t cr = vld1_u8(inptr2); + /* Subtract 128 from Cb and Cr. */ + int16x8_t cr_128 = + vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(neg_128), cr)); + int16x8_t cb_128 = + vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(neg_128), cb)); + /* Compute G-Y: - 0.34414 * (Cb - 128) - 0.71414 * (Cr - 128) */ + int32x4_t g_sub_y_l = vmull_lane_s16(vget_low_s16(cb_128), consts, 0); + int32x4_t g_sub_y_h = vmull_lane_s16(vget_high_s16(cb_128), consts, 0); + g_sub_y_l = vmlsl_lane_s16(g_sub_y_l, vget_low_s16(cr_128), consts, 1); + g_sub_y_h = vmlsl_lane_s16(g_sub_y_h, vget_high_s16(cr_128), consts, 1); + /* Descale G components: shift right 15, round, and narrow to 16-bit. */ + int16x8_t g_sub_y = vcombine_s16(vrshrn_n_s32(g_sub_y_l, 15), + vrshrn_n_s32(g_sub_y_h, 15)); + /* Compute R-Y: 1.40200 * (Cr - 128) */ + int16x8_t r_sub_y = vqrdmulhq_lane_s16(vshlq_n_s16(cr_128, 1), consts, 2); + /* Compute B-Y: 1.77200 * (Cb - 128) */ + int16x8_t b_sub_y = vqrdmulhq_lane_s16(vshlq_n_s16(cb_128, 1), consts, 3); + /* Add the chroma-derived values (G-Y, R-Y, and B-Y) to both the "even" and + * "odd" Y component values. This effectively upsamples the chroma + * components horizontally. + */ + int16x8_t g_even = + vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(g_sub_y), + y.val[0])); + int16x8_t r_even = + vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(r_sub_y), + y.val[0])); + int16x8_t b_even = + vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(b_sub_y), + y.val[0])); + int16x8_t g_odd = + vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(g_sub_y), + y.val[1])); + int16x8_t r_odd = + vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(r_sub_y), + y.val[1])); + int16x8_t b_odd = + vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(b_sub_y), + y.val[1])); + /* Convert each component to unsigned and narrow, clamping to [0-255]. + * Re-interleave the "even" and "odd" component values. + */ + uint8x8x2_t r = vzip_u8(vqmovun_s16(r_even), vqmovun_s16(r_odd)); + uint8x8x2_t g = vzip_u8(vqmovun_s16(g_even), vqmovun_s16(g_odd)); + uint8x8x2_t b = vzip_u8(vqmovun_s16(b_even), vqmovun_s16(b_odd)); + +#ifdef RGB_ALPHA + uint8x8x4_t rgba_h; + rgba_h.val[RGB_RED] = r.val[1]; + rgba_h.val[RGB_GREEN] = g.val[1]; + rgba_h.val[RGB_BLUE] = b.val[1]; + /* Set alpha channel to opaque (0xFF). */ + rgba_h.val[RGB_ALPHA] = vdup_n_u8(0xFF); + uint8x8x4_t rgba_l; + rgba_l.val[RGB_RED] = r.val[0]; + rgba_l.val[RGB_GREEN] = g.val[0]; + rgba_l.val[RGB_BLUE] = b.val[0]; + /* Set alpha channel to opaque (0xFF). */ + rgba_l.val[RGB_ALPHA] = vdup_n_u8(0xFF); + /* Store RGBA pixel data to memory. */ + switch (cols_remaining) { + case 15: + vst4_lane_u8(outptr + 14 * RGB_PIXELSIZE, rgba_h, 6); + FALLTHROUGH /*FALLTHROUGH*/ + case 14: + vst4_lane_u8(outptr + 13 * RGB_PIXELSIZE, rgba_h, 5); + FALLTHROUGH /*FALLTHROUGH*/ + case 13: + vst4_lane_u8(outptr + 12 * RGB_PIXELSIZE, rgba_h, 4); + FALLTHROUGH /*FALLTHROUGH*/ + case 12: + vst4_lane_u8(outptr + 11 * RGB_PIXELSIZE, rgba_h, 3); + FALLTHROUGH /*FALLTHROUGH*/ + case 11: + vst4_lane_u8(outptr + 10 * RGB_PIXELSIZE, rgba_h, 2); + FALLTHROUGH /*FALLTHROUGH*/ + case 10: + vst4_lane_u8(outptr + 9 * RGB_PIXELSIZE, rgba_h, 1); + FALLTHROUGH /*FALLTHROUGH*/ + case 9: + vst4_lane_u8(outptr + 8 * RGB_PIXELSIZE, rgba_h, 0); + FALLTHROUGH /*FALLTHROUGH*/ + case 8: + vst4_u8(outptr, rgba_l); + break; + case 7: + vst4_lane_u8(outptr + 6 * RGB_PIXELSIZE, rgba_l, 6); + FALLTHROUGH /*FALLTHROUGH*/ + case 6: + vst4_lane_u8(outptr + 5 * RGB_PIXELSIZE, rgba_l, 5); + FALLTHROUGH /*FALLTHROUGH*/ + case 5: + vst4_lane_u8(outptr + 4 * RGB_PIXELSIZE, rgba_l, 4); + FALLTHROUGH /*FALLTHROUGH*/ + case 4: + vst4_lane_u8(outptr + 3 * RGB_PIXELSIZE, rgba_l, 3); + FALLTHROUGH /*FALLTHROUGH*/ + case 3: + vst4_lane_u8(outptr + 2 * RGB_PIXELSIZE, rgba_l, 2); + FALLTHROUGH /*FALLTHROUGH*/ + case 2: + vst4_lane_u8(outptr + RGB_PIXELSIZE, rgba_l, 1); + FALLTHROUGH /*FALLTHROUGH*/ + case 1: + vst4_lane_u8(outptr, rgba_l, 0); + FALLTHROUGH /*FALLTHROUGH*/ + default: + break; + } +#else + uint8x8x3_t rgb_h; + rgb_h.val[RGB_RED] = r.val[1]; + rgb_h.val[RGB_GREEN] = g.val[1]; + rgb_h.val[RGB_BLUE] = b.val[1]; + uint8x8x3_t rgb_l; + rgb_l.val[RGB_RED] = r.val[0]; + rgb_l.val[RGB_GREEN] = g.val[0]; + rgb_l.val[RGB_BLUE] = b.val[0]; + /* Store RGB pixel data to memory. */ + switch (cols_remaining) { + case 15: + vst3_lane_u8(outptr + 14 * RGB_PIXELSIZE, rgb_h, 6); + FALLTHROUGH /*FALLTHROUGH*/ + case 14: + vst3_lane_u8(outptr + 13 * RGB_PIXELSIZE, rgb_h, 5); + FALLTHROUGH /*FALLTHROUGH*/ + case 13: + vst3_lane_u8(outptr + 12 * RGB_PIXELSIZE, rgb_h, 4); + FALLTHROUGH /*FALLTHROUGH*/ + case 12: + vst3_lane_u8(outptr + 11 * RGB_PIXELSIZE, rgb_h, 3); + FALLTHROUGH /*FALLTHROUGH*/ + case 11: + vst3_lane_u8(outptr + 10 * RGB_PIXELSIZE, rgb_h, 2); + FALLTHROUGH /*FALLTHROUGH*/ + case 10: + vst3_lane_u8(outptr + 9 * RGB_PIXELSIZE, rgb_h, 1); + FALLTHROUGH /*FALLTHROUGH*/ + case 9: + vst3_lane_u8(outptr + 8 * RGB_PIXELSIZE, rgb_h, 0); + FALLTHROUGH /*FALLTHROUGH*/ + case 8: + vst3_u8(outptr, rgb_l); + break; + case 7: + vst3_lane_u8(outptr + 6 * RGB_PIXELSIZE, rgb_l, 6); + FALLTHROUGH /*FALLTHROUGH*/ + case 6: + vst3_lane_u8(outptr + 5 * RGB_PIXELSIZE, rgb_l, 5); + FALLTHROUGH /*FALLTHROUGH*/ + case 5: + vst3_lane_u8(outptr + 4 * RGB_PIXELSIZE, rgb_l, 4); + FALLTHROUGH /*FALLTHROUGH*/ + case 4: + vst3_lane_u8(outptr + 3 * RGB_PIXELSIZE, rgb_l, 3); + FALLTHROUGH /*FALLTHROUGH*/ + case 3: + vst3_lane_u8(outptr + 2 * RGB_PIXELSIZE, rgb_l, 2); + FALLTHROUGH /*FALLTHROUGH*/ + case 2: + vst3_lane_u8(outptr + RGB_PIXELSIZE, rgb_l, 1); + FALLTHROUGH /*FALLTHROUGH*/ + case 1: + vst3_lane_u8(outptr, rgb_l, 0); + FALLTHROUGH /*FALLTHROUGH*/ + default: + break; + } +#endif + } +} + + +/* Upsample and color convert for the case of 2:1 horizontal and 2:1 vertical. + * + * See comments above for details regarding color conversion and safe memory + * access. + */ + +void jsimd_h2v2_merged_upsample_neon(JDIMENSION output_width, + JSAMPIMAGE input_buf, + JDIMENSION in_row_group_ctr, + JSAMPARRAY output_buf) +{ + JSAMPROW outptr0, outptr1; + /* Pointers to Y (both rows), Cb, and Cr data */ + JSAMPROW inptr0_0, inptr0_1, inptr1, inptr2; + + const int16x4_t consts = vld1_s16(jsimd_ycc_rgb_convert_neon_consts); + const int16x8_t neg_128 = vdupq_n_s16(-128); + + inptr0_0 = input_buf[0][in_row_group_ctr * 2]; + inptr0_1 = input_buf[0][in_row_group_ctr * 2 + 1]; + inptr1 = input_buf[1][in_row_group_ctr]; + inptr2 = input_buf[2][in_row_group_ctr]; + outptr0 = output_buf[0]; + outptr1 = output_buf[1]; + + int cols_remaining = output_width; + for (; cols_remaining >= 16; cols_remaining -= 16) { + /* For each row, de-interleave Y component values into two separate + * vectors, one containing the component values with even-numbered indices + * and one containing the component values with odd-numbered indices. + */ + uint8x8x2_t y0 = vld2_u8(inptr0_0); + uint8x8x2_t y1 = vld2_u8(inptr0_1); + uint8x8_t cb = vld1_u8(inptr1); + uint8x8_t cr = vld1_u8(inptr2); + /* Subtract 128 from Cb and Cr. */ + int16x8_t cr_128 = + vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(neg_128), cr)); + int16x8_t cb_128 = + vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(neg_128), cb)); + /* Compute G-Y: - 0.34414 * (Cb - 128) - 0.71414 * (Cr - 128) */ + int32x4_t g_sub_y_l = vmull_lane_s16(vget_low_s16(cb_128), consts, 0); + int32x4_t g_sub_y_h = vmull_lane_s16(vget_high_s16(cb_128), consts, 0); + g_sub_y_l = vmlsl_lane_s16(g_sub_y_l, vget_low_s16(cr_128), consts, 1); + g_sub_y_h = vmlsl_lane_s16(g_sub_y_h, vget_high_s16(cr_128), consts, 1); + /* Descale G components: shift right 15, round, and narrow to 16-bit. */ + int16x8_t g_sub_y = vcombine_s16(vrshrn_n_s32(g_sub_y_l, 15), + vrshrn_n_s32(g_sub_y_h, 15)); + /* Compute R-Y: 1.40200 * (Cr - 128) */ + int16x8_t r_sub_y = vqrdmulhq_lane_s16(vshlq_n_s16(cr_128, 1), consts, 2); + /* Compute B-Y: 1.77200 * (Cb - 128) */ + int16x8_t b_sub_y = vqrdmulhq_lane_s16(vshlq_n_s16(cb_128, 1), consts, 3); + /* For each row, add the chroma-derived values (G-Y, R-Y, and B-Y) to both + * the "even" and "odd" Y component values. This effectively upsamples the + * chroma components both horizontally and vertically. + */ + int16x8_t g0_even = + vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(g_sub_y), + y0.val[0])); + int16x8_t r0_even = + vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(r_sub_y), + y0.val[0])); + int16x8_t b0_even = + vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(b_sub_y), + y0.val[0])); + int16x8_t g0_odd = + vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(g_sub_y), + y0.val[1])); + int16x8_t r0_odd = + vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(r_sub_y), + y0.val[1])); + int16x8_t b0_odd = + vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(b_sub_y), + y0.val[1])); + int16x8_t g1_even = + vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(g_sub_y), + y1.val[0])); + int16x8_t r1_even = + vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(r_sub_y), + y1.val[0])); + int16x8_t b1_even = + vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(b_sub_y), + y1.val[0])); + int16x8_t g1_odd = + vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(g_sub_y), + y1.val[1])); + int16x8_t r1_odd = + vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(r_sub_y), + y1.val[1])); + int16x8_t b1_odd = + vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(b_sub_y), + y1.val[1])); + /* Convert each component to unsigned and narrow, clamping to [0-255]. + * Re-interleave the "even" and "odd" component values. + */ + uint8x8x2_t r0 = vzip_u8(vqmovun_s16(r0_even), vqmovun_s16(r0_odd)); + uint8x8x2_t r1 = vzip_u8(vqmovun_s16(r1_even), vqmovun_s16(r1_odd)); + uint8x8x2_t g0 = vzip_u8(vqmovun_s16(g0_even), vqmovun_s16(g0_odd)); + uint8x8x2_t g1 = vzip_u8(vqmovun_s16(g1_even), vqmovun_s16(g1_odd)); + uint8x8x2_t b0 = vzip_u8(vqmovun_s16(b0_even), vqmovun_s16(b0_odd)); + uint8x8x2_t b1 = vzip_u8(vqmovun_s16(b1_even), vqmovun_s16(b1_odd)); + +#ifdef RGB_ALPHA + uint8x16x4_t rgba0, rgba1; + rgba0.val[RGB_RED] = vcombine_u8(r0.val[0], r0.val[1]); + rgba1.val[RGB_RED] = vcombine_u8(r1.val[0], r1.val[1]); + rgba0.val[RGB_GREEN] = vcombine_u8(g0.val[0], g0.val[1]); + rgba1.val[RGB_GREEN] = vcombine_u8(g1.val[0], g1.val[1]); + rgba0.val[RGB_BLUE] = vcombine_u8(b0.val[0], b0.val[1]); + rgba1.val[RGB_BLUE] = vcombine_u8(b1.val[0], b1.val[1]); + /* Set alpha channel to opaque (0xFF). */ + rgba0.val[RGB_ALPHA] = vdupq_n_u8(0xFF); + rgba1.val[RGB_ALPHA] = vdupq_n_u8(0xFF); + /* Store RGBA pixel data to memory. */ + vst4q_u8(outptr0, rgba0); + vst4q_u8(outptr1, rgba1); +#else + uint8x16x3_t rgb0, rgb1; + rgb0.val[RGB_RED] = vcombine_u8(r0.val[0], r0.val[1]); + rgb1.val[RGB_RED] = vcombine_u8(r1.val[0], r1.val[1]); + rgb0.val[RGB_GREEN] = vcombine_u8(g0.val[0], g0.val[1]); + rgb1.val[RGB_GREEN] = vcombine_u8(g1.val[0], g1.val[1]); + rgb0.val[RGB_BLUE] = vcombine_u8(b0.val[0], b0.val[1]); + rgb1.val[RGB_BLUE] = vcombine_u8(b1.val[0], b1.val[1]); + /* Store RGB pixel data to memory. */ + vst3q_u8(outptr0, rgb0); + vst3q_u8(outptr1, rgb1); +#endif + + /* Increment pointers. */ + inptr0_0 += 16; + inptr0_1 += 16; + inptr1 += 8; + inptr2 += 8; + outptr0 += (RGB_PIXELSIZE * 16); + outptr1 += (RGB_PIXELSIZE * 16); + } + + if (cols_remaining > 0) { + /* For each row, de-interleave Y component values into two separate + * vectors, one containing the component values with even-numbered indices + * and one containing the component values with odd-numbered indices. + */ + uint8x8x2_t y0 = vld2_u8(inptr0_0); + uint8x8x2_t y1 = vld2_u8(inptr0_1); + uint8x8_t cb = vld1_u8(inptr1); + uint8x8_t cr = vld1_u8(inptr2); + /* Subtract 128 from Cb and Cr. */ + int16x8_t cr_128 = + vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(neg_128), cr)); + int16x8_t cb_128 = + vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(neg_128), cb)); + /* Compute G-Y: - 0.34414 * (Cb - 128) - 0.71414 * (Cr - 128) */ + int32x4_t g_sub_y_l = vmull_lane_s16(vget_low_s16(cb_128), consts, 0); + int32x4_t g_sub_y_h = vmull_lane_s16(vget_high_s16(cb_128), consts, 0); + g_sub_y_l = vmlsl_lane_s16(g_sub_y_l, vget_low_s16(cr_128), consts, 1); + g_sub_y_h = vmlsl_lane_s16(g_sub_y_h, vget_high_s16(cr_128), consts, 1); + /* Descale G components: shift right 15, round, and narrow to 16-bit. */ + int16x8_t g_sub_y = vcombine_s16(vrshrn_n_s32(g_sub_y_l, 15), + vrshrn_n_s32(g_sub_y_h, 15)); + /* Compute R-Y: 1.40200 * (Cr - 128) */ + int16x8_t r_sub_y = vqrdmulhq_lane_s16(vshlq_n_s16(cr_128, 1), consts, 2); + /* Compute B-Y: 1.77200 * (Cb - 128) */ + int16x8_t b_sub_y = vqrdmulhq_lane_s16(vshlq_n_s16(cb_128, 1), consts, 3); + /* For each row, add the chroma-derived values (G-Y, R-Y, and B-Y) to both + * the "even" and "odd" Y component values. This effectively upsamples the + * chroma components both horizontally and vertically. + */ + int16x8_t g0_even = + vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(g_sub_y), + y0.val[0])); + int16x8_t r0_even = + vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(r_sub_y), + y0.val[0])); + int16x8_t b0_even = + vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(b_sub_y), + y0.val[0])); + int16x8_t g0_odd = + vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(g_sub_y), + y0.val[1])); + int16x8_t r0_odd = + vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(r_sub_y), + y0.val[1])); + int16x8_t b0_odd = + vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(b_sub_y), + y0.val[1])); + int16x8_t g1_even = + vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(g_sub_y), + y1.val[0])); + int16x8_t r1_even = + vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(r_sub_y), + y1.val[0])); + int16x8_t b1_even = + vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(b_sub_y), + y1.val[0])); + int16x8_t g1_odd = + vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(g_sub_y), + y1.val[1])); + int16x8_t r1_odd = + vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(r_sub_y), + y1.val[1])); + int16x8_t b1_odd = + vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(b_sub_y), + y1.val[1])); + /* Convert each component to unsigned and narrow, clamping to [0-255]. + * Re-interleave the "even" and "odd" component values. + */ + uint8x8x2_t r0 = vzip_u8(vqmovun_s16(r0_even), vqmovun_s16(r0_odd)); + uint8x8x2_t r1 = vzip_u8(vqmovun_s16(r1_even), vqmovun_s16(r1_odd)); + uint8x8x2_t g0 = vzip_u8(vqmovun_s16(g0_even), vqmovun_s16(g0_odd)); + uint8x8x2_t g1 = vzip_u8(vqmovun_s16(g1_even), vqmovun_s16(g1_odd)); + uint8x8x2_t b0 = vzip_u8(vqmovun_s16(b0_even), vqmovun_s16(b0_odd)); + uint8x8x2_t b1 = vzip_u8(vqmovun_s16(b1_even), vqmovun_s16(b1_odd)); + +#ifdef RGB_ALPHA + uint8x8x4_t rgba0_h, rgba1_h; + rgba0_h.val[RGB_RED] = r0.val[1]; + rgba1_h.val[RGB_RED] = r1.val[1]; + rgba0_h.val[RGB_GREEN] = g0.val[1]; + rgba1_h.val[RGB_GREEN] = g1.val[1]; + rgba0_h.val[RGB_BLUE] = b0.val[1]; + rgba1_h.val[RGB_BLUE] = b1.val[1]; + /* Set alpha channel to opaque (0xFF). */ + rgba0_h.val[RGB_ALPHA] = vdup_n_u8(0xFF); + rgba1_h.val[RGB_ALPHA] = vdup_n_u8(0xFF); + + uint8x8x4_t rgba0_l, rgba1_l; + rgba0_l.val[RGB_RED] = r0.val[0]; + rgba1_l.val[RGB_RED] = r1.val[0]; + rgba0_l.val[RGB_GREEN] = g0.val[0]; + rgba1_l.val[RGB_GREEN] = g1.val[0]; + rgba0_l.val[RGB_BLUE] = b0.val[0]; + rgba1_l.val[RGB_BLUE] = b1.val[0]; + /* Set alpha channel to opaque (0xFF). */ + rgba0_l.val[RGB_ALPHA] = vdup_n_u8(0xFF); + rgba1_l.val[RGB_ALPHA] = vdup_n_u8(0xFF); + /* Store RGBA pixel data to memory. */ + switch (cols_remaining) { + case 15: + vst4_lane_u8(outptr0 + 14 * RGB_PIXELSIZE, rgba0_h, 6); + vst4_lane_u8(outptr1 + 14 * RGB_PIXELSIZE, rgba1_h, 6); + FALLTHROUGH /*FALLTHROUGH*/ + case 14: + vst4_lane_u8(outptr0 + 13 * RGB_PIXELSIZE, rgba0_h, 5); + vst4_lane_u8(outptr1 + 13 * RGB_PIXELSIZE, rgba1_h, 5); + FALLTHROUGH /*FALLTHROUGH*/ + case 13: + vst4_lane_u8(outptr0 + 12 * RGB_PIXELSIZE, rgba0_h, 4); + vst4_lane_u8(outptr1 + 12 * RGB_PIXELSIZE, rgba1_h, 4); + FALLTHROUGH /*FALLTHROUGH*/ + case 12: + vst4_lane_u8(outptr0 + 11 * RGB_PIXELSIZE, rgba0_h, 3); + vst4_lane_u8(outptr1 + 11 * RGB_PIXELSIZE, rgba1_h, 3); + FALLTHROUGH /*FALLTHROUGH*/ + case 11: + vst4_lane_u8(outptr0 + 10 * RGB_PIXELSIZE, rgba0_h, 2); + vst4_lane_u8(outptr1 + 10 * RGB_PIXELSIZE, rgba1_h, 2); + FALLTHROUGH /*FALLTHROUGH*/ + case 10: + vst4_lane_u8(outptr0 + 9 * RGB_PIXELSIZE, rgba0_h, 1); + vst4_lane_u8(outptr1 + 9 * RGB_PIXELSIZE, rgba1_h, 1); + FALLTHROUGH /*FALLTHROUGH*/ + case 9: + vst4_lane_u8(outptr0 + 8 * RGB_PIXELSIZE, rgba0_h, 0); + vst4_lane_u8(outptr1 + 8 * RGB_PIXELSIZE, rgba1_h, 0); + FALLTHROUGH /*FALLTHROUGH*/ + case 8: + vst4_u8(outptr0, rgba0_l); + vst4_u8(outptr1, rgba1_l); + break; + case 7: + vst4_lane_u8(outptr0 + 6 * RGB_PIXELSIZE, rgba0_l, 6); + vst4_lane_u8(outptr1 + 6 * RGB_PIXELSIZE, rgba1_l, 6); + FALLTHROUGH /*FALLTHROUGH*/ + case 6: + vst4_lane_u8(outptr0 + 5 * RGB_PIXELSIZE, rgba0_l, 5); + vst4_lane_u8(outptr1 + 5 * RGB_PIXELSIZE, rgba1_l, 5); + FALLTHROUGH /*FALLTHROUGH*/ + case 5: + vst4_lane_u8(outptr0 + 4 * RGB_PIXELSIZE, rgba0_l, 4); + vst4_lane_u8(outptr1 + 4 * RGB_PIXELSIZE, rgba1_l, 4); + FALLTHROUGH /*FALLTHROUGH*/ + case 4: + vst4_lane_u8(outptr0 + 3 * RGB_PIXELSIZE, rgba0_l, 3); + vst4_lane_u8(outptr1 + 3 * RGB_PIXELSIZE, rgba1_l, 3); + FALLTHROUGH /*FALLTHROUGH*/ + case 3: + vst4_lane_u8(outptr0 + 2 * RGB_PIXELSIZE, rgba0_l, 2); + vst4_lane_u8(outptr1 + 2 * RGB_PIXELSIZE, rgba1_l, 2); + FALLTHROUGH /*FALLTHROUGH*/ + case 2: + vst4_lane_u8(outptr0 + 1 * RGB_PIXELSIZE, rgba0_l, 1); + vst4_lane_u8(outptr1 + 1 * RGB_PIXELSIZE, rgba1_l, 1); + FALLTHROUGH /*FALLTHROUGH*/ + case 1: + vst4_lane_u8(outptr0, rgba0_l, 0); + vst4_lane_u8(outptr1, rgba1_l, 0); + FALLTHROUGH /*FALLTHROUGH*/ + default: + break; + } +#else + uint8x8x3_t rgb0_h, rgb1_h; + rgb0_h.val[RGB_RED] = r0.val[1]; + rgb1_h.val[RGB_RED] = r1.val[1]; + rgb0_h.val[RGB_GREEN] = g0.val[1]; + rgb1_h.val[RGB_GREEN] = g1.val[1]; + rgb0_h.val[RGB_BLUE] = b0.val[1]; + rgb1_h.val[RGB_BLUE] = b1.val[1]; + + uint8x8x3_t rgb0_l, rgb1_l; + rgb0_l.val[RGB_RED] = r0.val[0]; + rgb1_l.val[RGB_RED] = r1.val[0]; + rgb0_l.val[RGB_GREEN] = g0.val[0]; + rgb1_l.val[RGB_GREEN] = g1.val[0]; + rgb0_l.val[RGB_BLUE] = b0.val[0]; + rgb1_l.val[RGB_BLUE] = b1.val[0]; + /* Store RGB pixel data to memory. */ + switch (cols_remaining) { + case 15: + vst3_lane_u8(outptr0 + 14 * RGB_PIXELSIZE, rgb0_h, 6); + vst3_lane_u8(outptr1 + 14 * RGB_PIXELSIZE, rgb1_h, 6); + FALLTHROUGH /*FALLTHROUGH*/ + case 14: + vst3_lane_u8(outptr0 + 13 * RGB_PIXELSIZE, rgb0_h, 5); + vst3_lane_u8(outptr1 + 13 * RGB_PIXELSIZE, rgb1_h, 5); + FALLTHROUGH /*FALLTHROUGH*/ + case 13: + vst3_lane_u8(outptr0 + 12 * RGB_PIXELSIZE, rgb0_h, 4); + vst3_lane_u8(outptr1 + 12 * RGB_PIXELSIZE, rgb1_h, 4); + FALLTHROUGH /*FALLTHROUGH*/ + case 12: + vst3_lane_u8(outptr0 + 11 * RGB_PIXELSIZE, rgb0_h, 3); + vst3_lane_u8(outptr1 + 11 * RGB_PIXELSIZE, rgb1_h, 3); + FALLTHROUGH /*FALLTHROUGH*/ + case 11: + vst3_lane_u8(outptr0 + 10 * RGB_PIXELSIZE, rgb0_h, 2); + vst3_lane_u8(outptr1 + 10 * RGB_PIXELSIZE, rgb1_h, 2); + FALLTHROUGH /*FALLTHROUGH*/ + case 10: + vst3_lane_u8(outptr0 + 9 * RGB_PIXELSIZE, rgb0_h, 1); + vst3_lane_u8(outptr1 + 9 * RGB_PIXELSIZE, rgb1_h, 1); + FALLTHROUGH /*FALLTHROUGH*/ + case 9: + vst3_lane_u8(outptr0 + 8 * RGB_PIXELSIZE, rgb0_h, 0); + vst3_lane_u8(outptr1 + 8 * RGB_PIXELSIZE, rgb1_h, 0); + FALLTHROUGH /*FALLTHROUGH*/ + case 8: + vst3_u8(outptr0, rgb0_l); + vst3_u8(outptr1, rgb1_l); + break; + case 7: + vst3_lane_u8(outptr0 + 6 * RGB_PIXELSIZE, rgb0_l, 6); + vst3_lane_u8(outptr1 + 6 * RGB_PIXELSIZE, rgb1_l, 6); + FALLTHROUGH /*FALLTHROUGH*/ + case 6: + vst3_lane_u8(outptr0 + 5 * RGB_PIXELSIZE, rgb0_l, 5); + vst3_lane_u8(outptr1 + 5 * RGB_PIXELSIZE, rgb1_l, 5); + FALLTHROUGH /*FALLTHROUGH*/ + case 5: + vst3_lane_u8(outptr0 + 4 * RGB_PIXELSIZE, rgb0_l, 4); + vst3_lane_u8(outptr1 + 4 * RGB_PIXELSIZE, rgb1_l, 4); + FALLTHROUGH /*FALLTHROUGH*/ + case 4: + vst3_lane_u8(outptr0 + 3 * RGB_PIXELSIZE, rgb0_l, 3); + vst3_lane_u8(outptr1 + 3 * RGB_PIXELSIZE, rgb1_l, 3); + FALLTHROUGH /*FALLTHROUGH*/ + case 3: + vst3_lane_u8(outptr0 + 2 * RGB_PIXELSIZE, rgb0_l, 2); + vst3_lane_u8(outptr1 + 2 * RGB_PIXELSIZE, rgb1_l, 2); + FALLTHROUGH /*FALLTHROUGH*/ + case 2: + vst3_lane_u8(outptr0 + 1 * RGB_PIXELSIZE, rgb0_l, 1); + vst3_lane_u8(outptr1 + 1 * RGB_PIXELSIZE, rgb1_l, 1); + FALLTHROUGH /*FALLTHROUGH*/ + case 1: + vst3_lane_u8(outptr0, rgb0_l, 0); + vst3_lane_u8(outptr1, rgb1_l, 0); + FALLTHROUGH /*FALLTHROUGH*/ + default: + break; + } +#endif + } +} diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/jdsample-neon.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/jdsample-neon.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/jdsample-neon.c 1970-01-01 01:00:00.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/jdsample-neon.c 2021-11-20 03:41:33.400600418 +0000 @@ -0,0 +1,569 @@ +/* + * jdsample-neon.c - upsampling (Arm Neon) + * + * Copyright (C) 2020, Arm Limited. All Rights Reserved. + * Copyright (C) 2020, D. R. Commander. All Rights Reserved. + * + * This software is provided 'as-is', without any express or implied + * warranty. In no event will the authors be held liable for any damages + * arising from the use of this software. + * + * Permission is granted to anyone to use this software for any purpose, + * including commercial applications, and to alter it and redistribute it + * freely, subject to the following restrictions: + * + * 1. The origin of this software must not be misrepresented; you must not + * claim that you wrote the original software. If you use this software + * in a product, an acknowledgment in the product documentation would be + * appreciated but is not required. + * 2. Altered source versions must be plainly marked as such, and must not be + * misrepresented as being the original software. + * 3. This notice may not be removed or altered from any source distribution. + */ + +#define JPEG_INTERNALS +#include "../../jinclude.h" +#include "../../jpeglib.h" +#include "../../jsimd.h" +#include "../../jdct.h" +#include "../../jsimddct.h" +#include "../jsimd.h" + +#include + + +/* The diagram below shows a row of samples produced by h2v1 downsampling. + * + * s0 s1 s2 + * +---------+---------+---------+ + * | | | | + * | p0 p1 | p2 p3 | p4 p5 | + * | | | | + * +---------+---------+---------+ + * + * Samples s0-s2 were created by averaging the original pixel component values + * centered at positions p0-p5 above. To approximate those original pixel + * component values, we proportionally blend the adjacent samples in each row. + * + * An upsampled pixel component value is computed by blending the sample + * containing the pixel center with the nearest neighboring sample, in the + * ratio 3:1. For example: + * p1(upsampled) = 3/4 * s0 + 1/4 * s1 + * p2(upsampled) = 3/4 * s1 + 1/4 * s0 + * When computing the first and last pixel component values in the row, there + * is no adjacent sample to blend, so: + * p0(upsampled) = s0 + * p5(upsampled) = s2 + */ + +void jsimd_h2v1_fancy_upsample_neon(int max_v_samp_factor, + JDIMENSION downsampled_width, + JSAMPARRAY input_data, + JSAMPARRAY *output_data_ptr) +{ + JSAMPARRAY output_data = *output_data_ptr; + JSAMPROW inptr, outptr; + int inrow; + unsigned colctr; + /* Set up constants. */ + const uint16x8_t one_u16 = vdupq_n_u16(1); + const uint8x8_t three_u8 = vdup_n_u8(3); + + for (inrow = 0; inrow < max_v_samp_factor; inrow++) { + inptr = input_data[inrow]; + outptr = output_data[inrow]; + /* First pixel component value in this row of the original image */ + *outptr = (JSAMPLE)GETJSAMPLE(*inptr); + + /* 3/4 * containing sample + 1/4 * nearest neighboring sample + * For p1: containing sample = s0, nearest neighboring sample = s1 + * For p2: containing sample = s1, nearest neighboring sample = s0 + */ + uint8x16_t s0 = vld1q_u8(inptr); + uint8x16_t s1 = vld1q_u8(inptr + 1); + /* Multiplication makes vectors twice as wide. '_l' and '_h' suffixes + * denote low half and high half respectively. + */ + uint16x8_t s1_add_3s0_l = + vmlal_u8(vmovl_u8(vget_low_u8(s1)), vget_low_u8(s0), three_u8); + uint16x8_t s1_add_3s0_h = + vmlal_u8(vmovl_u8(vget_high_u8(s1)), vget_high_u8(s0), three_u8); + uint16x8_t s0_add_3s1_l = + vmlal_u8(vmovl_u8(vget_low_u8(s0)), vget_low_u8(s1), three_u8); + uint16x8_t s0_add_3s1_h = + vmlal_u8(vmovl_u8(vget_high_u8(s0)), vget_high_u8(s1), three_u8); + /* Add ordered dithering bias to odd pixel values. */ + s0_add_3s1_l = vaddq_u16(s0_add_3s1_l, one_u16); + s0_add_3s1_h = vaddq_u16(s0_add_3s1_h, one_u16); + + /* The offset is initially 1, because the first pixel component has already + * been stored. However, in subsequent iterations of the SIMD loop, this + * offset is (2 * colctr - 1) to stay within the bounds of the sample + * buffers without having to resort to a slow scalar tail case for the last + * (downsampled_width % 16) samples. See "Creation of 2-D sample arrays" + * in jmemmgr.c for more details. + */ + unsigned outptr_offset = 1; + uint8x16x2_t output_pixels; + + /* We use software pipelining to maximise performance. The code indented + * an extra two spaces begins the next iteration of the loop. + */ + for (colctr = 16; colctr < downsampled_width; colctr += 16) { + + s0 = vld1q_u8(inptr + colctr - 1); + s1 = vld1q_u8(inptr + colctr); + + /* Right-shift by 2 (divide by 4), narrow to 8-bit, and combine. */ + output_pixels.val[0] = vcombine_u8(vrshrn_n_u16(s1_add_3s0_l, 2), + vrshrn_n_u16(s1_add_3s0_h, 2)); + output_pixels.val[1] = vcombine_u8(vshrn_n_u16(s0_add_3s1_l, 2), + vshrn_n_u16(s0_add_3s1_h, 2)); + + /* Multiplication makes vectors twice as wide. '_l' and '_h' suffixes + * denote low half and high half respectively. + */ + s1_add_3s0_l = + vmlal_u8(vmovl_u8(vget_low_u8(s1)), vget_low_u8(s0), three_u8); + s1_add_3s0_h = + vmlal_u8(vmovl_u8(vget_high_u8(s1)), vget_high_u8(s0), three_u8); + s0_add_3s1_l = + vmlal_u8(vmovl_u8(vget_low_u8(s0)), vget_low_u8(s1), three_u8); + s0_add_3s1_h = + vmlal_u8(vmovl_u8(vget_high_u8(s0)), vget_high_u8(s1), three_u8); + /* Add ordered dithering bias to odd pixel values. */ + s0_add_3s1_l = vaddq_u16(s0_add_3s1_l, one_u16); + s0_add_3s1_h = vaddq_u16(s0_add_3s1_h, one_u16); + + /* Store pixel component values to memory. */ + vst2q_u8(outptr + outptr_offset, output_pixels); + outptr_offset = 2 * colctr - 1; + } + + /* Complete the last iteration of the loop. */ + + /* Right-shift by 2 (divide by 4), narrow to 8-bit, and combine. */ + output_pixels.val[0] = vcombine_u8(vrshrn_n_u16(s1_add_3s0_l, 2), + vrshrn_n_u16(s1_add_3s0_h, 2)); + output_pixels.val[1] = vcombine_u8(vshrn_n_u16(s0_add_3s1_l, 2), + vshrn_n_u16(s0_add_3s1_h, 2)); + /* Store pixel component values to memory. */ + vst2q_u8(outptr + outptr_offset, output_pixels); + + /* Last pixel component value in this row of the original image */ + outptr[2 * downsampled_width - 1] = + GETJSAMPLE(inptr[downsampled_width - 1]); + } +} + + +/* The diagram below shows an array of samples produced by h2v2 downsampling. + * + * s0 s1 s2 + * +---------+---------+---------+ + * | p0 p1 | p2 p3 | p4 p5 | + * sA | | | | + * | p6 p7 | p8 p9 | p10 p11| + * +---------+---------+---------+ + * | p12 p13| p14 p15| p16 p17| + * sB | | | | + * | p18 p19| p20 p21| p22 p23| + * +---------+---------+---------+ + * | p24 p25| p26 p27| p28 p29| + * sC | | | | + * | p30 p31| p32 p33| p34 p35| + * +---------+---------+---------+ + * + * Samples s0A-s2C were created by averaging the original pixel component + * values centered at positions p0-p35 above. To approximate one of those + * original pixel component values, we proportionally blend the sample + * containing the pixel center with the nearest neighboring samples in each + * row, column, and diagonal. + * + * An upsampled pixel component value is computed by first blending the sample + * containing the pixel center with the nearest neighboring samples in the + * same column, in the ratio 3:1, and then blending each column sum with the + * nearest neighboring column sum, in the ratio 3:1. For example: + * p14(upsampled) = 3/4 * (3/4 * s1B + 1/4 * s1A) + + * 1/4 * (3/4 * s0B + 1/4 * s0A) + * = 9/16 * s1B + 3/16 * s1A + 3/16 * s0B + 1/16 * s0A + * When computing the first and last pixel component values in the row, there + * is no horizontally adjacent sample to blend, so: + * p12(upsampled) = 3/4 * s0B + 1/4 * s0A + * p23(upsampled) = 3/4 * s2B + 1/4 * s2C + * When computing the first and last pixel component values in the column, + * there is no vertically adjacent sample to blend, so: + * p2(upsampled) = 3/4 * s1A + 1/4 * s0A + * p33(upsampled) = 3/4 * s1C + 1/4 * s2C + * When computing the corner pixel component values, there is no adjacent + * sample to blend, so: + * p0(upsampled) = s0A + * p35(upsampled) = s2C + */ + +void jsimd_h2v2_fancy_upsample_neon(int max_v_samp_factor, + JDIMENSION downsampled_width, + JSAMPARRAY input_data, + JSAMPARRAY *output_data_ptr) +{ + JSAMPARRAY output_data = *output_data_ptr; + JSAMPROW inptr0, inptr1, inptr2, outptr0, outptr1; + int inrow, outrow; + unsigned colctr; + /* Set up constants. */ + const uint16x8_t seven_u16 = vdupq_n_u16(7); + const uint8x8_t three_u8 = vdup_n_u8(3); + const uint16x8_t three_u16 = vdupq_n_u16(3); + + inrow = outrow = 0; + while (outrow < max_v_samp_factor) { + inptr0 = input_data[inrow - 1]; + inptr1 = input_data[inrow]; + inptr2 = input_data[inrow + 1]; + /* Suffixes 0 and 1 denote the upper and lower rows of output pixels, + * respectively. + */ + outptr0 = output_data[outrow++]; + outptr1 = output_data[outrow++]; + + /* First pixel component value in this row of the original image */ + int s0colsum0 = GETJSAMPLE(*inptr1) * 3 + GETJSAMPLE(*inptr0); + *outptr0 = (JSAMPLE)((s0colsum0 * 4 + 8) >> 4); + int s0colsum1 = GETJSAMPLE(*inptr1) * 3 + GETJSAMPLE(*inptr2); + *outptr1 = (JSAMPLE)((s0colsum1 * 4 + 8) >> 4); + + /* Step 1: Blend samples vertically in columns s0 and s1. + * Leave the divide by 4 until the end, when it can be done for both + * dimensions at once, right-shifting by 4. + */ + + /* Load and compute s0colsum0 and s0colsum1. */ + uint8x16_t s0A = vld1q_u8(inptr0); + uint8x16_t s0B = vld1q_u8(inptr1); + uint8x16_t s0C = vld1q_u8(inptr2); + /* Multiplication makes vectors twice as wide. '_l' and '_h' suffixes + * denote low half and high half respectively. + */ + uint16x8_t s0colsum0_l = vmlal_u8(vmovl_u8(vget_low_u8(s0A)), + vget_low_u8(s0B), three_u8); + uint16x8_t s0colsum0_h = vmlal_u8(vmovl_u8(vget_high_u8(s0A)), + vget_high_u8(s0B), three_u8); + uint16x8_t s0colsum1_l = vmlal_u8(vmovl_u8(vget_low_u8(s0C)), + vget_low_u8(s0B), three_u8); + uint16x8_t s0colsum1_h = vmlal_u8(vmovl_u8(vget_high_u8(s0C)), + vget_high_u8(s0B), three_u8); + /* Load and compute s1colsum0 and s1colsum1. */ + uint8x16_t s1A = vld1q_u8(inptr0 + 1); + uint8x16_t s1B = vld1q_u8(inptr1 + 1); + uint8x16_t s1C = vld1q_u8(inptr2 + 1); + uint16x8_t s1colsum0_l = vmlal_u8(vmovl_u8(vget_low_u8(s1A)), + vget_low_u8(s1B), three_u8); + uint16x8_t s1colsum0_h = vmlal_u8(vmovl_u8(vget_high_u8(s1A)), + vget_high_u8(s1B), three_u8); + uint16x8_t s1colsum1_l = vmlal_u8(vmovl_u8(vget_low_u8(s1C)), + vget_low_u8(s1B), three_u8); + uint16x8_t s1colsum1_h = vmlal_u8(vmovl_u8(vget_high_u8(s1C)), + vget_high_u8(s1B), three_u8); + + /* Step 2: Blend the already-blended columns. */ + + uint16x8_t output0_p1_l = vmlaq_u16(s1colsum0_l, s0colsum0_l, three_u16); + uint16x8_t output0_p1_h = vmlaq_u16(s1colsum0_h, s0colsum0_h, three_u16); + uint16x8_t output0_p2_l = vmlaq_u16(s0colsum0_l, s1colsum0_l, three_u16); + uint16x8_t output0_p2_h = vmlaq_u16(s0colsum0_h, s1colsum0_h, three_u16); + uint16x8_t output1_p1_l = vmlaq_u16(s1colsum1_l, s0colsum1_l, three_u16); + uint16x8_t output1_p1_h = vmlaq_u16(s1colsum1_h, s0colsum1_h, three_u16); + uint16x8_t output1_p2_l = vmlaq_u16(s0colsum1_l, s1colsum1_l, three_u16); + uint16x8_t output1_p2_h = vmlaq_u16(s0colsum1_h, s1colsum1_h, three_u16); + /* Add ordered dithering bias to odd pixel values. */ + output0_p1_l = vaddq_u16(output0_p1_l, seven_u16); + output0_p1_h = vaddq_u16(output0_p1_h, seven_u16); + output1_p1_l = vaddq_u16(output1_p1_l, seven_u16); + output1_p1_h = vaddq_u16(output1_p1_h, seven_u16); + /* Right-shift by 4 (divide by 16), narrow to 8-bit, and combine. */ + uint8x16x2_t output_pixels0 = { { + vcombine_u8(vshrn_n_u16(output0_p1_l, 4), vshrn_n_u16(output0_p1_h, 4)), + vcombine_u8(vrshrn_n_u16(output0_p2_l, 4), vrshrn_n_u16(output0_p2_h, 4)) + } }; + uint8x16x2_t output_pixels1 = { { + vcombine_u8(vshrn_n_u16(output1_p1_l, 4), vshrn_n_u16(output1_p1_h, 4)), + vcombine_u8(vrshrn_n_u16(output1_p2_l, 4), vrshrn_n_u16(output1_p2_h, 4)) + } }; + + /* Store pixel component values to memory. + * The minimum size of the output buffer for each row is 64 bytes => no + * need to worry about buffer overflow here. See "Creation of 2-D sample + * arrays" in jmemmgr.c for more details. + */ + vst2q_u8(outptr0 + 1, output_pixels0); + vst2q_u8(outptr1 + 1, output_pixels1); + + /* The first pixel of the image shifted our loads and stores by one byte. + * We have to re-align on a 32-byte boundary at some point before the end + * of the row (we do it now on the 32/33 pixel boundary) to stay within the + * bounds of the sample buffers without having to resort to a slow scalar + * tail case for the last (downsampled_width % 16) samples. See "Creation + * of 2-D sample arrays" in jmemmgr.c for more details. + */ + for (colctr = 16; colctr < downsampled_width; colctr += 16) { + /* Step 1: Blend samples vertically in columns s0 and s1. */ + + /* Load and compute s0colsum0 and s0colsum1. */ + s0A = vld1q_u8(inptr0 + colctr - 1); + s0B = vld1q_u8(inptr1 + colctr - 1); + s0C = vld1q_u8(inptr2 + colctr - 1); + s0colsum0_l = vmlal_u8(vmovl_u8(vget_low_u8(s0A)), vget_low_u8(s0B), + three_u8); + s0colsum0_h = vmlal_u8(vmovl_u8(vget_high_u8(s0A)), vget_high_u8(s0B), + three_u8); + s0colsum1_l = vmlal_u8(vmovl_u8(vget_low_u8(s0C)), vget_low_u8(s0B), + three_u8); + s0colsum1_h = vmlal_u8(vmovl_u8(vget_high_u8(s0C)), vget_high_u8(s0B), + three_u8); + /* Load and compute s1colsum0 and s1colsum1. */ + s1A = vld1q_u8(inptr0 + colctr); + s1B = vld1q_u8(inptr1 + colctr); + s1C = vld1q_u8(inptr2 + colctr); + s1colsum0_l = vmlal_u8(vmovl_u8(vget_low_u8(s1A)), vget_low_u8(s1B), + three_u8); + s1colsum0_h = vmlal_u8(vmovl_u8(vget_high_u8(s1A)), vget_high_u8(s1B), + three_u8); + s1colsum1_l = vmlal_u8(vmovl_u8(vget_low_u8(s1C)), vget_low_u8(s1B), + three_u8); + s1colsum1_h = vmlal_u8(vmovl_u8(vget_high_u8(s1C)), vget_high_u8(s1B), + three_u8); + + /* Step 2: Blend the already-blended columns. */ + + output0_p1_l = vmlaq_u16(s1colsum0_l, s0colsum0_l, three_u16); + output0_p1_h = vmlaq_u16(s1colsum0_h, s0colsum0_h, three_u16); + output0_p2_l = vmlaq_u16(s0colsum0_l, s1colsum0_l, three_u16); + output0_p2_h = vmlaq_u16(s0colsum0_h, s1colsum0_h, three_u16); + output1_p1_l = vmlaq_u16(s1colsum1_l, s0colsum1_l, three_u16); + output1_p1_h = vmlaq_u16(s1colsum1_h, s0colsum1_h, three_u16); + output1_p2_l = vmlaq_u16(s0colsum1_l, s1colsum1_l, three_u16); + output1_p2_h = vmlaq_u16(s0colsum1_h, s1colsum1_h, three_u16); + /* Add ordered dithering bias to odd pixel values. */ + output0_p1_l = vaddq_u16(output0_p1_l, seven_u16); + output0_p1_h = vaddq_u16(output0_p1_h, seven_u16); + output1_p1_l = vaddq_u16(output1_p1_l, seven_u16); + output1_p1_h = vaddq_u16(output1_p1_h, seven_u16); + /* Right-shift by 4 (divide by 16), narrow to 8-bit, and combine. */ + output_pixels0.val[0] = vcombine_u8(vshrn_n_u16(output0_p1_l, 4), + vshrn_n_u16(output0_p1_h, 4)); + output_pixels0.val[1] = vcombine_u8(vrshrn_n_u16(output0_p2_l, 4), + vrshrn_n_u16(output0_p2_h, 4)); + output_pixels1.val[0] = vcombine_u8(vshrn_n_u16(output1_p1_l, 4), + vshrn_n_u16(output1_p1_h, 4)); + output_pixels1.val[1] = vcombine_u8(vrshrn_n_u16(output1_p2_l, 4), + vrshrn_n_u16(output1_p2_h, 4)); + /* Store pixel component values to memory. */ + vst2q_u8(outptr0 + 2 * colctr - 1, output_pixels0); + vst2q_u8(outptr1 + 2 * colctr - 1, output_pixels1); + } + + /* Last pixel component value in this row of the original image */ + int s1colsum0 = GETJSAMPLE(inptr1[downsampled_width - 1]) * 3 + + GETJSAMPLE(inptr0[downsampled_width - 1]); + outptr0[2 * downsampled_width - 1] = (JSAMPLE)((s1colsum0 * 4 + 7) >> 4); + int s1colsum1 = GETJSAMPLE(inptr1[downsampled_width - 1]) * 3 + + GETJSAMPLE(inptr2[downsampled_width - 1]); + outptr1[2 * downsampled_width - 1] = (JSAMPLE)((s1colsum1 * 4 + 7) >> 4); + inrow++; + } +} + + +/* The diagram below shows a column of samples produced by h1v2 downsampling + * (or by losslessly rotating or transposing an h2v1-downsampled image.) + * + * +---------+ + * | p0 | + * sA | | + * | p1 | + * +---------+ + * | p2 | + * sB | | + * | p3 | + * +---------+ + * | p4 | + * sC | | + * | p5 | + * +---------+ + * + * Samples sA-sC were created by averaging the original pixel component values + * centered at positions p0-p5 above. To approximate those original pixel + * component values, we proportionally blend the adjacent samples in each + * column. + * + * An upsampled pixel component value is computed by blending the sample + * containing the pixel center with the nearest neighboring sample, in the + * ratio 3:1. For example: + * p1(upsampled) = 3/4 * sA + 1/4 * sB + * p2(upsampled) = 3/4 * sB + 1/4 * sA + * When computing the first and last pixel component values in the column, + * there is no adjacent sample to blend, so: + * p0(upsampled) = sA + * p5(upsampled) = sC + */ + +void jsimd_h1v2_fancy_upsample_neon(int max_v_samp_factor, + JDIMENSION downsampled_width, + JSAMPARRAY input_data, + JSAMPARRAY *output_data_ptr) +{ + JSAMPARRAY output_data = *output_data_ptr; + JSAMPROW inptr0, inptr1, inptr2, outptr0, outptr1; + int inrow, outrow; + unsigned colctr; + /* Set up constants. */ + const uint16x8_t one_u16 = vdupq_n_u16(1); + const uint8x8_t three_u8 = vdup_n_u8(3); + + inrow = outrow = 0; + while (outrow < max_v_samp_factor) { + inptr0 = input_data[inrow - 1]; + inptr1 = input_data[inrow]; + inptr2 = input_data[inrow + 1]; + /* Suffixes 0 and 1 denote the upper and lower rows of output pixels, + * respectively. + */ + outptr0 = output_data[outrow++]; + outptr1 = output_data[outrow++]; + inrow++; + + /* The size of the input and output buffers is always a multiple of 32 + * bytes => no need to worry about buffer overflow when reading/writing + * memory. See "Creation of 2-D sample arrays" in jmemmgr.c for more + * details. + */ + for (colctr = 0; colctr < downsampled_width; colctr += 16) { + /* Load samples. */ + uint8x16_t sA = vld1q_u8(inptr0 + colctr); + uint8x16_t sB = vld1q_u8(inptr1 + colctr); + uint8x16_t sC = vld1q_u8(inptr2 + colctr); + /* Blend samples vertically. */ + uint16x8_t colsum0_l = vmlal_u8(vmovl_u8(vget_low_u8(sA)), + vget_low_u8(sB), three_u8); + uint16x8_t colsum0_h = vmlal_u8(vmovl_u8(vget_high_u8(sA)), + vget_high_u8(sB), three_u8); + uint16x8_t colsum1_l = vmlal_u8(vmovl_u8(vget_low_u8(sC)), + vget_low_u8(sB), three_u8); + uint16x8_t colsum1_h = vmlal_u8(vmovl_u8(vget_high_u8(sC)), + vget_high_u8(sB), three_u8); + /* Add ordered dithering bias to pixel values in even output rows. */ + colsum0_l = vaddq_u16(colsum0_l, one_u16); + colsum0_h = vaddq_u16(colsum0_h, one_u16); + /* Right-shift by 2 (divide by 4), narrow to 8-bit, and combine. */ + uint8x16_t output_pixels0 = vcombine_u8(vshrn_n_u16(colsum0_l, 2), + vshrn_n_u16(colsum0_h, 2)); + uint8x16_t output_pixels1 = vcombine_u8(vrshrn_n_u16(colsum1_l, 2), + vrshrn_n_u16(colsum1_h, 2)); + /* Store pixel component values to memory. */ + vst1q_u8(outptr0 + colctr, output_pixels0); + vst1q_u8(outptr1 + colctr, output_pixels1); + } + } +} + + +/* The diagram below shows a row of samples produced by h2v1 downsampling. + * + * s0 s1 + * +---------+---------+ + * | | | + * | p0 p1 | p2 p3 | + * | | | + * +---------+---------+ + * + * Samples s0 and s1 were created by averaging the original pixel component + * values centered at positions p0-p3 above. To approximate those original + * pixel component values, we duplicate the samples horizontally: + * p0(upsampled) = p1(upsampled) = s0 + * p2(upsampled) = p3(upsampled) = s1 + */ + +void jsimd_h2v1_upsample_neon(int max_v_samp_factor, JDIMENSION output_width, + JSAMPARRAY input_data, + JSAMPARRAY *output_data_ptr) +{ + JSAMPARRAY output_data = *output_data_ptr; + JSAMPROW inptr, outptr; + int inrow; + unsigned colctr; + + for (inrow = 0; inrow < max_v_samp_factor; inrow++) { + inptr = input_data[inrow]; + outptr = output_data[inrow]; + for (colctr = 0; 2 * colctr < output_width; colctr += 16) { + uint8x16_t samples = vld1q_u8(inptr + colctr); + /* Duplicate the samples. The store operation below interleaves them so + * that adjacent pixel component values take on the same sample value, + * per above. + */ + uint8x16x2_t output_pixels = { { samples, samples } }; + /* Store pixel component values to memory. + * Due to the way sample buffers are allocated, we don't need to worry + * about tail cases when output_width is not a multiple of 32. See + * "Creation of 2-D sample arrays" in jmemmgr.c for details. + */ + vst2q_u8(outptr + 2 * colctr, output_pixels); + } + } +} + + +/* The diagram below shows an array of samples produced by h2v2 downsampling. + * + * s0 s1 + * +---------+---------+ + * | p0 p1 | p2 p3 | + * sA | | | + * | p4 p5 | p6 p7 | + * +---------+---------+ + * | p8 p9 | p10 p11| + * sB | | | + * | p12 p13| p14 p15| + * +---------+---------+ + * + * Samples s0A-s1B were created by averaging the original pixel component + * values centered at positions p0-p15 above. To approximate those original + * pixel component values, we duplicate the samples both horizontally and + * vertically: + * p0(upsampled) = p1(upsampled) = p4(upsampled) = p5(upsampled) = s0A + * p2(upsampled) = p3(upsampled) = p6(upsampled) = p7(upsampled) = s1A + * p8(upsampled) = p9(upsampled) = p12(upsampled) = p13(upsampled) = s0B + * p10(upsampled) = p11(upsampled) = p14(upsampled) = p15(upsampled) = s1B + */ + +void jsimd_h2v2_upsample_neon(int max_v_samp_factor, JDIMENSION output_width, + JSAMPARRAY input_data, + JSAMPARRAY *output_data_ptr) +{ + JSAMPARRAY output_data = *output_data_ptr; + JSAMPROW inptr, outptr0, outptr1; + int inrow, outrow; + unsigned colctr; + + for (inrow = 0, outrow = 0; outrow < max_v_samp_factor; inrow++) { + inptr = input_data[inrow]; + outptr0 = output_data[outrow++]; + outptr1 = output_data[outrow++]; + + for (colctr = 0; 2 * colctr < output_width; colctr += 16) { + uint8x16_t samples = vld1q_u8(inptr + colctr); + /* Duplicate the samples. The store operation below interleaves them so + * that adjacent pixel component values take on the same sample value, + * per above. + */ + uint8x16x2_t output_pixels = { { samples, samples } }; + /* Store pixel component values for both output rows to memory. + * Due to the way sample buffers are allocated, we don't need to worry + * about tail cases when output_width is not a multiple of 32. See + * "Creation of 2-D sample arrays" in jmemmgr.c for details. + */ + vst2q_u8(outptr0 + 2 * colctr, output_pixels); + vst2q_u8(outptr1 + 2 * colctr, output_pixels); + } + } +} diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/jfdctfst-neon.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/jfdctfst-neon.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/jfdctfst-neon.c 1970-01-01 01:00:00.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/jfdctfst-neon.c 2021-11-20 03:41:33.400600418 +0000 @@ -0,0 +1,214 @@ +/* + * jfdctfst-neon.c - fast integer FDCT (Arm Neon) + * + * Copyright (C) 2020, Arm Limited. All Rights Reserved. + * + * This software is provided 'as-is', without any express or implied + * warranty. In no event will the authors be held liable for any damages + * arising from the use of this software. + * + * Permission is granted to anyone to use this software for any purpose, + * including commercial applications, and to alter it and redistribute it + * freely, subject to the following restrictions: + * + * 1. The origin of this software must not be misrepresented; you must not + * claim that you wrote the original software. If you use this software + * in a product, an acknowledgment in the product documentation would be + * appreciated but is not required. + * 2. Altered source versions must be plainly marked as such, and must not be + * misrepresented as being the original software. + * 3. This notice may not be removed or altered from any source distribution. + */ + +#define JPEG_INTERNALS +#include "../../jinclude.h" +#include "../../jpeglib.h" +#include "../../jsimd.h" +#include "../../jdct.h" +#include "../../jsimddct.h" +#include "../jsimd.h" +#include "align.h" + +#include + + +/* jsimd_fdct_ifast_neon() performs a fast, not so accurate forward DCT + * (Discrete Cosine Transform) on one block of samples. It uses the same + * calculations and produces exactly the same output as IJG's original + * jpeg_fdct_ifast() function, which can be found in jfdctfst.c. + * + * Scaled integer constants are used to avoid floating-point arithmetic: + * 0.382683433 = 12544 * 2^-15 + * 0.541196100 = 17795 * 2^-15 + * 0.707106781 = 23168 * 2^-15 + * 0.306562965 = 9984 * 2^-15 + * + * See jfdctfst.c for further details of the DCT algorithm. Where possible, + * the variable names and comments here in jsimd_fdct_ifast_neon() match up + * with those in jpeg_fdct_ifast(). + */ + +#define F_0_382 12544 +#define F_0_541 17792 +#define F_0_707 23168 +#define F_0_306 9984 + + +ALIGN(16) static const int16_t jsimd_fdct_ifast_neon_consts[] = { + F_0_382, F_0_541, F_0_707, F_0_306 +}; + +void jsimd_fdct_ifast_neon(DCTELEM *data) +{ + /* Load an 8x8 block of samples into Neon registers. De-interleaving loads + * are used, followed by vuzp to transpose the block such that we have a + * column of samples per vector - allowing all rows to be processed at once. + */ + int16x8x4_t data1 = vld4q_s16(data); + int16x8x4_t data2 = vld4q_s16(data + 4 * DCTSIZE); + + int16x8x2_t cols_04 = vuzpq_s16(data1.val[0], data2.val[0]); + int16x8x2_t cols_15 = vuzpq_s16(data1.val[1], data2.val[1]); + int16x8x2_t cols_26 = vuzpq_s16(data1.val[2], data2.val[2]); + int16x8x2_t cols_37 = vuzpq_s16(data1.val[3], data2.val[3]); + + int16x8_t col0 = cols_04.val[0]; + int16x8_t col1 = cols_15.val[0]; + int16x8_t col2 = cols_26.val[0]; + int16x8_t col3 = cols_37.val[0]; + int16x8_t col4 = cols_04.val[1]; + int16x8_t col5 = cols_15.val[1]; + int16x8_t col6 = cols_26.val[1]; + int16x8_t col7 = cols_37.val[1]; + + /* Pass 1: process rows. */ + + /* Load DCT conversion constants. */ + const int16x4_t consts = vld1_s16(jsimd_fdct_ifast_neon_consts); + + int16x8_t tmp0 = vaddq_s16(col0, col7); + int16x8_t tmp7 = vsubq_s16(col0, col7); + int16x8_t tmp1 = vaddq_s16(col1, col6); + int16x8_t tmp6 = vsubq_s16(col1, col6); + int16x8_t tmp2 = vaddq_s16(col2, col5); + int16x8_t tmp5 = vsubq_s16(col2, col5); + int16x8_t tmp3 = vaddq_s16(col3, col4); + int16x8_t tmp4 = vsubq_s16(col3, col4); + + /* Even part */ + int16x8_t tmp10 = vaddq_s16(tmp0, tmp3); /* phase 2 */ + int16x8_t tmp13 = vsubq_s16(tmp0, tmp3); + int16x8_t tmp11 = vaddq_s16(tmp1, tmp2); + int16x8_t tmp12 = vsubq_s16(tmp1, tmp2); + + col0 = vaddq_s16(tmp10, tmp11); /* phase 3 */ + col4 = vsubq_s16(tmp10, tmp11); + + int16x8_t z1 = vqdmulhq_lane_s16(vaddq_s16(tmp12, tmp13), consts, 2); + col2 = vaddq_s16(tmp13, z1); /* phase 5 */ + col6 = vsubq_s16(tmp13, z1); + + /* Odd part */ + tmp10 = vaddq_s16(tmp4, tmp5); /* phase 2 */ + tmp11 = vaddq_s16(tmp5, tmp6); + tmp12 = vaddq_s16(tmp6, tmp7); + + int16x8_t z5 = vqdmulhq_lane_s16(vsubq_s16(tmp10, tmp12), consts, 0); + int16x8_t z2 = vqdmulhq_lane_s16(tmp10, consts, 1); + z2 = vaddq_s16(z2, z5); + int16x8_t z4 = vqdmulhq_lane_s16(tmp12, consts, 3); + z5 = vaddq_s16(tmp12, z5); + z4 = vaddq_s16(z4, z5); + int16x8_t z3 = vqdmulhq_lane_s16(tmp11, consts, 2); + + int16x8_t z11 = vaddq_s16(tmp7, z3); /* phase 5 */ + int16x8_t z13 = vsubq_s16(tmp7, z3); + + col5 = vaddq_s16(z13, z2); /* phase 6 */ + col3 = vsubq_s16(z13, z2); + col1 = vaddq_s16(z11, z4); + col7 = vsubq_s16(z11, z4); + + /* Transpose to work on columns in pass 2. */ + int16x8x2_t cols_01 = vtrnq_s16(col0, col1); + int16x8x2_t cols_23 = vtrnq_s16(col2, col3); + int16x8x2_t cols_45 = vtrnq_s16(col4, col5); + int16x8x2_t cols_67 = vtrnq_s16(col6, col7); + + int32x4x2_t cols_0145_l = vtrnq_s32(vreinterpretq_s32_s16(cols_01.val[0]), + vreinterpretq_s32_s16(cols_45.val[0])); + int32x4x2_t cols_0145_h = vtrnq_s32(vreinterpretq_s32_s16(cols_01.val[1]), + vreinterpretq_s32_s16(cols_45.val[1])); + int32x4x2_t cols_2367_l = vtrnq_s32(vreinterpretq_s32_s16(cols_23.val[0]), + vreinterpretq_s32_s16(cols_67.val[0])); + int32x4x2_t cols_2367_h = vtrnq_s32(vreinterpretq_s32_s16(cols_23.val[1]), + vreinterpretq_s32_s16(cols_67.val[1])); + + int32x4x2_t rows_04 = vzipq_s32(cols_0145_l.val[0], cols_2367_l.val[0]); + int32x4x2_t rows_15 = vzipq_s32(cols_0145_h.val[0], cols_2367_h.val[0]); + int32x4x2_t rows_26 = vzipq_s32(cols_0145_l.val[1], cols_2367_l.val[1]); + int32x4x2_t rows_37 = vzipq_s32(cols_0145_h.val[1], cols_2367_h.val[1]); + + int16x8_t row0 = vreinterpretq_s16_s32(rows_04.val[0]); + int16x8_t row1 = vreinterpretq_s16_s32(rows_15.val[0]); + int16x8_t row2 = vreinterpretq_s16_s32(rows_26.val[0]); + int16x8_t row3 = vreinterpretq_s16_s32(rows_37.val[0]); + int16x8_t row4 = vreinterpretq_s16_s32(rows_04.val[1]); + int16x8_t row5 = vreinterpretq_s16_s32(rows_15.val[1]); + int16x8_t row6 = vreinterpretq_s16_s32(rows_26.val[1]); + int16x8_t row7 = vreinterpretq_s16_s32(rows_37.val[1]); + + /* Pass 2: process columns. */ + + tmp0 = vaddq_s16(row0, row7); + tmp7 = vsubq_s16(row0, row7); + tmp1 = vaddq_s16(row1, row6); + tmp6 = vsubq_s16(row1, row6); + tmp2 = vaddq_s16(row2, row5); + tmp5 = vsubq_s16(row2, row5); + tmp3 = vaddq_s16(row3, row4); + tmp4 = vsubq_s16(row3, row4); + + /* Even part */ + tmp10 = vaddq_s16(tmp0, tmp3); /* phase 2 */ + tmp13 = vsubq_s16(tmp0, tmp3); + tmp11 = vaddq_s16(tmp1, tmp2); + tmp12 = vsubq_s16(tmp1, tmp2); + + row0 = vaddq_s16(tmp10, tmp11); /* phase 3 */ + row4 = vsubq_s16(tmp10, tmp11); + + z1 = vqdmulhq_lane_s16(vaddq_s16(tmp12, tmp13), consts, 2); + row2 = vaddq_s16(tmp13, z1); /* phase 5 */ + row6 = vsubq_s16(tmp13, z1); + + /* Odd part */ + tmp10 = vaddq_s16(tmp4, tmp5); /* phase 2 */ + tmp11 = vaddq_s16(tmp5, tmp6); + tmp12 = vaddq_s16(tmp6, tmp7); + + z5 = vqdmulhq_lane_s16(vsubq_s16(tmp10, tmp12), consts, 0); + z2 = vqdmulhq_lane_s16(tmp10, consts, 1); + z2 = vaddq_s16(z2, z5); + z4 = vqdmulhq_lane_s16(tmp12, consts, 3); + z5 = vaddq_s16(tmp12, z5); + z4 = vaddq_s16(z4, z5); + z3 = vqdmulhq_lane_s16(tmp11, consts, 2); + + z11 = vaddq_s16(tmp7, z3); /* phase 5 */ + z13 = vsubq_s16(tmp7, z3); + + row5 = vaddq_s16(z13, z2); /* phase 6 */ + row3 = vsubq_s16(z13, z2); + row1 = vaddq_s16(z11, z4); + row7 = vsubq_s16(z11, z4); + + vst1q_s16(data + 0 * DCTSIZE, row0); + vst1q_s16(data + 1 * DCTSIZE, row1); + vst1q_s16(data + 2 * DCTSIZE, row2); + vst1q_s16(data + 3 * DCTSIZE, row3); + vst1q_s16(data + 4 * DCTSIZE, row4); + vst1q_s16(data + 5 * DCTSIZE, row5); + vst1q_s16(data + 6 * DCTSIZE, row6); + vst1q_s16(data + 7 * DCTSIZE, row7); +} diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/jfdctint-neon.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/jfdctint-neon.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/jfdctint-neon.c 1970-01-01 01:00:00.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/jfdctint-neon.c 2021-11-20 03:41:33.400600418 +0000 @@ -0,0 +1,376 @@ +/* + * jfdctint-neon.c - accurate integer FDCT (Arm Neon) + * + * Copyright (C) 2020, Arm Limited. All Rights Reserved. + * Copyright (C) 2020, D. R. Commander. All Rights Reserved. + * + * This software is provided 'as-is', without any express or implied + * warranty. In no event will the authors be held liable for any damages + * arising from the use of this software. + * + * Permission is granted to anyone to use this software for any purpose, + * including commercial applications, and to alter it and redistribute it + * freely, subject to the following restrictions: + * + * 1. The origin of this software must not be misrepresented; you must not + * claim that you wrote the original software. If you use this software + * in a product, an acknowledgment in the product documentation would be + * appreciated but is not required. + * 2. Altered source versions must be plainly marked as such, and must not be + * misrepresented as being the original software. + * 3. This notice may not be removed or altered from any source distribution. + */ + +#define JPEG_INTERNALS +#include "../../jinclude.h" +#include "../../jpeglib.h" +#include "../../jsimd.h" +#include "../../jdct.h" +#include "../../jsimddct.h" +#include "../jsimd.h" +#include "align.h" +#include "neon-compat.h" + +#include + + +/* jsimd_fdct_islow_neon() performs a slower but more accurate forward DCT + * (Discrete Cosine Transform) on one block of samples. It uses the same + * calculations and produces exactly the same output as IJG's original + * jpeg_fdct_islow() function, which can be found in jfdctint.c. + * + * Scaled integer constants are used to avoid floating-point arithmetic: + * 0.298631336 = 2446 * 2^-13 + * 0.390180644 = 3196 * 2^-13 + * 0.541196100 = 4433 * 2^-13 + * 0.765366865 = 6270 * 2^-13 + * 0.899976223 = 7373 * 2^-13 + * 1.175875602 = 9633 * 2^-13 + * 1.501321110 = 12299 * 2^-13 + * 1.847759065 = 15137 * 2^-13 + * 1.961570560 = 16069 * 2^-13 + * 2.053119869 = 16819 * 2^-13 + * 2.562915447 = 20995 * 2^-13 + * 3.072711026 = 25172 * 2^-13 + * + * See jfdctint.c for further details of the DCT algorithm. Where possible, + * the variable names and comments here in jsimd_fdct_islow_neon() match up + * with those in jpeg_fdct_islow(). + */ + +#define CONST_BITS 13 +#define PASS1_BITS 2 + +#define DESCALE_P1 (CONST_BITS - PASS1_BITS) +#define DESCALE_P2 (CONST_BITS + PASS1_BITS) + +#define F_0_298 2446 +#define F_0_390 3196 +#define F_0_541 4433 +#define F_0_765 6270 +#define F_0_899 7373 +#define F_1_175 9633 +#define F_1_501 12299 +#define F_1_847 15137 +#define F_1_961 16069 +#define F_2_053 16819 +#define F_2_562 20995 +#define F_3_072 25172 + + +ALIGN(16) static const int16_t jsimd_fdct_islow_neon_consts[] = { + F_0_298, -F_0_390, F_0_541, F_0_765, + -F_0_899, F_1_175, F_1_501, -F_1_847, + -F_1_961, F_2_053, -F_2_562, F_3_072 +}; + +void jsimd_fdct_islow_neon(DCTELEM *data) +{ + /* Load DCT constants. */ +#ifdef HAVE_VLD1_S16_X3 + const int16x4x3_t consts = vld1_s16_x3(jsimd_fdct_islow_neon_consts); +#else + /* GCC does not currently support the intrinsic vld1__x3(). */ + const int16x4_t consts1 = vld1_s16(jsimd_fdct_islow_neon_consts); + const int16x4_t consts2 = vld1_s16(jsimd_fdct_islow_neon_consts + 4); + const int16x4_t consts3 = vld1_s16(jsimd_fdct_islow_neon_consts + 8); + const int16x4x3_t consts = { { consts1, consts2, consts3 } }; +#endif + + /* Load an 8x8 block of samples into Neon registers. De-interleaving loads + * are used, followed by vuzp to transpose the block such that we have a + * column of samples per vector - allowing all rows to be processed at once. + */ + int16x8x4_t s_rows_0123 = vld4q_s16(data); + int16x8x4_t s_rows_4567 = vld4q_s16(data + 4 * DCTSIZE); + + int16x8x2_t cols_04 = vuzpq_s16(s_rows_0123.val[0], s_rows_4567.val[0]); + int16x8x2_t cols_15 = vuzpq_s16(s_rows_0123.val[1], s_rows_4567.val[1]); + int16x8x2_t cols_26 = vuzpq_s16(s_rows_0123.val[2], s_rows_4567.val[2]); + int16x8x2_t cols_37 = vuzpq_s16(s_rows_0123.val[3], s_rows_4567.val[3]); + + int16x8_t col0 = cols_04.val[0]; + int16x8_t col1 = cols_15.val[0]; + int16x8_t col2 = cols_26.val[0]; + int16x8_t col3 = cols_37.val[0]; + int16x8_t col4 = cols_04.val[1]; + int16x8_t col5 = cols_15.val[1]; + int16x8_t col6 = cols_26.val[1]; + int16x8_t col7 = cols_37.val[1]; + + /* Pass 1: process rows. */ + + int16x8_t tmp0 = vaddq_s16(col0, col7); + int16x8_t tmp7 = vsubq_s16(col0, col7); + int16x8_t tmp1 = vaddq_s16(col1, col6); + int16x8_t tmp6 = vsubq_s16(col1, col6); + int16x8_t tmp2 = vaddq_s16(col2, col5); + int16x8_t tmp5 = vsubq_s16(col2, col5); + int16x8_t tmp3 = vaddq_s16(col3, col4); + int16x8_t tmp4 = vsubq_s16(col3, col4); + + /* Even part */ + int16x8_t tmp10 = vaddq_s16(tmp0, tmp3); + int16x8_t tmp13 = vsubq_s16(tmp0, tmp3); + int16x8_t tmp11 = vaddq_s16(tmp1, tmp2); + int16x8_t tmp12 = vsubq_s16(tmp1, tmp2); + + col0 = vshlq_n_s16(vaddq_s16(tmp10, tmp11), PASS1_BITS); + col4 = vshlq_n_s16(vsubq_s16(tmp10, tmp11), PASS1_BITS); + + int16x8_t tmp12_add_tmp13 = vaddq_s16(tmp12, tmp13); + int32x4_t z1_l = + vmull_lane_s16(vget_low_s16(tmp12_add_tmp13), consts.val[0], 2); + int32x4_t z1_h = + vmull_lane_s16(vget_high_s16(tmp12_add_tmp13), consts.val[0], 2); + + int32x4_t col2_scaled_l = + vmlal_lane_s16(z1_l, vget_low_s16(tmp13), consts.val[0], 3); + int32x4_t col2_scaled_h = + vmlal_lane_s16(z1_h, vget_high_s16(tmp13), consts.val[0], 3); + col2 = vcombine_s16(vrshrn_n_s32(col2_scaled_l, DESCALE_P1), + vrshrn_n_s32(col2_scaled_h, DESCALE_P1)); + + int32x4_t col6_scaled_l = + vmlal_lane_s16(z1_l, vget_low_s16(tmp12), consts.val[1], 3); + int32x4_t col6_scaled_h = + vmlal_lane_s16(z1_h, vget_high_s16(tmp12), consts.val[1], 3); + col6 = vcombine_s16(vrshrn_n_s32(col6_scaled_l, DESCALE_P1), + vrshrn_n_s32(col6_scaled_h, DESCALE_P1)); + + /* Odd part */ + int16x8_t z1 = vaddq_s16(tmp4, tmp7); + int16x8_t z2 = vaddq_s16(tmp5, tmp6); + int16x8_t z3 = vaddq_s16(tmp4, tmp6); + int16x8_t z4 = vaddq_s16(tmp5, tmp7); + /* sqrt(2) * c3 */ + int32x4_t z5_l = vmull_lane_s16(vget_low_s16(z3), consts.val[1], 1); + int32x4_t z5_h = vmull_lane_s16(vget_high_s16(z3), consts.val[1], 1); + z5_l = vmlal_lane_s16(z5_l, vget_low_s16(z4), consts.val[1], 1); + z5_h = vmlal_lane_s16(z5_h, vget_high_s16(z4), consts.val[1], 1); + + /* sqrt(2) * (-c1+c3+c5-c7) */ + int32x4_t tmp4_l = vmull_lane_s16(vget_low_s16(tmp4), consts.val[0], 0); + int32x4_t tmp4_h = vmull_lane_s16(vget_high_s16(tmp4), consts.val[0], 0); + /* sqrt(2) * ( c1+c3-c5+c7) */ + int32x4_t tmp5_l = vmull_lane_s16(vget_low_s16(tmp5), consts.val[2], 1); + int32x4_t tmp5_h = vmull_lane_s16(vget_high_s16(tmp5), consts.val[2], 1); + /* sqrt(2) * ( c1+c3+c5-c7) */ + int32x4_t tmp6_l = vmull_lane_s16(vget_low_s16(tmp6), consts.val[2], 3); + int32x4_t tmp6_h = vmull_lane_s16(vget_high_s16(tmp6), consts.val[2], 3); + /* sqrt(2) * ( c1+c3-c5-c7) */ + int32x4_t tmp7_l = vmull_lane_s16(vget_low_s16(tmp7), consts.val[1], 2); + int32x4_t tmp7_h = vmull_lane_s16(vget_high_s16(tmp7), consts.val[1], 2); + + /* sqrt(2) * (c7-c3) */ + z1_l = vmull_lane_s16(vget_low_s16(z1), consts.val[1], 0); + z1_h = vmull_lane_s16(vget_high_s16(z1), consts.val[1], 0); + /* sqrt(2) * (-c1-c3) */ + int32x4_t z2_l = vmull_lane_s16(vget_low_s16(z2), consts.val[2], 2); + int32x4_t z2_h = vmull_lane_s16(vget_high_s16(z2), consts.val[2], 2); + /* sqrt(2) * (-c3-c5) */ + int32x4_t z3_l = vmull_lane_s16(vget_low_s16(z3), consts.val[2], 0); + int32x4_t z3_h = vmull_lane_s16(vget_high_s16(z3), consts.val[2], 0); + /* sqrt(2) * (c5-c3) */ + int32x4_t z4_l = vmull_lane_s16(vget_low_s16(z4), consts.val[0], 1); + int32x4_t z4_h = vmull_lane_s16(vget_high_s16(z4), consts.val[0], 1); + + z3_l = vaddq_s32(z3_l, z5_l); + z3_h = vaddq_s32(z3_h, z5_h); + z4_l = vaddq_s32(z4_l, z5_l); + z4_h = vaddq_s32(z4_h, z5_h); + + tmp4_l = vaddq_s32(tmp4_l, z1_l); + tmp4_h = vaddq_s32(tmp4_h, z1_h); + tmp4_l = vaddq_s32(tmp4_l, z3_l); + tmp4_h = vaddq_s32(tmp4_h, z3_h); + col7 = vcombine_s16(vrshrn_n_s32(tmp4_l, DESCALE_P1), + vrshrn_n_s32(tmp4_h, DESCALE_P1)); + + tmp5_l = vaddq_s32(tmp5_l, z2_l); + tmp5_h = vaddq_s32(tmp5_h, z2_h); + tmp5_l = vaddq_s32(tmp5_l, z4_l); + tmp5_h = vaddq_s32(tmp5_h, z4_h); + col5 = vcombine_s16(vrshrn_n_s32(tmp5_l, DESCALE_P1), + vrshrn_n_s32(tmp5_h, DESCALE_P1)); + + tmp6_l = vaddq_s32(tmp6_l, z2_l); + tmp6_h = vaddq_s32(tmp6_h, z2_h); + tmp6_l = vaddq_s32(tmp6_l, z3_l); + tmp6_h = vaddq_s32(tmp6_h, z3_h); + col3 = vcombine_s16(vrshrn_n_s32(tmp6_l, DESCALE_P1), + vrshrn_n_s32(tmp6_h, DESCALE_P1)); + + tmp7_l = vaddq_s32(tmp7_l, z1_l); + tmp7_h = vaddq_s32(tmp7_h, z1_h); + tmp7_l = vaddq_s32(tmp7_l, z4_l); + tmp7_h = vaddq_s32(tmp7_h, z4_h); + col1 = vcombine_s16(vrshrn_n_s32(tmp7_l, DESCALE_P1), + vrshrn_n_s32(tmp7_h, DESCALE_P1)); + + /* Transpose to work on columns in pass 2. */ + int16x8x2_t cols_01 = vtrnq_s16(col0, col1); + int16x8x2_t cols_23 = vtrnq_s16(col2, col3); + int16x8x2_t cols_45 = vtrnq_s16(col4, col5); + int16x8x2_t cols_67 = vtrnq_s16(col6, col7); + + int32x4x2_t cols_0145_l = vtrnq_s32(vreinterpretq_s32_s16(cols_01.val[0]), + vreinterpretq_s32_s16(cols_45.val[0])); + int32x4x2_t cols_0145_h = vtrnq_s32(vreinterpretq_s32_s16(cols_01.val[1]), + vreinterpretq_s32_s16(cols_45.val[1])); + int32x4x2_t cols_2367_l = vtrnq_s32(vreinterpretq_s32_s16(cols_23.val[0]), + vreinterpretq_s32_s16(cols_67.val[0])); + int32x4x2_t cols_2367_h = vtrnq_s32(vreinterpretq_s32_s16(cols_23.val[1]), + vreinterpretq_s32_s16(cols_67.val[1])); + + int32x4x2_t rows_04 = vzipq_s32(cols_0145_l.val[0], cols_2367_l.val[0]); + int32x4x2_t rows_15 = vzipq_s32(cols_0145_h.val[0], cols_2367_h.val[0]); + int32x4x2_t rows_26 = vzipq_s32(cols_0145_l.val[1], cols_2367_l.val[1]); + int32x4x2_t rows_37 = vzipq_s32(cols_0145_h.val[1], cols_2367_h.val[1]); + + int16x8_t row0 = vreinterpretq_s16_s32(rows_04.val[0]); + int16x8_t row1 = vreinterpretq_s16_s32(rows_15.val[0]); + int16x8_t row2 = vreinterpretq_s16_s32(rows_26.val[0]); + int16x8_t row3 = vreinterpretq_s16_s32(rows_37.val[0]); + int16x8_t row4 = vreinterpretq_s16_s32(rows_04.val[1]); + int16x8_t row5 = vreinterpretq_s16_s32(rows_15.val[1]); + int16x8_t row6 = vreinterpretq_s16_s32(rows_26.val[1]); + int16x8_t row7 = vreinterpretq_s16_s32(rows_37.val[1]); + + /* Pass 2: process columns. */ + + tmp0 = vaddq_s16(row0, row7); + tmp7 = vsubq_s16(row0, row7); + tmp1 = vaddq_s16(row1, row6); + tmp6 = vsubq_s16(row1, row6); + tmp2 = vaddq_s16(row2, row5); + tmp5 = vsubq_s16(row2, row5); + tmp3 = vaddq_s16(row3, row4); + tmp4 = vsubq_s16(row3, row4); + + /* Even part */ + tmp10 = vaddq_s16(tmp0, tmp3); + tmp13 = vsubq_s16(tmp0, tmp3); + tmp11 = vaddq_s16(tmp1, tmp2); + tmp12 = vsubq_s16(tmp1, tmp2); + + row0 = vrshrq_n_s16(vaddq_s16(tmp10, tmp11), PASS1_BITS); + row4 = vrshrq_n_s16(vsubq_s16(tmp10, tmp11), PASS1_BITS); + + tmp12_add_tmp13 = vaddq_s16(tmp12, tmp13); + z1_l = vmull_lane_s16(vget_low_s16(tmp12_add_tmp13), consts.val[0], 2); + z1_h = vmull_lane_s16(vget_high_s16(tmp12_add_tmp13), consts.val[0], 2); + + int32x4_t row2_scaled_l = + vmlal_lane_s16(z1_l, vget_low_s16(tmp13), consts.val[0], 3); + int32x4_t row2_scaled_h = + vmlal_lane_s16(z1_h, vget_high_s16(tmp13), consts.val[0], 3); + row2 = vcombine_s16(vrshrn_n_s32(row2_scaled_l, DESCALE_P2), + vrshrn_n_s32(row2_scaled_h, DESCALE_P2)); + + int32x4_t row6_scaled_l = + vmlal_lane_s16(z1_l, vget_low_s16(tmp12), consts.val[1], 3); + int32x4_t row6_scaled_h = + vmlal_lane_s16(z1_h, vget_high_s16(tmp12), consts.val[1], 3); + row6 = vcombine_s16(vrshrn_n_s32(row6_scaled_l, DESCALE_P2), + vrshrn_n_s32(row6_scaled_h, DESCALE_P2)); + + /* Odd part */ + z1 = vaddq_s16(tmp4, tmp7); + z2 = vaddq_s16(tmp5, tmp6); + z3 = vaddq_s16(tmp4, tmp6); + z4 = vaddq_s16(tmp5, tmp7); + /* sqrt(2) * c3 */ + z5_l = vmull_lane_s16(vget_low_s16(z3), consts.val[1], 1); + z5_h = vmull_lane_s16(vget_high_s16(z3), consts.val[1], 1); + z5_l = vmlal_lane_s16(z5_l, vget_low_s16(z4), consts.val[1], 1); + z5_h = vmlal_lane_s16(z5_h, vget_high_s16(z4), consts.val[1], 1); + + /* sqrt(2) * (-c1+c3+c5-c7) */ + tmp4_l = vmull_lane_s16(vget_low_s16(tmp4), consts.val[0], 0); + tmp4_h = vmull_lane_s16(vget_high_s16(tmp4), consts.val[0], 0); + /* sqrt(2) * ( c1+c3-c5+c7) */ + tmp5_l = vmull_lane_s16(vget_low_s16(tmp5), consts.val[2], 1); + tmp5_h = vmull_lane_s16(vget_high_s16(tmp5), consts.val[2], 1); + /* sqrt(2) * ( c1+c3+c5-c7) */ + tmp6_l = vmull_lane_s16(vget_low_s16(tmp6), consts.val[2], 3); + tmp6_h = vmull_lane_s16(vget_high_s16(tmp6), consts.val[2], 3); + /* sqrt(2) * ( c1+c3-c5-c7) */ + tmp7_l = vmull_lane_s16(vget_low_s16(tmp7), consts.val[1], 2); + tmp7_h = vmull_lane_s16(vget_high_s16(tmp7), consts.val[1], 2); + + /* sqrt(2) * (c7-c3) */ + z1_l = vmull_lane_s16(vget_low_s16(z1), consts.val[1], 0); + z1_h = vmull_lane_s16(vget_high_s16(z1), consts.val[1], 0); + /* sqrt(2) * (-c1-c3) */ + z2_l = vmull_lane_s16(vget_low_s16(z2), consts.val[2], 2); + z2_h = vmull_lane_s16(vget_high_s16(z2), consts.val[2], 2); + /* sqrt(2) * (-c3-c5) */ + z3_l = vmull_lane_s16(vget_low_s16(z3), consts.val[2], 0); + z3_h = vmull_lane_s16(vget_high_s16(z3), consts.val[2], 0); + /* sqrt(2) * (c5-c3) */ + z4_l = vmull_lane_s16(vget_low_s16(z4), consts.val[0], 1); + z4_h = vmull_lane_s16(vget_high_s16(z4), consts.val[0], 1); + + z3_l = vaddq_s32(z3_l, z5_l); + z3_h = vaddq_s32(z3_h, z5_h); + z4_l = vaddq_s32(z4_l, z5_l); + z4_h = vaddq_s32(z4_h, z5_h); + + tmp4_l = vaddq_s32(tmp4_l, z1_l); + tmp4_h = vaddq_s32(tmp4_h, z1_h); + tmp4_l = vaddq_s32(tmp4_l, z3_l); + tmp4_h = vaddq_s32(tmp4_h, z3_h); + row7 = vcombine_s16(vrshrn_n_s32(tmp4_l, DESCALE_P2), + vrshrn_n_s32(tmp4_h, DESCALE_P2)); + + tmp5_l = vaddq_s32(tmp5_l, z2_l); + tmp5_h = vaddq_s32(tmp5_h, z2_h); + tmp5_l = vaddq_s32(tmp5_l, z4_l); + tmp5_h = vaddq_s32(tmp5_h, z4_h); + row5 = vcombine_s16(vrshrn_n_s32(tmp5_l, DESCALE_P2), + vrshrn_n_s32(tmp5_h, DESCALE_P2)); + + tmp6_l = vaddq_s32(tmp6_l, z2_l); + tmp6_h = vaddq_s32(tmp6_h, z2_h); + tmp6_l = vaddq_s32(tmp6_l, z3_l); + tmp6_h = vaddq_s32(tmp6_h, z3_h); + row3 = vcombine_s16(vrshrn_n_s32(tmp6_l, DESCALE_P2), + vrshrn_n_s32(tmp6_h, DESCALE_P2)); + + tmp7_l = vaddq_s32(tmp7_l, z1_l); + tmp7_h = vaddq_s32(tmp7_h, z1_h); + tmp7_l = vaddq_s32(tmp7_l, z4_l); + tmp7_h = vaddq_s32(tmp7_h, z4_h); + row1 = vcombine_s16(vrshrn_n_s32(tmp7_l, DESCALE_P2), + vrshrn_n_s32(tmp7_h, DESCALE_P2)); + + vst1q_s16(data + 0 * DCTSIZE, row0); + vst1q_s16(data + 1 * DCTSIZE, row1); + vst1q_s16(data + 2 * DCTSIZE, row2); + vst1q_s16(data + 3 * DCTSIZE, row3); + vst1q_s16(data + 4 * DCTSIZE, row4); + vst1q_s16(data + 5 * DCTSIZE, row5); + vst1q_s16(data + 6 * DCTSIZE, row6); + vst1q_s16(data + 7 * DCTSIZE, row7); +} diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/jidctfst-neon.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/jidctfst-neon.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/jidctfst-neon.c 1970-01-01 01:00:00.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/jidctfst-neon.c 2021-11-20 03:41:33.400600418 +0000 @@ -0,0 +1,472 @@ +/* + * jidctfst-neon.c - fast integer IDCT (Arm Neon) + * + * Copyright (C) 2020, Arm Limited. All Rights Reserved. + * + * This software is provided 'as-is', without any express or implied + * warranty. In no event will the authors be held liable for any damages + * arising from the use of this software. + * + * Permission is granted to anyone to use this software for any purpose, + * including commercial applications, and to alter it and redistribute it + * freely, subject to the following restrictions: + * + * 1. The origin of this software must not be misrepresented; you must not + * claim that you wrote the original software. If you use this software + * in a product, an acknowledgment in the product documentation would be + * appreciated but is not required. + * 2. Altered source versions must be plainly marked as such, and must not be + * misrepresented as being the original software. + * 3. This notice may not be removed or altered from any source distribution. + */ + +#define JPEG_INTERNALS +#include "../../jinclude.h" +#include "../../jpeglib.h" +#include "../../jsimd.h" +#include "../../jdct.h" +#include "../../jsimddct.h" +#include "../jsimd.h" +#include "align.h" + +#include + + +/* jsimd_idct_ifast_neon() performs dequantization and a fast, not so accurate + * inverse DCT (Discrete Cosine Transform) on one block of coefficients. It + * uses the same calculations and produces exactly the same output as IJG's + * original jpeg_idct_ifast() function, which can be found in jidctfst.c. + * + * Scaled integer constants are used to avoid floating-point arithmetic: + * 0.082392200 = 2688 * 2^-15 + * 0.414213562 = 13568 * 2^-15 + * 0.847759065 = 27776 * 2^-15 + * 0.613125930 = 20096 * 2^-15 + * + * See jidctfst.c for further details of the IDCT algorithm. Where possible, + * the variable names and comments here in jsimd_idct_ifast_neon() match up + * with those in jpeg_idct_ifast(). + */ + +#define PASS1_BITS 2 + +#define F_0_082 2688 +#define F_0_414 13568 +#define F_0_847 27776 +#define F_0_613 20096 + + +ALIGN(16) static const int16_t jsimd_idct_ifast_neon_consts[] = { + F_0_082, F_0_414, F_0_847, F_0_613 +}; + +void jsimd_idct_ifast_neon(void *dct_table, JCOEFPTR coef_block, + JSAMPARRAY output_buf, JDIMENSION output_col) +{ + IFAST_MULT_TYPE *quantptr = dct_table; + + /* Load DCT coefficients. */ + int16x8_t row0 = vld1q_s16(coef_block + 0 * DCTSIZE); + int16x8_t row1 = vld1q_s16(coef_block + 1 * DCTSIZE); + int16x8_t row2 = vld1q_s16(coef_block + 2 * DCTSIZE); + int16x8_t row3 = vld1q_s16(coef_block + 3 * DCTSIZE); + int16x8_t row4 = vld1q_s16(coef_block + 4 * DCTSIZE); + int16x8_t row5 = vld1q_s16(coef_block + 5 * DCTSIZE); + int16x8_t row6 = vld1q_s16(coef_block + 6 * DCTSIZE); + int16x8_t row7 = vld1q_s16(coef_block + 7 * DCTSIZE); + + /* Load quantization table values for DC coefficients. */ + int16x8_t quant_row0 = vld1q_s16(quantptr + 0 * DCTSIZE); + /* Dequantize DC coefficients. */ + row0 = vmulq_s16(row0, quant_row0); + + /* Construct bitmap to test if all AC coefficients are 0. */ + int16x8_t bitmap = vorrq_s16(row1, row2); + bitmap = vorrq_s16(bitmap, row3); + bitmap = vorrq_s16(bitmap, row4); + bitmap = vorrq_s16(bitmap, row5); + bitmap = vorrq_s16(bitmap, row6); + bitmap = vorrq_s16(bitmap, row7); + + int64_t left_ac_bitmap = vgetq_lane_s64(vreinterpretq_s64_s16(bitmap), 0); + int64_t right_ac_bitmap = vgetq_lane_s64(vreinterpretq_s64_s16(bitmap), 1); + + /* Load IDCT conversion constants. */ + const int16x4_t consts = vld1_s16(jsimd_idct_ifast_neon_consts); + + if (left_ac_bitmap == 0 && right_ac_bitmap == 0) { + /* All AC coefficients are zero. + * Compute DC values and duplicate into vectors. + */ + int16x8_t dcval = row0; + row1 = dcval; + row2 = dcval; + row3 = dcval; + row4 = dcval; + row5 = dcval; + row6 = dcval; + row7 = dcval; + } else if (left_ac_bitmap == 0) { + /* AC coefficients are zero for columns 0, 1, 2, and 3. + * Use DC values for these columns. + */ + int16x4_t dcval = vget_low_s16(row0); + + /* Commence regular fast IDCT computation for columns 4, 5, 6, and 7. */ + + /* Load quantization table. */ + int16x4_t quant_row1 = vld1_s16(quantptr + 1 * DCTSIZE + 4); + int16x4_t quant_row2 = vld1_s16(quantptr + 2 * DCTSIZE + 4); + int16x4_t quant_row3 = vld1_s16(quantptr + 3 * DCTSIZE + 4); + int16x4_t quant_row4 = vld1_s16(quantptr + 4 * DCTSIZE + 4); + int16x4_t quant_row5 = vld1_s16(quantptr + 5 * DCTSIZE + 4); + int16x4_t quant_row6 = vld1_s16(quantptr + 6 * DCTSIZE + 4); + int16x4_t quant_row7 = vld1_s16(quantptr + 7 * DCTSIZE + 4); + + /* Even part: dequantize DCT coefficients. */ + int16x4_t tmp0 = vget_high_s16(row0); + int16x4_t tmp1 = vmul_s16(vget_high_s16(row2), quant_row2); + int16x4_t tmp2 = vmul_s16(vget_high_s16(row4), quant_row4); + int16x4_t tmp3 = vmul_s16(vget_high_s16(row6), quant_row6); + + int16x4_t tmp10 = vadd_s16(tmp0, tmp2); /* phase 3 */ + int16x4_t tmp11 = vsub_s16(tmp0, tmp2); + + int16x4_t tmp13 = vadd_s16(tmp1, tmp3); /* phases 5-3 */ + int16x4_t tmp1_sub_tmp3 = vsub_s16(tmp1, tmp3); + int16x4_t tmp12 = vqdmulh_lane_s16(tmp1_sub_tmp3, consts, 1); + tmp12 = vadd_s16(tmp12, tmp1_sub_tmp3); + tmp12 = vsub_s16(tmp12, tmp13); + + tmp0 = vadd_s16(tmp10, tmp13); /* phase 2 */ + tmp3 = vsub_s16(tmp10, tmp13); + tmp1 = vadd_s16(tmp11, tmp12); + tmp2 = vsub_s16(tmp11, tmp12); + + /* Odd part: dequantize DCT coefficients. */ + int16x4_t tmp4 = vmul_s16(vget_high_s16(row1), quant_row1); + int16x4_t tmp5 = vmul_s16(vget_high_s16(row3), quant_row3); + int16x4_t tmp6 = vmul_s16(vget_high_s16(row5), quant_row5); + int16x4_t tmp7 = vmul_s16(vget_high_s16(row7), quant_row7); + + int16x4_t z13 = vadd_s16(tmp6, tmp5); /* phase 6 */ + int16x4_t neg_z10 = vsub_s16(tmp5, tmp6); + int16x4_t z11 = vadd_s16(tmp4, tmp7); + int16x4_t z12 = vsub_s16(tmp4, tmp7); + + tmp7 = vadd_s16(z11, z13); /* phase 5 */ + int16x4_t z11_sub_z13 = vsub_s16(z11, z13); + tmp11 = vqdmulh_lane_s16(z11_sub_z13, consts, 1); + tmp11 = vadd_s16(tmp11, z11_sub_z13); + + int16x4_t z10_add_z12 = vsub_s16(z12, neg_z10); + int16x4_t z5 = vqdmulh_lane_s16(z10_add_z12, consts, 2); + z5 = vadd_s16(z5, z10_add_z12); + tmp10 = vqdmulh_lane_s16(z12, consts, 0); + tmp10 = vadd_s16(tmp10, z12); + tmp10 = vsub_s16(tmp10, z5); + tmp12 = vqdmulh_lane_s16(neg_z10, consts, 3); + tmp12 = vadd_s16(tmp12, vadd_s16(neg_z10, neg_z10)); + tmp12 = vadd_s16(tmp12, z5); + + tmp6 = vsub_s16(tmp12, tmp7); /* phase 2 */ + tmp5 = vsub_s16(tmp11, tmp6); + tmp4 = vadd_s16(tmp10, tmp5); + + row0 = vcombine_s16(dcval, vadd_s16(tmp0, tmp7)); + row7 = vcombine_s16(dcval, vsub_s16(tmp0, tmp7)); + row1 = vcombine_s16(dcval, vadd_s16(tmp1, tmp6)); + row6 = vcombine_s16(dcval, vsub_s16(tmp1, tmp6)); + row2 = vcombine_s16(dcval, vadd_s16(tmp2, tmp5)); + row5 = vcombine_s16(dcval, vsub_s16(tmp2, tmp5)); + row4 = vcombine_s16(dcval, vadd_s16(tmp3, tmp4)); + row3 = vcombine_s16(dcval, vsub_s16(tmp3, tmp4)); + } else if (right_ac_bitmap == 0) { + /* AC coefficients are zero for columns 4, 5, 6, and 7. + * Use DC values for these columns. + */ + int16x4_t dcval = vget_high_s16(row0); + + /* Commence regular fast IDCT computation for columns 0, 1, 2, and 3. */ + + /* Load quantization table. */ + int16x4_t quant_row1 = vld1_s16(quantptr + 1 * DCTSIZE); + int16x4_t quant_row2 = vld1_s16(quantptr + 2 * DCTSIZE); + int16x4_t quant_row3 = vld1_s16(quantptr + 3 * DCTSIZE); + int16x4_t quant_row4 = vld1_s16(quantptr + 4 * DCTSIZE); + int16x4_t quant_row5 = vld1_s16(quantptr + 5 * DCTSIZE); + int16x4_t quant_row6 = vld1_s16(quantptr + 6 * DCTSIZE); + int16x4_t quant_row7 = vld1_s16(quantptr + 7 * DCTSIZE); + + /* Even part: dequantize DCT coefficients. */ + int16x4_t tmp0 = vget_low_s16(row0); + int16x4_t tmp1 = vmul_s16(vget_low_s16(row2), quant_row2); + int16x4_t tmp2 = vmul_s16(vget_low_s16(row4), quant_row4); + int16x4_t tmp3 = vmul_s16(vget_low_s16(row6), quant_row6); + + int16x4_t tmp10 = vadd_s16(tmp0, tmp2); /* phase 3 */ + int16x4_t tmp11 = vsub_s16(tmp0, tmp2); + + int16x4_t tmp13 = vadd_s16(tmp1, tmp3); /* phases 5-3 */ + int16x4_t tmp1_sub_tmp3 = vsub_s16(tmp1, tmp3); + int16x4_t tmp12 = vqdmulh_lane_s16(tmp1_sub_tmp3, consts, 1); + tmp12 = vadd_s16(tmp12, tmp1_sub_tmp3); + tmp12 = vsub_s16(tmp12, tmp13); + + tmp0 = vadd_s16(tmp10, tmp13); /* phase 2 */ + tmp3 = vsub_s16(tmp10, tmp13); + tmp1 = vadd_s16(tmp11, tmp12); + tmp2 = vsub_s16(tmp11, tmp12); + + /* Odd part: dequantize DCT coefficients. */ + int16x4_t tmp4 = vmul_s16(vget_low_s16(row1), quant_row1); + int16x4_t tmp5 = vmul_s16(vget_low_s16(row3), quant_row3); + int16x4_t tmp6 = vmul_s16(vget_low_s16(row5), quant_row5); + int16x4_t tmp7 = vmul_s16(vget_low_s16(row7), quant_row7); + + int16x4_t z13 = vadd_s16(tmp6, tmp5); /* phase 6 */ + int16x4_t neg_z10 = vsub_s16(tmp5, tmp6); + int16x4_t z11 = vadd_s16(tmp4, tmp7); + int16x4_t z12 = vsub_s16(tmp4, tmp7); + + tmp7 = vadd_s16(z11, z13); /* phase 5 */ + int16x4_t z11_sub_z13 = vsub_s16(z11, z13); + tmp11 = vqdmulh_lane_s16(z11_sub_z13, consts, 1); + tmp11 = vadd_s16(tmp11, z11_sub_z13); + + int16x4_t z10_add_z12 = vsub_s16(z12, neg_z10); + int16x4_t z5 = vqdmulh_lane_s16(z10_add_z12, consts, 2); + z5 = vadd_s16(z5, z10_add_z12); + tmp10 = vqdmulh_lane_s16(z12, consts, 0); + tmp10 = vadd_s16(tmp10, z12); + tmp10 = vsub_s16(tmp10, z5); + tmp12 = vqdmulh_lane_s16(neg_z10, consts, 3); + tmp12 = vadd_s16(tmp12, vadd_s16(neg_z10, neg_z10)); + tmp12 = vadd_s16(tmp12, z5); + + tmp6 = vsub_s16(tmp12, tmp7); /* phase 2 */ + tmp5 = vsub_s16(tmp11, tmp6); + tmp4 = vadd_s16(tmp10, tmp5); + + row0 = vcombine_s16(vadd_s16(tmp0, tmp7), dcval); + row7 = vcombine_s16(vsub_s16(tmp0, tmp7), dcval); + row1 = vcombine_s16(vadd_s16(tmp1, tmp6), dcval); + row6 = vcombine_s16(vsub_s16(tmp1, tmp6), dcval); + row2 = vcombine_s16(vadd_s16(tmp2, tmp5), dcval); + row5 = vcombine_s16(vsub_s16(tmp2, tmp5), dcval); + row4 = vcombine_s16(vadd_s16(tmp3, tmp4), dcval); + row3 = vcombine_s16(vsub_s16(tmp3, tmp4), dcval); + } else { + /* Some AC coefficients are non-zero; full IDCT calculation required. */ + + /* Load quantization table. */ + int16x8_t quant_row1 = vld1q_s16(quantptr + 1 * DCTSIZE); + int16x8_t quant_row2 = vld1q_s16(quantptr + 2 * DCTSIZE); + int16x8_t quant_row3 = vld1q_s16(quantptr + 3 * DCTSIZE); + int16x8_t quant_row4 = vld1q_s16(quantptr + 4 * DCTSIZE); + int16x8_t quant_row5 = vld1q_s16(quantptr + 5 * DCTSIZE); + int16x8_t quant_row6 = vld1q_s16(quantptr + 6 * DCTSIZE); + int16x8_t quant_row7 = vld1q_s16(quantptr + 7 * DCTSIZE); + + /* Even part: dequantize DCT coefficients. */ + int16x8_t tmp0 = row0; + int16x8_t tmp1 = vmulq_s16(row2, quant_row2); + int16x8_t tmp2 = vmulq_s16(row4, quant_row4); + int16x8_t tmp3 = vmulq_s16(row6, quant_row6); + + int16x8_t tmp10 = vaddq_s16(tmp0, tmp2); /* phase 3 */ + int16x8_t tmp11 = vsubq_s16(tmp0, tmp2); + + int16x8_t tmp13 = vaddq_s16(tmp1, tmp3); /* phases 5-3 */ + int16x8_t tmp1_sub_tmp3 = vsubq_s16(tmp1, tmp3); + int16x8_t tmp12 = vqdmulhq_lane_s16(tmp1_sub_tmp3, consts, 1); + tmp12 = vaddq_s16(tmp12, tmp1_sub_tmp3); + tmp12 = vsubq_s16(tmp12, tmp13); + + tmp0 = vaddq_s16(tmp10, tmp13); /* phase 2 */ + tmp3 = vsubq_s16(tmp10, tmp13); + tmp1 = vaddq_s16(tmp11, tmp12); + tmp2 = vsubq_s16(tmp11, tmp12); + + /* Odd part: dequantize DCT coefficients. */ + int16x8_t tmp4 = vmulq_s16(row1, quant_row1); + int16x8_t tmp5 = vmulq_s16(row3, quant_row3); + int16x8_t tmp6 = vmulq_s16(row5, quant_row5); + int16x8_t tmp7 = vmulq_s16(row7, quant_row7); + + int16x8_t z13 = vaddq_s16(tmp6, tmp5); /* phase 6 */ + int16x8_t neg_z10 = vsubq_s16(tmp5, tmp6); + int16x8_t z11 = vaddq_s16(tmp4, tmp7); + int16x8_t z12 = vsubq_s16(tmp4, tmp7); + + tmp7 = vaddq_s16(z11, z13); /* phase 5 */ + int16x8_t z11_sub_z13 = vsubq_s16(z11, z13); + tmp11 = vqdmulhq_lane_s16(z11_sub_z13, consts, 1); + tmp11 = vaddq_s16(tmp11, z11_sub_z13); + + int16x8_t z10_add_z12 = vsubq_s16(z12, neg_z10); + int16x8_t z5 = vqdmulhq_lane_s16(z10_add_z12, consts, 2); + z5 = vaddq_s16(z5, z10_add_z12); + tmp10 = vqdmulhq_lane_s16(z12, consts, 0); + tmp10 = vaddq_s16(tmp10, z12); + tmp10 = vsubq_s16(tmp10, z5); + tmp12 = vqdmulhq_lane_s16(neg_z10, consts, 3); + tmp12 = vaddq_s16(tmp12, vaddq_s16(neg_z10, neg_z10)); + tmp12 = vaddq_s16(tmp12, z5); + + tmp6 = vsubq_s16(tmp12, tmp7); /* phase 2 */ + tmp5 = vsubq_s16(tmp11, tmp6); + tmp4 = vaddq_s16(tmp10, tmp5); + + row0 = vaddq_s16(tmp0, tmp7); + row7 = vsubq_s16(tmp0, tmp7); + row1 = vaddq_s16(tmp1, tmp6); + row6 = vsubq_s16(tmp1, tmp6); + row2 = vaddq_s16(tmp2, tmp5); + row5 = vsubq_s16(tmp2, tmp5); + row4 = vaddq_s16(tmp3, tmp4); + row3 = vsubq_s16(tmp3, tmp4); + } + + /* Transpose rows to work on columns in pass 2. */ + int16x8x2_t rows_01 = vtrnq_s16(row0, row1); + int16x8x2_t rows_23 = vtrnq_s16(row2, row3); + int16x8x2_t rows_45 = vtrnq_s16(row4, row5); + int16x8x2_t rows_67 = vtrnq_s16(row6, row7); + + int32x4x2_t rows_0145_l = vtrnq_s32(vreinterpretq_s32_s16(rows_01.val[0]), + vreinterpretq_s32_s16(rows_45.val[0])); + int32x4x2_t rows_0145_h = vtrnq_s32(vreinterpretq_s32_s16(rows_01.val[1]), + vreinterpretq_s32_s16(rows_45.val[1])); + int32x4x2_t rows_2367_l = vtrnq_s32(vreinterpretq_s32_s16(rows_23.val[0]), + vreinterpretq_s32_s16(rows_67.val[0])); + int32x4x2_t rows_2367_h = vtrnq_s32(vreinterpretq_s32_s16(rows_23.val[1]), + vreinterpretq_s32_s16(rows_67.val[1])); + + int32x4x2_t cols_04 = vzipq_s32(rows_0145_l.val[0], rows_2367_l.val[0]); + int32x4x2_t cols_15 = vzipq_s32(rows_0145_h.val[0], rows_2367_h.val[0]); + int32x4x2_t cols_26 = vzipq_s32(rows_0145_l.val[1], rows_2367_l.val[1]); + int32x4x2_t cols_37 = vzipq_s32(rows_0145_h.val[1], rows_2367_h.val[1]); + + int16x8_t col0 = vreinterpretq_s16_s32(cols_04.val[0]); + int16x8_t col1 = vreinterpretq_s16_s32(cols_15.val[0]); + int16x8_t col2 = vreinterpretq_s16_s32(cols_26.val[0]); + int16x8_t col3 = vreinterpretq_s16_s32(cols_37.val[0]); + int16x8_t col4 = vreinterpretq_s16_s32(cols_04.val[1]); + int16x8_t col5 = vreinterpretq_s16_s32(cols_15.val[1]); + int16x8_t col6 = vreinterpretq_s16_s32(cols_26.val[1]); + int16x8_t col7 = vreinterpretq_s16_s32(cols_37.val[1]); + + /* 1-D IDCT, pass 2 */ + + /* Even part */ + int16x8_t tmp10 = vaddq_s16(col0, col4); + int16x8_t tmp11 = vsubq_s16(col0, col4); + + int16x8_t tmp13 = vaddq_s16(col2, col6); + int16x8_t col2_sub_col6 = vsubq_s16(col2, col6); + int16x8_t tmp12 = vqdmulhq_lane_s16(col2_sub_col6, consts, 1); + tmp12 = vaddq_s16(tmp12, col2_sub_col6); + tmp12 = vsubq_s16(tmp12, tmp13); + + int16x8_t tmp0 = vaddq_s16(tmp10, tmp13); + int16x8_t tmp3 = vsubq_s16(tmp10, tmp13); + int16x8_t tmp1 = vaddq_s16(tmp11, tmp12); + int16x8_t tmp2 = vsubq_s16(tmp11, tmp12); + + /* Odd part */ + int16x8_t z13 = vaddq_s16(col5, col3); + int16x8_t neg_z10 = vsubq_s16(col3, col5); + int16x8_t z11 = vaddq_s16(col1, col7); + int16x8_t z12 = vsubq_s16(col1, col7); + + int16x8_t tmp7 = vaddq_s16(z11, z13); /* phase 5 */ + int16x8_t z11_sub_z13 = vsubq_s16(z11, z13); + tmp11 = vqdmulhq_lane_s16(z11_sub_z13, consts, 1); + tmp11 = vaddq_s16(tmp11, z11_sub_z13); + + int16x8_t z10_add_z12 = vsubq_s16(z12, neg_z10); + int16x8_t z5 = vqdmulhq_lane_s16(z10_add_z12, consts, 2); + z5 = vaddq_s16(z5, z10_add_z12); + tmp10 = vqdmulhq_lane_s16(z12, consts, 0); + tmp10 = vaddq_s16(tmp10, z12); + tmp10 = vsubq_s16(tmp10, z5); + tmp12 = vqdmulhq_lane_s16(neg_z10, consts, 3); + tmp12 = vaddq_s16(tmp12, vaddq_s16(neg_z10, neg_z10)); + tmp12 = vaddq_s16(tmp12, z5); + + int16x8_t tmp6 = vsubq_s16(tmp12, tmp7); /* phase 2 */ + int16x8_t tmp5 = vsubq_s16(tmp11, tmp6); + int16x8_t tmp4 = vaddq_s16(tmp10, tmp5); + + col0 = vaddq_s16(tmp0, tmp7); + col7 = vsubq_s16(tmp0, tmp7); + col1 = vaddq_s16(tmp1, tmp6); + col6 = vsubq_s16(tmp1, tmp6); + col2 = vaddq_s16(tmp2, tmp5); + col5 = vsubq_s16(tmp2, tmp5); + col4 = vaddq_s16(tmp3, tmp4); + col3 = vsubq_s16(tmp3, tmp4); + + /* Scale down by a factor of 8, narrowing to 8-bit. */ + int8x16_t cols_01_s8 = vcombine_s8(vqshrn_n_s16(col0, PASS1_BITS + 3), + vqshrn_n_s16(col1, PASS1_BITS + 3)); + int8x16_t cols_45_s8 = vcombine_s8(vqshrn_n_s16(col4, PASS1_BITS + 3), + vqshrn_n_s16(col5, PASS1_BITS + 3)); + int8x16_t cols_23_s8 = vcombine_s8(vqshrn_n_s16(col2, PASS1_BITS + 3), + vqshrn_n_s16(col3, PASS1_BITS + 3)); + int8x16_t cols_67_s8 = vcombine_s8(vqshrn_n_s16(col6, PASS1_BITS + 3), + vqshrn_n_s16(col7, PASS1_BITS + 3)); + /* Clamp to range [0-255]. */ + uint8x16_t cols_01 = + vreinterpretq_u8_s8 + (vaddq_s8(cols_01_s8, vreinterpretq_s8_u8(vdupq_n_u8(CENTERJSAMPLE)))); + uint8x16_t cols_45 = + vreinterpretq_u8_s8 + (vaddq_s8(cols_45_s8, vreinterpretq_s8_u8(vdupq_n_u8(CENTERJSAMPLE)))); + uint8x16_t cols_23 = + vreinterpretq_u8_s8 + (vaddq_s8(cols_23_s8, vreinterpretq_s8_u8(vdupq_n_u8(CENTERJSAMPLE)))); + uint8x16_t cols_67 = + vreinterpretq_u8_s8 + (vaddq_s8(cols_67_s8, vreinterpretq_s8_u8(vdupq_n_u8(CENTERJSAMPLE)))); + + /* Transpose block to prepare for store. */ + uint32x4x2_t cols_0415 = vzipq_u32(vreinterpretq_u32_u8(cols_01), + vreinterpretq_u32_u8(cols_45)); + uint32x4x2_t cols_2637 = vzipq_u32(vreinterpretq_u32_u8(cols_23), + vreinterpretq_u32_u8(cols_67)); + + uint8x16x2_t cols_0145 = vtrnq_u8(vreinterpretq_u8_u32(cols_0415.val[0]), + vreinterpretq_u8_u32(cols_0415.val[1])); + uint8x16x2_t cols_2367 = vtrnq_u8(vreinterpretq_u8_u32(cols_2637.val[0]), + vreinterpretq_u8_u32(cols_2637.val[1])); + uint16x8x2_t rows_0426 = vtrnq_u16(vreinterpretq_u16_u8(cols_0145.val[0]), + vreinterpretq_u16_u8(cols_2367.val[0])); + uint16x8x2_t rows_1537 = vtrnq_u16(vreinterpretq_u16_u8(cols_0145.val[1]), + vreinterpretq_u16_u8(cols_2367.val[1])); + + uint8x16_t rows_04 = vreinterpretq_u8_u16(rows_0426.val[0]); + uint8x16_t rows_15 = vreinterpretq_u8_u16(rows_1537.val[0]); + uint8x16_t rows_26 = vreinterpretq_u8_u16(rows_0426.val[1]); + uint8x16_t rows_37 = vreinterpretq_u8_u16(rows_1537.val[1]); + + JSAMPROW outptr0 = output_buf[0] + output_col; + JSAMPROW outptr1 = output_buf[1] + output_col; + JSAMPROW outptr2 = output_buf[2] + output_col; + JSAMPROW outptr3 = output_buf[3] + output_col; + JSAMPROW outptr4 = output_buf[4] + output_col; + JSAMPROW outptr5 = output_buf[5] + output_col; + JSAMPROW outptr6 = output_buf[6] + output_col; + JSAMPROW outptr7 = output_buf[7] + output_col; + + /* Store DCT block to memory. */ + vst1q_lane_u64((uint64_t *)outptr0, vreinterpretq_u64_u8(rows_04), 0); + vst1q_lane_u64((uint64_t *)outptr1, vreinterpretq_u64_u8(rows_15), 0); + vst1q_lane_u64((uint64_t *)outptr2, vreinterpretq_u64_u8(rows_26), 0); + vst1q_lane_u64((uint64_t *)outptr3, vreinterpretq_u64_u8(rows_37), 0); + vst1q_lane_u64((uint64_t *)outptr4, vreinterpretq_u64_u8(rows_04), 1); + vst1q_lane_u64((uint64_t *)outptr5, vreinterpretq_u64_u8(rows_15), 1); + vst1q_lane_u64((uint64_t *)outptr6, vreinterpretq_u64_u8(rows_26), 1); + vst1q_lane_u64((uint64_t *)outptr7, vreinterpretq_u64_u8(rows_37), 1); +} diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/jidctint-neon.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/jidctint-neon.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/jidctint-neon.c 1970-01-01 01:00:00.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/jidctint-neon.c 2021-11-20 03:41:33.400600418 +0000 @@ -0,0 +1,802 @@ +/* + * jidctint-neon.c - accurate integer IDCT (Arm Neon) + * + * Copyright (C) 2020, Arm Limited. All Rights Reserved. + * Copyright (C) 2020, D. R. Commander. All Rights Reserved. + * + * This software is provided 'as-is', without any express or implied + * warranty. In no event will the authors be held liable for any damages + * arising from the use of this software. + * + * Permission is granted to anyone to use this software for any purpose, + * including commercial applications, and to alter it and redistribute it + * freely, subject to the following restrictions: + * + * 1. The origin of this software must not be misrepresented; you must not + * claim that you wrote the original software. If you use this software + * in a product, an acknowledgment in the product documentation would be + * appreciated but is not required. + * 2. Altered source versions must be plainly marked as such, and must not be + * misrepresented as being the original software. + * 3. This notice may not be removed or altered from any source distribution. + */ + +#define JPEG_INTERNALS +#include "jconfigint.h" +#include "../../jinclude.h" +#include "../../jpeglib.h" +#include "../../jsimd.h" +#include "../../jdct.h" +#include "../../jsimddct.h" +#include "../jsimd.h" +#include "align.h" +#include "neon-compat.h" + +#include + + +#define CONST_BITS 13 +#define PASS1_BITS 2 + +#define DESCALE_P1 (CONST_BITS - PASS1_BITS) +#define DESCALE_P2 (CONST_BITS + PASS1_BITS + 3) + +/* The computation of the inverse DCT requires the use of constants known at + * compile time. Scaled integer constants are used to avoid floating-point + * arithmetic: + * 0.298631336 = 2446 * 2^-13 + * 0.390180644 = 3196 * 2^-13 + * 0.541196100 = 4433 * 2^-13 + * 0.765366865 = 6270 * 2^-13 + * 0.899976223 = 7373 * 2^-13 + * 1.175875602 = 9633 * 2^-13 + * 1.501321110 = 12299 * 2^-13 + * 1.847759065 = 15137 * 2^-13 + * 1.961570560 = 16069 * 2^-13 + * 2.053119869 = 16819 * 2^-13 + * 2.562915447 = 20995 * 2^-13 + * 3.072711026 = 25172 * 2^-13 + */ + +#define F_0_298 2446 +#define F_0_390 3196 +#define F_0_541 4433 +#define F_0_765 6270 +#define F_0_899 7373 +#define F_1_175 9633 +#define F_1_501 12299 +#define F_1_847 15137 +#define F_1_961 16069 +#define F_2_053 16819 +#define F_2_562 20995 +#define F_3_072 25172 + +#define F_1_175_MINUS_1_961 (F_1_175 - F_1_961) +#define F_1_175_MINUS_0_390 (F_1_175 - F_0_390) +#define F_0_541_MINUS_1_847 (F_0_541 - F_1_847) +#define F_3_072_MINUS_2_562 (F_3_072 - F_2_562) +#define F_0_298_MINUS_0_899 (F_0_298 - F_0_899) +#define F_1_501_MINUS_0_899 (F_1_501 - F_0_899) +#define F_2_053_MINUS_2_562 (F_2_053 - F_2_562) +#define F_0_541_PLUS_0_765 (F_0_541 + F_0_765) + + +ALIGN(16) static const int16_t jsimd_idct_islow_neon_consts[] = { + F_0_899, F_0_541, + F_2_562, F_0_298_MINUS_0_899, + F_1_501_MINUS_0_899, F_2_053_MINUS_2_562, + F_0_541_PLUS_0_765, F_1_175, + F_1_175_MINUS_0_390, F_0_541_MINUS_1_847, + F_3_072_MINUS_2_562, F_1_175_MINUS_1_961, + 0, 0, 0, 0 +}; + + +/* Forward declaration of regular and sparse IDCT helper functions */ + +static INLINE void jsimd_idct_islow_pass1_regular(int16x4_t row0, + int16x4_t row1, + int16x4_t row2, + int16x4_t row3, + int16x4_t row4, + int16x4_t row5, + int16x4_t row6, + int16x4_t row7, + int16x4_t quant_row0, + int16x4_t quant_row1, + int16x4_t quant_row2, + int16x4_t quant_row3, + int16x4_t quant_row4, + int16x4_t quant_row5, + int16x4_t quant_row6, + int16x4_t quant_row7, + int16_t *workspace_1, + int16_t *workspace_2); + +static INLINE void jsimd_idct_islow_pass1_sparse(int16x4_t row0, + int16x4_t row1, + int16x4_t row2, + int16x4_t row3, + int16x4_t quant_row0, + int16x4_t quant_row1, + int16x4_t quant_row2, + int16x4_t quant_row3, + int16_t *workspace_1, + int16_t *workspace_2); + +static INLINE void jsimd_idct_islow_pass2_regular(int16_t *workspace, + JSAMPARRAY output_buf, + JDIMENSION output_col, + unsigned buf_offset); + +static INLINE void jsimd_idct_islow_pass2_sparse(int16_t *workspace, + JSAMPARRAY output_buf, + JDIMENSION output_col, + unsigned buf_offset); + + +/* Perform dequantization and inverse DCT on one block of coefficients. For + * reference, the C implementation (jpeg_idct_slow()) can be found in + * jidctint.c. + * + * Optimization techniques used for fast data access: + * + * In each pass, the inverse DCT is computed for the left and right 4x8 halves + * of the DCT block. This avoids spilling due to register pressure, and the + * increased granularity allows for an optimized calculation depending on the + * values of the DCT coefficients. Between passes, intermediate data is stored + * in 4x8 workspace buffers. + * + * Transposing the 8x8 DCT block after each pass can be achieved by transposing + * each of the four 4x4 quadrants and swapping quadrants 1 and 2 (refer to the + * diagram below.) Swapping quadrants is cheap, since the second pass can just + * swap the workspace buffer pointers. + * + * +-------+-------+ +-------+-------+ + * | | | | | | + * | 0 | 1 | | 0 | 2 | + * | | | transpose | | | + * +-------+-------+ ------> +-------+-------+ + * | | | | | | + * | 2 | 3 | | 1 | 3 | + * | | | | | | + * +-------+-------+ +-------+-------+ + * + * Optimization techniques used to accelerate the inverse DCT calculation: + * + * In a DCT coefficient block, the coefficients are increasingly likely to be 0 + * as you move diagonally from top left to bottom right. If whole rows of + * coefficients are 0, then the inverse DCT calculation can be simplified. On + * the first pass of the inverse DCT, we test for three special cases before + * defaulting to a full "regular" inverse DCT: + * + * 1) Coefficients in rows 4-7 are all zero. In this case, we perform a + * "sparse" simplified inverse DCT on rows 0-3. + * 2) AC coefficients (rows 1-7) are all zero. In this case, the inverse DCT + * result is equal to the dequantized DC coefficients. + * 3) AC and DC coefficients are all zero. In this case, the inverse DCT + * result is all zero. For the left 4x8 half, this is handled identically + * to Case 2 above. For the right 4x8 half, we do no work and signal that + * the "sparse" algorithm is required for the second pass. + * + * In the second pass, only a single special case is tested: whether the AC and + * DC coefficients were all zero in the right 4x8 block during the first pass + * (refer to Case 3 above.) If this is the case, then a "sparse" variant of + * the second pass is performed for both the left and right halves of the DCT + * block. (The transposition after the first pass means that the right 4x8 + * block during the first pass becomes rows 4-7 during the second pass.) + */ + +void jsimd_idct_islow_neon(void *dct_table, JCOEFPTR coef_block, + JSAMPARRAY output_buf, JDIMENSION output_col) +{ + ISLOW_MULT_TYPE *quantptr = dct_table; + + int16_t workspace_l[8 * DCTSIZE / 2]; + int16_t workspace_r[8 * DCTSIZE / 2]; + + /* Compute IDCT first pass on left 4x8 coefficient block. */ + + /* Load DCT coefficients in left 4x8 block. */ + int16x4_t row0 = vld1_s16(coef_block + 0 * DCTSIZE); + int16x4_t row1 = vld1_s16(coef_block + 1 * DCTSIZE); + int16x4_t row2 = vld1_s16(coef_block + 2 * DCTSIZE); + int16x4_t row3 = vld1_s16(coef_block + 3 * DCTSIZE); + int16x4_t row4 = vld1_s16(coef_block + 4 * DCTSIZE); + int16x4_t row5 = vld1_s16(coef_block + 5 * DCTSIZE); + int16x4_t row6 = vld1_s16(coef_block + 6 * DCTSIZE); + int16x4_t row7 = vld1_s16(coef_block + 7 * DCTSIZE); + + /* Load quantization table for left 4x8 block. */ + int16x4_t quant_row0 = vld1_s16(quantptr + 0 * DCTSIZE); + int16x4_t quant_row1 = vld1_s16(quantptr + 1 * DCTSIZE); + int16x4_t quant_row2 = vld1_s16(quantptr + 2 * DCTSIZE); + int16x4_t quant_row3 = vld1_s16(quantptr + 3 * DCTSIZE); + int16x4_t quant_row4 = vld1_s16(quantptr + 4 * DCTSIZE); + int16x4_t quant_row5 = vld1_s16(quantptr + 5 * DCTSIZE); + int16x4_t quant_row6 = vld1_s16(quantptr + 6 * DCTSIZE); + int16x4_t quant_row7 = vld1_s16(quantptr + 7 * DCTSIZE); + + /* Construct bitmap to test if DCT coefficients in left 4x8 block are 0. */ + int16x4_t bitmap = vorr_s16(row7, row6); + bitmap = vorr_s16(bitmap, row5); + bitmap = vorr_s16(bitmap, row4); + int64_t bitmap_rows_4567 = vget_lane_s64(vreinterpret_s64_s16(bitmap), 0); + + if (bitmap_rows_4567 == 0) { + bitmap = vorr_s16(bitmap, row3); + bitmap = vorr_s16(bitmap, row2); + bitmap = vorr_s16(bitmap, row1); + int64_t left_ac_bitmap = vget_lane_s64(vreinterpret_s64_s16(bitmap), 0); + + if (left_ac_bitmap == 0) { + int16x4_t dcval = vshl_n_s16(vmul_s16(row0, quant_row0), PASS1_BITS); + int16x4x4_t quadrant = { { dcval, dcval, dcval, dcval } }; + /* Store 4x4 blocks to workspace, transposing in the process. */ + vst4_s16(workspace_l, quadrant); + vst4_s16(workspace_r, quadrant); + } else { + jsimd_idct_islow_pass1_sparse(row0, row1, row2, row3, quant_row0, + quant_row1, quant_row2, quant_row3, + workspace_l, workspace_r); + } + } else { + jsimd_idct_islow_pass1_regular(row0, row1, row2, row3, row4, row5, + row6, row7, quant_row0, quant_row1, + quant_row2, quant_row3, quant_row4, + quant_row5, quant_row6, quant_row7, + workspace_l, workspace_r); + } + + /* Compute IDCT first pass on right 4x8 coefficient block. */ + + /* Load DCT coefficients in right 4x8 block. */ + row0 = vld1_s16(coef_block + 0 * DCTSIZE + 4); + row1 = vld1_s16(coef_block + 1 * DCTSIZE + 4); + row2 = vld1_s16(coef_block + 2 * DCTSIZE + 4); + row3 = vld1_s16(coef_block + 3 * DCTSIZE + 4); + row4 = vld1_s16(coef_block + 4 * DCTSIZE + 4); + row5 = vld1_s16(coef_block + 5 * DCTSIZE + 4); + row6 = vld1_s16(coef_block + 6 * DCTSIZE + 4); + row7 = vld1_s16(coef_block + 7 * DCTSIZE + 4); + + /* Load quantization table for right 4x8 block. */ + quant_row0 = vld1_s16(quantptr + 0 * DCTSIZE + 4); + quant_row1 = vld1_s16(quantptr + 1 * DCTSIZE + 4); + quant_row2 = vld1_s16(quantptr + 2 * DCTSIZE + 4); + quant_row3 = vld1_s16(quantptr + 3 * DCTSIZE + 4); + quant_row4 = vld1_s16(quantptr + 4 * DCTSIZE + 4); + quant_row5 = vld1_s16(quantptr + 5 * DCTSIZE + 4); + quant_row6 = vld1_s16(quantptr + 6 * DCTSIZE + 4); + quant_row7 = vld1_s16(quantptr + 7 * DCTSIZE + 4); + + /* Construct bitmap to test if DCT coefficients in right 4x8 block are 0. */ + bitmap = vorr_s16(row7, row6); + bitmap = vorr_s16(bitmap, row5); + bitmap = vorr_s16(bitmap, row4); + bitmap_rows_4567 = vget_lane_s64(vreinterpret_s64_s16(bitmap), 0); + bitmap = vorr_s16(bitmap, row3); + bitmap = vorr_s16(bitmap, row2); + bitmap = vorr_s16(bitmap, row1); + int64_t right_ac_bitmap = vget_lane_s64(vreinterpret_s64_s16(bitmap), 0); + + /* If this remains non-zero, a "regular" second pass will be performed. */ + int64_t right_ac_dc_bitmap = 1; + + if (right_ac_bitmap == 0) { + bitmap = vorr_s16(bitmap, row0); + right_ac_dc_bitmap = vget_lane_s64(vreinterpret_s64_s16(bitmap), 0); + + if (right_ac_dc_bitmap != 0) { + int16x4_t dcval = vshl_n_s16(vmul_s16(row0, quant_row0), PASS1_BITS); + int16x4x4_t quadrant = { { dcval, dcval, dcval, dcval } }; + /* Store 4x4 blocks to workspace, transposing in the process. */ + vst4_s16(workspace_l + 4 * DCTSIZE / 2, quadrant); + vst4_s16(workspace_r + 4 * DCTSIZE / 2, quadrant); + } + } else { + if (bitmap_rows_4567 == 0) { + jsimd_idct_islow_pass1_sparse(row0, row1, row2, row3, quant_row0, + quant_row1, quant_row2, quant_row3, + workspace_l + 4 * DCTSIZE / 2, + workspace_r + 4 * DCTSIZE / 2); + } else { + jsimd_idct_islow_pass1_regular(row0, row1, row2, row3, row4, row5, + row6, row7, quant_row0, quant_row1, + quant_row2, quant_row3, quant_row4, + quant_row5, quant_row6, quant_row7, + workspace_l + 4 * DCTSIZE / 2, + workspace_r + 4 * DCTSIZE / 2); + } + } + + /* Second pass: compute IDCT on rows in workspace. */ + + /* If all coefficients in right 4x8 block are 0, use "sparse" second pass. */ + if (right_ac_dc_bitmap == 0) { + jsimd_idct_islow_pass2_sparse(workspace_l, output_buf, output_col, 0); + jsimd_idct_islow_pass2_sparse(workspace_r, output_buf, output_col, 4); + } else { + jsimd_idct_islow_pass2_regular(workspace_l, output_buf, output_col, 0); + jsimd_idct_islow_pass2_regular(workspace_r, output_buf, output_col, 4); + } +} + + +/* Perform dequantization and the first pass of the accurate inverse DCT on a + * 4x8 block of coefficients. (To process the full 8x8 DCT block, this + * function-- or some other optimized variant-- needs to be called for both the + * left and right 4x8 blocks.) + * + * This "regular" version assumes that no optimization can be made to the IDCT + * calculation, since no useful set of AC coefficients is all 0. + * + * The original C implementation of the accurate IDCT (jpeg_idct_slow()) can be + * found in jidctint.c. Algorithmic changes made here are documented inline. + */ + +static INLINE void jsimd_idct_islow_pass1_regular(int16x4_t row0, + int16x4_t row1, + int16x4_t row2, + int16x4_t row3, + int16x4_t row4, + int16x4_t row5, + int16x4_t row6, + int16x4_t row7, + int16x4_t quant_row0, + int16x4_t quant_row1, + int16x4_t quant_row2, + int16x4_t quant_row3, + int16x4_t quant_row4, + int16x4_t quant_row5, + int16x4_t quant_row6, + int16x4_t quant_row7, + int16_t *workspace_1, + int16_t *workspace_2) +{ + /* Load constants for IDCT computation. */ +#ifdef HAVE_VLD1_S16_X3 + const int16x4x3_t consts = vld1_s16_x3(jsimd_idct_islow_neon_consts); +#else + const int16x4_t consts1 = vld1_s16(jsimd_idct_islow_neon_consts); + const int16x4_t consts2 = vld1_s16(jsimd_idct_islow_neon_consts + 4); + const int16x4_t consts3 = vld1_s16(jsimd_idct_islow_neon_consts + 8); + const int16x4x3_t consts = { { consts1, consts2, consts3 } }; +#endif + + /* Even part */ + int16x4_t z2_s16 = vmul_s16(row2, quant_row2); + int16x4_t z3_s16 = vmul_s16(row6, quant_row6); + + int32x4_t tmp2 = vmull_lane_s16(z2_s16, consts.val[0], 1); + int32x4_t tmp3 = vmull_lane_s16(z2_s16, consts.val[1], 2); + tmp2 = vmlal_lane_s16(tmp2, z3_s16, consts.val[2], 1); + tmp3 = vmlal_lane_s16(tmp3, z3_s16, consts.val[0], 1); + + z2_s16 = vmul_s16(row0, quant_row0); + z3_s16 = vmul_s16(row4, quant_row4); + + int32x4_t tmp0 = vshll_n_s16(vadd_s16(z2_s16, z3_s16), CONST_BITS); + int32x4_t tmp1 = vshll_n_s16(vsub_s16(z2_s16, z3_s16), CONST_BITS); + + int32x4_t tmp10 = vaddq_s32(tmp0, tmp3); + int32x4_t tmp13 = vsubq_s32(tmp0, tmp3); + int32x4_t tmp11 = vaddq_s32(tmp1, tmp2); + int32x4_t tmp12 = vsubq_s32(tmp1, tmp2); + + /* Odd part */ + int16x4_t tmp0_s16 = vmul_s16(row7, quant_row7); + int16x4_t tmp1_s16 = vmul_s16(row5, quant_row5); + int16x4_t tmp2_s16 = vmul_s16(row3, quant_row3); + int16x4_t tmp3_s16 = vmul_s16(row1, quant_row1); + + z3_s16 = vadd_s16(tmp0_s16, tmp2_s16); + int16x4_t z4_s16 = vadd_s16(tmp1_s16, tmp3_s16); + + /* Implementation as per jpeg_idct_islow() in jidctint.c: + * z5 = (z3 + z4) * 1.175875602; + * z3 = z3 * -1.961570560; z4 = z4 * -0.390180644; + * z3 += z5; z4 += z5; + * + * This implementation: + * z3 = z3 * (1.175875602 - 1.961570560) + z4 * 1.175875602; + * z4 = z3 * 1.175875602 + z4 * (1.175875602 - 0.390180644); + */ + + int32x4_t z3 = vmull_lane_s16(z3_s16, consts.val[2], 3); + int32x4_t z4 = vmull_lane_s16(z3_s16, consts.val[1], 3); + z3 = vmlal_lane_s16(z3, z4_s16, consts.val[1], 3); + z4 = vmlal_lane_s16(z4, z4_s16, consts.val[2], 0); + + /* Implementation as per jpeg_idct_islow() in jidctint.c: + * z1 = tmp0 + tmp3; z2 = tmp1 + tmp2; + * tmp0 = tmp0 * 0.298631336; tmp1 = tmp1 * 2.053119869; + * tmp2 = tmp2 * 3.072711026; tmp3 = tmp3 * 1.501321110; + * z1 = z1 * -0.899976223; z2 = z2 * -2.562915447; + * tmp0 += z1 + z3; tmp1 += z2 + z4; + * tmp2 += z2 + z3; tmp3 += z1 + z4; + * + * This implementation: + * tmp0 = tmp0 * (0.298631336 - 0.899976223) + tmp3 * -0.899976223; + * tmp1 = tmp1 * (2.053119869 - 2.562915447) + tmp2 * -2.562915447; + * tmp2 = tmp1 * -2.562915447 + tmp2 * (3.072711026 - 2.562915447); + * tmp3 = tmp0 * -0.899976223 + tmp3 * (1.501321110 - 0.899976223); + * tmp0 += z3; tmp1 += z4; + * tmp2 += z3; tmp3 += z4; + */ + + tmp0 = vmull_lane_s16(tmp0_s16, consts.val[0], 3); + tmp1 = vmull_lane_s16(tmp1_s16, consts.val[1], 1); + tmp2 = vmull_lane_s16(tmp2_s16, consts.val[2], 2); + tmp3 = vmull_lane_s16(tmp3_s16, consts.val[1], 0); + + tmp0 = vmlsl_lane_s16(tmp0, tmp3_s16, consts.val[0], 0); + tmp1 = vmlsl_lane_s16(tmp1, tmp2_s16, consts.val[0], 2); + tmp2 = vmlsl_lane_s16(tmp2, tmp1_s16, consts.val[0], 2); + tmp3 = vmlsl_lane_s16(tmp3, tmp0_s16, consts.val[0], 0); + + tmp0 = vaddq_s32(tmp0, z3); + tmp1 = vaddq_s32(tmp1, z4); + tmp2 = vaddq_s32(tmp2, z3); + tmp3 = vaddq_s32(tmp3, z4); + + /* Final output stage: descale and narrow to 16-bit. */ + int16x4x4_t rows_0123 = { { + vrshrn_n_s32(vaddq_s32(tmp10, tmp3), DESCALE_P1), + vrshrn_n_s32(vaddq_s32(tmp11, tmp2), DESCALE_P1), + vrshrn_n_s32(vaddq_s32(tmp12, tmp1), DESCALE_P1), + vrshrn_n_s32(vaddq_s32(tmp13, tmp0), DESCALE_P1) + } }; + int16x4x4_t rows_4567 = { { + vrshrn_n_s32(vsubq_s32(tmp13, tmp0), DESCALE_P1), + vrshrn_n_s32(vsubq_s32(tmp12, tmp1), DESCALE_P1), + vrshrn_n_s32(vsubq_s32(tmp11, tmp2), DESCALE_P1), + vrshrn_n_s32(vsubq_s32(tmp10, tmp3), DESCALE_P1) + } }; + + /* Store 4x4 blocks to the intermediate workspace, ready for the second pass. + * (VST4 transposes the blocks. We need to operate on rows in the next + * pass.) + */ + vst4_s16(workspace_1, rows_0123); + vst4_s16(workspace_2, rows_4567); +} + + +/* Perform dequantization and the first pass of the accurate inverse DCT on a + * 4x8 block of coefficients. + * + * This "sparse" version assumes that the AC coefficients in rows 4-7 are all + * 0. This simplifies the IDCT calculation, accelerating overall performance. + */ + +static INLINE void jsimd_idct_islow_pass1_sparse(int16x4_t row0, + int16x4_t row1, + int16x4_t row2, + int16x4_t row3, + int16x4_t quant_row0, + int16x4_t quant_row1, + int16x4_t quant_row2, + int16x4_t quant_row3, + int16_t *workspace_1, + int16_t *workspace_2) +{ + /* Load constants for IDCT computation. */ +#ifdef HAVE_VLD1_S16_X3 + const int16x4x3_t consts = vld1_s16_x3(jsimd_idct_islow_neon_consts); +#else + const int16x4_t consts1 = vld1_s16(jsimd_idct_islow_neon_consts); + const int16x4_t consts2 = vld1_s16(jsimd_idct_islow_neon_consts + 4); + const int16x4_t consts3 = vld1_s16(jsimd_idct_islow_neon_consts + 8); + const int16x4x3_t consts = { { consts1, consts2, consts3 } }; +#endif + + /* Even part (z3 is all 0) */ + int16x4_t z2_s16 = vmul_s16(row2, quant_row2); + + int32x4_t tmp2 = vmull_lane_s16(z2_s16, consts.val[0], 1); + int32x4_t tmp3 = vmull_lane_s16(z2_s16, consts.val[1], 2); + + z2_s16 = vmul_s16(row0, quant_row0); + int32x4_t tmp0 = vshll_n_s16(z2_s16, CONST_BITS); + int32x4_t tmp1 = vshll_n_s16(z2_s16, CONST_BITS); + + int32x4_t tmp10 = vaddq_s32(tmp0, tmp3); + int32x4_t tmp13 = vsubq_s32(tmp0, tmp3); + int32x4_t tmp11 = vaddq_s32(tmp1, tmp2); + int32x4_t tmp12 = vsubq_s32(tmp1, tmp2); + + /* Odd part (tmp0 and tmp1 are both all 0) */ + int16x4_t tmp2_s16 = vmul_s16(row3, quant_row3); + int16x4_t tmp3_s16 = vmul_s16(row1, quant_row1); + + int16x4_t z3_s16 = tmp2_s16; + int16x4_t z4_s16 = tmp3_s16; + + int32x4_t z3 = vmull_lane_s16(z3_s16, consts.val[2], 3); + int32x4_t z4 = vmull_lane_s16(z3_s16, consts.val[1], 3); + z3 = vmlal_lane_s16(z3, z4_s16, consts.val[1], 3); + z4 = vmlal_lane_s16(z4, z4_s16, consts.val[2], 0); + + tmp0 = vmlsl_lane_s16(z3, tmp3_s16, consts.val[0], 0); + tmp1 = vmlsl_lane_s16(z4, tmp2_s16, consts.val[0], 2); + tmp2 = vmlal_lane_s16(z3, tmp2_s16, consts.val[2], 2); + tmp3 = vmlal_lane_s16(z4, tmp3_s16, consts.val[1], 0); + + /* Final output stage: descale and narrow to 16-bit. */ + int16x4x4_t rows_0123 = { { + vrshrn_n_s32(vaddq_s32(tmp10, tmp3), DESCALE_P1), + vrshrn_n_s32(vaddq_s32(tmp11, tmp2), DESCALE_P1), + vrshrn_n_s32(vaddq_s32(tmp12, tmp1), DESCALE_P1), + vrshrn_n_s32(vaddq_s32(tmp13, tmp0), DESCALE_P1) + } }; + int16x4x4_t rows_4567 = { { + vrshrn_n_s32(vsubq_s32(tmp13, tmp0), DESCALE_P1), + vrshrn_n_s32(vsubq_s32(tmp12, tmp1), DESCALE_P1), + vrshrn_n_s32(vsubq_s32(tmp11, tmp2), DESCALE_P1), + vrshrn_n_s32(vsubq_s32(tmp10, tmp3), DESCALE_P1) + } }; + + /* Store 4x4 blocks to the intermediate workspace, ready for the second pass. + * (VST4 transposes the blocks. We need to operate on rows in the next + * pass.) + */ + vst4_s16(workspace_1, rows_0123); + vst4_s16(workspace_2, rows_4567); +} + + +/* Perform the second pass of the accurate inverse DCT on a 4x8 block of + * coefficients. (To process the full 8x8 DCT block, this function-- or some + * other optimized variant-- needs to be called for both the right and left 4x8 + * blocks.) + * + * This "regular" version assumes that no optimization can be made to the IDCT + * calculation, since no useful set of coefficient values are all 0 after the + * first pass. + * + * Again, the original C implementation of the accurate IDCT (jpeg_idct_slow()) + * can be found in jidctint.c. Algorithmic changes made here are documented + * inline. + */ + +static INLINE void jsimd_idct_islow_pass2_regular(int16_t *workspace, + JSAMPARRAY output_buf, + JDIMENSION output_col, + unsigned buf_offset) +{ + /* Load constants for IDCT computation. */ +#ifdef HAVE_VLD1_S16_X3 + const int16x4x3_t consts = vld1_s16_x3(jsimd_idct_islow_neon_consts); +#else + const int16x4_t consts1 = vld1_s16(jsimd_idct_islow_neon_consts); + const int16x4_t consts2 = vld1_s16(jsimd_idct_islow_neon_consts + 4); + const int16x4_t consts3 = vld1_s16(jsimd_idct_islow_neon_consts + 8); + const int16x4x3_t consts = { { consts1, consts2, consts3 } }; +#endif + + /* Even part */ + int16x4_t z2_s16 = vld1_s16(workspace + 2 * DCTSIZE / 2); + int16x4_t z3_s16 = vld1_s16(workspace + 6 * DCTSIZE / 2); + + int32x4_t tmp2 = vmull_lane_s16(z2_s16, consts.val[0], 1); + int32x4_t tmp3 = vmull_lane_s16(z2_s16, consts.val[1], 2); + tmp2 = vmlal_lane_s16(tmp2, z3_s16, consts.val[2], 1); + tmp3 = vmlal_lane_s16(tmp3, z3_s16, consts.val[0], 1); + + z2_s16 = vld1_s16(workspace + 0 * DCTSIZE / 2); + z3_s16 = vld1_s16(workspace + 4 * DCTSIZE / 2); + + int32x4_t tmp0 = vshll_n_s16(vadd_s16(z2_s16, z3_s16), CONST_BITS); + int32x4_t tmp1 = vshll_n_s16(vsub_s16(z2_s16, z3_s16), CONST_BITS); + + int32x4_t tmp10 = vaddq_s32(tmp0, tmp3); + int32x4_t tmp13 = vsubq_s32(tmp0, tmp3); + int32x4_t tmp11 = vaddq_s32(tmp1, tmp2); + int32x4_t tmp12 = vsubq_s32(tmp1, tmp2); + + /* Odd part */ + int16x4_t tmp0_s16 = vld1_s16(workspace + 7 * DCTSIZE / 2); + int16x4_t tmp1_s16 = vld1_s16(workspace + 5 * DCTSIZE / 2); + int16x4_t tmp2_s16 = vld1_s16(workspace + 3 * DCTSIZE / 2); + int16x4_t tmp3_s16 = vld1_s16(workspace + 1 * DCTSIZE / 2); + + z3_s16 = vadd_s16(tmp0_s16, tmp2_s16); + int16x4_t z4_s16 = vadd_s16(tmp1_s16, tmp3_s16); + + /* Implementation as per jpeg_idct_islow() in jidctint.c: + * z5 = (z3 + z4) * 1.175875602; + * z3 = z3 * -1.961570560; z4 = z4 * -0.390180644; + * z3 += z5; z4 += z5; + * + * This implementation: + * z3 = z3 * (1.175875602 - 1.961570560) + z4 * 1.175875602; + * z4 = z3 * 1.175875602 + z4 * (1.175875602 - 0.390180644); + */ + + int32x4_t z3 = vmull_lane_s16(z3_s16, consts.val[2], 3); + int32x4_t z4 = vmull_lane_s16(z3_s16, consts.val[1], 3); + z3 = vmlal_lane_s16(z3, z4_s16, consts.val[1], 3); + z4 = vmlal_lane_s16(z4, z4_s16, consts.val[2], 0); + + /* Implementation as per jpeg_idct_islow() in jidctint.c: + * z1 = tmp0 + tmp3; z2 = tmp1 + tmp2; + * tmp0 = tmp0 * 0.298631336; tmp1 = tmp1 * 2.053119869; + * tmp2 = tmp2 * 3.072711026; tmp3 = tmp3 * 1.501321110; + * z1 = z1 * -0.899976223; z2 = z2 * -2.562915447; + * tmp0 += z1 + z3; tmp1 += z2 + z4; + * tmp2 += z2 + z3; tmp3 += z1 + z4; + * + * This implementation: + * tmp0 = tmp0 * (0.298631336 - 0.899976223) + tmp3 * -0.899976223; + * tmp1 = tmp1 * (2.053119869 - 2.562915447) + tmp2 * -2.562915447; + * tmp2 = tmp1 * -2.562915447 + tmp2 * (3.072711026 - 2.562915447); + * tmp3 = tmp0 * -0.899976223 + tmp3 * (1.501321110 - 0.899976223); + * tmp0 += z3; tmp1 += z4; + * tmp2 += z3; tmp3 += z4; + */ + + tmp0 = vmull_lane_s16(tmp0_s16, consts.val[0], 3); + tmp1 = vmull_lane_s16(tmp1_s16, consts.val[1], 1); + tmp2 = vmull_lane_s16(tmp2_s16, consts.val[2], 2); + tmp3 = vmull_lane_s16(tmp3_s16, consts.val[1], 0); + + tmp0 = vmlsl_lane_s16(tmp0, tmp3_s16, consts.val[0], 0); + tmp1 = vmlsl_lane_s16(tmp1, tmp2_s16, consts.val[0], 2); + tmp2 = vmlsl_lane_s16(tmp2, tmp1_s16, consts.val[0], 2); + tmp3 = vmlsl_lane_s16(tmp3, tmp0_s16, consts.val[0], 0); + + tmp0 = vaddq_s32(tmp0, z3); + tmp1 = vaddq_s32(tmp1, z4); + tmp2 = vaddq_s32(tmp2, z3); + tmp3 = vaddq_s32(tmp3, z4); + + /* Final output stage: descale and narrow to 16-bit. */ + int16x8_t cols_02_s16 = vcombine_s16(vaddhn_s32(tmp10, tmp3), + vaddhn_s32(tmp12, tmp1)); + int16x8_t cols_13_s16 = vcombine_s16(vaddhn_s32(tmp11, tmp2), + vaddhn_s32(tmp13, tmp0)); + int16x8_t cols_46_s16 = vcombine_s16(vsubhn_s32(tmp13, tmp0), + vsubhn_s32(tmp11, tmp2)); + int16x8_t cols_57_s16 = vcombine_s16(vsubhn_s32(tmp12, tmp1), + vsubhn_s32(tmp10, tmp3)); + /* Descale and narrow to 8-bit. */ + int8x8_t cols_02_s8 = vqrshrn_n_s16(cols_02_s16, DESCALE_P2 - 16); + int8x8_t cols_13_s8 = vqrshrn_n_s16(cols_13_s16, DESCALE_P2 - 16); + int8x8_t cols_46_s8 = vqrshrn_n_s16(cols_46_s16, DESCALE_P2 - 16); + int8x8_t cols_57_s8 = vqrshrn_n_s16(cols_57_s16, DESCALE_P2 - 16); + /* Clamp to range [0-255]. */ + uint8x8_t cols_02_u8 = vadd_u8(vreinterpret_u8_s8(cols_02_s8), + vdup_n_u8(CENTERJSAMPLE)); + uint8x8_t cols_13_u8 = vadd_u8(vreinterpret_u8_s8(cols_13_s8), + vdup_n_u8(CENTERJSAMPLE)); + uint8x8_t cols_46_u8 = vadd_u8(vreinterpret_u8_s8(cols_46_s8), + vdup_n_u8(CENTERJSAMPLE)); + uint8x8_t cols_57_u8 = vadd_u8(vreinterpret_u8_s8(cols_57_s8), + vdup_n_u8(CENTERJSAMPLE)); + + /* Transpose 4x8 block and store to memory. (Zipping adjacent columns + * together allows us to store 16-bit elements.) + */ + uint8x8x2_t cols_01_23 = vzip_u8(cols_02_u8, cols_13_u8); + uint8x8x2_t cols_45_67 = vzip_u8(cols_46_u8, cols_57_u8); + uint16x4x4_t cols_01_23_45_67 = { { + vreinterpret_u16_u8(cols_01_23.val[0]), + vreinterpret_u16_u8(cols_01_23.val[1]), + vreinterpret_u16_u8(cols_45_67.val[0]), + vreinterpret_u16_u8(cols_45_67.val[1]) + } }; + + JSAMPROW outptr0 = output_buf[buf_offset + 0] + output_col; + JSAMPROW outptr1 = output_buf[buf_offset + 1] + output_col; + JSAMPROW outptr2 = output_buf[buf_offset + 2] + output_col; + JSAMPROW outptr3 = output_buf[buf_offset + 3] + output_col; + /* VST4 of 16-bit elements completes the transpose. */ + vst4_lane_u16((uint16_t *)outptr0, cols_01_23_45_67, 0); + vst4_lane_u16((uint16_t *)outptr1, cols_01_23_45_67, 1); + vst4_lane_u16((uint16_t *)outptr2, cols_01_23_45_67, 2); + vst4_lane_u16((uint16_t *)outptr3, cols_01_23_45_67, 3); +} + + +/* Performs the second pass of the accurate inverse DCT on a 4x8 block + * of coefficients. + * + * This "sparse" version assumes that the coefficient values (after the first + * pass) in rows 4-7 are all 0. This simplifies the IDCT calculation, + * accelerating overall performance. + */ + +static INLINE void jsimd_idct_islow_pass2_sparse(int16_t *workspace, + JSAMPARRAY output_buf, + JDIMENSION output_col, + unsigned buf_offset) +{ + /* Load constants for IDCT computation. */ +#ifdef HAVE_VLD1_S16_X3 + const int16x4x3_t consts = vld1_s16_x3(jsimd_idct_islow_neon_consts); +#else + const int16x4_t consts1 = vld1_s16(jsimd_idct_islow_neon_consts); + const int16x4_t consts2 = vld1_s16(jsimd_idct_islow_neon_consts + 4); + const int16x4_t consts3 = vld1_s16(jsimd_idct_islow_neon_consts + 8); + const int16x4x3_t consts = { { consts1, consts2, consts3 } }; +#endif + + /* Even part (z3 is all 0) */ + int16x4_t z2_s16 = vld1_s16(workspace + 2 * DCTSIZE / 2); + + int32x4_t tmp2 = vmull_lane_s16(z2_s16, consts.val[0], 1); + int32x4_t tmp3 = vmull_lane_s16(z2_s16, consts.val[1], 2); + + z2_s16 = vld1_s16(workspace + 0 * DCTSIZE / 2); + int32x4_t tmp0 = vshll_n_s16(z2_s16, CONST_BITS); + int32x4_t tmp1 = vshll_n_s16(z2_s16, CONST_BITS); + + int32x4_t tmp10 = vaddq_s32(tmp0, tmp3); + int32x4_t tmp13 = vsubq_s32(tmp0, tmp3); + int32x4_t tmp11 = vaddq_s32(tmp1, tmp2); + int32x4_t tmp12 = vsubq_s32(tmp1, tmp2); + + /* Odd part (tmp0 and tmp1 are both all 0) */ + int16x4_t tmp2_s16 = vld1_s16(workspace + 3 * DCTSIZE / 2); + int16x4_t tmp3_s16 = vld1_s16(workspace + 1 * DCTSIZE / 2); + + int16x4_t z3_s16 = tmp2_s16; + int16x4_t z4_s16 = tmp3_s16; + + int32x4_t z3 = vmull_lane_s16(z3_s16, consts.val[2], 3); + z3 = vmlal_lane_s16(z3, z4_s16, consts.val[1], 3); + int32x4_t z4 = vmull_lane_s16(z3_s16, consts.val[1], 3); + z4 = vmlal_lane_s16(z4, z4_s16, consts.val[2], 0); + + tmp0 = vmlsl_lane_s16(z3, tmp3_s16, consts.val[0], 0); + tmp1 = vmlsl_lane_s16(z4, tmp2_s16, consts.val[0], 2); + tmp2 = vmlal_lane_s16(z3, tmp2_s16, consts.val[2], 2); + tmp3 = vmlal_lane_s16(z4, tmp3_s16, consts.val[1], 0); + + /* Final output stage: descale and narrow to 16-bit. */ + int16x8_t cols_02_s16 = vcombine_s16(vaddhn_s32(tmp10, tmp3), + vaddhn_s32(tmp12, tmp1)); + int16x8_t cols_13_s16 = vcombine_s16(vaddhn_s32(tmp11, tmp2), + vaddhn_s32(tmp13, tmp0)); + int16x8_t cols_46_s16 = vcombine_s16(vsubhn_s32(tmp13, tmp0), + vsubhn_s32(tmp11, tmp2)); + int16x8_t cols_57_s16 = vcombine_s16(vsubhn_s32(tmp12, tmp1), + vsubhn_s32(tmp10, tmp3)); + /* Descale and narrow to 8-bit. */ + int8x8_t cols_02_s8 = vqrshrn_n_s16(cols_02_s16, DESCALE_P2 - 16); + int8x8_t cols_13_s8 = vqrshrn_n_s16(cols_13_s16, DESCALE_P2 - 16); + int8x8_t cols_46_s8 = vqrshrn_n_s16(cols_46_s16, DESCALE_P2 - 16); + int8x8_t cols_57_s8 = vqrshrn_n_s16(cols_57_s16, DESCALE_P2 - 16); + /* Clamp to range [0-255]. */ + uint8x8_t cols_02_u8 = vadd_u8(vreinterpret_u8_s8(cols_02_s8), + vdup_n_u8(CENTERJSAMPLE)); + uint8x8_t cols_13_u8 = vadd_u8(vreinterpret_u8_s8(cols_13_s8), + vdup_n_u8(CENTERJSAMPLE)); + uint8x8_t cols_46_u8 = vadd_u8(vreinterpret_u8_s8(cols_46_s8), + vdup_n_u8(CENTERJSAMPLE)); + uint8x8_t cols_57_u8 = vadd_u8(vreinterpret_u8_s8(cols_57_s8), + vdup_n_u8(CENTERJSAMPLE)); + + /* Transpose 4x8 block and store to memory. (Zipping adjacent columns + * together allows us to store 16-bit elements.) + */ + uint8x8x2_t cols_01_23 = vzip_u8(cols_02_u8, cols_13_u8); + uint8x8x2_t cols_45_67 = vzip_u8(cols_46_u8, cols_57_u8); + uint16x4x4_t cols_01_23_45_67 = { { + vreinterpret_u16_u8(cols_01_23.val[0]), + vreinterpret_u16_u8(cols_01_23.val[1]), + vreinterpret_u16_u8(cols_45_67.val[0]), + vreinterpret_u16_u8(cols_45_67.val[1]) + } }; + + JSAMPROW outptr0 = output_buf[buf_offset + 0] + output_col; + JSAMPROW outptr1 = output_buf[buf_offset + 1] + output_col; + JSAMPROW outptr2 = output_buf[buf_offset + 2] + output_col; + JSAMPROW outptr3 = output_buf[buf_offset + 3] + output_col; + /* VST4 of 16-bit elements completes the transpose. */ + vst4_lane_u16((uint16_t *)outptr0, cols_01_23_45_67, 0); + vst4_lane_u16((uint16_t *)outptr1, cols_01_23_45_67, 1); + vst4_lane_u16((uint16_t *)outptr2, cols_01_23_45_67, 2); + vst4_lane_u16((uint16_t *)outptr3, cols_01_23_45_67, 3); +} diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/jidctred-neon.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/jidctred-neon.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/jidctred-neon.c 1970-01-01 01:00:00.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/jidctred-neon.c 2021-11-20 03:41:33.401600402 +0000 @@ -0,0 +1,486 @@ +/* + * jidctred-neon.c - reduced-size IDCT (Arm Neon) + * + * Copyright (C) 2020, Arm Limited. All Rights Reserved. + * Copyright (C) 2020, D. R. Commander. All Rights Reserved. + * + * This software is provided 'as-is', without any express or implied + * warranty. In no event will the authors be held liable for any damages + * arising from the use of this software. + * + * Permission is granted to anyone to use this software for any purpose, + * including commercial applications, and to alter it and redistribute it + * freely, subject to the following restrictions: + * + * 1. The origin of this software must not be misrepresented; you must not + * claim that you wrote the original software. If you use this software + * in a product, an acknowledgment in the product documentation would be + * appreciated but is not required. + * 2. Altered source versions must be plainly marked as such, and must not be + * misrepresented as being the original software. + * 3. This notice may not be removed or altered from any source distribution. + */ + +#define JPEG_INTERNALS +#include "../../jinclude.h" +#include "../../jpeglib.h" +#include "../../jsimd.h" +#include "../../jdct.h" +#include "../../jsimddct.h" +#include "../jsimd.h" +#include "align.h" +#include "neon-compat.h" + +#include + + +#define CONST_BITS 13 +#define PASS1_BITS 2 + +#define F_0_211 1730 +#define F_0_509 4176 +#define F_0_601 4926 +#define F_0_720 5906 +#define F_0_765 6270 +#define F_0_850 6967 +#define F_0_899 7373 +#define F_1_061 8697 +#define F_1_272 10426 +#define F_1_451 11893 +#define F_1_847 15137 +#define F_2_172 17799 +#define F_2_562 20995 +#define F_3_624 29692 + + +/* jsimd_idct_2x2_neon() is an inverse DCT function that produces reduced-size + * 2x2 output from an 8x8 DCT block. It uses the same calculations and + * produces exactly the same output as IJG's original jpeg_idct_2x2() function + * from jpeg-6b, which can be found in jidctred.c. + * + * Scaled integer constants are used to avoid floating-point arithmetic: + * 0.720959822 = 5906 * 2^-13 + * 0.850430095 = 6967 * 2^-13 + * 1.272758580 = 10426 * 2^-13 + * 3.624509785 = 29692 * 2^-13 + * + * See jidctred.c for further details of the 2x2 IDCT algorithm. Where + * possible, the variable names and comments here in jsimd_idct_2x2_neon() + * match up with those in jpeg_idct_2x2(). + */ + +ALIGN(16) static const int16_t jsimd_idct_2x2_neon_consts[] = { + -F_0_720, F_0_850, -F_1_272, F_3_624 +}; + +void jsimd_idct_2x2_neon(void *dct_table, JCOEFPTR coef_block, + JSAMPARRAY output_buf, JDIMENSION output_col) +{ + ISLOW_MULT_TYPE *quantptr = dct_table; + + /* Load DCT coefficients. */ + int16x8_t row0 = vld1q_s16(coef_block + 0 * DCTSIZE); + int16x8_t row1 = vld1q_s16(coef_block + 1 * DCTSIZE); + int16x8_t row3 = vld1q_s16(coef_block + 3 * DCTSIZE); + int16x8_t row5 = vld1q_s16(coef_block + 5 * DCTSIZE); + int16x8_t row7 = vld1q_s16(coef_block + 7 * DCTSIZE); + + /* Load quantization table values. */ + int16x8_t quant_row0 = vld1q_s16(quantptr + 0 * DCTSIZE); + int16x8_t quant_row1 = vld1q_s16(quantptr + 1 * DCTSIZE); + int16x8_t quant_row3 = vld1q_s16(quantptr + 3 * DCTSIZE); + int16x8_t quant_row5 = vld1q_s16(quantptr + 5 * DCTSIZE); + int16x8_t quant_row7 = vld1q_s16(quantptr + 7 * DCTSIZE); + + /* Dequantize DCT coefficients. */ + row0 = vmulq_s16(row0, quant_row0); + row1 = vmulq_s16(row1, quant_row1); + row3 = vmulq_s16(row3, quant_row3); + row5 = vmulq_s16(row5, quant_row5); + row7 = vmulq_s16(row7, quant_row7); + + /* Load IDCT conversion constants. */ + const int16x4_t consts = vld1_s16(jsimd_idct_2x2_neon_consts); + + /* Pass 1: process columns from input, put results in vectors row0 and + * row1. + */ + + /* Even part */ + int32x4_t tmp10_l = vshll_n_s16(vget_low_s16(row0), CONST_BITS + 2); + int32x4_t tmp10_h = vshll_n_s16(vget_high_s16(row0), CONST_BITS + 2); + + /* Odd part */ + int32x4_t tmp0_l = vmull_lane_s16(vget_low_s16(row1), consts, 3); + tmp0_l = vmlal_lane_s16(tmp0_l, vget_low_s16(row3), consts, 2); + tmp0_l = vmlal_lane_s16(tmp0_l, vget_low_s16(row5), consts, 1); + tmp0_l = vmlal_lane_s16(tmp0_l, vget_low_s16(row7), consts, 0); + int32x4_t tmp0_h = vmull_lane_s16(vget_high_s16(row1), consts, 3); + tmp0_h = vmlal_lane_s16(tmp0_h, vget_high_s16(row3), consts, 2); + tmp0_h = vmlal_lane_s16(tmp0_h, vget_high_s16(row5), consts, 1); + tmp0_h = vmlal_lane_s16(tmp0_h, vget_high_s16(row7), consts, 0); + + /* Final output stage: descale and narrow to 16-bit. */ + row0 = vcombine_s16(vrshrn_n_s32(vaddq_s32(tmp10_l, tmp0_l), CONST_BITS), + vrshrn_n_s32(vaddq_s32(tmp10_h, tmp0_h), CONST_BITS)); + row1 = vcombine_s16(vrshrn_n_s32(vsubq_s32(tmp10_l, tmp0_l), CONST_BITS), + vrshrn_n_s32(vsubq_s32(tmp10_h, tmp0_h), CONST_BITS)); + + /* Transpose two rows, ready for second pass. */ + int16x8x2_t cols_0246_1357 = vtrnq_s16(row0, row1); + int16x8_t cols_0246 = cols_0246_1357.val[0]; + int16x8_t cols_1357 = cols_0246_1357.val[1]; + /* Duplicate columns such that each is accessible in its own vector. */ + int32x4x2_t cols_1155_3377 = vtrnq_s32(vreinterpretq_s32_s16(cols_1357), + vreinterpretq_s32_s16(cols_1357)); + int16x8_t cols_1155 = vreinterpretq_s16_s32(cols_1155_3377.val[0]); + int16x8_t cols_3377 = vreinterpretq_s16_s32(cols_1155_3377.val[1]); + + /* Pass 2: process two rows, store to output array. */ + + /* Even part: we're only interested in col0; the top half of tmp10 is "don't + * care." + */ + int32x4_t tmp10 = vshll_n_s16(vget_low_s16(cols_0246), CONST_BITS + 2); + + /* Odd part: we're only interested in the bottom half of tmp0. */ + int32x4_t tmp0 = vmull_lane_s16(vget_low_s16(cols_1155), consts, 3); + tmp0 = vmlal_lane_s16(tmp0, vget_low_s16(cols_3377), consts, 2); + tmp0 = vmlal_lane_s16(tmp0, vget_high_s16(cols_1155), consts, 1); + tmp0 = vmlal_lane_s16(tmp0, vget_high_s16(cols_3377), consts, 0); + + /* Final output stage: descale and clamp to range [0-255]. */ + int16x8_t output_s16 = vcombine_s16(vaddhn_s32(tmp10, tmp0), + vsubhn_s32(tmp10, tmp0)); + output_s16 = vrsraq_n_s16(vdupq_n_s16(CENTERJSAMPLE), output_s16, + CONST_BITS + PASS1_BITS + 3 + 2 - 16); + /* Narrow to 8-bit and convert to unsigned. */ + uint8x8_t output_u8 = vqmovun_s16(output_s16); + + /* Store 2x2 block to memory. */ + vst1_lane_u8(output_buf[0] + output_col, output_u8, 0); + vst1_lane_u8(output_buf[1] + output_col, output_u8, 1); + vst1_lane_u8(output_buf[0] + output_col + 1, output_u8, 4); + vst1_lane_u8(output_buf[1] + output_col + 1, output_u8, 5); +} + + +/* jsimd_idct_4x4_neon() is an inverse DCT function that produces reduced-size + * 4x4 output from an 8x8 DCT block. It uses the same calculations and + * produces exactly the same output as IJG's original jpeg_idct_4x4() function + * from jpeg-6b, which can be found in jidctred.c. + * + * Scaled integer constants are used to avoid floating-point arithmetic: + * 0.211164243 = 1730 * 2^-13 + * 0.509795579 = 4176 * 2^-13 + * 0.601344887 = 4926 * 2^-13 + * 0.765366865 = 6270 * 2^-13 + * 0.899976223 = 7373 * 2^-13 + * 1.061594337 = 8697 * 2^-13 + * 1.451774981 = 11893 * 2^-13 + * 1.847759065 = 15137 * 2^-13 + * 2.172734803 = 17799 * 2^-13 + * 2.562915447 = 20995 * 2^-13 + * + * See jidctred.c for further details of the 4x4 IDCT algorithm. Where + * possible, the variable names and comments here in jsimd_idct_4x4_neon() + * match up with those in jpeg_idct_4x4(). + */ + +ALIGN(16) static const int16_t jsimd_idct_4x4_neon_consts[] = { + F_1_847, -F_0_765, -F_0_211, F_1_451, + -F_2_172, F_1_061, -F_0_509, -F_0_601, + F_0_899, F_2_562, 0, 0 +}; + +void jsimd_idct_4x4_neon(void *dct_table, JCOEFPTR coef_block, + JSAMPARRAY output_buf, JDIMENSION output_col) +{ + ISLOW_MULT_TYPE *quantptr = dct_table; + + /* Load DCT coefficients. */ + int16x8_t row0 = vld1q_s16(coef_block + 0 * DCTSIZE); + int16x8_t row1 = vld1q_s16(coef_block + 1 * DCTSIZE); + int16x8_t row2 = vld1q_s16(coef_block + 2 * DCTSIZE); + int16x8_t row3 = vld1q_s16(coef_block + 3 * DCTSIZE); + int16x8_t row5 = vld1q_s16(coef_block + 5 * DCTSIZE); + int16x8_t row6 = vld1q_s16(coef_block + 6 * DCTSIZE); + int16x8_t row7 = vld1q_s16(coef_block + 7 * DCTSIZE); + + /* Load quantization table values for DC coefficients. */ + int16x8_t quant_row0 = vld1q_s16(quantptr + 0 * DCTSIZE); + /* Dequantize DC coefficients. */ + row0 = vmulq_s16(row0, quant_row0); + + /* Construct bitmap to test if all AC coefficients are 0. */ + int16x8_t bitmap = vorrq_s16(row1, row2); + bitmap = vorrq_s16(bitmap, row3); + bitmap = vorrq_s16(bitmap, row5); + bitmap = vorrq_s16(bitmap, row6); + bitmap = vorrq_s16(bitmap, row7); + + int64_t left_ac_bitmap = vgetq_lane_s64(vreinterpretq_s64_s16(bitmap), 0); + int64_t right_ac_bitmap = vgetq_lane_s64(vreinterpretq_s64_s16(bitmap), 1); + + /* Load constants for IDCT computation. */ +#ifdef HAVE_VLD1_S16_X3 + const int16x4x3_t consts = vld1_s16_x3(jsimd_idct_4x4_neon_consts); +#else + /* GCC does not currently support the intrinsic vld1__x3(). */ + const int16x4_t consts1 = vld1_s16(jsimd_idct_4x4_neon_consts); + const int16x4_t consts2 = vld1_s16(jsimd_idct_4x4_neon_consts + 4); + const int16x4_t consts3 = vld1_s16(jsimd_idct_4x4_neon_consts + 8); + const int16x4x3_t consts = { { consts1, consts2, consts3 } }; +#endif + + if (left_ac_bitmap == 0 && right_ac_bitmap == 0) { + /* All AC coefficients are zero. + * Compute DC values and duplicate into row vectors 0, 1, 2, and 3. + */ + int16x8_t dcval = vshlq_n_s16(row0, PASS1_BITS); + row0 = dcval; + row1 = dcval; + row2 = dcval; + row3 = dcval; + } else if (left_ac_bitmap == 0) { + /* AC coefficients are zero for columns 0, 1, 2, and 3. + * Compute DC values for these columns. + */ + int16x4_t dcval = vshl_n_s16(vget_low_s16(row0), PASS1_BITS); + + /* Commence regular IDCT computation for columns 4, 5, 6, and 7. */ + + /* Load quantization table. */ + int16x4_t quant_row1 = vld1_s16(quantptr + 1 * DCTSIZE + 4); + int16x4_t quant_row2 = vld1_s16(quantptr + 2 * DCTSIZE + 4); + int16x4_t quant_row3 = vld1_s16(quantptr + 3 * DCTSIZE + 4); + int16x4_t quant_row5 = vld1_s16(quantptr + 5 * DCTSIZE + 4); + int16x4_t quant_row6 = vld1_s16(quantptr + 6 * DCTSIZE + 4); + int16x4_t quant_row7 = vld1_s16(quantptr + 7 * DCTSIZE + 4); + + /* Even part */ + int32x4_t tmp0 = vshll_n_s16(vget_high_s16(row0), CONST_BITS + 1); + + int16x4_t z2 = vmul_s16(vget_high_s16(row2), quant_row2); + int16x4_t z3 = vmul_s16(vget_high_s16(row6), quant_row6); + + int32x4_t tmp2 = vmull_lane_s16(z2, consts.val[0], 0); + tmp2 = vmlal_lane_s16(tmp2, z3, consts.val[0], 1); + + int32x4_t tmp10 = vaddq_s32(tmp0, tmp2); + int32x4_t tmp12 = vsubq_s32(tmp0, tmp2); + + /* Odd part */ + int16x4_t z1 = vmul_s16(vget_high_s16(row7), quant_row7); + z2 = vmul_s16(vget_high_s16(row5), quant_row5); + z3 = vmul_s16(vget_high_s16(row3), quant_row3); + int16x4_t z4 = vmul_s16(vget_high_s16(row1), quant_row1); + + tmp0 = vmull_lane_s16(z1, consts.val[0], 2); + tmp0 = vmlal_lane_s16(tmp0, z2, consts.val[0], 3); + tmp0 = vmlal_lane_s16(tmp0, z3, consts.val[1], 0); + tmp0 = vmlal_lane_s16(tmp0, z4, consts.val[1], 1); + + tmp2 = vmull_lane_s16(z1, consts.val[1], 2); + tmp2 = vmlal_lane_s16(tmp2, z2, consts.val[1], 3); + tmp2 = vmlal_lane_s16(tmp2, z3, consts.val[2], 0); + tmp2 = vmlal_lane_s16(tmp2, z4, consts.val[2], 1); + + /* Final output stage: descale and narrow to 16-bit. */ + row0 = vcombine_s16(dcval, vrshrn_n_s32(vaddq_s32(tmp10, tmp2), + CONST_BITS - PASS1_BITS + 1)); + row3 = vcombine_s16(dcval, vrshrn_n_s32(vsubq_s32(tmp10, tmp2), + CONST_BITS - PASS1_BITS + 1)); + row1 = vcombine_s16(dcval, vrshrn_n_s32(vaddq_s32(tmp12, tmp0), + CONST_BITS - PASS1_BITS + 1)); + row2 = vcombine_s16(dcval, vrshrn_n_s32(vsubq_s32(tmp12, tmp0), + CONST_BITS - PASS1_BITS + 1)); + } else if (right_ac_bitmap == 0) { + /* AC coefficients are zero for columns 4, 5, 6, and 7. + * Compute DC values for these columns. + */ + int16x4_t dcval = vshl_n_s16(vget_high_s16(row0), PASS1_BITS); + + /* Commence regular IDCT computation for columns 0, 1, 2, and 3. */ + + /* Load quantization table. */ + int16x4_t quant_row1 = vld1_s16(quantptr + 1 * DCTSIZE); + int16x4_t quant_row2 = vld1_s16(quantptr + 2 * DCTSIZE); + int16x4_t quant_row3 = vld1_s16(quantptr + 3 * DCTSIZE); + int16x4_t quant_row5 = vld1_s16(quantptr + 5 * DCTSIZE); + int16x4_t quant_row6 = vld1_s16(quantptr + 6 * DCTSIZE); + int16x4_t quant_row7 = vld1_s16(quantptr + 7 * DCTSIZE); + + /* Even part */ + int32x4_t tmp0 = vshll_n_s16(vget_low_s16(row0), CONST_BITS + 1); + + int16x4_t z2 = vmul_s16(vget_low_s16(row2), quant_row2); + int16x4_t z3 = vmul_s16(vget_low_s16(row6), quant_row6); + + int32x4_t tmp2 = vmull_lane_s16(z2, consts.val[0], 0); + tmp2 = vmlal_lane_s16(tmp2, z3, consts.val[0], 1); + + int32x4_t tmp10 = vaddq_s32(tmp0, tmp2); + int32x4_t tmp12 = vsubq_s32(tmp0, tmp2); + + /* Odd part */ + int16x4_t z1 = vmul_s16(vget_low_s16(row7), quant_row7); + z2 = vmul_s16(vget_low_s16(row5), quant_row5); + z3 = vmul_s16(vget_low_s16(row3), quant_row3); + int16x4_t z4 = vmul_s16(vget_low_s16(row1), quant_row1); + + tmp0 = vmull_lane_s16(z1, consts.val[0], 2); + tmp0 = vmlal_lane_s16(tmp0, z2, consts.val[0], 3); + tmp0 = vmlal_lane_s16(tmp0, z3, consts.val[1], 0); + tmp0 = vmlal_lane_s16(tmp0, z4, consts.val[1], 1); + + tmp2 = vmull_lane_s16(z1, consts.val[1], 2); + tmp2 = vmlal_lane_s16(tmp2, z2, consts.val[1], 3); + tmp2 = vmlal_lane_s16(tmp2, z3, consts.val[2], 0); + tmp2 = vmlal_lane_s16(tmp2, z4, consts.val[2], 1); + + /* Final output stage: descale and narrow to 16-bit. */ + row0 = vcombine_s16(vrshrn_n_s32(vaddq_s32(tmp10, tmp2), + CONST_BITS - PASS1_BITS + 1), dcval); + row3 = vcombine_s16(vrshrn_n_s32(vsubq_s32(tmp10, tmp2), + CONST_BITS - PASS1_BITS + 1), dcval); + row1 = vcombine_s16(vrshrn_n_s32(vaddq_s32(tmp12, tmp0), + CONST_BITS - PASS1_BITS + 1), dcval); + row2 = vcombine_s16(vrshrn_n_s32(vsubq_s32(tmp12, tmp0), + CONST_BITS - PASS1_BITS + 1), dcval); + } else { + /* All AC coefficients are non-zero; full IDCT calculation required. */ + int16x8_t quant_row1 = vld1q_s16(quantptr + 1 * DCTSIZE); + int16x8_t quant_row2 = vld1q_s16(quantptr + 2 * DCTSIZE); + int16x8_t quant_row3 = vld1q_s16(quantptr + 3 * DCTSIZE); + int16x8_t quant_row5 = vld1q_s16(quantptr + 5 * DCTSIZE); + int16x8_t quant_row6 = vld1q_s16(quantptr + 6 * DCTSIZE); + int16x8_t quant_row7 = vld1q_s16(quantptr + 7 * DCTSIZE); + + /* Even part */ + int32x4_t tmp0_l = vshll_n_s16(vget_low_s16(row0), CONST_BITS + 1); + int32x4_t tmp0_h = vshll_n_s16(vget_high_s16(row0), CONST_BITS + 1); + + int16x8_t z2 = vmulq_s16(row2, quant_row2); + int16x8_t z3 = vmulq_s16(row6, quant_row6); + + int32x4_t tmp2_l = vmull_lane_s16(vget_low_s16(z2), consts.val[0], 0); + int32x4_t tmp2_h = vmull_lane_s16(vget_high_s16(z2), consts.val[0], 0); + tmp2_l = vmlal_lane_s16(tmp2_l, vget_low_s16(z3), consts.val[0], 1); + tmp2_h = vmlal_lane_s16(tmp2_h, vget_high_s16(z3), consts.val[0], 1); + + int32x4_t tmp10_l = vaddq_s32(tmp0_l, tmp2_l); + int32x4_t tmp10_h = vaddq_s32(tmp0_h, tmp2_h); + int32x4_t tmp12_l = vsubq_s32(tmp0_l, tmp2_l); + int32x4_t tmp12_h = vsubq_s32(tmp0_h, tmp2_h); + + /* Odd part */ + int16x8_t z1 = vmulq_s16(row7, quant_row7); + z2 = vmulq_s16(row5, quant_row5); + z3 = vmulq_s16(row3, quant_row3); + int16x8_t z4 = vmulq_s16(row1, quant_row1); + + tmp0_l = vmull_lane_s16(vget_low_s16(z1), consts.val[0], 2); + tmp0_l = vmlal_lane_s16(tmp0_l, vget_low_s16(z2), consts.val[0], 3); + tmp0_l = vmlal_lane_s16(tmp0_l, vget_low_s16(z3), consts.val[1], 0); + tmp0_l = vmlal_lane_s16(tmp0_l, vget_low_s16(z4), consts.val[1], 1); + tmp0_h = vmull_lane_s16(vget_high_s16(z1), consts.val[0], 2); + tmp0_h = vmlal_lane_s16(tmp0_h, vget_high_s16(z2), consts.val[0], 3); + tmp0_h = vmlal_lane_s16(tmp0_h, vget_high_s16(z3), consts.val[1], 0); + tmp0_h = vmlal_lane_s16(tmp0_h, vget_high_s16(z4), consts.val[1], 1); + + tmp2_l = vmull_lane_s16(vget_low_s16(z1), consts.val[1], 2); + tmp2_l = vmlal_lane_s16(tmp2_l, vget_low_s16(z2), consts.val[1], 3); + tmp2_l = vmlal_lane_s16(tmp2_l, vget_low_s16(z3), consts.val[2], 0); + tmp2_l = vmlal_lane_s16(tmp2_l, vget_low_s16(z4), consts.val[2], 1); + tmp2_h = vmull_lane_s16(vget_high_s16(z1), consts.val[1], 2); + tmp2_h = vmlal_lane_s16(tmp2_h, vget_high_s16(z2), consts.val[1], 3); + tmp2_h = vmlal_lane_s16(tmp2_h, vget_high_s16(z3), consts.val[2], 0); + tmp2_h = vmlal_lane_s16(tmp2_h, vget_high_s16(z4), consts.val[2], 1); + + /* Final output stage: descale and narrow to 16-bit. */ + row0 = vcombine_s16(vrshrn_n_s32(vaddq_s32(tmp10_l, tmp2_l), + CONST_BITS - PASS1_BITS + 1), + vrshrn_n_s32(vaddq_s32(tmp10_h, tmp2_h), + CONST_BITS - PASS1_BITS + 1)); + row3 = vcombine_s16(vrshrn_n_s32(vsubq_s32(tmp10_l, tmp2_l), + CONST_BITS - PASS1_BITS + 1), + vrshrn_n_s32(vsubq_s32(tmp10_h, tmp2_h), + CONST_BITS - PASS1_BITS + 1)); + row1 = vcombine_s16(vrshrn_n_s32(vaddq_s32(tmp12_l, tmp0_l), + CONST_BITS - PASS1_BITS + 1), + vrshrn_n_s32(vaddq_s32(tmp12_h, tmp0_h), + CONST_BITS - PASS1_BITS + 1)); + row2 = vcombine_s16(vrshrn_n_s32(vsubq_s32(tmp12_l, tmp0_l), + CONST_BITS - PASS1_BITS + 1), + vrshrn_n_s32(vsubq_s32(tmp12_h, tmp0_h), + CONST_BITS - PASS1_BITS + 1)); + } + + /* Transpose 8x4 block to perform IDCT on rows in second pass. */ + int16x8x2_t row_01 = vtrnq_s16(row0, row1); + int16x8x2_t row_23 = vtrnq_s16(row2, row3); + + int32x4x2_t cols_0426 = vtrnq_s32(vreinterpretq_s32_s16(row_01.val[0]), + vreinterpretq_s32_s16(row_23.val[0])); + int32x4x2_t cols_1537 = vtrnq_s32(vreinterpretq_s32_s16(row_01.val[1]), + vreinterpretq_s32_s16(row_23.val[1])); + + int16x4_t col0 = vreinterpret_s16_s32(vget_low_s32(cols_0426.val[0])); + int16x4_t col1 = vreinterpret_s16_s32(vget_low_s32(cols_1537.val[0])); + int16x4_t col2 = vreinterpret_s16_s32(vget_low_s32(cols_0426.val[1])); + int16x4_t col3 = vreinterpret_s16_s32(vget_low_s32(cols_1537.val[1])); + int16x4_t col5 = vreinterpret_s16_s32(vget_high_s32(cols_1537.val[0])); + int16x4_t col6 = vreinterpret_s16_s32(vget_high_s32(cols_0426.val[1])); + int16x4_t col7 = vreinterpret_s16_s32(vget_high_s32(cols_1537.val[1])); + + /* Commence second pass of IDCT. */ + + /* Even part */ + int32x4_t tmp0 = vshll_n_s16(col0, CONST_BITS + 1); + int32x4_t tmp2 = vmull_lane_s16(col2, consts.val[0], 0); + tmp2 = vmlal_lane_s16(tmp2, col6, consts.val[0], 1); + + int32x4_t tmp10 = vaddq_s32(tmp0, tmp2); + int32x4_t tmp12 = vsubq_s32(tmp0, tmp2); + + /* Odd part */ + tmp0 = vmull_lane_s16(col7, consts.val[0], 2); + tmp0 = vmlal_lane_s16(tmp0, col5, consts.val[0], 3); + tmp0 = vmlal_lane_s16(tmp0, col3, consts.val[1], 0); + tmp0 = vmlal_lane_s16(tmp0, col1, consts.val[1], 1); + + tmp2 = vmull_lane_s16(col7, consts.val[1], 2); + tmp2 = vmlal_lane_s16(tmp2, col5, consts.val[1], 3); + tmp2 = vmlal_lane_s16(tmp2, col3, consts.val[2], 0); + tmp2 = vmlal_lane_s16(tmp2, col1, consts.val[2], 1); + + /* Final output stage: descale and clamp to range [0-255]. */ + int16x8_t output_cols_02 = vcombine_s16(vaddhn_s32(tmp10, tmp2), + vsubhn_s32(tmp12, tmp0)); + int16x8_t output_cols_13 = vcombine_s16(vaddhn_s32(tmp12, tmp0), + vsubhn_s32(tmp10, tmp2)); + output_cols_02 = vrsraq_n_s16(vdupq_n_s16(CENTERJSAMPLE), output_cols_02, + CONST_BITS + PASS1_BITS + 3 + 1 - 16); + output_cols_13 = vrsraq_n_s16(vdupq_n_s16(CENTERJSAMPLE), output_cols_13, + CONST_BITS + PASS1_BITS + 3 + 1 - 16); + /* Narrow to 8-bit and convert to unsigned while zipping 8-bit elements. + * An interleaving store completes the transpose. + */ + uint8x8x2_t output_0123 = vzip_u8(vqmovun_s16(output_cols_02), + vqmovun_s16(output_cols_13)); + uint16x4x2_t output_01_23 = { { + vreinterpret_u16_u8(output_0123.val[0]), + vreinterpret_u16_u8(output_0123.val[1]) + } }; + + /* Store 4x4 block to memory. */ + JSAMPROW outptr0 = output_buf[0] + output_col; + JSAMPROW outptr1 = output_buf[1] + output_col; + JSAMPROW outptr2 = output_buf[2] + output_col; + JSAMPROW outptr3 = output_buf[3] + output_col; + vst2_lane_u16((uint16_t *)outptr0, output_01_23, 0); + vst2_lane_u16((uint16_t *)outptr1, output_01_23, 1); + vst2_lane_u16((uint16_t *)outptr2, output_01_23, 2); + vst2_lane_u16((uint16_t *)outptr3, output_01_23, 3); +} diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/jquanti-neon.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/jquanti-neon.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/jquanti-neon.c 1970-01-01 01:00:00.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/jquanti-neon.c 2021-11-20 03:41:33.401600402 +0000 @@ -0,0 +1,190 @@ +/* + * jquanti-neon.c - sample data conversion and quantization (Arm Neon) + * + * Copyright (C) 2020, Arm Limited. All Rights Reserved. + * + * This software is provided 'as-is', without any express or implied + * warranty. In no event will the authors be held liable for any damages + * arising from the use of this software. + * + * Permission is granted to anyone to use this software for any purpose, + * including commercial applications, and to alter it and redistribute it + * freely, subject to the following restrictions: + * + * 1. The origin of this software must not be misrepresented; you must not + * claim that you wrote the original software. If you use this software + * in a product, an acknowledgment in the product documentation would be + * appreciated but is not required. + * 2. Altered source versions must be plainly marked as such, and must not be + * misrepresented as being the original software. + * 3. This notice may not be removed or altered from any source distribution. + */ + +#define JPEG_INTERNALS +#include "../../jinclude.h" +#include "../../jpeglib.h" +#include "../../jsimd.h" +#include "../../jdct.h" +#include "../../jsimddct.h" +#include "../jsimd.h" + +#include + + +/* After downsampling, the resulting sample values are in the range [0, 255], + * but the Discrete Cosine Transform (DCT) operates on values centered around + * 0. + * + * To prepare sample values for the DCT, load samples into a DCT workspace, + * subtracting CENTERJSAMPLE (128). The samples, now in the range [-128, 127], + * are also widened from 8- to 16-bit. + * + * The equivalent scalar C function convsamp() can be found in jcdctmgr.c. + */ + +void jsimd_convsamp_neon(JSAMPARRAY sample_data, JDIMENSION start_col, + DCTELEM *workspace) +{ + uint8x8_t samp_row0 = vld1_u8(sample_data[0] + start_col); + uint8x8_t samp_row1 = vld1_u8(sample_data[1] + start_col); + uint8x8_t samp_row2 = vld1_u8(sample_data[2] + start_col); + uint8x8_t samp_row3 = vld1_u8(sample_data[3] + start_col); + uint8x8_t samp_row4 = vld1_u8(sample_data[4] + start_col); + uint8x8_t samp_row5 = vld1_u8(sample_data[5] + start_col); + uint8x8_t samp_row6 = vld1_u8(sample_data[6] + start_col); + uint8x8_t samp_row7 = vld1_u8(sample_data[7] + start_col); + + int16x8_t row0 = + vreinterpretq_s16_u16(vsubl_u8(samp_row0, vdup_n_u8(CENTERJSAMPLE))); + int16x8_t row1 = + vreinterpretq_s16_u16(vsubl_u8(samp_row1, vdup_n_u8(CENTERJSAMPLE))); + int16x8_t row2 = + vreinterpretq_s16_u16(vsubl_u8(samp_row2, vdup_n_u8(CENTERJSAMPLE))); + int16x8_t row3 = + vreinterpretq_s16_u16(vsubl_u8(samp_row3, vdup_n_u8(CENTERJSAMPLE))); + int16x8_t row4 = + vreinterpretq_s16_u16(vsubl_u8(samp_row4, vdup_n_u8(CENTERJSAMPLE))); + int16x8_t row5 = + vreinterpretq_s16_u16(vsubl_u8(samp_row5, vdup_n_u8(CENTERJSAMPLE))); + int16x8_t row6 = + vreinterpretq_s16_u16(vsubl_u8(samp_row6, vdup_n_u8(CENTERJSAMPLE))); + int16x8_t row7 = + vreinterpretq_s16_u16(vsubl_u8(samp_row7, vdup_n_u8(CENTERJSAMPLE))); + + vst1q_s16(workspace + 0 * DCTSIZE, row0); + vst1q_s16(workspace + 1 * DCTSIZE, row1); + vst1q_s16(workspace + 2 * DCTSIZE, row2); + vst1q_s16(workspace + 3 * DCTSIZE, row3); + vst1q_s16(workspace + 4 * DCTSIZE, row4); + vst1q_s16(workspace + 5 * DCTSIZE, row5); + vst1q_s16(workspace + 6 * DCTSIZE, row6); + vst1q_s16(workspace + 7 * DCTSIZE, row7); +} + + +/* After the DCT, the resulting array of coefficient values needs to be divided + * by an array of quantization values. + * + * To avoid a slow division operation, the DCT coefficients are multiplied by + * the (scaled) reciprocals of the quantization values and then right-shifted. + * + * The equivalent scalar C function quantize() can be found in jcdctmgr.c. + */ + +void jsimd_quantize_neon(JCOEFPTR coef_block, DCTELEM *divisors, + DCTELEM *workspace) +{ + JCOEFPTR out_ptr = coef_block; + UDCTELEM *recip_ptr = (UDCTELEM *)divisors; + UDCTELEM *corr_ptr = (UDCTELEM *)divisors + DCTSIZE2; + DCTELEM *shift_ptr = divisors + 3 * DCTSIZE2; + int i; + + for (i = 0; i < DCTSIZE; i += DCTSIZE / 2) { + /* Load DCT coefficients. */ + int16x8_t row0 = vld1q_s16(workspace + (i + 0) * DCTSIZE); + int16x8_t row1 = vld1q_s16(workspace + (i + 1) * DCTSIZE); + int16x8_t row2 = vld1q_s16(workspace + (i + 2) * DCTSIZE); + int16x8_t row3 = vld1q_s16(workspace + (i + 3) * DCTSIZE); + /* Load reciprocals of quantization values. */ + uint16x8_t recip0 = vld1q_u16(recip_ptr + (i + 0) * DCTSIZE); + uint16x8_t recip1 = vld1q_u16(recip_ptr + (i + 1) * DCTSIZE); + uint16x8_t recip2 = vld1q_u16(recip_ptr + (i + 2) * DCTSIZE); + uint16x8_t recip3 = vld1q_u16(recip_ptr + (i + 3) * DCTSIZE); + uint16x8_t corr0 = vld1q_u16(corr_ptr + (i + 0) * DCTSIZE); + uint16x8_t corr1 = vld1q_u16(corr_ptr + (i + 1) * DCTSIZE); + uint16x8_t corr2 = vld1q_u16(corr_ptr + (i + 2) * DCTSIZE); + uint16x8_t corr3 = vld1q_u16(corr_ptr + (i + 3) * DCTSIZE); + int16x8_t shift0 = vld1q_s16(shift_ptr + (i + 0) * DCTSIZE); + int16x8_t shift1 = vld1q_s16(shift_ptr + (i + 1) * DCTSIZE); + int16x8_t shift2 = vld1q_s16(shift_ptr + (i + 2) * DCTSIZE); + int16x8_t shift3 = vld1q_s16(shift_ptr + (i + 3) * DCTSIZE); + + /* Extract sign from coefficients. */ + int16x8_t sign_row0 = vshrq_n_s16(row0, 15); + int16x8_t sign_row1 = vshrq_n_s16(row1, 15); + int16x8_t sign_row2 = vshrq_n_s16(row2, 15); + int16x8_t sign_row3 = vshrq_n_s16(row3, 15); + /* Get absolute value of DCT coefficients. */ + uint16x8_t abs_row0 = vreinterpretq_u16_s16(vabsq_s16(row0)); + uint16x8_t abs_row1 = vreinterpretq_u16_s16(vabsq_s16(row1)); + uint16x8_t abs_row2 = vreinterpretq_u16_s16(vabsq_s16(row2)); + uint16x8_t abs_row3 = vreinterpretq_u16_s16(vabsq_s16(row3)); + /* Add correction. */ + abs_row0 = vaddq_u16(abs_row0, corr0); + abs_row1 = vaddq_u16(abs_row1, corr1); + abs_row2 = vaddq_u16(abs_row2, corr2); + abs_row3 = vaddq_u16(abs_row3, corr3); + + /* Multiply DCT coefficients by quantization reciprocals. */ + int32x4_t row0_l = vreinterpretq_s32_u32(vmull_u16(vget_low_u16(abs_row0), + vget_low_u16(recip0))); + int32x4_t row0_h = vreinterpretq_s32_u32(vmull_u16(vget_high_u16(abs_row0), + vget_high_u16(recip0))); + int32x4_t row1_l = vreinterpretq_s32_u32(vmull_u16(vget_low_u16(abs_row1), + vget_low_u16(recip1))); + int32x4_t row1_h = vreinterpretq_s32_u32(vmull_u16(vget_high_u16(abs_row1), + vget_high_u16(recip1))); + int32x4_t row2_l = vreinterpretq_s32_u32(vmull_u16(vget_low_u16(abs_row2), + vget_low_u16(recip2))); + int32x4_t row2_h = vreinterpretq_s32_u32(vmull_u16(vget_high_u16(abs_row2), + vget_high_u16(recip2))); + int32x4_t row3_l = vreinterpretq_s32_u32(vmull_u16(vget_low_u16(abs_row3), + vget_low_u16(recip3))); + int32x4_t row3_h = vreinterpretq_s32_u32(vmull_u16(vget_high_u16(abs_row3), + vget_high_u16(recip3))); + /* Narrow back to 16-bit. */ + row0 = vcombine_s16(vshrn_n_s32(row0_l, 16), vshrn_n_s32(row0_h, 16)); + row1 = vcombine_s16(vshrn_n_s32(row1_l, 16), vshrn_n_s32(row1_h, 16)); + row2 = vcombine_s16(vshrn_n_s32(row2_l, 16), vshrn_n_s32(row2_h, 16)); + row3 = vcombine_s16(vshrn_n_s32(row3_l, 16), vshrn_n_s32(row3_h, 16)); + + /* Since VSHR only supports an immediate as its second argument, negate the + * shift value and shift left. + */ + row0 = vreinterpretq_s16_u16(vshlq_u16(vreinterpretq_u16_s16(row0), + vnegq_s16(shift0))); + row1 = vreinterpretq_s16_u16(vshlq_u16(vreinterpretq_u16_s16(row1), + vnegq_s16(shift1))); + row2 = vreinterpretq_s16_u16(vshlq_u16(vreinterpretq_u16_s16(row2), + vnegq_s16(shift2))); + row3 = vreinterpretq_s16_u16(vshlq_u16(vreinterpretq_u16_s16(row3), + vnegq_s16(shift3))); + + /* Restore sign to original product. */ + row0 = veorq_s16(row0, sign_row0); + row0 = vsubq_s16(row0, sign_row0); + row1 = veorq_s16(row1, sign_row1); + row1 = vsubq_s16(row1, sign_row1); + row2 = veorq_s16(row2, sign_row2); + row2 = vsubq_s16(row2, sign_row2); + row3 = veorq_s16(row3, sign_row3); + row3 = vsubq_s16(row3, sign_row3); + + /* Store quantized coefficients to memory. */ + vst1q_s16(out_ptr + (i + 0) * DCTSIZE, row0); + vst1q_s16(out_ptr + (i + 1) * DCTSIZE, row1); + vst1q_s16(out_ptr + (i + 2) * DCTSIZE, row2); + vst1q_s16(out_ptr + (i + 3) * DCTSIZE, row3); + } +} diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/neon-compat.h b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/neon-compat.h --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/neon-compat.h 1970-01-01 01:00:00.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/neon-compat.h 2021-11-20 03:41:33.401600402 +0000 @@ -0,0 +1,39 @@ +/* + * Copyright (C) 2020, D. R. Commander. All Rights Reserved. + * Copyright (C) 2020-2021, Arm Limited. All Rights Reserved. + * + * This software is provided 'as-is', without any express or implied + * warranty. In no event will the authors be held liable for any damages + * arising from the use of this software. + * + * Permission is granted to anyone to use this software for any purpose, + * including commercial applications, and to alter it and redistribute it + * freely, subject to the following restrictions: + * + * 1. The origin of this software must not be misrepresented; you must not + * claim that you wrote the original software. If you use this software + * in a product, an acknowledgment in the product documentation would be + * appreciated but is not required. + * 2. Altered source versions must be plainly marked as such, and must not be + * misrepresented as being the original software. + * 3. This notice may not be removed or altered from any source distribution. + */ + +#if defined(__clang__) || defined(_MSC_VER) +#define HAVE_VLD1_S16_X3 +#define HAVE_VLD1_U16_X2 +#define HAVE_VLD1Q_U8_X4 +#endif + +/* Define compiler-independent count-leading-zeros and byte-swap macros */ +#if defined(_MSC_VER) && !defined(__clang__) +#define BUILTIN_CLZ(x) _CountLeadingZeros(x) +#define BUILTIN_CLZLL(x) _CountLeadingZeros64(x) +#define BUILTIN_BSWAP64(x) _byteswap_uint64(x) +#elif defined(__clang__) || defined(__GNUC__) +#define BUILTIN_CLZ(x) __builtin_clz(x) +#define BUILTIN_CLZLL(x) __builtin_clzll(x) +#define BUILTIN_BSWAP64(x) __builtin_bswap64(x) +#else +#error "Unknown compiler" +#endif diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/neon-compat.h.in b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/neon-compat.h.in --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/neon-compat.h.in 1970-01-01 01:00:00.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/arm/neon-compat.h.in 2021-11-20 03:41:33.401600402 +0000 @@ -0,0 +1,37 @@ +/* + * Copyright (C) 2020, D. R. Commander. All Rights Reserved. + * Copyright (C) 2020-2021, Arm Limited. All Rights Reserved. + * + * This software is provided 'as-is', without any express or implied + * warranty. In no event will the authors be held liable for any damages + * arising from the use of this software. + * + * Permission is granted to anyone to use this software for any purpose, + * including commercial applications, and to alter it and redistribute it + * freely, subject to the following restrictions: + * + * 1. The origin of this software must not be misrepresented; you must not + * claim that you wrote the original software. If you use this software + * in a product, an acknowledgment in the product documentation would be + * appreciated but is not required. + * 2. Altered source versions must be plainly marked as such, and must not be + * misrepresented as being the original software. + * 3. This notice may not be removed or altered from any source distribution. + */ + +#cmakedefine HAVE_VLD1_S16_X3 +#cmakedefine HAVE_VLD1_U16_X2 +#cmakedefine HAVE_VLD1Q_U8_X4 + +/* Define compiler-independent count-leading-zeros and byte-swap macros */ +#if defined(_MSC_VER) && !defined(__clang__) +#define BUILTIN_CLZ(x) _CountLeadingZeros(x) +#define BUILTIN_CLZLL(x) _CountLeadingZeros64(x) +#define BUILTIN_BSWAP64(x) _byteswap_uint64(x) +#elif defined(__clang__) || defined(__GNUC__) +#define BUILTIN_CLZ(x) __builtin_clz(x) +#define BUILTIN_CLZLL(x) __builtin_clzll(x) +#define BUILTIN_BSWAP64(x) __builtin_bswap64(x) +#else +#error "Unknown compiler" +#endif diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/CMakeLists.txt b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/CMakeLists.txt --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/CMakeLists.txt 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/CMakeLists.txt 2021-11-20 03:41:33.397600466 +0000 @@ -30,6 +30,9 @@ if(CYGWIN) set(CMAKE_ASM_NASM_OBJECT_FORMAT win64) endif() + if(CMAKE_C_COMPILER_ABI MATCHES "ELF X32") + set(CMAKE_ASM_NASM_OBJECT_FORMAT elfx32) + endif() elseif(CPU_TYPE STREQUAL "i386") if(BORLAND) set(CMAKE_ASM_NASM_OBJECT_FORMAT obj) @@ -205,64 +208,76 @@ ############################################################################### -# ARM (GAS) +# Arm (Intrinsics or GAS) ############################################################################### elseif(CPU_TYPE STREQUAL "arm64" OR CPU_TYPE STREQUAL "arm") -enable_language(ASM) - -set(CMAKE_ASM_FLAGS "${CMAKE_C_FLAGS} ${CMAKE_ASM_FLAGS}") - -string(TOUPPER ${CMAKE_BUILD_TYPE} CMAKE_BUILD_TYPE_UC) -set(EFFECTIVE_ASM_FLAGS "${CMAKE_ASM_FLAGS} ${CMAKE_ASM_FLAGS_${CMAKE_BUILD_TYPE_UC}}") -message(STATUS "CMAKE_ASM_FLAGS = ${EFFECTIVE_ASM_FLAGS}") - -# Test whether we need gas-preprocessor.pl -if(CPU_TYPE STREQUAL "arm") - file(WRITE ${CMAKE_CURRENT_BINARY_DIR}/gastest.S " - .text - .fpu neon - .arch armv7a - .object_arch armv4 - .arm - pld [r0] - vmovn.u16 d0, q0") +include(CheckSymbolExists) +if(BITS EQUAL 32) + set(CMAKE_REQUIRED_FLAGS -mfpu=neon) +endif() +check_symbol_exists(vld1_s16_x3 arm_neon.h HAVE_VLD1_S16_X3) +check_symbol_exists(vld1_u16_x2 arm_neon.h HAVE_VLD1_U16_X2) +check_symbol_exists(vld1q_u8_x4 arm_neon.h HAVE_VLD1Q_U8_X4) +if(BITS EQUAL 32) + unset(CMAKE_REQUIRED_FLAGS) +endif() +configure_file(arm/neon-compat.h.in arm/neon-compat.h @ONLY) +include_directories(${CMAKE_CURRENT_BINARY_DIR}/arm) + +# GCC (as of this writing) and some older versions of Clang do not have a full +# or optimal set of Neon intrinsics, so for performance reasons, when using +# those compilers, we default to using the older GAS implementation of the Neon +# SIMD extensions for certain algorithms. The presence or absence of the three +# intrinsics we tested above is a reasonable proxy for this. We always default +# to using the full Neon intrinsics implementation when building for macOS or +# iOS, to avoid the need for gas-preprocessor. +if((HAVE_VLD1_S16_X3 AND HAVE_VLD1_U16_X2 AND HAVE_VLD1Q_U8_X4) OR APPLE) + set(DEFAULT_NEON_INTRINSICS 1) else() - file(WRITE ${CMAKE_CURRENT_BINARY_DIR}/gastest.S " - .text - MYVAR .req x0 - movi v0.16b, #100 - mov MYVAR, #100 - .unreq MYVAR") -endif() - -separate_arguments(CMAKE_ASM_FLAGS_SEP UNIX_COMMAND "${CMAKE_ASM_FLAGS}") - -execute_process(COMMAND ${CMAKE_ASM_COMPILER} ${CMAKE_ASM_FLAGS_SEP} - -x assembler-with-cpp -c ${CMAKE_CURRENT_BINARY_DIR}/gastest.S - RESULT_VARIABLE RESULT OUTPUT_VARIABLE OUTPUT ERROR_VARIABLE ERROR) -if(NOT RESULT EQUAL 0) - message(STATUS "GAS appears to be broken. Trying gas-preprocessor.pl ...") - execute_process(COMMAND gas-preprocessor.pl ${CMAKE_ASM_COMPILER} - ${CMAKE_ASM_FLAGS_SEP} -x assembler-with-cpp -c - ${CMAKE_CURRENT_BINARY_DIR}/gastest.S - RESULT_VARIABLE RESULT OUTPUT_VARIABLE OUTPUT ERROR_VARIABLE ERROR) - if(NOT RESULT EQUAL 0) - simd_fail("SIMD extensions disabled: GAS is not working properly") - return() - else() - message(STATUS "Using gas-preprocessor.pl") - configure_file(gas-preprocessor.in gas-preprocessor @ONLY) - set(CMAKE_ASM_COMPILER ${CMAKE_CURRENT_BINARY_DIR}/gas-preprocessor) - endif() + set(DEFAULT_NEON_INTRINSICS 0) +endif() +option(NEON_INTRINSICS + "Because GCC (as of this writing) and some older versions of Clang do not have a full or optimal set of Neon intrinsics, for performance reasons, the default when building libjpeg-turbo with those compilers is to continue using the older GAS implementation of the Neon SIMD extensions for certain algorithms. Setting this option forces the full Neon intrinsics implementation to be used with all compilers. Unsetting this option forces the hybrid GAS/intrinsics implementation to be used with all compilers." + ${DEFAULT_NEON_INTRINSICS}) +boolean_number(NEON_INTRINSICS PARENT_SCOPE) +if(NEON_INTRINSICS) + add_definitions(-DNEON_INTRINSICS) + message(STATUS "Use full Neon SIMD intrinsics implementation (NEON_INTRINSICS = ${NEON_INTRINSICS})") else() - message(STATUS "GAS is working properly") + message(STATUS "Use partial Neon SIMD intrinsics implementation (NEON_INTRINSICS = ${NEON_INTRINSICS})") endif() -file(REMOVE ${CMAKE_CURRENT_BINARY_DIR}/gastest.S) +set(SIMD_SOURCES arm/jcgray-neon.c arm/jcphuff-neon.c arm/jcsample-neon.c + arm/jdmerge-neon.c arm/jdsample-neon.c arm/jfdctfst-neon.c + arm/jidctred-neon.c arm/jquanti-neon.c) +if(NEON_INTRINSICS) + set(SIMD_SOURCES ${SIMD_SOURCES} arm/jccolor-neon.c arm/jidctint-neon.c) +endif() +if(NEON_INTRINSICS OR BITS EQUAL 64) + set(SIMD_SOURCES ${SIMD_SOURCES} arm/jidctfst-neon.c) +endif() +if(NEON_INTRINSICS OR BITS EQUAL 32) + set(SIMD_SOURCES ${SIMD_SOURCES} arm/aarch${BITS}/jchuff-neon.c + arm/jdcolor-neon.c arm/jfdctint-neon.c) +endif() +if(BITS EQUAL 32) + set_source_files_properties(${SIMD_SOURCES} COMPILE_FLAGS -mfpu=neon) +endif() +if(NOT NEON_INTRINSICS) + enable_language(ASM) -add_library(simd OBJECT ${CPU_TYPE}/jsimd_neon.S ${CPU_TYPE}/jsimd.c) + set(CMAKE_ASM_FLAGS "${CMAKE_C_FLAGS} ${CMAKE_ASM_FLAGS}") + + string(TOUPPER ${CMAKE_BUILD_TYPE} CMAKE_BUILD_TYPE_UC) + set(EFFECTIVE_ASM_FLAGS "${CMAKE_ASM_FLAGS} ${CMAKE_ASM_FLAGS_${CMAKE_BUILD_TYPE_UC}}") + message(STATUS "CMAKE_ASM_FLAGS = ${EFFECTIVE_ASM_FLAGS}") + + set(SIMD_SOURCES ${SIMD_SOURCES} arm/aarch${BITS}/jsimd_neon.S) +endif() + +add_library(simd OBJECT ${SIMD_SOURCES} arm/aarch${BITS}/jsimd.c) if(CMAKE_POSITION_INDEPENDENT_CODE OR ENABLE_SHARED) set_target_properties(simd PROPERTIES POSITION_INDEPENDENT_CODE 1) @@ -311,14 +326,35 @@ endif() ############################################################################### -# Loongson (Intrinsics) +# MIPS64 (Intrinsics) ############################################################################### -elseif(CPU_TYPE STREQUAL "loongson") +elseif(CPU_TYPE STREQUAL "loongson" OR CPU_TYPE MATCHES "mips64*") -set(SIMD_SOURCES loongson/jccolor-mmi.c loongson/jcsample-mmi.c - loongson/jdcolor-mmi.c loongson/jdsample-mmi.c loongson/jfdctint-mmi.c - loongson/jidctint-mmi.c loongson/jquanti-mmi.c) +set(CMAKE_REQUIRED_FLAGS -Wa,-mloongson-mmi,-mloongson-ext) + +check_c_source_compiles(" + int main(void) { + int c = 0, a = 0, b = 0; + asm ( + \"paddb %0, %1, %2\" + : \"=f\" (c) + : \"f\" (a), \"f\" (b) + ); + return c; + }" HAVE_MMI) + +unset(CMAKE_REQUIRED_FLAGS) + +if(NOT HAVE_MMI) + simd_fail("SIMD extensions not available for this CPU") + return() +endif() + +set(SIMD_SOURCES mips64/jccolor-mmi.c mips64/jcgray-mmi.c mips64/jcsample-mmi.c + mips64/jdcolor-mmi.c mips64/jdmerge-mmi.c mips64/jdsample-mmi.c + mips64/jfdctfst-mmi.c mips64/jfdctint-mmi.c mips64/jidctfst-mmi.c + mips64/jidctint-mmi.c mips64/jquanti-mmi.c) if(CMAKE_COMPILER_IS_GNUCC) foreach(file ${SIMD_SOURCES}) @@ -326,8 +362,12 @@ " -fno-strict-aliasing") endforeach() endif() +foreach(file ${SIMD_SOURCES}) + set_property(SOURCE ${file} APPEND_STRING PROPERTY COMPILE_FLAGS + " -Wa,-mloongson-mmi,-mloongson-ext") +endforeach() -add_library(simd OBJECT ${SIMD_SOURCES} loongson/jsimd.c) +add_library(simd OBJECT ${SIMD_SOURCES} mips64/jsimd.c) if(CMAKE_POSITION_INDEPENDENT_CODE OR ENABLE_SHARED) set_target_properties(simd PROPERTIES POSITION_INDEPENDENT_CODE 1) diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/i386/jchuff-sse2.asm b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/i386/jchuff-sse2.asm --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/i386/jchuff-sse2.asm 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/i386/jchuff-sse2.asm 2021-11-20 03:41:33.401600402 +0000 @@ -1,8 +1,9 @@ ; ; jchuff-sse2.asm - Huffman entropy encoding (SSE2) ; -; Copyright (C) 2009-2011, 2014-2017, D. R. Commander. +; Copyright (C) 2009-2011, 2014-2017, 2019, D. R. Commander. ; Copyright (C) 2015, Matthieu Darbois. +; Copyright (C) 2018, Matthias Räncker. ; ; Based on the x86 SIMD extension for IJG JPEG library ; Copyright (C) 1999-2006, MIYASAKA Masaru. @@ -15,133 +16,255 @@ ; http://sourceforge.net/project/showfiles.php?group_id=6208 ; ; This file contains an SSE2 implementation for Huffman coding of one block. -; The following code is based directly on jchuff.c; see jchuff.c for more -; details. +; The following code is based on jchuff.c; see jchuff.c for more details. %include "jsimdext.inc" +struc working_state +.next_output_byte: resp 1 ; => next byte to write in buffer +.free_in_buffer: resp 1 ; # of byte spaces remaining in buffer +.cur.put_buffer.simd resq 1 ; current bit accumulation buffer +.cur.free_bits resd 1 ; # of bits available in it +.cur.last_dc_val resd 4 ; last DC coef for each component +.cinfo: resp 1 ; dump_buffer needs access to this +endstruc + +struc c_derived_tbl +.ehufco: resd 256 ; code for each symbol +.ehufsi: resb 256 ; length of code for each symbol +; If no code has been allocated for a symbol S, ehufsi[S] contains 0 +endstruc + ; -------------------------------------------------------------------------- SECTION SEG_CONST - alignz 32 GLOBAL_DATA(jconst_huff_encode_one_block) - EXTERN EXTN(jpeg_nbits_table) EXTN(jconst_huff_encode_one_block): alignz 32 +jpeg_mask_bits dq 0x0000, 0x0001, 0x0003, 0x0007 + dq 0x000f, 0x001f, 0x003f, 0x007f + dq 0x00ff, 0x01ff, 0x03ff, 0x07ff + dq 0x0fff, 0x1fff, 0x3fff, 0x7fff + +times 1 << 14 db 15 +times 1 << 13 db 14 +times 1 << 12 db 13 +times 1 << 11 db 12 +times 1 << 10 db 11 +times 1 << 9 db 10 +times 1 << 8 db 9 +times 1 << 7 db 8 +times 1 << 6 db 7 +times 1 << 5 db 6 +times 1 << 4 db 5 +times 1 << 3 db 4 +times 1 << 2 db 3 +times 1 << 1 db 2 +times 1 << 0 db 1 +times 1 db 0 +jpeg_nbits_table: +times 1 db 0 +times 1 << 0 db 1 +times 1 << 1 db 2 +times 1 << 2 db 3 +times 1 << 3 db 4 +times 1 << 4 db 5 +times 1 << 5 db 6 +times 1 << 6 db 7 +times 1 << 7 db 8 +times 1 << 8 db 9 +times 1 << 9 db 10 +times 1 << 10 db 11 +times 1 << 11 db 12 +times 1 << 12 db 13 +times 1 << 13 db 14 +times 1 << 14 db 15 + + alignz 32 + +%ifdef PIC +%define NBITS(x) nbits_base + x +%else +%define NBITS(x) jpeg_nbits_table + x +%endif +%define MASK_BITS(x) NBITS((x) * 8) + (jpeg_mask_bits - jpeg_nbits_table) + ; -------------------------------------------------------------------------- SECTION SEG_TEXT BITS 32 -; These macros perform the same task as the emit_bits() function in the -; original libjpeg code. In addition to reducing overhead by explicitly -; inlining the code, additional performance is achieved by taking into -; account the size of the bit buffer and waiting until it is almost full -; before emptying it. This mostly benefits 64-bit platforms, since 6 -; bytes can be stored in a 64-bit bit buffer before it has to be emptied. - -%macro EMIT_BYTE 0 - sub put_bits, 8 ; put_bits -= 8; - mov edx, put_buffer - mov ecx, put_bits - shr edx, cl ; c = (JOCTET)GETJOCTET(put_buffer >> put_bits); - mov byte [eax], dl ; *buffer++ = c; - add eax, 1 - cmp dl, 0xFF ; need to stuff a zero byte? - jne %%.EMIT_BYTE_END - mov byte [eax], 0 ; *buffer++ = 0; - add eax, 1 -%%.EMIT_BYTE_END: -%endmacro +%define mm_put_buffer mm0 +%define mm_all_0xff mm1 +%define mm_temp mm2 +%define mm_nbits mm3 +%define mm_code_bits mm3 +%define mm_code mm4 +%define mm_overflow_bits mm5 +%define mm_save_nbits mm6 + +; Shorthand used to describe SIMD operations: +; wN: xmmN treated as eight signed 16-bit values +; wN[i]: perform the same operation on all eight signed 16-bit values, i=0..7 +; bN: xmmN treated as 16 unsigned 8-bit values, or +; mmN treated as eight unsigned 8-bit values +; bN[i]: perform the same operation on all unsigned 8-bit values, +; i=0..15 (SSE register) or i=0..7 (MMX register) +; Contents of SIMD registers are shown in memory order. + +; Fill the bit buffer to capacity with the leading bits from code, then output +; the bit buffer and put the remaining bits from code into the bit buffer. +; +; Usage: +; code - contains the bits to shift into the bit buffer (LSB-aligned) +; %1 - temp register +; %2 - low byte of temp register +; %3 - second byte of temp register +; %4-%8 (optional) - extra instructions to execute before the macro completes +; %9 - the label to which to jump when the macro completes +; +; Upon completion, free_bits will be set to the number of remaining bits from +; code, and put_buffer will contain those remaining bits. temp and code will +; be clobbered. +; +; This macro encodes any 0xFF bytes as 0xFF 0x00, as does the EMIT_BYTE() +; macro in jchuff.c. -%macro PUT_BITS 1 - add put_bits, ecx ; put_bits += size; - shl put_buffer, cl ; put_buffer = (put_buffer << size); - or put_buffer, %1 +%macro EMIT_QWORD 9 +%define %%temp %1 +%define %%tempb %2 +%define %%temph %3 + add nbits, free_bits ; nbits += free_bits; + neg free_bits ; free_bits = -free_bits; + movq mm_temp, mm_code ; temp = code; + movd mm_nbits, nbits ; nbits --> MMX register + movd mm_overflow_bits, free_bits ; overflow_bits (temp register) = free_bits; + neg free_bits ; free_bits = -free_bits; + psllq mm_put_buffer, mm_nbits ; put_buffer <<= nbits; + psrlq mm_temp, mm_overflow_bits ; temp >>= overflow_bits; + add free_bits, 64 ; free_bits += 64; + por mm_temp, mm_put_buffer ; temp |= put_buffer; +%ifidn %%temp, nbits_base + movd mm_save_nbits, nbits_base ; save nbits_base +%endif + movq mm_code_bits, mm_temp ; code_bits (temp register) = temp; + movq mm_put_buffer, mm_code ; put_buffer = code; + pcmpeqb mm_temp, mm_all_0xff ; b_temp[i] = (b_temp[i] == 0xFF ? 0xFF : 0); + movq mm_code, mm_code_bits ; code = code_bits; + psrlq mm_code_bits, 32 ; code_bits >>= 32; + pmovmskb nbits, mm_temp ; nbits = 0; nbits |= ((b_temp[i] >> 7) << i); + movd %%temp, mm_code_bits ; temp = code_bits; + bswap %%temp ; temp = htonl(temp); + test nbits, nbits ; if (nbits != 0) /* Some 0xFF bytes */ + jnz %%.SLOW ; goto %%.SLOW + mov dword [buffer], %%temp ; *(uint32_t)buffer = temp; +%ifidn %%temp, nbits_base + movd nbits_base, mm_save_nbits ; restore nbits_base +%endif + %4 + movd nbits, mm_code ; nbits = (uint32_t)(code); + %5 + bswap nbits ; nbits = htonl(nbits); + mov dword [buffer + 4], nbits ; *(uint32_t)(buffer + 4) = nbits; + lea buffer, [buffer + 8] ; buffer += 8; + %6 + %7 + %8 + jmp %9 ; return +%%.SLOW: + ; Execute the equivalent of the EMIT_BYTE() macro in jchuff.c for all 8 + ; bytes in the qword. + mov byte [buffer], %%tempb ; buffer[0] = temp[0]; + cmp %%tempb, 0xFF ; Set CF if temp[0] < 0xFF + mov byte [buffer+1], 0 ; buffer[1] = 0; + sbb buffer, -2 ; buffer -= (-2 + (temp[0] < 0xFF ? 1 : 0)); + mov byte [buffer], %%temph ; buffer[0] = temp[1]; + cmp %%temph, 0xFF ; Set CF if temp[1] < 0xFF + mov byte [buffer+1], 0 ; buffer[1] = 0; + sbb buffer, -2 ; buffer -= (-2 + (temp[1] < 0xFF ? 1 : 0)); + shr %%temp, 16 ; temp >>= 16; + mov byte [buffer], %%tempb ; buffer[0] = temp[0]; + cmp %%tempb, 0xFF ; Set CF if temp[0] < 0xFF + mov byte [buffer+1], 0 ; buffer[1] = 0; + sbb buffer, -2 ; buffer -= (-2 + (temp[0] < 0xFF ? 1 : 0)); + mov byte [buffer], %%temph ; buffer[0] = temp[1]; + cmp %%temph, 0xFF ; Set CF if temp[1] < 0xFF + mov byte [buffer+1], 0 ; buffer[1] = 0; + sbb buffer, -2 ; buffer -= (-2 + (temp[1] < 0xFF ? 1 : 0)); + movd nbits, mm_code ; nbits (temp register) = (uint32_t)(code) +%ifidn %%temp, nbits_base + movd nbits_base, mm_save_nbits ; restore nbits_base +%endif + bswap nbits ; nbits = htonl(nbits) + mov byte [buffer], nbitsb ; buffer[0] = nbits[0]; + cmp nbitsb, 0xFF ; Set CF if nbits[0] < 0xFF + mov byte [buffer+1], 0 ; buffer[1] = 0; + sbb buffer, -2 ; buffer -= (-2 + (nbits[0] < 0xFF ? 1 : 0)); + mov byte [buffer], nbitsh ; buffer[0] = nbits[1]; + cmp nbitsh, 0xFF ; Set CF if nbits[1] < 0xFF + mov byte [buffer+1], 0 ; buffer[1] = 0; + sbb buffer, -2 ; buffer -= (-2 + (nbits[1] < 0xFF ? 1 : 0)); + shr nbits, 16 ; nbits >>= 16; + mov byte [buffer], nbitsb ; buffer[0] = nbits[0]; + cmp nbitsb, 0xFF ; Set CF if nbits[0] < 0xFF + mov byte [buffer+1], 0 ; buffer[1] = 0; + sbb buffer, -2 ; buffer -= (-2 + (nbits[0] < 0xFF ? 1 : 0)); + mov byte [buffer], nbitsh ; buffer[0] = nbits[1]; + %4 + cmp nbitsh, 0xFF ; Set CF if nbits[1] < 0xFF + mov byte [buffer+1], 0 ; buffer[1] = 0; + sbb buffer, -2 ; buffer -= (-2 + (nbits[1] < 0xFF ? 1 : 0)); + %5 + %6 + %7 + %8 + jmp %9 ; return; %endmacro -%macro CHECKBUF15 0 - cmp put_bits, 16 ; if (put_bits > 31) { - jl %%.CHECKBUF15_END - mov eax, POINTER [esp+buffer] - EMIT_BYTE - EMIT_BYTE - mov POINTER [esp+buffer], eax -%%.CHECKBUF15_END: +%macro PUSH 1 + push %1 +%assign stack_offset stack_offset + 4 %endmacro -%macro EMIT_BITS 1 - PUT_BITS %1 - CHECKBUF15 +%macro POP 1 + pop %1 +%assign stack_offset stack_offset - 4 %endmacro -%macro kloop_prepare 37 ;(ko, jno0, ..., jno31, xmm0, xmm1, xmm2, xmm3) - pxor xmm4, xmm4 ; __m128i neg = _mm_setzero_si128(); - pxor xmm5, xmm5 ; __m128i neg = _mm_setzero_si128(); - pxor xmm6, xmm6 ; __m128i neg = _mm_setzero_si128(); - pxor xmm7, xmm7 ; __m128i neg = _mm_setzero_si128(); - pinsrw %34, word [esi + %2 * SIZEOF_WORD], 0 ; xmm_shadow[0] = block[jno0]; - pinsrw %35, word [esi + %10 * SIZEOF_WORD], 0 ; xmm_shadow[8] = block[jno8]; - pinsrw %36, word [esi + %18 * SIZEOF_WORD], 0 ; xmm_shadow[16] = block[jno16]; - pinsrw %37, word [esi + %26 * SIZEOF_WORD], 0 ; xmm_shadow[24] = block[jno24]; - pinsrw %34, word [esi + %3 * SIZEOF_WORD], 1 ; xmm_shadow[1] = block[jno1]; - pinsrw %35, word [esi + %11 * SIZEOF_WORD], 1 ; xmm_shadow[9] = block[jno9]; - pinsrw %36, word [esi + %19 * SIZEOF_WORD], 1 ; xmm_shadow[17] = block[jno17]; - pinsrw %37, word [esi + %27 * SIZEOF_WORD], 1 ; xmm_shadow[25] = block[jno25]; - pinsrw %34, word [esi + %4 * SIZEOF_WORD], 2 ; xmm_shadow[2] = block[jno2]; - pinsrw %35, word [esi + %12 * SIZEOF_WORD], 2 ; xmm_shadow[10] = block[jno10]; - pinsrw %36, word [esi + %20 * SIZEOF_WORD], 2 ; xmm_shadow[18] = block[jno18]; - pinsrw %37, word [esi + %28 * SIZEOF_WORD], 2 ; xmm_shadow[26] = block[jno26]; - pinsrw %34, word [esi + %5 * SIZEOF_WORD], 3 ; xmm_shadow[3] = block[jno3]; - pinsrw %35, word [esi + %13 * SIZEOF_WORD], 3 ; xmm_shadow[11] = block[jno11]; - pinsrw %36, word [esi + %21 * SIZEOF_WORD], 3 ; xmm_shadow[19] = block[jno19]; - pinsrw %37, word [esi + %29 * SIZEOF_WORD], 3 ; xmm_shadow[27] = block[jno27]; - pinsrw %34, word [esi + %6 * SIZEOF_WORD], 4 ; xmm_shadow[4] = block[jno4]; - pinsrw %35, word [esi + %14 * SIZEOF_WORD], 4 ; xmm_shadow[12] = block[jno12]; - pinsrw %36, word [esi + %22 * SIZEOF_WORD], 4 ; xmm_shadow[20] = block[jno20]; - pinsrw %37, word [esi + %30 * SIZEOF_WORD], 4 ; xmm_shadow[28] = block[jno28]; - pinsrw %34, word [esi + %7 * SIZEOF_WORD], 5 ; xmm_shadow[5] = block[jno5]; - pinsrw %35, word [esi + %15 * SIZEOF_WORD], 5 ; xmm_shadow[13] = block[jno13]; - pinsrw %36, word [esi + %23 * SIZEOF_WORD], 5 ; xmm_shadow[21] = block[jno21]; - pinsrw %37, word [esi + %31 * SIZEOF_WORD], 5 ; xmm_shadow[29] = block[jno29]; - pinsrw %34, word [esi + %8 * SIZEOF_WORD], 6 ; xmm_shadow[6] = block[jno6]; - pinsrw %35, word [esi + %16 * SIZEOF_WORD], 6 ; xmm_shadow[14] = block[jno14]; - pinsrw %36, word [esi + %24 * SIZEOF_WORD], 6 ; xmm_shadow[22] = block[jno22]; - pinsrw %37, word [esi + %32 * SIZEOF_WORD], 6 ; xmm_shadow[30] = block[jno30]; - pinsrw %34, word [esi + %9 * SIZEOF_WORD], 7 ; xmm_shadow[7] = block[jno7]; - pinsrw %35, word [esi + %17 * SIZEOF_WORD], 7 ; xmm_shadow[15] = block[jno15]; - pinsrw %36, word [esi + %25 * SIZEOF_WORD], 7 ; xmm_shadow[23] = block[jno23]; -%if %1 != 32 - pinsrw %37, word [esi + %33 * SIZEOF_WORD], 7 ; xmm_shadow[31] = block[jno31]; +; If PIC is defined, load the address of a symbol defined in this file into a +; register. Equivalent to +; get_GOT %1 +; lea %1, [GOTOFF(%1, %2)] +; without using the GOT. +; +; Usage: +; %1 - register into which to load the address of the symbol +; %2 - symbol whose address should be loaded +; %3 - optional multi-line macro to execute before the symbol address is loaded +; %4 - optional multi-line macro to execute after the symbol address is loaded +; +; If PIC is not defined, then %3 and %4 are executed in order. + +%macro GET_SYM 2-4 +%ifdef PIC + call %%.geteip +%%.ref: + %4 + add %1, %2 - %%.ref + jmp short %%.done + align 32 +%%.geteip: + %3 4 ; must adjust stack pointer because of call + mov %1, POINTER [esp] + ret + align 32 +%%.done: %else - pinsrw %37, ecx, 7 ; xmm_shadow[31] = block[jno31]; + %3 0 + %4 %endif - pcmpgtw xmm4, %34 ; neg = _mm_cmpgt_epi16(neg, x1); - pcmpgtw xmm5, %35 ; neg = _mm_cmpgt_epi16(neg, x1); - pcmpgtw xmm6, %36 ; neg = _mm_cmpgt_epi16(neg, x1); - pcmpgtw xmm7, %37 ; neg = _mm_cmpgt_epi16(neg, x1); - paddw %34, xmm4 ; x1 = _mm_add_epi16(x1, neg); - paddw %35, xmm5 ; x1 = _mm_add_epi16(x1, neg); - paddw %36, xmm6 ; x1 = _mm_add_epi16(x1, neg); - paddw %37, xmm7 ; x1 = _mm_add_epi16(x1, neg); - pxor %34, xmm4 ; x1 = _mm_xor_si128(x1, neg); - pxor %35, xmm5 ; x1 = _mm_xor_si128(x1, neg); - pxor %36, xmm6 ; x1 = _mm_xor_si128(x1, neg); - pxor %37, xmm7 ; x1 = _mm_xor_si128(x1, neg); - pxor xmm4, %34 ; neg = _mm_xor_si128(neg, x1); - pxor xmm5, %35 ; neg = _mm_xor_si128(neg, x1); - pxor xmm6, %36 ; neg = _mm_xor_si128(neg, x1); - pxor xmm7, %37 ; neg = _mm_xor_si128(neg, x1); - movdqa XMMWORD [esp + t1 + %1 * SIZEOF_WORD], %34 ; _mm_storeu_si128((__m128i *)(t1 + ko), x1); - movdqa XMMWORD [esp + t1 + (%1 + 8) * SIZEOF_WORD], %35 ; _mm_storeu_si128((__m128i *)(t1 + ko + 8), x1); - movdqa XMMWORD [esp + t1 + (%1 + 16) * SIZEOF_WORD], %36 ; _mm_storeu_si128((__m128i *)(t1 + ko + 16), x1); - movdqa XMMWORD [esp + t1 + (%1 + 24) * SIZEOF_WORD], %37 ; _mm_storeu_si128((__m128i *)(t1 + ko + 24), x1); - movdqa XMMWORD [esp + t2 + %1 * SIZEOF_WORD], xmm4 ; _mm_storeu_si128((__m128i *)(t2 + ko), neg); - movdqa XMMWORD [esp + t2 + (%1 + 8) * SIZEOF_WORD], xmm5 ; _mm_storeu_si128((__m128i *)(t2 + ko + 8), neg); - movdqa XMMWORD [esp + t2 + (%1 + 16) * SIZEOF_WORD], xmm6 ; _mm_storeu_si128((__m128i *)(t2 + ko + 16), neg); - movdqa XMMWORD [esp + t2 + (%1 + 24) * SIZEOF_WORD], xmm7 ; _mm_storeu_si128((__m128i *)(t2 + ko + 24), neg); %endmacro ; @@ -152,272 +275,487 @@ ; JCOEFPTR block, int last_dc_val, ; c_derived_tbl *dctbl, c_derived_tbl *actbl) ; - -; eax + 8 = working_state *state -; eax + 12 = JOCTET *buffer -; eax + 16 = JCOEFPTR block -; eax + 20 = int last_dc_val -; eax + 24 = c_derived_tbl *dctbl -; eax + 28 = c_derived_tbl *actbl - -%define pad 6 * SIZEOF_DWORD ; Align to 16 bytes -%define t1 pad -%define t2 t1 + (DCTSIZE2 * SIZEOF_WORD) -%define block t2 + (DCTSIZE2 * SIZEOF_WORD) -%define actbl block + SIZEOF_DWORD -%define buffer actbl + SIZEOF_DWORD -%define temp buffer + SIZEOF_DWORD -%define temp2 temp + SIZEOF_DWORD -%define temp3 temp2 + SIZEOF_DWORD -%define temp4 temp3 + SIZEOF_DWORD -%define temp5 temp4 + SIZEOF_DWORD -%define gotptr temp5 + SIZEOF_DWORD ; void *gotptr -%define put_buffer ebx -%define put_bits edi +; Stack layout: +; Function args +; Return address +; Saved ebx +; Saved ebp +; Saved esi +; Saved edi <-- esp_save +; ... +; esp_save +; t_ 64*2 bytes (aligned to 128 bytes) +; +; esp is used (as t) to point into t_ (data in lower indices is not used once +; esp passes over them, so this is signal-safe.) Aligning to 128 bytes allows +; us to find the rest of the data again. +; +; NOTES: +; When shuffling data, we try to avoid pinsrw as much as possible, since it is +; slow on many CPUs. Its reciprocal throughput (issue latency) is 1 even on +; modern CPUs, so chains of pinsrw instructions (even with different outputs) +; can limit performance. pinsrw is a VectorPath instruction on AMD K8 and +; requires 2 µops (with memory operand) on Intel. In either case, only one +; pinsrw instruction can be decoded per cycle (and nothing else if they are +; back-to-back), so out-of-order execution cannot be used to work around long +; pinsrw chains (though for Sandy Bridge and later, this may be less of a +; problem if the code runs from the µop cache.) +; +; We use tzcnt instead of bsf without checking for support. The instruction is +; executed as bsf on CPUs that don't support tzcnt (encoding is equivalent to +; rep bsf.) The destination (first) operand of bsf (and tzcnt on some CPUs) is +; an input dependency (although the behavior is not formally defined, Intel +; CPUs usually leave the destination unmodified if the source is zero.) This +; can prevent out-of-order execution, so we clear the destination before +; invoking tzcnt. +; +; Initial register allocation +; eax - frame --> buffer +; ebx - nbits_base (PIC) / emit_temp +; ecx - dctbl --> size --> state +; edx - block --> nbits +; esi - code_temp --> state --> actbl +; edi - index_temp --> free_bits +; esp - t +; ebp - index + +%define frame eax +%ifdef PIC +%define nbits_base ebx +%endif +%define emit_temp ebx +%define emit_tempb bl +%define emit_temph bh +%define dctbl ecx +%define block edx +%define code_temp esi +%define index_temp edi +%define t esp +%define index ebp + +%assign save_frame DCTSIZE2 * SIZEOF_WORD + +; Step 1: Re-arrange input data according to jpeg_natural_order +; xx 01 02 03 04 05 06 07 xx 01 08 16 09 02 03 10 +; 08 09 10 11 12 13 14 15 17 24 32 25 18 11 04 05 +; 16 17 18 19 20 21 22 23 12 19 26 33 40 48 41 34 +; 24 25 26 27 28 29 30 31 ==> 27 20 13 06 07 14 21 28 +; 32 33 34 35 36 37 38 39 35 42 49 56 57 50 43 36 +; 40 41 42 43 44 45 46 47 29 22 15 23 30 37 44 51 +; 48 49 50 51 52 53 54 55 58 59 52 45 38 31 39 46 +; 56 57 58 59 60 61 62 63 53 60 61 54 47 55 62 63 align 32 GLOBAL_FUNCTION(jsimd_huff_encode_one_block_sse2) EXTN(jsimd_huff_encode_one_block_sse2): - push ebp - mov eax, esp ; eax = original ebp - sub esp, byte 4 - and esp, byte (-SIZEOF_XMMWORD) ; align to 128 bits - mov [esp], eax - mov ebp, esp ; ebp = aligned ebp - sub esp, temp5+9*SIZEOF_DWORD-pad - push ebx - push ecx -; push edx ; need not be preserved - push esi - push edi - push ebp - - mov esi, POINTER [eax+8] ; (working_state *state) - mov put_buffer, dword [esi+8] ; put_buffer = state->cur.put_buffer; - mov put_bits, dword [esi+12] ; put_bits = state->cur.put_bits; - push esi ; esi is now scratch - - get_GOT edx ; get GOT address - movpic POINTER [esp+gotptr], edx ; save GOT address - - mov ecx, POINTER [eax+28] - mov edx, POINTER [eax+16] - mov esi, POINTER [eax+12] - mov POINTER [esp+actbl], ecx - mov POINTER [esp+block], edx - mov POINTER [esp+buffer], esi - - ; Encode the DC coefficient difference per section F.1.2.1 - mov esi, POINTER [esp+block] ; block - movsx ecx, word [esi] ; temp = temp2 = block[0] - last_dc_val; - sub ecx, dword [eax+20] - mov esi, ecx - - ; This is a well-known technique for obtaining the absolute value - ; with out a branch. It is derived from an assembly language technique - ; presented in "How to Optimize for the Pentium Processors", - ; Copyright (c) 1996, 1997 by Agner Fog. - mov edx, ecx - sar edx, 31 ; temp3 = temp >> (CHAR_BIT * sizeof(int) - 1); - xor ecx, edx ; temp ^= temp3; - sub ecx, edx ; temp -= temp3; - - ; For a negative input, want temp2 = bitwise complement of abs(input) - ; This code assumes we are on a two's complement machine - add esi, edx ; temp2 += temp3; - mov dword [esp+temp], esi ; backup temp2 in temp - - ; Find the number of bits needed for the magnitude of the coefficient - movpic ebp, POINTER [esp+gotptr] ; load GOT address (ebp) - movzx edx, byte [GOTOFF(ebp, EXTN(jpeg_nbits_table) + ecx)] ; nbits = JPEG_NBITS(temp); - mov dword [esp+temp2], edx ; backup nbits in temp2 - - ; Emit the Huffman-coded symbol for the number of bits - mov ebp, POINTER [eax+24] ; After this point, arguments are not accessible anymore - mov eax, INT [ebp + edx * 4] ; code = dctbl->ehufco[nbits]; - movzx ecx, byte [ebp + edx + 1024] ; size = dctbl->ehufsi[nbits]; - EMIT_BITS eax ; EMIT_BITS(code, size) - - mov ecx, dword [esp+temp2] ; restore nbits - - ; Mask off any extra bits in code - mov eax, 1 - shl eax, cl - dec eax - and eax, dword [esp+temp] ; temp2 &= (((JLONG)1)<>= r; - mov dword [esp+temp3], edx -.BRLOOP: - cmp ecx, 16 ; while (r > 15) { - jl near .ERLOOP - sub ecx, 16 ; r -= 16; - mov dword [esp+temp], ecx - mov eax, INT [ebp + 240 * 4] ; code_0xf0 = actbl->ehufco[0xf0]; - movzx ecx, byte [ebp + 1024 + 240] ; size_0xf0 = actbl->ehufsi[0xf0]; - EMIT_BITS eax ; EMIT_BITS(code_0xf0, size_0xf0) - mov ecx, dword [esp+temp] - jmp .BRLOOP -.ERLOOP: - movsx eax, word [esi] ; temp = t1[k]; - movpic edx, POINTER [esp+gotptr] ; load GOT address (edx) - movzx eax, byte [GOTOFF(edx, EXTN(jpeg_nbits_table) + eax)] ; nbits = JPEG_NBITS(temp); - mov dword [esp+temp2], eax - ; Emit Huffman symbol for run length / number of bits - shl ecx, 4 ; temp3 = (r << 4) + nbits; - add ecx, eax - mov eax, INT [ebp + ecx * 4] ; code = actbl->ehufco[temp3]; - movzx ecx, byte [ebp + ecx + 1024] ; size = actbl->ehufsi[temp3]; - EMIT_BITS eax - - movsx edx, word [esi+DCTSIZE2*2] ; temp2 = t2[k]; - ; Mask off any extra bits in code - mov ecx, dword [esp+temp2] - mov eax, 1 - shl eax, cl - dec eax - and eax, edx ; temp2 &= (((JLONG)1)<>= 1; - - jmp .BLOOP -.ELOOP: - movdqa xmm0, XMMWORD [esp + t1 + 32 * SIZEOF_WORD] ; __m128i tmp0 = _mm_loadu_si128((__m128i *)(t1 + 0)); - movdqa xmm1, XMMWORD [esp + t1 + 40 * SIZEOF_WORD] ; __m128i tmp1 = _mm_loadu_si128((__m128i *)(t1 + 8)); - movdqa xmm2, XMMWORD [esp + t1 + 48 * SIZEOF_WORD] ; __m128i tmp2 = _mm_loadu_si128((__m128i *)(t1 + 16)); - movdqa xmm3, XMMWORD [esp + t1 + 56 * SIZEOF_WORD] ; __m128i tmp3 = _mm_loadu_si128((__m128i *)(t1 + 24)); - pcmpeqw xmm0, xmm7 ; tmp0 = _mm_cmpeq_epi16(tmp0, zero); - pcmpeqw xmm1, xmm7 ; tmp1 = _mm_cmpeq_epi16(tmp1, zero); - pcmpeqw xmm2, xmm7 ; tmp2 = _mm_cmpeq_epi16(tmp2, zero); - pcmpeqw xmm3, xmm7 ; tmp3 = _mm_cmpeq_epi16(tmp3, zero); - packsswb xmm0, xmm1 ; tmp0 = _mm_packs_epi16(tmp0, tmp1); - packsswb xmm2, xmm3 ; tmp2 = _mm_packs_epi16(tmp2, tmp3); - pmovmskb edx, xmm0 ; index = ((uint64_t)_mm_movemask_epi8(tmp0)) << 0; - pmovmskb ecx, xmm2 ; index = ((uint64_t)_mm_movemask_epi8(tmp2)) << 16; - shl ecx, 16 - or edx, ecx - not edx ; index = ~index; - - lea eax, [esp + t1 + (DCTSIZE2/2) * 2] - sub eax, esi - shr eax, 1 - bsf ecx, edx ; r = __builtin_ctzl(index); - jz near .ELOOP2 - shr edx, cl ; index >>= r; - add ecx, eax - lea esi, [esi+ecx*2] ; k += r; - mov dword [esp+temp3], edx - jmp .BRLOOP2 -.BLOOP2: - bsf ecx, edx ; r = __builtin_ctzl(index); - jz near .ELOOP2 - lea esi, [esi+ecx*2] ; k += r; - shr edx, cl ; index >>= r; - mov dword [esp+temp3], edx -.BRLOOP2: - cmp ecx, 16 ; while (r > 15) { - jl near .ERLOOP2 - sub ecx, 16 ; r -= 16; - mov dword [esp+temp], ecx - mov eax, INT [ebp + 240 * 4] ; code_0xf0 = actbl->ehufco[0xf0]; - movzx ecx, byte [ebp + 1024 + 240] ; size_0xf0 = actbl->ehufsi[0xf0]; - EMIT_BITS eax ; EMIT_BITS(code_0xf0, size_0xf0) - mov ecx, dword [esp+temp] - jmp .BRLOOP2 -.ERLOOP2: - movsx eax, word [esi] ; temp = t1[k]; - bsr eax, eax ; nbits = 32 - __builtin_clz(temp); - inc eax - mov dword [esp+temp2], eax - ; Emit Huffman symbol for run length / number of bits - shl ecx, 4 ; temp3 = (r << 4) + nbits; - add ecx, eax - mov eax, INT [ebp + ecx * 4] ; code = actbl->ehufco[temp3]; - movzx ecx, byte [ebp + ecx + 1024] ; size = actbl->ehufsi[temp3]; - EMIT_BITS eax - - movsx edx, word [esi+DCTSIZE2*2] ; temp2 = t2[k]; - ; Mask off any extra bits in code - mov ecx, dword [esp+temp2] - mov eax, 1 - shl eax, cl - dec eax - and eax, edx ; temp2 &= (((JLONG)1)<>= 1; - - jmp .BLOOP2 -.ELOOP2: - ; If the last coef(s) were zero, emit an end-of-block code - lea edx, [esp + t1 + (DCTSIZE2-1) * 2] ; r = DCTSIZE2-1-k; - cmp edx, esi ; if (r > 0) { - je .EFN - mov eax, INT [ebp] ; code = actbl->ehufco[0]; - movzx ecx, byte [ebp + 1024] ; size = actbl->ehufsi[0]; - EMIT_BITS eax -.EFN: - mov eax, [esp+buffer] - pop esi - ; Save put_buffer & put_bits - mov dword [esi+8], put_buffer ; state->cur.put_buffer = put_buffer; - mov dword [esi+12], put_bits ; state->cur.put_bits = put_bits; - - pop ebp - pop edi - pop esi -; pop edx ; need not be preserved - pop ecx - pop ebx - mov esp, ebp ; esp <- aligned ebp - pop esp ; esp <- original ebp - pop ebp + +%assign stack_offset 0 +%define arg_state 4 + stack_offset +%define arg_buffer 8 + stack_offset +%define arg_block 12 + stack_offset +%define arg_last_dc_val 16 + stack_offset +%define arg_dctbl 20 + stack_offset +%define arg_actbl 24 + stack_offset + + ;X: X = code stream + mov block, [esp + arg_block] + PUSH ebx + PUSH ebp + movups xmm3, XMMWORD [block + 0 * SIZEOF_WORD] ;D: w3 = xx 01 02 03 04 05 06 07 + PUSH esi + PUSH edi + movdqa xmm0, xmm3 ;A: w0 = xx 01 02 03 04 05 06 07 + mov frame, esp + lea t, [frame - (save_frame + 4)] + movups xmm1, XMMWORD [block + 8 * SIZEOF_WORD] ;B: w1 = 08 09 10 11 12 13 14 15 + and t, -DCTSIZE2 * SIZEOF_WORD ; t = &t_[0] + mov [t + save_frame], frame + pxor xmm4, xmm4 ;A: w4[i] = 0; + punpckldq xmm0, xmm1 ;A: w0 = xx 01 08 09 02 03 10 11 + pshuflw xmm0, xmm0, 11001001b ;A: w0 = 01 08 xx 09 02 03 10 11 + pinsrw xmm0, word [block + 16 * SIZEOF_WORD], 2 ;A: w0 = 01 08 16 09 02 03 10 11 + punpckhdq xmm3, xmm1 ;D: w3 = 04 05 12 13 06 07 14 15 + punpcklqdq xmm1, xmm3 ;B: w1 = 08 09 10 11 04 05 12 13 + pinsrw xmm0, word [block + 17 * SIZEOF_WORD], 7 ;A: w0 = 01 08 16 09 02 03 10 17 + ;A: (Row 0, offset 1) + pcmpgtw xmm4, xmm0 ;A: w4[i] = (w0[i] < 0 ? -1 : 0); + paddw xmm0, xmm4 ;A: w0[i] += w4[i]; + movaps XMMWORD [t + 0 * SIZEOF_WORD], xmm0 ;A: t[i] = w0[i]; + + movq xmm2, qword [block + 24 * SIZEOF_WORD] ;B: w2 = 24 25 26 27 -- -- -- -- + pshuflw xmm2, xmm2, 11011000b ;B: w2 = 24 26 25 27 -- -- -- -- + pslldq xmm1, 1 * SIZEOF_WORD ;B: w1 = -- 08 09 10 11 04 05 12 + movups xmm5, XMMWORD [block + 48 * SIZEOF_WORD] ;H: w5 = 48 49 50 51 52 53 54 55 + movsd xmm1, xmm2 ;B: w1 = 24 26 25 27 11 04 05 12 + punpcklqdq xmm2, xmm5 ;C: w2 = 24 26 25 27 48 49 50 51 + pinsrw xmm1, word [block + 32 * SIZEOF_WORD], 1 ;B: w1 = 24 32 25 27 11 04 05 12 + pxor xmm4, xmm4 ;A: w4[i] = 0; + psrldq xmm3, 2 * SIZEOF_WORD ;D: w3 = 12 13 06 07 14 15 -- -- + pcmpeqw xmm0, xmm4 ;A: w0[i] = (w0[i] == 0 ? -1 : 0); + pinsrw xmm1, word [block + 18 * SIZEOF_WORD], 3 ;B: w1 = 24 32 25 18 11 04 05 12 + ; (Row 1, offset 1) + pcmpgtw xmm4, xmm1 ;B: w4[i] = (w1[i] < 0 ? -1 : 0); + paddw xmm1, xmm4 ;B: w1[i] += w4[i]; + movaps XMMWORD [t + 8 * SIZEOF_WORD], xmm1 ;B: t[i+8] = w1[i]; + pxor xmm4, xmm4 ;B: w4[i] = 0; + pcmpeqw xmm1, xmm4 ;B: w1[i] = (w1[i] == 0 ? -1 : 0); + + packsswb xmm0, xmm1 ;AB: b0[i] = w0[i], b0[i+8] = w1[i] + ; w/ signed saturation + + pinsrw xmm3, word [block + 20 * SIZEOF_WORD], 0 ;D: w3 = 20 13 06 07 14 15 -- -- + pinsrw xmm3, word [block + 21 * SIZEOF_WORD], 5 ;D: w3 = 20 13 06 07 14 21 -- -- + pinsrw xmm3, word [block + 28 * SIZEOF_WORD], 6 ;D: w3 = 20 13 06 07 14 21 28 -- + pinsrw xmm3, word [block + 35 * SIZEOF_WORD], 7 ;D: w3 = 20 13 06 07 14 21 28 35 + ; (Row 3, offset 1) + pcmpgtw xmm4, xmm3 ;D: w4[i] = (w3[i] < 0 ? -1 : 0); + paddw xmm3, xmm4 ;D: w3[i] += w4[i]; + movaps XMMWORD [t + 24 * SIZEOF_WORD], xmm3 ;D: t[i+24] = w3[i]; + pxor xmm4, xmm4 ;D: w4[i] = 0; + pcmpeqw xmm3, xmm4 ;D: w3[i] = (w3[i] == 0 ? -1 : 0); + + pinsrw xmm2, word [block + 19 * SIZEOF_WORD], 0 ;C: w2 = 19 26 25 27 48 49 50 51 + pinsrw xmm2, word [block + 33 * SIZEOF_WORD], 2 ;C: w2 = 19 26 33 27 48 49 50 51 + pinsrw xmm2, word [block + 40 * SIZEOF_WORD], 3 ;C: w2 = 19 26 33 40 48 49 50 51 + pinsrw xmm2, word [block + 41 * SIZEOF_WORD], 5 ;C: w2 = 19 26 33 40 48 41 50 51 + pinsrw xmm2, word [block + 34 * SIZEOF_WORD], 6 ;C: w2 = 19 26 33 40 48 41 34 51 + pinsrw xmm2, word [block + 27 * SIZEOF_WORD], 7 ;C: w2 = 19 26 33 40 48 41 34 27 + ; (Row 2, offset 1) + pcmpgtw xmm4, xmm2 ;C: w4[i] = (w2[i] < 0 ? -1 : 0); + paddw xmm2, xmm4 ;C: w2[i] += w4[i]; + movsx code_temp, word [block] ;Z: code_temp = block[0]; + +; %1 - stack pointer adjustment +%macro GET_SYM_BEFORE 1 + movaps XMMWORD [t + 16 * SIZEOF_WORD + %1], xmm2 + ;C: t[i+16] = w2[i]; + pxor xmm4, xmm4 ;C: w4[i] = 0; + pcmpeqw xmm2, xmm4 ;C: w2[i] = (w2[i] == 0 ? -1 : 0); + sub code_temp, [frame + arg_last_dc_val] ;Z: code_temp -= last_dc_val; + + packsswb xmm2, xmm3 ;CD: b2[i] = w2[i], b2[i+8] = w3[i] + ; w/ signed saturation + + movdqa xmm3, xmm5 ;H: w3 = 48 49 50 51 52 53 54 55 + pmovmskb index_temp, xmm2 ;Z: index_temp = 0; index_temp |= ((b2[i] >> 7) << i); + pmovmskb index, xmm0 ;Z: index = 0; index |= ((b0[i] >> 7) << i); + movups xmm0, XMMWORD [block + 56 * SIZEOF_WORD] ;H: w0 = 56 57 58 59 60 61 62 63 + punpckhdq xmm3, xmm0 ;H: w3 = 52 53 60 61 54 55 62 63 + shl index_temp, 16 ;Z: index_temp <<= 16; + psrldq xmm3, 1 * SIZEOF_WORD ;H: w3 = 53 60 61 54 55 62 63 -- + pxor xmm2, xmm2 ;H: w2[i] = 0; + pshuflw xmm3, xmm3, 00111001b ;H: w3 = 60 61 54 53 55 62 63 -- + or index, index_temp ;Z: index |= index_temp; +%undef index_temp +%define free_bits edi +%endmacro + +%macro GET_SYM_AFTER 0 + movq xmm1, qword [block + 44 * SIZEOF_WORD] ;G: w1 = 44 45 46 47 -- -- -- -- + unpcklps xmm5, xmm0 ;E: w5 = 48 49 56 57 50 51 58 59 + pxor xmm0, xmm0 ;H: w0[i] = 0; + not index ;Z: index = ~index; + pinsrw xmm3, word [block + 47 * SIZEOF_WORD], 3 ;H: w3 = 60 61 54 47 55 62 63 -- + ; (Row 7, offset 1) + pcmpgtw xmm2, xmm3 ;H: w2[i] = (w3[i] < 0 ? -1 : 0); + mov dctbl, [frame + arg_dctbl] + paddw xmm3, xmm2 ;H: w3[i] += w2[i]; + movaps XMMWORD [t + 56 * SIZEOF_WORD], xmm3 ;H: t[i+56] = w3[i]; + movq xmm4, qword [block + 36 * SIZEOF_WORD] ;G: w4 = 36 37 38 39 -- -- -- -- + pcmpeqw xmm3, xmm0 ;H: w3[i] = (w3[i] == 0 ? -1 : 0); + punpckldq xmm4, xmm1 ;G: w4 = 36 37 44 45 38 39 46 47 + movdqa xmm1, xmm4 ;F: w1 = 36 37 44 45 38 39 46 47 + pcmpeqw mm_all_0xff, mm_all_0xff ;Z: all_0xff[i] = 0xFF; +%endmacro + + GET_SYM nbits_base, jpeg_nbits_table, GET_SYM_BEFORE, GET_SYM_AFTER + + psrldq xmm4, 1 * SIZEOF_WORD ;G: w4 = 37 44 45 38 39 46 47 -- + shufpd xmm1, xmm5, 10b ;F: w1 = 36 37 44 45 50 51 58 59 + pshufhw xmm4, xmm4, 11010011b ;G: w4 = 37 44 45 38 -- 39 46 -- + pslldq xmm1, 1 * SIZEOF_WORD ;F: w1 = -- 36 37 44 45 50 51 58 + pinsrw xmm4, word [block + 59 * SIZEOF_WORD], 0 ;G: w4 = 59 44 45 38 -- 39 46 -- + pshufd xmm1, xmm1, 11011000b ;F: w1 = -- 36 45 50 37 44 51 58 + cmp code_temp, 1 << 31 ;Z: Set CF if code_temp < 0x80000000, + ;Z: i.e. if code_temp is positive + pinsrw xmm4, word [block + 52 * SIZEOF_WORD], 1 ;G: w4 = 59 52 45 38 -- 39 46 -- + movlps xmm1, qword [block + 20 * SIZEOF_WORD] ;F: w1 = 20 21 22 23 37 44 51 58 + pinsrw xmm4, word [block + 31 * SIZEOF_WORD], 4 ;G: w4 = 59 52 45 38 31 39 46 -- + pshuflw xmm1, xmm1, 01110010b ;F: w1 = 22 20 23 21 37 44 51 58 + pinsrw xmm4, word [block + 53 * SIZEOF_WORD], 7 ;G: w4 = 59 52 45 38 31 39 46 53 + ; (Row 6, offset 1) + adc code_temp, -1 ;Z: code_temp += -1 + (code_temp >= 0 ? 1 : 0); + pxor xmm2, xmm2 ;G: w2[i] = 0; + pcmpgtw xmm0, xmm4 ;G: w0[i] = (w4[i] < 0 ? -1 : 0); + pinsrw xmm1, word [block + 15 * SIZEOF_WORD], 1 ;F: w1 = 22 15 23 21 37 44 51 58 + paddw xmm4, xmm0 ;G: w4[i] += w0[i]; + movaps XMMWORD [t + 48 * SIZEOF_WORD], xmm4 ;G: t[48+i] = w4[i]; + movd mm_temp, code_temp ;Z: temp = code_temp + pinsrw xmm1, word [block + 30 * SIZEOF_WORD], 3 ;F: w1 = 22 15 23 30 37 44 51 58 + ; (Row 5, offset 1) + pcmpeqw xmm4, xmm2 ;G: w4[i] = (w4[i] == 0 ? -1 : 0); + + packsswb xmm4, xmm3 ;GH: b4[i] = w4[i], b4[i+8] = w3[i] + ; w/ signed saturation + + lea t, [t - SIZEOF_WORD] ;Z: t = &t[-1] + pxor xmm0, xmm0 ;F: w0[i] = 0; + pcmpgtw xmm2, xmm1 ;F: w2[i] = (w1[i] < 0 ? -1 : 0); + paddw xmm1, xmm2 ;F: w1[i] += w2[i]; + movaps XMMWORD [t + (40+1) * SIZEOF_WORD], xmm1 ;F: t[40+i] = w1[i]; + pcmpeqw xmm1, xmm0 ;F: w1[i] = (w1[i] == 0 ? -1 : 0); + pinsrw xmm5, word [block + 42 * SIZEOF_WORD], 0 ;E: w5 = 42 49 56 57 50 51 58 59 + pinsrw xmm5, word [block + 43 * SIZEOF_WORD], 5 ;E: w5 = 42 49 56 57 50 43 58 59 + pinsrw xmm5, word [block + 36 * SIZEOF_WORD], 6 ;E: w5 = 42 49 56 57 50 43 36 59 + pinsrw xmm5, word [block + 29 * SIZEOF_WORD], 7 ;E: w5 = 42 49 56 57 50 43 36 29 + ; (Row 4, offset 1) +%undef block +%define nbits edx +%define nbitsb dl +%define nbitsh dh + movzx nbits, byte [NBITS(code_temp)] ;Z: nbits = JPEG_NBITS(code_temp); +%undef code_temp +%define state esi + pxor xmm2, xmm2 ;E: w2[i] = 0; + mov state, [frame + arg_state] + movd mm_nbits, nbits ;Z: nbits --> MMX register + pcmpgtw xmm0, xmm5 ;E: w0[i] = (w5[i] < 0 ? -1 : 0); + movd mm_code, dword [dctbl + c_derived_tbl.ehufco + nbits * 4] + ;Z: code = dctbl->ehufco[nbits]; +%define size ecx +%define sizeb cl +%define sizeh ch + paddw xmm5, xmm0 ;E: w5[i] += w0[i]; + movaps XMMWORD [t + (32+1) * SIZEOF_WORD], xmm5 ;E: t[32+i] = w5[i]; + movzx size, byte [dctbl + c_derived_tbl.ehufsi + nbits] + ;Z: size = dctbl->ehufsi[nbits]; +%undef dctbl + pcmpeqw xmm5, xmm2 ;E: w5[i] = (w5[i] == 0 ? -1 : 0); + + packsswb xmm5, xmm1 ;EF: b5[i] = w5[i], b5[i+8] = w1[i] + ; w/ signed saturation + + movq mm_put_buffer, [state + working_state.cur.put_buffer.simd] + ;Z: put_buffer = state->cur.put_buffer.simd; + mov free_bits, [state + working_state.cur.free_bits] + ;Z: free_bits = state->cur.free_bits; +%undef state +%define actbl esi + mov actbl, [frame + arg_actbl] +%define buffer eax + mov buffer, [frame + arg_buffer] +%undef frame + jmp .BEGIN + +; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + align 16 +; size <= 32, so this is not really a loop +.BRLOOP1: ; .BRLOOP1: + movzx nbits, byte [actbl + c_derived_tbl.ehufsi + 0xf0] + ; nbits = actbl->ehufsi[0xf0]; + movd mm_code, dword [actbl + c_derived_tbl.ehufco + 0xf0 * 4] + ; code = actbl->ehufco[0xf0]; + and index, 0x7ffffff ; clear index if size == 32 + sub size, 16 ; size -= 16; + sub free_bits, nbits ; if ((free_bits -= nbits) <= 0) + jle .EMIT_BRLOOP1 ; goto .EMIT_BRLOOP1; + movd mm_nbits, nbits ; nbits --> MMX register + psllq mm_put_buffer, mm_nbits ; put_buffer <<= nbits; + por mm_put_buffer, mm_code ; put_buffer |= code; + jmp .ERLOOP1 ; goto .ERLOOP1; + +; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + align 16 +%ifdef PIC + times 6 nop +%else + times 2 nop +%endif +.BLOOP1: ; do { /* size = # of zero bits/elements to skip */ +; if size == 32, index remains unchanged. Correct in .BRLOOP. + shr index, sizeb ; index >>= size; + lea t, [t + size * SIZEOF_WORD] ; t += size; + cmp size, 16 ; if (size > 16) + jg .BRLOOP1 ; goto .BRLOOP1; +.ERLOOP1: ; .ERLOOP1: + movsx nbits, word [t] ; nbits = *t; +%ifdef PIC + add size, size ; size += size; +%else + lea size, [size * 2] ; size += size; +%endif + movd mm_temp, nbits ; temp = nbits; + movzx nbits, byte [NBITS(nbits)] ; nbits = JPEG_NBITS(nbits); + lea size, [size * 8 + nbits] ; size = size * 8 + nbits; + movd mm_nbits, nbits ; nbits --> MMX register + movd mm_code, dword [actbl + c_derived_tbl.ehufco + (size - 16) * 4] + ; code = actbl->ehufco[size-16]; + movzx size, byte [actbl + c_derived_tbl.ehufsi + (size - 16)] + ; size = actbl->ehufsi[size-16]; +.BEGIN: ; .BEGIN: + pand mm_temp, [MASK_BITS(nbits)] ; temp &= (1 << nbits) - 1; + psllq mm_code, mm_nbits ; code <<= nbits; + add nbits, size ; nbits += size; + por mm_code, mm_temp ; code |= temp; + sub free_bits, nbits ; if ((free_bits -= nbits) <= 0) + jle .EMIT_ERLOOP1 ; insert code, flush buffer, init size, goto .BLOOP1 + xor size, size ; size = 0; /* kill tzcnt input dependency */ + tzcnt size, index ; size = # of trailing 0 bits in index + movd mm_nbits, nbits ; nbits --> MMX register + psllq mm_put_buffer, mm_nbits ; put_buffer <<= nbits; + inc size ; ++size; + por mm_put_buffer, mm_code ; put_buffer |= code; + test index, index + jnz .BLOOP1 ; } while (index != 0); +; Round 2 +; t points to the last used word, possibly below t_ if the previous index had 32 zero bits. +.ELOOP1: ; .ELOOP1: + pmovmskb size, xmm4 ; size = 0; size |= ((b4[i] >> 7) << i); + pmovmskb index, xmm5 ; index = 0; index |= ((b5[i] >> 7) << i); + shl size, 16 ; size <<= 16; + or index, size ; index |= size; + not index ; index = ~index; + lea nbits, [t + (1 + DCTSIZE2) * SIZEOF_WORD] + ; nbits = t + 1 + 64; + and nbits, -DCTSIZE2 * SIZEOF_WORD ; nbits &= -128; /* now points to &t_[64] */ + sub nbits, t ; nbits -= t; + shr nbits, 1 ; nbits >>= 1; /* # of leading 0 bits in old index + 33 */ + tzcnt size, index ; size = # of trailing 0 bits in index + inc size ; ++size; + test index, index ; if (index == 0) + jz .ELOOP2 ; goto .ELOOP2; +; NOTE: size == 32 cannot happen, since the last element is always 0. + shr index, sizeb ; index >>= size; + lea size, [size + nbits - 33] ; size = size + nbits - 33; + lea t, [t + size * SIZEOF_WORD] ; t += size; + cmp size, 16 ; if (size <= 16) + jle .ERLOOP2 ; goto .ERLOOP2; +.BRLOOP2: ; do { + movzx nbits, byte [actbl + c_derived_tbl.ehufsi + 0xf0] + ; nbits = actbl->ehufsi[0xf0]; + sub size, 16 ; size -= 16; + movd mm_code, dword [actbl + c_derived_tbl.ehufco + 0xf0 * 4] + ; code = actbl->ehufco[0xf0]; + sub free_bits, nbits ; if ((free_bits -= nbits) <= 0) + jle .EMIT_BRLOOP2 ; insert code and flush put_buffer + movd mm_nbits, nbits ; else { nbits --> MMX register + psllq mm_put_buffer, mm_nbits ; put_buffer <<= nbits; + por mm_put_buffer, mm_code ; put_buffer |= code; + cmp size, 16 ; if (size <= 16) + jle .ERLOOP2 ; goto .ERLOOP2; + jmp .BRLOOP2 ; } while (1); + +; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + align 16 +.BLOOP2: ; do { /* size = # of zero bits/elements to skip */ + shr index, sizeb ; index >>= size; + lea t, [t + size * SIZEOF_WORD] ; t += size; + cmp size, 16 ; if (size > 16) + jg .BRLOOP2 ; goto .BRLOOP2; +.ERLOOP2: ; .ERLOOP2: + movsx nbits, word [t] ; nbits = *t; + add size, size ; size += size; + movd mm_temp, nbits ; temp = nbits; + movzx nbits, byte [NBITS(nbits)] ; nbits = JPEG_NBITS(nbits); + movd mm_nbits, nbits ; nbits --> MMX register + lea size, [size * 8 + nbits] ; size = size * 8 + nbits; + movd mm_code, dword [actbl + c_derived_tbl.ehufco + (size - 16) * 4] + ; code = actbl->ehufco[size-16]; + movzx size, byte [actbl + c_derived_tbl.ehufsi + (size - 16)] + ; size = actbl->ehufsi[size-16]; + psllq mm_code, mm_nbits ; code <<= nbits; + pand mm_temp, [MASK_BITS(nbits)] ; temp &= (1 << nbits) - 1; + lea nbits, [nbits + size] ; nbits += size; + por mm_code, mm_temp ; code |= temp; + xor size, size ; size = 0; /* kill tzcnt input dependency */ + sub free_bits, nbits ; if ((free_bits -= nbits) <= 0) + jle .EMIT_ERLOOP2 ; insert code, flush buffer, init size, goto .BLOOP2 + tzcnt size, index ; size = # of trailing 0 bits in index + movd mm_nbits, nbits ; nbits --> MMX register + psllq mm_put_buffer, mm_nbits ; put_buffer <<= nbits; + inc size ; ++size; + por mm_put_buffer, mm_code ; put_buffer |= code; + test index, index + jnz .BLOOP2 ; } while (index != 0); +.ELOOP2: ; .ELOOP2: + mov nbits, t ; nbits = t; + lea t, [t + SIZEOF_WORD] ; t = &t[1]; + and nbits, DCTSIZE2 * SIZEOF_WORD - 1 ; nbits &= 127; + and t, -DCTSIZE2 * SIZEOF_WORD ; t &= -128; /* t = &t_[0]; */ + cmp nbits, (DCTSIZE2 - 2) * SIZEOF_WORD ; if (nbits != 62 * 2) + je .EFN ; { + movd mm_code, dword [actbl + c_derived_tbl.ehufco + 0] + ; code = actbl->ehufco[0]; + movzx nbits, byte [actbl + c_derived_tbl.ehufsi + 0] + ; nbits = actbl->ehufsi[0]; + sub free_bits, nbits ; if ((free_bits -= nbits) <= 0) + jg .EFN_SKIP_EMIT_CODE ; { + EMIT_QWORD size, sizeb, sizeh, , , , , , .EFN ; insert code, flush put_buffer + align 16 +.EFN_SKIP_EMIT_CODE: ; } else { + movd mm_nbits, nbits ; nbits --> MMX register + psllq mm_put_buffer, mm_nbits ; put_buffer <<= nbits; + por mm_put_buffer, mm_code ; put_buffer |= code; +.EFN: ; } } +%define frame esp + mov frame, [t + save_frame] +%define state ecx + mov state, [frame + arg_state] + movq [state + working_state.cur.put_buffer.simd], mm_put_buffer + ; state->cur.put_buffer.simd = put_buffer; + emms + mov [state + working_state.cur.free_bits], free_bits + ; state->cur.free_bits = free_bits; + POP edi + POP esi + POP ebp + POP ebx ret +; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + align 16 +.EMIT_BRLOOP1: + EMIT_QWORD emit_temp, emit_tempb, emit_temph, , , , , , \ + .ERLOOP1 + +; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + align 16 +.EMIT_ERLOOP1: + EMIT_QWORD size, sizeb, sizeh, \ + { xor size, size }, \ + { tzcnt size, index }, \ + { inc size }, \ + { test index, index }, \ + { jnz .BLOOP1 }, \ + .ELOOP1 + +; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + align 16 +.EMIT_BRLOOP2: + EMIT_QWORD emit_temp, emit_tempb, emit_temph, , , , \ + { cmp size, 16 }, \ + { jle .ERLOOP2 }, \ + .BRLOOP2 + +; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + align 16 +.EMIT_ERLOOP2: + EMIT_QWORD size, sizeb, sizeh, \ + { xor size, size }, \ + { tzcnt size, index }, \ + { inc size }, \ + { test index, index }, \ + { jnz .BLOOP2 }, \ + .ELOOP2 + ; For some reason, the OS X linker does not honor the request to align the ; segment unless we do this. align 32 diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/i386/jcphuff-sse2.asm b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/i386/jcphuff-sse2.asm --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/i386/jcphuff-sse2.asm 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/i386/jcphuff-sse2.asm 2021-11-20 03:41:33.401600402 +0000 @@ -523,6 +523,8 @@ add KK, 2 dec K jnz .BLOOPR16 + test LEN, 15 + je .PADDINGR .ELOOPR16: mov LENEND, LEN diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/i386/jfdctint-avx2.asm b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/i386/jfdctint-avx2.asm --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/i386/jfdctint-avx2.asm 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/i386/jfdctint-avx2.asm 2021-11-20 03:41:33.401600402 +0000 @@ -2,7 +2,7 @@ ; jfdctint.asm - accurate integer FDCT (AVX2) ; ; Copyright 2009 Pierre Ossman for Cendio AB -; Copyright (C) 2009, 2016, 2018, D. R. Commander. +; Copyright (C) 2009, 2016, 2018, 2020, D. R. Commander. ; ; Based on the x86 SIMD extension for IJG JPEG library ; Copyright (C) 1999-2006, MIYASAKA Masaru. @@ -14,7 +14,7 @@ ; NASM is available from http://nasm.sourceforge.net/ or ; http://sourceforge.net/project/showfiles.php?group_id=6208 ; -; This file contains a slow-but-accurate integer implementation of the +; This file contains a slower but more accurate integer implementation of the ; forward DCT (Discrete Cosine Transform). The following code is based ; directly on the IJG's original jfdctint.c; see the jfdctint.c for ; more details. @@ -103,7 +103,7 @@ %endmacro ; -------------------------------------------------------------------------- -; In-place 8x8x16-bit slow integer forward DCT using AVX2 instructions +; In-place 8x8x16-bit accurate integer forward DCT using AVX2 instructions ; %1-%4: Input/output registers ; %5-%8: Temp registers ; %9: Pass (1 or 2) diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/i386/jfdctint-mmx.asm b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/i386/jfdctint-mmx.asm --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/i386/jfdctint-mmx.asm 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/i386/jfdctint-mmx.asm 2021-11-20 03:41:33.401600402 +0000 @@ -2,7 +2,7 @@ ; jfdctint.asm - accurate integer FDCT (MMX) ; ; Copyright 2009 Pierre Ossman for Cendio AB -; Copyright (C) 2016, D. R. Commander. +; Copyright (C) 2016, 2020, D. R. Commander. ; ; Based on the x86 SIMD extension for IJG JPEG library ; Copyright (C) 1999-2006, MIYASAKA Masaru. @@ -14,7 +14,7 @@ ; NASM is available from http://nasm.sourceforge.net/ or ; http://sourceforge.net/project/showfiles.php?group_id=6208 ; -; This file contains a slow-but-accurate integer implementation of the +; This file contains a slower but more accurate integer implementation of the ; forward DCT (Discrete Cosine Transform). The following code is based ; directly on the IJG's original jfdctint.c; see the jfdctint.c for ; more details. diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/i386/jfdctint-sse2.asm b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/i386/jfdctint-sse2.asm --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/i386/jfdctint-sse2.asm 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/i386/jfdctint-sse2.asm 2021-11-20 03:41:33.402600386 +0000 @@ -2,7 +2,7 @@ ; jfdctint.asm - accurate integer FDCT (SSE2) ; ; Copyright 2009 Pierre Ossman for Cendio AB -; Copyright (C) 2016, D. R. Commander. +; Copyright (C) 2016, 2020, D. R. Commander. ; ; Based on the x86 SIMD extension for IJG JPEG library ; Copyright (C) 1999-2006, MIYASAKA Masaru. @@ -14,7 +14,7 @@ ; NASM is available from http://nasm.sourceforge.net/ or ; http://sourceforge.net/project/showfiles.php?group_id=6208 ; -; This file contains a slow-but-accurate integer implementation of the +; This file contains a slower but more accurate integer implementation of the ; forward DCT (Discrete Cosine Transform). The following code is based ; directly on the IJG's original jfdctint.c; see the jfdctint.c for ; more details. diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/i386/jidctint-avx2.asm b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/i386/jidctint-avx2.asm --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/i386/jidctint-avx2.asm 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/i386/jidctint-avx2.asm 2021-11-20 03:41:33.402600386 +0000 @@ -2,7 +2,7 @@ ; jidctint.asm - accurate integer IDCT (AVX2) ; ; Copyright 2009 Pierre Ossman for Cendio AB -; Copyright (C) 2009, 2016, 2018, D. R. Commander. +; Copyright (C) 2009, 2016, 2018, 2020, D. R. Commander. ; ; Based on the x86 SIMD extension for IJG JPEG library ; Copyright (C) 1999-2006, MIYASAKA Masaru. @@ -14,7 +14,7 @@ ; NASM is available from http://nasm.sourceforge.net/ or ; http://sourceforge.net/project/showfiles.php?group_id=6208 ; -; This file contains a slow-but-accurate integer implementation of the +; This file contains a slower but more accurate integer implementation of the ; inverse DCT (Discrete Cosine Transform). The following code is based ; directly on the IJG's original jidctint.c; see the jidctint.c for ; more details. @@ -113,7 +113,7 @@ %endmacro ; -------------------------------------------------------------------------- -; In-place 8x8x16-bit slow integer inverse DCT using AVX2 instructions +; In-place 8x8x16-bit accurate integer inverse DCT using AVX2 instructions ; %1-%4: Input/output registers ; %5-%12: Temp registers ; %9: Pass (1 or 2) diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/i386/jidctint-mmx.asm b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/i386/jidctint-mmx.asm --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/i386/jidctint-mmx.asm 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/i386/jidctint-mmx.asm 2021-11-20 03:41:33.402600386 +0000 @@ -2,7 +2,7 @@ ; jidctint.asm - accurate integer IDCT (MMX) ; ; Copyright 2009 Pierre Ossman for Cendio AB -; Copyright (C) 2016, D. R. Commander. +; Copyright (C) 2016, 2020, D. R. Commander. ; ; Based on the x86 SIMD extension for IJG JPEG library ; Copyright (C) 1999-2006, MIYASAKA Masaru. @@ -14,7 +14,7 @@ ; NASM is available from http://nasm.sourceforge.net/ or ; http://sourceforge.net/project/showfiles.php?group_id=6208 ; -; This file contains a slow-but-accurate integer implementation of the +; This file contains a slower but more accurate integer implementation of the ; inverse DCT (Discrete Cosine Transform). The following code is based ; directly on the IJG's original jidctint.c; see the jidctint.c for ; more details. diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/i386/jidctint-sse2.asm b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/i386/jidctint-sse2.asm --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/i386/jidctint-sse2.asm 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/i386/jidctint-sse2.asm 2021-11-20 03:41:33.402600386 +0000 @@ -2,7 +2,7 @@ ; jidctint.asm - accurate integer IDCT (SSE2) ; ; Copyright 2009 Pierre Ossman for Cendio AB -; Copyright (C) 2016, D. R. Commander. +; Copyright (C) 2016, 2020, D. R. Commander. ; ; Based on the x86 SIMD extension for IJG JPEG library ; Copyright (C) 1999-2006, MIYASAKA Masaru. @@ -14,7 +14,7 @@ ; NASM is available from http://nasm.sourceforge.net/ or ; http://sourceforge.net/project/showfiles.php?group_id=6208 ; -; This file contains a slow-but-accurate integer implementation of the +; This file contains a slower but more accurate integer implementation of the ; inverse DCT (Discrete Cosine Transform). The following code is based ; directly on the IJG's original jidctint.c; see the jidctint.c for ; more details. diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/i386/jsimd.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/i386/jsimd.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/i386/jsimd.c 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/i386/jsimd.c 2021-11-20 03:41:33.402600386 +0000 @@ -543,12 +543,6 @@ return 0; } -GLOBAL(int) -jsimd_can_h1v2_fancy_upsample(void) -{ - return 0; -} - GLOBAL(void) jsimd_h2v2_fancy_upsample(j_decompress_ptr cinfo, jpeg_component_info *compptr, JSAMPARRAY input_data, JSAMPARRAY *output_data_ptr) @@ -585,12 +579,6 @@ output_data_ptr); } -GLOBAL(void) -jsimd_h1v2_fancy_upsample(j_decompress_ptr cinfo, jpeg_component_info *compptr, - JSAMPARRAY input_data, JSAMPARRAY *output_data_ptr) -{ -} - GLOBAL(int) jsimd_can_h2v2_merged_upsample(void) { diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/jsimd.h b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/jsimd.h --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/jsimd.h 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/jsimd.h 2021-11-20 03:41:33.402600386 +0000 @@ -2,11 +2,12 @@ * simd/jsimd.h * * Copyright 2009 Pierre Ossman for Cendio AB - * Copyright (C) 2011, 2014-2016, 2018, D. R. Commander. + * Copyright (C) 2011, 2014-2016, 2018, 2020, D. R. Commander. * Copyright (C) 2013-2014, MIPS Technologies, Inc., California. * Copyright (C) 2014, Linaro Limited. * Copyright (C) 2015-2016, 2018, Matthieu Darbois. - * Copyright (C) 2016-2017, Loongson Technology Corporation Limited, BeiJing. + * Copyright (C) 2016-2018, Loongson Technology Corporation Limited, BeiJing. + * Copyright (C) 2020, Arm Limited. * * Based on the x86 SIMD extension for IJG JPEG library, * Copyright (C) 1999-2006, MIYASAKA Masaru. @@ -121,6 +122,17 @@ (JDIMENSION img_width, JSAMPARRAY input_buf, JSAMPIMAGE output_buf, JDIMENSION output_row, int num_rows); +#ifndef NEON_INTRINSICS + +EXTERN(void) jsimd_extrgb_ycc_convert_neon_slowld3 + (JDIMENSION img_width, JSAMPARRAY input_buf, JSAMPIMAGE output_buf, + JDIMENSION output_row, int num_rows); +EXTERN(void) jsimd_extbgr_ycc_convert_neon_slowld3 + (JDIMENSION img_width, JSAMPARRAY input_buf, JSAMPIMAGE output_buf, + JDIMENSION output_row, int num_rows); + +#endif + EXTERN(void) jsimd_rgb_ycc_convert_dspr2 (JDIMENSION img_width, JSAMPARRAY input_buf, JSAMPIMAGE output_buf, JDIMENSION output_row, int num_rows); @@ -300,6 +312,28 @@ (JDIMENSION img_width, JSAMPARRAY input_buf, JSAMPIMAGE output_buf, JDIMENSION output_row, int num_rows); +EXTERN(void) jsimd_rgb_gray_convert_mmi + (JDIMENSION img_width, JSAMPARRAY input_buf, JSAMPIMAGE output_buf, + JDIMENSION output_row, int num_rows); +EXTERN(void) jsimd_extrgb_gray_convert_mmi + (JDIMENSION img_width, JSAMPARRAY input_buf, JSAMPIMAGE output_buf, + JDIMENSION output_row, int num_rows); +EXTERN(void) jsimd_extrgbx_gray_convert_mmi + (JDIMENSION img_width, JSAMPARRAY input_buf, JSAMPIMAGE output_buf, + JDIMENSION output_row, int num_rows); +EXTERN(void) jsimd_extbgr_gray_convert_mmi + (JDIMENSION img_width, JSAMPARRAY input_buf, JSAMPIMAGE output_buf, + JDIMENSION output_row, int num_rows); +EXTERN(void) jsimd_extbgrx_gray_convert_mmi + (JDIMENSION img_width, JSAMPARRAY input_buf, JSAMPIMAGE output_buf, + JDIMENSION output_row, int num_rows); +EXTERN(void) jsimd_extxbgr_gray_convert_mmi + (JDIMENSION img_width, JSAMPARRAY input_buf, JSAMPIMAGE output_buf, + JDIMENSION output_row, int num_rows); +EXTERN(void) jsimd_extxrgb_gray_convert_mmi + (JDIMENSION img_width, JSAMPARRAY input_buf, JSAMPIMAGE output_buf, + JDIMENSION output_row, int num_rows); + EXTERN(void) jsimd_rgb_gray_convert_altivec (JDIMENSION img_width, JSAMPARRAY input_buf, JSAMPIMAGE output_buf, JDIMENSION output_row, int num_rows); @@ -416,6 +450,17 @@ (JDIMENSION out_width, JSAMPIMAGE input_buf, JDIMENSION input_row, JSAMPARRAY output_buf, int num_rows); +#ifndef NEON_INTRINSICS + +EXTERN(void) jsimd_ycc_extrgb_convert_neon_slowst3 + (JDIMENSION out_width, JSAMPIMAGE input_buf, JDIMENSION input_row, + JSAMPARRAY output_buf, int num_rows); +EXTERN(void) jsimd_ycc_extbgr_convert_neon_slowst3 + (JDIMENSION out_width, JSAMPIMAGE input_buf, JDIMENSION input_row, + JSAMPARRAY output_buf, int num_rows); + +#endif + EXTERN(void) jsimd_ycc_rgb_convert_dspr2 (JDIMENSION out_width, JSAMPIMAGE input_buf, JDIMENSION input_row, JSAMPARRAY output_buf, int num_rows); @@ -637,6 +682,9 @@ (int max_v_samp_factor, JDIMENSION downsampled_width, JSAMPARRAY input_data, JSAMPARRAY *output_data_ptr); +EXTERN(void) jsimd_h2v1_fancy_upsample_mmi + (int max_v_samp_factor, JDIMENSION downsampled_width, JSAMPARRAY input_data, + JSAMPARRAY *output_data_ptr); EXTERN(void) jsimd_h2v2_fancy_upsample_mmi (int max_v_samp_factor, JDIMENSION downsampled_width, JSAMPARRAY input_data, JSAMPARRAY *output_data_ptr); @@ -871,6 +919,50 @@ (JDIMENSION output_width, JSAMPIMAGE input_buf, JDIMENSION in_row_group_ctr, JSAMPARRAY output_buf, JSAMPLE *range); +EXTERN(void) jsimd_h2v1_merged_upsample_mmi + (JDIMENSION output_width, JSAMPIMAGE input_buf, JDIMENSION in_row_group_ctr, + JSAMPARRAY output_buf); +EXTERN(void) jsimd_h2v1_extrgb_merged_upsample_mmi + (JDIMENSION output_width, JSAMPIMAGE input_buf, JDIMENSION in_row_group_ctr, + JSAMPARRAY output_buf); +EXTERN(void) jsimd_h2v1_extrgbx_merged_upsample_mmi + (JDIMENSION output_width, JSAMPIMAGE input_buf, JDIMENSION in_row_group_ctr, + JSAMPARRAY output_buf); +EXTERN(void) jsimd_h2v1_extbgr_merged_upsample_mmi + (JDIMENSION output_width, JSAMPIMAGE input_buf, JDIMENSION in_row_group_ctr, + JSAMPARRAY output_buf); +EXTERN(void) jsimd_h2v1_extbgrx_merged_upsample_mmi + (JDIMENSION output_width, JSAMPIMAGE input_buf, JDIMENSION in_row_group_ctr, + JSAMPARRAY output_buf); +EXTERN(void) jsimd_h2v1_extxbgr_merged_upsample_mmi + (JDIMENSION output_width, JSAMPIMAGE input_buf, JDIMENSION in_row_group_ctr, + JSAMPARRAY output_buf); +EXTERN(void) jsimd_h2v1_extxrgb_merged_upsample_mmi + (JDIMENSION output_width, JSAMPIMAGE input_buf, JDIMENSION in_row_group_ctr, + JSAMPARRAY output_buf); + +EXTERN(void) jsimd_h2v2_merged_upsample_mmi + (JDIMENSION output_width, JSAMPIMAGE input_buf, JDIMENSION in_row_group_ctr, + JSAMPARRAY output_buf); +EXTERN(void) jsimd_h2v2_extrgb_merged_upsample_mmi + (JDIMENSION output_width, JSAMPIMAGE input_buf, JDIMENSION in_row_group_ctr, + JSAMPARRAY output_buf); +EXTERN(void) jsimd_h2v2_extrgbx_merged_upsample_mmi + (JDIMENSION output_width, JSAMPIMAGE input_buf, JDIMENSION in_row_group_ctr, + JSAMPARRAY output_buf); +EXTERN(void) jsimd_h2v2_extbgr_merged_upsample_mmi + (JDIMENSION output_width, JSAMPIMAGE input_buf, JDIMENSION in_row_group_ctr, + JSAMPARRAY output_buf); +EXTERN(void) jsimd_h2v2_extbgrx_merged_upsample_mmi + (JDIMENSION output_width, JSAMPIMAGE input_buf, JDIMENSION in_row_group_ctr, + JSAMPARRAY output_buf); +EXTERN(void) jsimd_h2v2_extxbgr_merged_upsample_mmi + (JDIMENSION output_width, JSAMPIMAGE input_buf, JDIMENSION in_row_group_ctr, + JSAMPARRAY output_buf); +EXTERN(void) jsimd_h2v2_extxrgb_merged_upsample_mmi + (JDIMENSION output_width, JSAMPIMAGE input_buf, JDIMENSION in_row_group_ctr, + JSAMPARRAY output_buf); + EXTERN(void) jsimd_h2v1_merged_upsample_altivec (JDIMENSION output_width, JSAMPIMAGE input_buf, JDIMENSION in_row_group_ctr, JSAMPARRAY output_buf); @@ -947,7 +1039,7 @@ EXTERN(void) jsimd_convsamp_float_dspr2 (JSAMPARRAY sample_data, JDIMENSION start_col, FAST_FLOAT *workspace); -/* Slow Integer Forward DCT */ +/* Accurate Integer Forward DCT */ EXTERN(void) jsimd_fdct_islow_mmx(DCTELEM *data); extern const int jconst_fdct_islow_sse2[]; @@ -974,6 +1066,8 @@ EXTERN(void) jsimd_fdct_ifast_dspr2(DCTELEM *data); +EXTERN(void) jsimd_fdct_ifast_mmi(DCTELEM *data); + EXTERN(void) jsimd_fdct_ifast_altivec(DCTELEM *data); /* Floating Point Forward DCT */ @@ -1054,7 +1148,7 @@ EXTERN(void) jsimd_idct_12x12_pass2_dspr2 (int *workspace, int *output); -/* Slow Integer Inverse DCT */ +/* Accurate Integer Inverse DCT */ EXTERN(void) jsimd_idct_islow_mmx (void *dct_table, JCOEFPTR coef_block, JSAMPARRAY output_buf, JDIMENSION output_col); @@ -1105,6 +1199,10 @@ (DCTELEM *wsptr, JSAMPARRAY output_buf, JDIMENSION output_col, const int *idct_coefs); +EXTERN(void) jsimd_idct_ifast_mmi + (void *dct_table, JCOEFPTR coef_block, JSAMPARRAY output_buf, + JDIMENSION output_col); + EXTERN(void) jsimd_idct_ifast_altivec (void *dct_table, JCOEFPTR coef_block, JSAMPARRAY output_buf, JDIMENSION output_col); @@ -1134,15 +1232,27 @@ (void *state, JOCTET *buffer, JCOEFPTR block, int last_dc_val, c_derived_tbl *dctbl, c_derived_tbl *actbl); +#ifndef NEON_INTRINSICS + EXTERN(JOCTET *) jsimd_huff_encode_one_block_neon_slowtbl (void *state, JOCTET *buffer, JCOEFPTR block, int last_dc_val, c_derived_tbl *dctbl, c_derived_tbl *actbl); +#endif + /* Progressive Huffman encoding */ EXTERN(void) jsimd_encode_mcu_AC_first_prepare_sse2 (const JCOEF *block, const int *jpeg_natural_order_start, int Sl, int Al, JCOEF *values, size_t *zerobits); +EXTERN(void) jsimd_encode_mcu_AC_first_prepare_neon + (const JCOEF *block, const int *jpeg_natural_order_start, int Sl, int Al, + JCOEF *values, size_t *zerobits); + EXTERN(int) jsimd_encode_mcu_AC_refine_prepare_sse2 (const JCOEF *block, const int *jpeg_natural_order_start, int Sl, int Al, JCOEF *absvalues, size_t *bits); + +EXTERN(int) jsimd_encode_mcu_AC_refine_prepare_neon + (const JCOEF *block, const int *jpeg_natural_order_start, int Sl, int Al, + JCOEF *absvalues, size_t *bits); diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/nasm/jsimdcfg.inc.h b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/nasm/jsimdcfg.inc.h --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/nasm/jsimdcfg.inc.h 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/nasm/jsimdcfg.inc.h 2021-11-20 03:41:33.402600386 +0000 @@ -1,8 +1,10 @@ -// This file generates the include file for the assembly -// implementations by abusing the C preprocessor. -// -// Note: Some things are manually defined as they need to -// be mapped to NASM types. +/* + * This file generates the include file for the assembly + * implementations by abusing the C preprocessor. + * + * Note: Some things are manually defined as they need to + * be mapped to NASM types. + */ ; ; Automatically generated include file from jsimdcfg.inc.h diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/nasm/jsimdext.inc b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/nasm/jsimdext.inc --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/nasm/jsimdext.inc 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/nasm/jsimdext.inc 2021-11-20 03:41:33.402600386 +0000 @@ -2,8 +2,9 @@ ; jsimdext.inc - common declarations ; ; Copyright 2009 Pierre Ossman for Cendio AB -; Copyright (C) 2010, 2016, 2019, D. R. Commander. +; Copyright (C) 2010, 2016, 2018-2019, D. R. Commander. ; Copyright (C) 2018, Matthieu Darbois. +; Copyright (C) 2018, Matthias Räncker. ; ; Based on the x86 SIMD extension for IJG JPEG library - version 1.02 ; @@ -130,13 +131,53 @@ ; Common types ; %ifdef __x86_64__ +%ifnidn __OUTPUT_FORMAT__, elfx32 %define POINTER qword ; general pointer type %define SIZEOF_POINTER SIZEOF_QWORD ; sizeof(POINTER) %define POINTER_BIT QWORD_BIT ; sizeof(POINTER)*BYTE_BIT -%else +%define resp resq +%define dp dq +%define raxp rax +%define rbxp rbx +%define rcxp rcx +%define rdxp rdx +%define rsip rsi +%define rdip rdi +%define rbpp rbp +%define rspp rsp +%define r8p r8 +%define r9p r9 +%define r10p r10 +%define r11p r11 +%define r12p r12 +%define r13p r13 +%define r14p r14 +%define r15p r15 +%endif +%endif +%ifndef raxp %define POINTER dword ; general pointer type %define SIZEOF_POINTER SIZEOF_DWORD ; sizeof(POINTER) %define POINTER_BIT DWORD_BIT ; sizeof(POINTER)*BYTE_BIT +%define resp resd +%define dp dd +; x86_64 ILP32 ABI (x32) +%define raxp eax +%define rbxp ebx +%define rcxp ecx +%define rdxp edx +%define rsip esi +%define rdip edi +%define rbpp ebp +%define rspp esp +%define r8p r8d +%define r9p r9d +%define r10p r10d +%define r11p r11d +%define r12p r12d +%define r13p r13d +%define r14p r14d +%define r15p r15d %endif %define INT dword ; signed integer type diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jccolext-avx2.asm b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jccolext-avx2.asm --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jccolext-avx2.asm 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jccolext-avx2.asm 2021-11-20 03:41:33.402600386 +0000 @@ -3,6 +3,7 @@ ; ; Copyright (C) 2009, 2016, D. R. Commander. ; Copyright (C) 2015, Intel Corporation. +; Copyright (C) 2018, Matthias Räncker. ; ; Based on the x86 SIMD extension for IJG JPEG library ; Copyright (C) 1999-2006, MIYASAKA Masaru. @@ -57,9 +58,9 @@ mov rsi, r12 mov ecx, r13d - mov rdi, JSAMPARRAY [rsi+0*SIZEOF_JSAMPARRAY] - mov rbx, JSAMPARRAY [rsi+1*SIZEOF_JSAMPARRAY] - mov rdx, JSAMPARRAY [rsi+2*SIZEOF_JSAMPARRAY] + mov rdip, JSAMPARRAY [rsi+0*SIZEOF_JSAMPARRAY] + mov rbxp, JSAMPARRAY [rsi+1*SIZEOF_JSAMPARRAY] + mov rdxp, JSAMPARRAY [rsi+2*SIZEOF_JSAMPARRAY] lea rdi, [rdi+rcx*SIZEOF_JSAMPROW] lea rbx, [rbx+rcx*SIZEOF_JSAMPROW] lea rdx, [rdx+rcx*SIZEOF_JSAMPROW] @@ -77,10 +78,10 @@ push rsi push rcx ; col - mov rsi, JSAMPROW [rsi] ; inptr - mov rdi, JSAMPROW [rdi] ; outptr0 - mov rbx, JSAMPROW [rbx] ; outptr1 - mov rdx, JSAMPROW [rdx] ; outptr2 + mov rsip, JSAMPROW [rsi] ; inptr + mov rdip, JSAMPROW [rdi] ; outptr0 + mov rbxp, JSAMPROW [rbx] ; outptr1 + mov rdxp, JSAMPROW [rdx] ; outptr2 cmp rcx, byte SIZEOF_YMMWORD jae near .columnloop diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jccolext-sse2.asm b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jccolext-sse2.asm --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jccolext-sse2.asm 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jccolext-sse2.asm 2021-11-20 03:41:33.402600386 +0000 @@ -2,6 +2,7 @@ ; jccolext.asm - colorspace conversion (64-bit SSE2) ; ; Copyright (C) 2009, 2016, D. R. Commander. +; Copyright (C) 2018, Matthias Räncker. ; ; Based on the x86 SIMD extension for IJG JPEG library ; Copyright (C) 1999-2006, MIYASAKA Masaru. @@ -56,9 +57,9 @@ mov rsi, r12 mov ecx, r13d - mov rdi, JSAMPARRAY [rsi+0*SIZEOF_JSAMPARRAY] - mov rbx, JSAMPARRAY [rsi+1*SIZEOF_JSAMPARRAY] - mov rdx, JSAMPARRAY [rsi+2*SIZEOF_JSAMPARRAY] + mov rdip, JSAMPARRAY [rsi+0*SIZEOF_JSAMPARRAY] + mov rbxp, JSAMPARRAY [rsi+1*SIZEOF_JSAMPARRAY] + mov rdxp, JSAMPARRAY [rsi+2*SIZEOF_JSAMPARRAY] lea rdi, [rdi+rcx*SIZEOF_JSAMPROW] lea rbx, [rbx+rcx*SIZEOF_JSAMPROW] lea rdx, [rdx+rcx*SIZEOF_JSAMPROW] @@ -76,10 +77,10 @@ push rsi push rcx ; col - mov rsi, JSAMPROW [rsi] ; inptr - mov rdi, JSAMPROW [rdi] ; outptr0 - mov rbx, JSAMPROW [rbx] ; outptr1 - mov rdx, JSAMPROW [rdx] ; outptr2 + mov rsip, JSAMPROW [rsi] ; inptr + mov rdip, JSAMPROW [rdi] ; outptr0 + mov rbxp, JSAMPROW [rbx] ; outptr1 + mov rdxp, JSAMPROW [rdx] ; outptr2 cmp rcx, byte SIZEOF_XMMWORD jae near .columnloop diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jcgryext-avx2.asm b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jcgryext-avx2.asm --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jcgryext-avx2.asm 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jcgryext-avx2.asm 2021-11-20 03:41:33.403600370 +0000 @@ -3,6 +3,7 @@ ; ; Copyright (C) 2011, 2016, D. R. Commander. ; Copyright (C) 2015, Intel Corporation. +; Copyright (C) 2018, Matthias Räncker. ; ; Based on the x86 SIMD extension for IJG JPEG library ; Copyright (C) 1999-2006, MIYASAKA Masaru. @@ -57,7 +58,7 @@ mov rsi, r12 mov ecx, r13d - mov rdi, JSAMPARRAY [rsi+0*SIZEOF_JSAMPARRAY] + mov rdip, JSAMPARRAY [rsi+0*SIZEOF_JSAMPARRAY] lea rdi, [rdi+rcx*SIZEOF_JSAMPROW] pop rcx @@ -71,8 +72,8 @@ push rsi push rcx ; col - mov rsi, JSAMPROW [rsi] ; inptr - mov rdi, JSAMPROW [rdi] ; outptr0 + mov rsip, JSAMPROW [rsi] ; inptr + mov rdip, JSAMPROW [rdi] ; outptr0 cmp rcx, byte SIZEOF_YMMWORD jae near .columnloop diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jcgryext-sse2.asm b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jcgryext-sse2.asm --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jcgryext-sse2.asm 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jcgryext-sse2.asm 2021-11-20 03:41:33.403600370 +0000 @@ -2,6 +2,7 @@ ; jcgryext.asm - grayscale colorspace conversion (64-bit SSE2) ; ; Copyright (C) 2011, 2016, D. R. Commander. +; Copyright (C) 2018, Matthias Räncker. ; ; Based on the x86 SIMD extension for IJG JPEG library ; Copyright (C) 1999-2006, MIYASAKA Masaru. @@ -56,7 +57,7 @@ mov rsi, r12 mov ecx, r13d - mov rdi, JSAMPARRAY [rsi+0*SIZEOF_JSAMPARRAY] + mov rdip, JSAMPARRAY [rsi+0*SIZEOF_JSAMPARRAY] lea rdi, [rdi+rcx*SIZEOF_JSAMPROW] pop rcx @@ -70,8 +71,8 @@ push rsi push rcx ; col - mov rsi, JSAMPROW [rsi] ; inptr - mov rdi, JSAMPROW [rdi] ; outptr0 + mov rsip, JSAMPROW [rsi] ; inptr + mov rdip, JSAMPROW [rdi] ; outptr0 cmp rcx, byte SIZEOF_XMMWORD jae near .columnloop diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jchuff-sse2.asm b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jchuff-sse2.asm --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jchuff-sse2.asm 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jchuff-sse2.asm 2021-11-20 03:41:33.403600370 +0000 @@ -1,8 +1,9 @@ ; ; jchuff-sse2.asm - Huffman entropy encoding (64-bit SSE2) ; -; Copyright (C) 2009-2011, 2014-2016, D. R. Commander. +; Copyright (C) 2009-2011, 2014-2016, 2019, 2021, D. R. Commander. ; Copyright (C) 2015, Matthieu Darbois. +; Copyright (C) 2018, Matthias Räncker. ; ; Based on the x86 SIMD extension for IJG JPEG library ; Copyright (C) 1999-2006, MIYASAKA Masaru. @@ -15,146 +16,165 @@ ; http://sourceforge.net/project/showfiles.php?group_id=6208 ; ; This file contains an SSE2 implementation for Huffman coding of one block. -; The following code is based directly on jchuff.c; see jchuff.c for more -; details. +; The following code is based on jchuff.c; see jchuff.c for more details. %include "jsimdext.inc" +struc working_state +.next_output_byte: resp 1 ; => next byte to write in buffer +.free_in_buffer: resp 1 ; # of byte spaces remaining in buffer +.cur.put_buffer.simd resq 1 ; current bit accumulation buffer +.cur.free_bits resd 1 ; # of bits available in it +.cur.last_dc_val resd 4 ; last DC coef for each component +.cinfo: resp 1 ; dump_buffer needs access to this +endstruc + +struc c_derived_tbl +.ehufco: resd 256 ; code for each symbol +.ehufsi: resb 256 ; length of code for each symbol +; If no code has been allocated for a symbol S, ehufsi[S] contains 0 +endstruc + ; -------------------------------------------------------------------------- SECTION SEG_CONST alignz 32 GLOBAL_DATA(jconst_huff_encode_one_block) - EXTERN EXTN(jpeg_nbits_table) EXTN(jconst_huff_encode_one_block): +jpeg_mask_bits dd 0x0000, 0x0001, 0x0003, 0x0007 + dd 0x000f, 0x001f, 0x003f, 0x007f + dd 0x00ff, 0x01ff, 0x03ff, 0x07ff + dd 0x0fff, 0x1fff, 0x3fff, 0x7fff + alignz 32 -; -------------------------------------------------------------------------- - SECTION SEG_TEXT - BITS 64 +times 1 << 14 db 15 +times 1 << 13 db 14 +times 1 << 12 db 13 +times 1 << 11 db 12 +times 1 << 10 db 11 +times 1 << 9 db 10 +times 1 << 8 db 9 +times 1 << 7 db 8 +times 1 << 6 db 7 +times 1 << 5 db 6 +times 1 << 4 db 5 +times 1 << 3 db 4 +times 1 << 2 db 3 +times 1 << 1 db 2 +times 1 << 0 db 1 +times 1 db 0 +jpeg_nbits_table: +times 1 db 0 +times 1 << 0 db 1 +times 1 << 1 db 2 +times 1 << 2 db 3 +times 1 << 3 db 4 +times 1 << 4 db 5 +times 1 << 5 db 6 +times 1 << 6 db 7 +times 1 << 7 db 8 +times 1 << 8 db 9 +times 1 << 9 db 10 +times 1 << 10 db 11 +times 1 << 11 db 12 +times 1 << 12 db 13 +times 1 << 13 db 14 +times 1 << 14 db 15 +times 1 << 15 db 16 -; These macros perform the same task as the emit_bits() function in the -; original libjpeg code. In addition to reducing overhead by explicitly -; inlining the code, additional performance is achieved by taking into -; account the size of the bit buffer and waiting until it is almost full -; before emptying it. This mostly benefits 64-bit platforms, since 6 -; bytes can be stored in a 64-bit bit buffer before it has to be emptied. - -%macro EMIT_BYTE 0 - sub put_bits, 8 ; put_bits -= 8; - mov rdx, put_buffer - mov ecx, put_bits - shr rdx, cl ; c = (JOCTET)GETJOCTET(put_buffer >> put_bits); - mov byte [buffer], dl ; *buffer++ = c; - add buffer, 1 - cmp dl, 0xFF ; need to stuff a zero byte? - jne %%.EMIT_BYTE_END - mov byte [buffer], 0 ; *buffer++ = 0; - add buffer, 1 -%%.EMIT_BYTE_END: -%endmacro + alignz 32 -%macro PUT_BITS 1 - add put_bits, ecx ; put_bits += size; - shl put_buffer, cl ; put_buffer = (put_buffer << size); - or put_buffer, %1 -%endmacro +%define NBITS(x) nbits_base + x +%define MASK_BITS(x) NBITS((x) * 4) + (jpeg_mask_bits - jpeg_nbits_table) -%macro CHECKBUF31 0 - cmp put_bits, 32 ; if (put_bits > 31) { - jl %%.CHECKBUF31_END - EMIT_BYTE - EMIT_BYTE - EMIT_BYTE - EMIT_BYTE -%%.CHECKBUF31_END: -%endmacro +; -------------------------------------------------------------------------- + SECTION SEG_TEXT + BITS 64 -%macro CHECKBUF47 0 - cmp put_bits, 48 ; if (put_bits > 47) { - jl %%.CHECKBUF47_END - EMIT_BYTE - EMIT_BYTE - EMIT_BYTE - EMIT_BYTE - EMIT_BYTE - EMIT_BYTE -%%.CHECKBUF47_END: -%endmacro +; Shorthand used to describe SIMD operations: +; wN: xmmN treated as eight signed 16-bit values +; wN[i]: perform the same operation on all eight signed 16-bit values, i=0..7 +; bN: xmmN treated as 16 unsigned 8-bit values +; bN[i]: perform the same operation on all 16 unsigned 8-bit values, i=0..15 +; Contents of SIMD registers are shown in memory order. -%macro EMIT_BITS 2 - CHECKBUF47 - mov ecx, %2 - PUT_BITS %1 -%endmacro +; Fill the bit buffer to capacity with the leading bits from code, then output +; the bit buffer and put the remaining bits from code into the bit buffer. +; +; Usage: +; code - contains the bits to shift into the bit buffer (LSB-aligned) +; %1 - the label to which to jump when the macro completes +; %2 (optional) - extra instructions to execute after nbits has been set +; +; Upon completion, free_bits will be set to the number of remaining bits from +; code, and put_buffer will contain those remaining bits. temp and code will +; be clobbered. +; +; This macro encodes any 0xFF bytes as 0xFF 0x00, as does the EMIT_BYTE() +; macro in jchuff.c. -%macro kloop_prepare 37 ;(ko, jno0, ..., jno31, xmm0, xmm1, xmm2, xmm3) - pxor xmm8, xmm8 ; __m128i neg = _mm_setzero_si128(); - pxor xmm9, xmm9 ; __m128i neg = _mm_setzero_si128(); - pxor xmm10, xmm10 ; __m128i neg = _mm_setzero_si128(); - pxor xmm11, xmm11 ; __m128i neg = _mm_setzero_si128(); - pinsrw %34, word [r12 + %2 * SIZEOF_WORD], 0 ; xmm_shadow[0] = block[jno0]; - pinsrw %35, word [r12 + %10 * SIZEOF_WORD], 0 ; xmm_shadow[8] = block[jno8]; - pinsrw %36, word [r12 + %18 * SIZEOF_WORD], 0 ; xmm_shadow[16] = block[jno16]; - pinsrw %37, word [r12 + %26 * SIZEOF_WORD], 0 ; xmm_shadow[24] = block[jno24]; - pinsrw %34, word [r12 + %3 * SIZEOF_WORD], 1 ; xmm_shadow[1] = block[jno1]; - pinsrw %35, word [r12 + %11 * SIZEOF_WORD], 1 ; xmm_shadow[9] = block[jno9]; - pinsrw %36, word [r12 + %19 * SIZEOF_WORD], 1 ; xmm_shadow[17] = block[jno17]; - pinsrw %37, word [r12 + %27 * SIZEOF_WORD], 1 ; xmm_shadow[25] = block[jno25]; - pinsrw %34, word [r12 + %4 * SIZEOF_WORD], 2 ; xmm_shadow[2] = block[jno2]; - pinsrw %35, word [r12 + %12 * SIZEOF_WORD], 2 ; xmm_shadow[10] = block[jno10]; - pinsrw %36, word [r12 + %20 * SIZEOF_WORD], 2 ; xmm_shadow[18] = block[jno18]; - pinsrw %37, word [r12 + %28 * SIZEOF_WORD], 2 ; xmm_shadow[26] = block[jno26]; - pinsrw %34, word [r12 + %5 * SIZEOF_WORD], 3 ; xmm_shadow[3] = block[jno3]; - pinsrw %35, word [r12 + %13 * SIZEOF_WORD], 3 ; xmm_shadow[11] = block[jno11]; - pinsrw %36, word [r12 + %21 * SIZEOF_WORD], 3 ; xmm_shadow[19] = block[jno19]; - pinsrw %37, word [r12 + %29 * SIZEOF_WORD], 3 ; xmm_shadow[27] = block[jno27]; - pinsrw %34, word [r12 + %6 * SIZEOF_WORD], 4 ; xmm_shadow[4] = block[jno4]; - pinsrw %35, word [r12 + %14 * SIZEOF_WORD], 4 ; xmm_shadow[12] = block[jno12]; - pinsrw %36, word [r12 + %22 * SIZEOF_WORD], 4 ; xmm_shadow[20] = block[jno20]; - pinsrw %37, word [r12 + %30 * SIZEOF_WORD], 4 ; xmm_shadow[28] = block[jno28]; - pinsrw %34, word [r12 + %7 * SIZEOF_WORD], 5 ; xmm_shadow[5] = block[jno5]; - pinsrw %35, word [r12 + %15 * SIZEOF_WORD], 5 ; xmm_shadow[13] = block[jno13]; - pinsrw %36, word [r12 + %23 * SIZEOF_WORD], 5 ; xmm_shadow[21] = block[jno21]; - pinsrw %37, word [r12 + %31 * SIZEOF_WORD], 5 ; xmm_shadow[29] = block[jno29]; - pinsrw %34, word [r12 + %8 * SIZEOF_WORD], 6 ; xmm_shadow[6] = block[jno6]; - pinsrw %35, word [r12 + %16 * SIZEOF_WORD], 6 ; xmm_shadow[14] = block[jno14]; - pinsrw %36, word [r12 + %24 * SIZEOF_WORD], 6 ; xmm_shadow[22] = block[jno22]; - pinsrw %37, word [r12 + %32 * SIZEOF_WORD], 6 ; xmm_shadow[30] = block[jno30]; - pinsrw %34, word [r12 + %9 * SIZEOF_WORD], 7 ; xmm_shadow[7] = block[jno7]; - pinsrw %35, word [r12 + %17 * SIZEOF_WORD], 7 ; xmm_shadow[15] = block[jno15]; - pinsrw %36, word [r12 + %25 * SIZEOF_WORD], 7 ; xmm_shadow[23] = block[jno23]; -%if %1 != 32 - pinsrw %37, word [r12 + %33 * SIZEOF_WORD], 7 ; xmm_shadow[31] = block[jno31]; -%else - pinsrw %37, ebx, 7 ; xmm_shadow[31] = block[jno31]; -%endif - pcmpgtw xmm8, %34 ; neg = _mm_cmpgt_epi16(neg, x1); - pcmpgtw xmm9, %35 ; neg = _mm_cmpgt_epi16(neg, x1); - pcmpgtw xmm10, %36 ; neg = _mm_cmpgt_epi16(neg, x1); - pcmpgtw xmm11, %37 ; neg = _mm_cmpgt_epi16(neg, x1); - paddw %34, xmm8 ; x1 = _mm_add_epi16(x1, neg); - paddw %35, xmm9 ; x1 = _mm_add_epi16(x1, neg); - paddw %36, xmm10 ; x1 = _mm_add_epi16(x1, neg); - paddw %37, xmm11 ; x1 = _mm_add_epi16(x1, neg); - pxor %34, xmm8 ; x1 = _mm_xor_si128(x1, neg); - pxor %35, xmm9 ; x1 = _mm_xor_si128(x1, neg); - pxor %36, xmm10 ; x1 = _mm_xor_si128(x1, neg); - pxor %37, xmm11 ; x1 = _mm_xor_si128(x1, neg); - pxor xmm8, %34 ; neg = _mm_xor_si128(neg, x1); - pxor xmm9, %35 ; neg = _mm_xor_si128(neg, x1); - pxor xmm10, %36 ; neg = _mm_xor_si128(neg, x1); - pxor xmm11, %37 ; neg = _mm_xor_si128(neg, x1); - movdqa XMMWORD [t1 + %1 * SIZEOF_WORD], %34 ; _mm_storeu_si128((__m128i *)(t1 + ko), x1); - movdqa XMMWORD [t1 + (%1 + 8) * SIZEOF_WORD], %35 ; _mm_storeu_si128((__m128i *)(t1 + ko + 8), x1); - movdqa XMMWORD [t1 + (%1 + 16) * SIZEOF_WORD], %36 ; _mm_storeu_si128((__m128i *)(t1 + ko + 16), x1); - movdqa XMMWORD [t1 + (%1 + 24) * SIZEOF_WORD], %37 ; _mm_storeu_si128((__m128i *)(t1 + ko + 24), x1); - movdqa XMMWORD [t2 + %1 * SIZEOF_WORD], xmm8 ; _mm_storeu_si128((__m128i *)(t2 + ko), neg); - movdqa XMMWORD [t2 + (%1 + 8) * SIZEOF_WORD], xmm9 ; _mm_storeu_si128((__m128i *)(t2 + ko + 8), neg); - movdqa XMMWORD [t2 + (%1 + 16) * SIZEOF_WORD], xmm10 ; _mm_storeu_si128((__m128i *)(t2 + ko + 16), neg); - movdqa XMMWORD [t2 + (%1 + 24) * SIZEOF_WORD], xmm11 ; _mm_storeu_si128((__m128i *)(t2 + ko + 24), neg); +%macro EMIT_QWORD 1-2 + add nbitsb, free_bitsb ; nbits += free_bits; + neg free_bitsb ; free_bits = -free_bits; + mov tempd, code ; temp = code; + shl put_buffer, nbitsb ; put_buffer <<= nbits; + mov nbitsb, free_bitsb ; nbits = free_bits; + neg free_bitsb ; free_bits = -free_bits; + shr tempd, nbitsb ; temp >>= nbits; + or tempq, put_buffer ; temp |= put_buffer; + movq xmm0, tempq ; xmm0.u64 = { temp, 0 }; + bswap tempq ; temp = htonl(temp); + mov put_buffer, codeq ; put_buffer = code; + pcmpeqb xmm0, xmm1 ; b0[i] = (b0[i] == 0xFF ? 0xFF : 0); + %2 + pmovmskb code, xmm0 ; code = 0; code |= ((b0[i] >> 7) << i); + mov qword [buffer], tempq ; memcpy(buffer, &temp, 8); + ; (speculative; will be overwritten if + ; code contains any 0xFF bytes) + add free_bitsb, 64 ; free_bits += 64; + add bufferp, 8 ; buffer += 8; + test code, code ; if (code == 0) /* No 0xFF bytes */ + jz %1 ; return; + ; Execute the equivalent of the EMIT_BYTE() macro in jchuff.c for all 8 + ; bytes in the qword. + cmp tempb, 0xFF ; Set CF if temp[0] < 0xFF + mov byte [buffer-7], 0 ; buffer[-7] = 0; + sbb bufferp, 6 ; buffer -= (6 + (temp[0] < 0xFF ? 1 : 0)); + mov byte [buffer], temph ; buffer[0] = temp[1]; + cmp temph, 0xFF ; Set CF if temp[1] < 0xFF + mov byte [buffer+1], 0 ; buffer[1] = 0; + sbb bufferp, -2 ; buffer -= (-2 + (temp[1] < 0xFF ? 1 : 0)); + shr tempq, 16 ; temp >>= 16; + mov byte [buffer], tempb ; buffer[0] = temp[0]; + cmp tempb, 0xFF ; Set CF if temp[0] < 0xFF + mov byte [buffer+1], 0 ; buffer[1] = 0; + sbb bufferp, -2 ; buffer -= (-2 + (temp[0] < 0xFF ? 1 : 0)); + mov byte [buffer], temph ; buffer[0] = temp[1]; + cmp temph, 0xFF ; Set CF if temp[1] < 0xFF + mov byte [buffer+1], 0 ; buffer[1] = 0; + sbb bufferp, -2 ; buffer -= (-2 + (temp[1] < 0xFF ? 1 : 0)); + shr tempq, 16 ; temp >>= 16; + mov byte [buffer], tempb ; buffer[0] = temp[0]; + cmp tempb, 0xFF ; Set CF if temp[0] < 0xFF + mov byte [buffer+1], 0 ; buffer[1] = 0; + sbb bufferp, -2 ; buffer -= (-2 + (temp[0] < 0xFF ? 1 : 0)); + mov byte [buffer], temph ; buffer[0] = temp[1]; + cmp temph, 0xFF ; Set CF if temp[1] < 0xFF + mov byte [buffer+1], 0 ; buffer[1] = 0; + sbb bufferp, -2 ; buffer -= (-2 + (temp[1] < 0xFF ? 1 : 0)); + shr tempd, 16 ; temp >>= 16; + mov byte [buffer], tempb ; buffer[0] = temp[0]; + cmp tempb, 0xFF ; Set CF if temp[0] < 0xFF + mov byte [buffer+1], 0 ; buffer[1] = 0; + sbb bufferp, -2 ; buffer -= (-2 + (temp[0] < 0xFF ? 1 : 0)); + mov byte [buffer], temph ; buffer[0] = temp[1]; + cmp temph, 0xFF ; Set CF if temp[1] < 0xFF + mov byte [buffer+1], 0 ; buffer[1] = 0; + sbb bufferp, -2 ; buffer -= (-2 + (temp[1] < 0xFF ? 1 : 0)); + jmp %1 ; return; %endmacro ; @@ -165,181 +185,399 @@ ; JCOEFPTR block, int last_dc_val, ; c_derived_tbl *dctbl, c_derived_tbl *actbl) ; - -; r10 = working_state *state -; r11 = JOCTET *buffer -; r12 = JCOEFPTR block -; r13d = int last_dc_val -; r14 = c_derived_tbl *dctbl -; r15 = c_derived_tbl *actbl - -%define t1 rbp - (DCTSIZE2 * SIZEOF_WORD) -%define t2 t1 - (DCTSIZE2 * SIZEOF_WORD) -%define put_buffer r8 -%define put_bits r9d -%define buffer rax +; NOTES: +; When shuffling data, we try to avoid pinsrw as much as possible, since it is +; slow on many CPUs. Its reciprocal throughput (issue latency) is 1 even on +; modern CPUs, so chains of pinsrw instructions (even with different outputs) +; can limit performance. pinsrw is a VectorPath instruction on AMD K8 and +; requires 2 µops (with memory operand) on Intel. In either case, only one +; pinsrw instruction can be decoded per cycle (and nothing else if they are +; back-to-back), so out-of-order execution cannot be used to work around long +; pinsrw chains (though for Sandy Bridge and later, this may be less of a +; problem if the code runs from the µop cache.) +; +; We use tzcnt instead of bsf without checking for support. The instruction is +; executed as bsf on CPUs that don't support tzcnt (encoding is equivalent to +; rep bsf.) The destination (first) operand of bsf (and tzcnt on some CPUs) is +; an input dependency (although the behavior is not formally defined, Intel +; CPUs usually leave the destination unmodified if the source is zero.) This +; can prevent out-of-order execution, so we clear the destination before +; invoking tzcnt. +; +; Initial register allocation +; rax - buffer +; rbx - temp +; rcx - nbits +; rdx - block --> free_bits +; rsi - nbits_base +; rdi - t +; rbp - code +; r8 - dctbl --> code_temp +; r9 - actbl +; r10 - state +; r11 - index +; r12 - put_buffer + +%define buffer rax +%ifdef WIN64 +%define bufferp rax +%else +%define bufferp raxp +%endif +%define tempq rbx +%define tempd ebx +%define tempb bl +%define temph bh +%define nbitsq rcx +%define nbits ecx +%define nbitsb cl +%define block rdx +%define nbits_base rsi +%define t rdi +%define td edi +%define codeq rbp +%define code ebp +%define dctbl r8 +%define actbl r9 +%define state r10 +%define index r11 +%define indexd r11d +%define put_buffer r12 +%define put_bufferd r12d + +; Step 1: Re-arrange input data according to jpeg_natural_order +; xx 01 02 03 04 05 06 07 xx 01 08 16 09 02 03 10 +; 08 09 10 11 12 13 14 15 17 24 32 25 18 11 04 05 +; 16 17 18 19 20 21 22 23 12 19 26 33 40 48 41 34 +; 24 25 26 27 28 29 30 31 ==> 27 20 13 06 07 14 21 28 +; 32 33 34 35 36 37 38 39 35 42 49 56 57 50 43 36 +; 40 41 42 43 44 45 46 47 29 22 15 23 30 37 44 51 +; 48 49 50 51 52 53 54 55 58 59 52 45 38 31 39 46 +; 56 57 58 59 60 61 62 63 53 60 61 54 47 55 62 63 align 32 GLOBAL_FUNCTION(jsimd_huff_encode_one_block_sse2) EXTN(jsimd_huff_encode_one_block_sse2): - push rbp - mov rax, rsp ; rax = original rbp - sub rsp, byte 4 - and rsp, byte (-SIZEOF_XMMWORD) ; align to 128 bits - mov [rsp], rax - mov rbp, rsp ; rbp = aligned rbp - lea rsp, [t2] - push_xmm 4 - collect_args 6 + +%ifdef WIN64 + +; rcx = working_state *state +; rdx = JOCTET *buffer +; r8 = JCOEFPTR block +; r9 = int last_dc_val +; [rax+48] = c_derived_tbl *dctbl +; [rax+56] = c_derived_tbl *actbl + + ;X: X = code stream + mov buffer, rdx + mov block, r8 + movups xmm3, XMMWORD [block + 0 * SIZEOF_WORD] ;D: w3 = xx 01 02 03 04 05 06 07 push rbx + push rbp + movdqa xmm0, xmm3 ;A: w0 = xx 01 02 03 04 05 06 07 + push rsi + push rdi + push r12 + movups xmm1, XMMWORD [block + 8 * SIZEOF_WORD] ;B: w1 = 08 09 10 11 12 13 14 15 + mov state, rcx + movsx code, word [block] ;Z: code = block[0]; + pxor xmm4, xmm4 ;A: w4[i] = 0; + sub code, r9d ;Z: code -= last_dc_val; + mov dctbl, POINTER [rsp+6*8+4*8] + mov actbl, POINTER [rsp+6*8+5*8] + punpckldq xmm0, xmm1 ;A: w0 = xx 01 08 09 02 03 10 11 + lea nbits_base, [rel jpeg_nbits_table] + add rsp, -DCTSIZE2 * SIZEOF_WORD + mov t, rsp + +%else - mov buffer, r11 ; r11 is now sratch +; rdi = working_state *state +; rsi = JOCTET *buffer +; rdx = JCOEFPTR block +; rcx = int last_dc_val +; r8 = c_derived_tbl *dctbl +; r9 = c_derived_tbl *actbl - mov put_buffer, MMWORD [r10+16] ; put_buffer = state->cur.put_buffer; - mov put_bits, dword [r10+24] ; put_bits = state->cur.put_bits; - push r10 ; r10 is now scratch - - ; Encode the DC coefficient difference per section F.1.2.1 - movsx edi, word [r12] ; temp = temp2 = block[0] - last_dc_val; - sub edi, r13d ; r13 is not used anymore - mov ebx, edi - - ; This is a well-known technique for obtaining the absolute value - ; without a branch. It is derived from an assembly language technique - ; presented in "How to Optimize for the Pentium Processors", - ; Copyright (c) 1996, 1997 by Agner Fog. - mov esi, edi - sar esi, 31 ; temp3 = temp >> (CHAR_BIT * sizeof(int) - 1); - xor edi, esi ; temp ^= temp3; - sub edi, esi ; temp -= temp3; - - ; For a negative input, want temp2 = bitwise complement of abs(input) - ; This code assumes we are on a two's complement machine - add ebx, esi ; temp2 += temp3; - - ; Find the number of bits needed for the magnitude of the coefficient - lea r11, [rel EXTN(jpeg_nbits_table)] - movzx rdi, byte [r11 + rdi] ; nbits = JPEG_NBITS(temp); - ; Emit the Huffman-coded symbol for the number of bits - mov r11d, INT [r14 + rdi * 4] ; code = dctbl->ehufco[nbits]; - movzx esi, byte [r14 + rdi + 1024] ; size = dctbl->ehufsi[nbits]; - EMIT_BITS r11, esi ; EMIT_BITS(code, size) - - ; Mask off any extra bits in code - mov esi, 1 - mov ecx, edi - shl esi, cl - dec esi - and ebx, esi ; temp2 &= (((JLONG)1)<ehufco[0xf0]; - movzx r14d, byte [r15 + 1024 + 240] ; size_0xf0 = actbl->ehufsi[0xf0]; - lea rsi, [t1] -.BLOOP: - bsf r12, r11 ; r = __builtin_ctzl(index); - jz .ELOOP - mov rcx, r12 - lea rsi, [rsi+r12*2] ; k += r; - shr r11, cl ; index >>= r; - movzx rdi, word [rsi] ; temp = t1[k]; - lea rbx, [rel EXTN(jpeg_nbits_table)] - movzx rdi, byte [rbx + rdi] ; nbits = JPEG_NBITS(temp); -.BRLOOP: - cmp r12, 16 ; while (r > 15) { - jl .ERLOOP - EMIT_BITS r13, r14d ; EMIT_BITS(code_0xf0, size_0xf0) - sub r12, 16 ; r -= 16; - jmp .BRLOOP -.ERLOOP: - ; Emit Huffman symbol for run length / number of bits - CHECKBUF31 ; uses rcx, rdx - - shl r12, 4 ; temp3 = (r << 4) + nbits; - add r12, rdi - mov ebx, INT [r15 + r12 * 4] ; code = actbl->ehufco[temp3]; - movzx ecx, byte [r15 + r12 + 1024] ; size = actbl->ehufsi[temp3]; - PUT_BITS rbx - - ;EMIT_CODE(code, size) - - movsx ebx, word [rsi-DCTSIZE2*2] ; temp2 = t2[k]; - ; Mask off any extra bits in code - mov rcx, rdi - mov rdx, 1 - shl rdx, cl - dec rdx - and rbx, rdx ; temp2 &= (((JLONG)1)<>= 1; - add rsi, 2 ; ++k; - jmp .BLOOP -.ELOOP: - ; If the last coef(s) were zero, emit an end-of-block code - lea rdi, [t1 + (DCTSIZE2-1) * 2] ; r = DCTSIZE2-1-k; - cmp rdi, rsi ; if (r > 0) { - je .EFN - mov ebx, INT [r15] ; code = actbl->ehufco[0]; - movzx r12d, byte [r15 + 1024] ; size = actbl->ehufsi[0]; - EMIT_BITS rbx, r12d -.EFN: - pop r10 - ; Save put_buffer & put_bits - mov MMWORD [r10+16], put_buffer ; state->cur.put_buffer = put_buffer; - mov dword [r10+24], put_bits ; state->cur.put_bits = put_bits; + ;X: X = code stream + movups xmm3, XMMWORD [block + 0 * SIZEOF_WORD] ;D: w3 = xx 01 02 03 04 05 06 07 + push rbx + push rbp + movdqa xmm0, xmm3 ;A: w0 = xx 01 02 03 04 05 06 07 + push r12 + mov state, rdi + mov buffer, rsi + movups xmm1, XMMWORD [block + 8 * SIZEOF_WORD] ;B: w1 = 08 09 10 11 12 13 14 15 + movsx codeq, word [block] ;Z: code = block[0]; + lea nbits_base, [rel jpeg_nbits_table] + pxor xmm4, xmm4 ;A: w4[i] = 0; + sub codeq, rcx ;Z: code -= last_dc_val; + punpckldq xmm0, xmm1 ;A: w0 = xx 01 08 09 02 03 10 11 + lea t, [rsp - DCTSIZE2 * SIZEOF_WORD] ; use red zone for t_ +%endif + + pshuflw xmm0, xmm0, 11001001b ;A: w0 = 01 08 xx 09 02 03 10 11 + pinsrw xmm0, word [block + 16 * SIZEOF_WORD], 2 ;A: w0 = 01 08 16 09 02 03 10 11 + punpckhdq xmm3, xmm1 ;D: w3 = 04 05 12 13 06 07 14 15 + punpcklqdq xmm1, xmm3 ;B: w1 = 08 09 10 11 04 05 12 13 + pinsrw xmm0, word [block + 17 * SIZEOF_WORD], 7 ;A: w0 = 01 08 16 09 02 03 10 17 + ;A: (Row 0, offset 1) + pcmpgtw xmm4, xmm0 ;A: w4[i] = (w0[i] < 0 ? -1 : 0); + paddw xmm0, xmm4 ;A: w0[i] += w4[i]; + movaps XMMWORD [t + 0 * SIZEOF_WORD], xmm0 ;A: t[i] = w0[i]; + + movq xmm2, qword [block + 24 * SIZEOF_WORD] ;B: w2 = 24 25 26 27 -- -- -- -- + pshuflw xmm2, xmm2, 11011000b ;B: w2 = 24 26 25 27 -- -- -- -- + pslldq xmm1, 1 * SIZEOF_WORD ;B: w1 = -- 08 09 10 11 04 05 12 + movups xmm5, XMMWORD [block + 48 * SIZEOF_WORD] ;H: w5 = 48 49 50 51 52 53 54 55 + movsd xmm1, xmm2 ;B: w1 = 24 26 25 27 11 04 05 12 + punpcklqdq xmm2, xmm5 ;C: w2 = 24 26 25 27 48 49 50 51 + pinsrw xmm1, word [block + 32 * SIZEOF_WORD], 1 ;B: w1 = 24 32 25 27 11 04 05 12 + pxor xmm4, xmm4 ;A: w4[i] = 0; + psrldq xmm3, 2 * SIZEOF_WORD ;D: w3 = 12 13 06 07 14 15 -- -- + pcmpeqw xmm0, xmm4 ;A: w0[i] = (w0[i] == 0 ? -1 : 0); + pinsrw xmm1, word [block + 18 * SIZEOF_WORD], 3 ;B: w1 = 24 32 25 18 11 04 05 12 + ; (Row 1, offset 1) + pcmpgtw xmm4, xmm1 ;B: w4[i] = (w1[i] < 0 ? -1 : 0); + paddw xmm1, xmm4 ;B: w1[i] += w4[i]; + movaps XMMWORD [t + 8 * SIZEOF_WORD], xmm1 ;B: t[i+8] = w1[i]; + pxor xmm4, xmm4 ;B: w4[i] = 0; + pcmpeqw xmm1, xmm4 ;B: w1[i] = (w1[i] == 0 ? -1 : 0); + + packsswb xmm0, xmm1 ;AB: b0[i] = w0[i], b0[i+8] = w1[i] + ; w/ signed saturation + + pinsrw xmm3, word [block + 20 * SIZEOF_WORD], 0 ;D: w3 = 20 13 06 07 14 15 -- -- + pinsrw xmm3, word [block + 21 * SIZEOF_WORD], 5 ;D: w3 = 20 13 06 07 14 21 -- -- + pinsrw xmm3, word [block + 28 * SIZEOF_WORD], 6 ;D: w3 = 20 13 06 07 14 21 28 -- + pinsrw xmm3, word [block + 35 * SIZEOF_WORD], 7 ;D: w3 = 20 13 06 07 14 21 28 35 + ; (Row 3, offset 1) + pcmpgtw xmm4, xmm3 ;D: w4[i] = (w3[i] < 0 ? -1 : 0); + paddw xmm3, xmm4 ;D: w3[i] += w4[i]; + movaps XMMWORD [t + 24 * SIZEOF_WORD], xmm3 ;D: t[i+24] = w3[i]; + pxor xmm4, xmm4 ;D: w4[i] = 0; + pcmpeqw xmm3, xmm4 ;D: w3[i] = (w3[i] == 0 ? -1 : 0); + + pinsrw xmm2, word [block + 19 * SIZEOF_WORD], 0 ;C: w2 = 19 26 25 27 48 49 50 51 + cmp code, 1 << 31 ;Z: Set CF if code < 0x80000000, + ;Z: i.e. if code is positive + pinsrw xmm2, word [block + 33 * SIZEOF_WORD], 2 ;C: w2 = 19 26 33 27 48 49 50 51 + pinsrw xmm2, word [block + 40 * SIZEOF_WORD], 3 ;C: w2 = 19 26 33 40 48 49 50 51 + adc code, -1 ;Z: code += -1 + (code >= 0 ? 1 : 0); + pinsrw xmm2, word [block + 41 * SIZEOF_WORD], 5 ;C: w2 = 19 26 33 40 48 41 50 51 + pinsrw xmm2, word [block + 34 * SIZEOF_WORD], 6 ;C: w2 = 19 26 33 40 48 41 34 51 + movsxd codeq, code ;Z: sign extend code + pinsrw xmm2, word [block + 27 * SIZEOF_WORD], 7 ;C: w2 = 19 26 33 40 48 41 34 27 + ; (Row 2, offset 1) + pcmpgtw xmm4, xmm2 ;C: w4[i] = (w2[i] < 0 ? -1 : 0); + paddw xmm2, xmm4 ;C: w2[i] += w4[i]; + movaps XMMWORD [t + 16 * SIZEOF_WORD], xmm2 ;C: t[i+16] = w2[i]; + pxor xmm4, xmm4 ;C: w4[i] = 0; + pcmpeqw xmm2, xmm4 ;C: w2[i] = (w2[i] == 0 ? -1 : 0); + + packsswb xmm2, xmm3 ;CD: b2[i] = w2[i], b2[i+8] = w3[i] + ; w/ signed saturation + + movzx nbitsq, byte [NBITS(codeq)] ;Z: nbits = JPEG_NBITS(code); + movdqa xmm3, xmm5 ;H: w3 = 48 49 50 51 52 53 54 55 + pmovmskb tempd, xmm2 ;Z: temp = 0; temp |= ((b2[i] >> 7) << i); + pmovmskb put_bufferd, xmm0 ;Z: put_buffer = 0; put_buffer |= ((b0[i] >> 7) << i); + movups xmm0, XMMWORD [block + 56 * SIZEOF_WORD] ;H: w0 = 56 57 58 59 60 61 62 63 + punpckhdq xmm3, xmm0 ;H: w3 = 52 53 60 61 54 55 62 63 + shl tempd, 16 ;Z: temp <<= 16; + psrldq xmm3, 1 * SIZEOF_WORD ;H: w3 = 53 60 61 54 55 62 63 -- + pxor xmm2, xmm2 ;H: w2[i] = 0; + or put_bufferd, tempd ;Z: put_buffer |= temp; + pshuflw xmm3, xmm3, 00111001b ;H: w3 = 60 61 54 53 55 62 63 -- + movq xmm1, qword [block + 44 * SIZEOF_WORD] ;G: w1 = 44 45 46 47 -- -- -- -- + unpcklps xmm5, xmm0 ;E: w5 = 48 49 56 57 50 51 58 59 + pxor xmm0, xmm0 ;H: w0[i] = 0; + pinsrw xmm3, word [block + 47 * SIZEOF_WORD], 3 ;H: w3 = 60 61 54 47 55 62 63 -- + ; (Row 7, offset 1) + pcmpgtw xmm2, xmm3 ;H: w2[i] = (w3[i] < 0 ? -1 : 0); + paddw xmm3, xmm2 ;H: w3[i] += w2[i]; + movaps XMMWORD [t + 56 * SIZEOF_WORD], xmm3 ;H: t[i+56] = w3[i]; + movq xmm4, qword [block + 36 * SIZEOF_WORD] ;G: w4 = 36 37 38 39 -- -- -- -- + pcmpeqw xmm3, xmm0 ;H: w3[i] = (w3[i] == 0 ? -1 : 0); + punpckldq xmm4, xmm1 ;G: w4 = 36 37 44 45 38 39 46 47 + mov tempd, [dctbl + c_derived_tbl.ehufco + nbitsq * 4] + ;Z: temp = dctbl->ehufco[nbits]; + movdqa xmm1, xmm4 ;F: w1 = 36 37 44 45 38 39 46 47 + psrldq xmm4, 1 * SIZEOF_WORD ;G: w4 = 37 44 45 38 39 46 47 -- + shufpd xmm1, xmm5, 10b ;F: w1 = 36 37 44 45 50 51 58 59 + and code, dword [MASK_BITS(nbitsq)] ;Z: code &= (1 << nbits) - 1; + pshufhw xmm4, xmm4, 11010011b ;G: w4 = 37 44 45 38 -- 39 46 -- + pslldq xmm1, 1 * SIZEOF_WORD ;F: w1 = -- 36 37 44 45 50 51 58 + shl tempq, nbitsb ;Z: temp <<= nbits; + pinsrw xmm4, word [block + 59 * SIZEOF_WORD], 0 ;G: w4 = 59 44 45 38 -- 39 46 -- + pshufd xmm1, xmm1, 11011000b ;F: w1 = -- 36 45 50 37 44 51 58 + pinsrw xmm4, word [block + 52 * SIZEOF_WORD], 1 ;G: w4 = 59 52 45 38 -- 39 46 -- + or code, tempd ;Z: code |= temp; + movlps xmm1, qword [block + 20 * SIZEOF_WORD] ;F: w1 = 20 21 22 23 37 44 51 58 + pinsrw xmm4, word [block + 31 * SIZEOF_WORD], 4 ;G: w4 = 59 52 45 38 31 39 46 -- + pshuflw xmm1, xmm1, 01110010b ;F: w1 = 22 20 23 21 37 44 51 58 + pinsrw xmm4, word [block + 53 * SIZEOF_WORD], 7 ;G: w4 = 59 52 45 38 31 39 46 53 + ; (Row 6, offset 1) + pxor xmm2, xmm2 ;G: w2[i] = 0; + pcmpgtw xmm0, xmm4 ;G: w0[i] = (w4[i] < 0 ? -1 : 0); + pinsrw xmm1, word [block + 15 * SIZEOF_WORD], 1 ;F: w1 = 22 15 23 21 37 44 51 58 + paddw xmm4, xmm0 ;G: w4[i] += w0[i]; + movaps XMMWORD [t + 48 * SIZEOF_WORD], xmm4 ;G: t[48+i] = w4[i]; + pinsrw xmm1, word [block + 30 * SIZEOF_WORD], 3 ;F: w1 = 22 15 23 30 37 44 51 58 + ; (Row 5, offset 1) + pcmpeqw xmm4, xmm2 ;G: w4[i] = (w4[i] == 0 ? -1 : 0); + pinsrw xmm5, word [block + 42 * SIZEOF_WORD], 0 ;E: w5 = 42 49 56 57 50 51 58 59 + + packsswb xmm4, xmm3 ;GH: b4[i] = w4[i], b4[i+8] = w3[i] + ; w/ signed saturation + + pxor xmm0, xmm0 ;F: w0[i] = 0; + pinsrw xmm5, word [block + 43 * SIZEOF_WORD], 5 ;E: w5 = 42 49 56 57 50 43 58 59 + pcmpgtw xmm2, xmm1 ;F: w2[i] = (w1[i] < 0 ? -1 : 0); + pmovmskb tempd, xmm4 ;Z: temp = 0; temp |= ((b4[i] >> 7) << i); + pinsrw xmm5, word [block + 36 * SIZEOF_WORD], 6 ;E: w5 = 42 49 56 57 50 43 36 59 + paddw xmm1, xmm2 ;F: w1[i] += w2[i]; + movaps XMMWORD [t + 40 * SIZEOF_WORD], xmm1 ;F: t[40+i] = w1[i]; + pinsrw xmm5, word [block + 29 * SIZEOF_WORD], 7 ;E: w5 = 42 49 56 57 50 43 36 29 + ; (Row 4, offset 1) +%undef block +%define free_bitsq rdx +%define free_bitsd edx +%define free_bitsb dl + pcmpeqw xmm1, xmm0 ;F: w1[i] = (w1[i] == 0 ? -1 : 0); + shl tempq, 48 ;Z: temp <<= 48; + pxor xmm2, xmm2 ;E: w2[i] = 0; + pcmpgtw xmm0, xmm5 ;E: w0[i] = (w5[i] < 0 ? -1 : 0); + paddw xmm5, xmm0 ;E: w5[i] += w0[i]; + or tempq, put_buffer ;Z: temp |= put_buffer; + movaps XMMWORD [t + 32 * SIZEOF_WORD], xmm5 ;E: t[32+i] = w5[i]; + lea t, [dword t - 2] ;Z: t = &t[-1]; + pcmpeqw xmm5, xmm2 ;E: w5[i] = (w5[i] == 0 ? -1 : 0); + + packsswb xmm5, xmm1 ;EF: b5[i] = w5[i], b5[i+8] = w1[i] + ; w/ signed saturation + + add nbitsb, byte [dctbl + c_derived_tbl.ehufsi + nbitsq] + ;Z: nbits += dctbl->ehufsi[nbits]; +%undef dctbl +%define code_temp r8d + pmovmskb indexd, xmm5 ;Z: index = 0; index |= ((b5[i] >> 7) << i); + mov free_bitsd, [state+working_state.cur.free_bits] + ;Z: free_bits = state->cur.free_bits; + pcmpeqw xmm1, xmm1 ;Z: b1[i] = 0xFF; + shl index, 32 ;Z: index <<= 32; + mov put_buffer, [state+working_state.cur.put_buffer.simd] + ;Z: put_buffer = state->cur.put_buffer.simd; + or index, tempq ;Z: index |= temp; + not index ;Z: index = ~index; + sub free_bitsb, nbitsb ;Z: if ((free_bits -= nbits) >= 0) + jnl .ENTRY_SKIP_EMIT_CODE ;Z: goto .ENTRY_SKIP_EMIT_CODE; + align 16 +.EMIT_CODE: ;Z: .EMIT_CODE: + EMIT_QWORD .BLOOP_COND ;Z: insert code, flush buffer, goto .BLOOP_COND + +; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + align 16 +.BRLOOP: ; do { + lea code_temp, [nbitsq - 16] ; code_temp = nbits - 16; + movzx nbits, byte [actbl + c_derived_tbl.ehufsi + 0xf0] + ; nbits = actbl->ehufsi[0xf0]; + mov code, [actbl + c_derived_tbl.ehufco + 0xf0 * 4] + ; code = actbl->ehufco[0xf0]; + sub free_bitsb, nbitsb ; if ((free_bits -= nbits) <= 0) + jle .EMIT_BRLOOP_CODE ; goto .EMIT_BRLOOP_CODE; + shl put_buffer, nbitsb ; put_buffer <<= nbits; + mov nbits, code_temp ; nbits = code_temp; + or put_buffer, codeq ; put_buffer |= code; + cmp nbits, 16 ; if (nbits <= 16) + jle .ERLOOP ; break; + jmp .BRLOOP ; } while (1); + +; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + align 16 + times 5 nop +.ENTRY_SKIP_EMIT_CODE: ; .ENTRY_SKIP_EMIT_CODE: + shl put_buffer, nbitsb ; put_buffer <<= nbits; + or put_buffer, codeq ; put_buffer |= code; +.BLOOP_COND: ; .BLOOP_COND: + test index, index ; if (index != 0) + jz .ELOOP ; { +.BLOOP: ; do { + xor nbits, nbits ; nbits = 0; /* kill tzcnt input dependency */ + tzcnt nbitsq, index ; nbits = # of trailing 0 bits in index + inc nbits ; ++nbits; + lea t, [t + nbitsq * 2] ; t = &t[nbits]; + shr index, nbitsb ; index >>= nbits; +.EMIT_BRLOOP_CODE_END: ; .EMIT_BRLOOP_CODE_END: + cmp nbits, 16 ; if (nbits > 16) + jg .BRLOOP ; goto .BRLOOP; +.ERLOOP: ; .ERLOOP: + movsx codeq, word [t] ; code = *t; + lea tempd, [nbitsq * 2] ; temp = nbits * 2; + movzx nbits, byte [NBITS(codeq)] ; nbits = JPEG_NBITS(code); + lea tempd, [nbitsq + tempq * 8] ; temp = temp * 8 + nbits; + mov code_temp, [actbl + c_derived_tbl.ehufco + (tempq - 16) * 4] + ; code_temp = actbl->ehufco[temp-16]; + shl code_temp, nbitsb ; code_temp <<= nbits; + and code, dword [MASK_BITS(nbitsq)] ; code &= (1 << nbits) - 1; + add nbitsb, [actbl + c_derived_tbl.ehufsi + (tempq - 16)] + ; free_bits -= actbl->ehufsi[temp-16]; + or code, code_temp ; code |= code_temp; + sub free_bitsb, nbitsb ; if ((free_bits -= nbits) <= 0) + jle .EMIT_CODE ; goto .EMIT_CODE; + shl put_buffer, nbitsb ; put_buffer <<= nbits; + or put_buffer, codeq ; put_buffer |= code; + test index, index + jnz .BLOOP ; } while (index != 0); +.ELOOP: ; } /* index != 0 */ + sub td, esp ; t -= (WIN64: &t_[0], UNIX: &t_[64]); +%ifdef WIN64 + cmp td, (DCTSIZE2 - 2) * SIZEOF_WORD ; if (t != 62) +%else + cmp td, -2 * SIZEOF_WORD ; if (t != -2) +%endif + je .EFN ; { + movzx nbits, byte [actbl + c_derived_tbl.ehufsi + 0] + ; nbits = actbl->ehufsi[0]; + mov code, [actbl + c_derived_tbl.ehufco + 0] ; code = actbl->ehufco[0]; + sub free_bitsb, nbitsb ; if ((free_bits -= nbits) <= 0) + jg .EFN_SKIP_EMIT_CODE ; { + EMIT_QWORD .EFN ; insert code, flush buffer + align 16 +.EFN_SKIP_EMIT_CODE: ; } else { + shl put_buffer, nbitsb ; put_buffer <<= nbits; + or put_buffer, codeq ; put_buffer |= code; +.EFN: ; } } + mov [state + working_state.cur.put_buffer.simd], put_buffer + ; state->cur.put_buffer.simd = put_buffer; + mov byte [state + working_state.cur.free_bits], free_bitsb + ; state->cur.free_bits = free_bits; +%ifdef WIN64 + sub rsp, -DCTSIZE2 * SIZEOF_WORD + pop r12 + pop rdi + pop rsi + pop rbp pop rbx - uncollect_args 6 - pop_xmm 4 - mov rsp, rbp ; rsp <- aligned rbp - pop rsp ; rsp <- original rbp +%else + pop r12 pop rbp + pop rbx +%endif ret +; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + align 16 +.EMIT_BRLOOP_CODE: + EMIT_QWORD .EMIT_BRLOOP_CODE_END, { mov nbits, code_temp } + ; insert code, flush buffer, + ; nbits = code_temp, goto .EMIT_BRLOOP_CODE_END + ; For some reason, the OS X linker does not honor the request to align the ; segment unless we do this. align 32 diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jcphuff-sse2.asm b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jcphuff-sse2.asm --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jcphuff-sse2.asm 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jcphuff-sse2.asm 2021-11-20 03:41:33.403600370 +0000 @@ -504,6 +504,8 @@ add KK, 16 dec K jnz .BLOOPR16 + test LEN, 15 + je .PADDINGR .ELOOPR16: test LEN, 8 jz .TRYR7 diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jcsample-avx2.asm b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jcsample-avx2.asm --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jcsample-avx2.asm 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jcsample-avx2.asm 2021-11-20 03:41:33.403600370 +0000 @@ -4,6 +4,7 @@ ; Copyright 2009 Pierre Ossman for Cendio AB ; Copyright (C) 2009, 2016, D. R. Commander. ; Copyright (C) 2015, Intel Corporation. +; Copyright (C) 2018, Matthias Räncker. ; ; Based on the x86 SIMD extension for IJG JPEG library ; Copyright (C) 1999-2006, MIYASAKA Masaru. @@ -71,7 +72,7 @@ push rax push rcx - mov rdi, JSAMPROW [rsi] + mov rdip, JSAMPROW [rsi] add rdi, rdx mov al, JSAMPLE [rdi-1] @@ -107,8 +108,8 @@ push rdi push rsi - mov rsi, JSAMPROW [rsi] ; inptr - mov rdi, JSAMPROW [rdi] ; outptr + mov rsip, JSAMPROW [rsi] ; inptr + mov rdip, JSAMPROW [rdi] ; outptr cmp rcx, byte SIZEOF_YMMWORD jae short .columnloop @@ -233,7 +234,7 @@ push rax push rcx - mov rdi, JSAMPROW [rsi] + mov rdip, JSAMPROW [rsi] add rdi, rdx mov al, JSAMPLE [rdi-1] @@ -269,9 +270,9 @@ push rdi push rsi - mov rdx, JSAMPROW [rsi+0*SIZEOF_JSAMPROW] ; inptr0 - mov rsi, JSAMPROW [rsi+1*SIZEOF_JSAMPROW] ; inptr1 - mov rdi, JSAMPROW [rdi] ; outptr + mov rdxp, JSAMPROW [rsi+0*SIZEOF_JSAMPROW] ; inptr0 + mov rsip, JSAMPROW [rsi+1*SIZEOF_JSAMPROW] ; inptr1 + mov rdip, JSAMPROW [rdi] ; outptr cmp rcx, byte SIZEOF_YMMWORD jae short .columnloop diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jcsample-sse2.asm b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jcsample-sse2.asm --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jcsample-sse2.asm 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jcsample-sse2.asm 2021-11-20 03:41:33.403600370 +0000 @@ -3,6 +3,7 @@ ; ; Copyright 2009 Pierre Ossman for Cendio AB ; Copyright (C) 2009, 2016, D. R. Commander. +; Copyright (C) 2018, Matthias Räncker. ; ; Based on the x86 SIMD extension for IJG JPEG library ; Copyright (C) 1999-2006, MIYASAKA Masaru. @@ -70,7 +71,7 @@ push rax push rcx - mov rdi, JSAMPROW [rsi] + mov rdip, JSAMPROW [rsi] add rdi, rdx mov al, JSAMPLE [rdi-1] @@ -105,8 +106,8 @@ push rdi push rsi - mov rsi, JSAMPROW [rsi] ; inptr - mov rdi, JSAMPROW [rdi] ; outptr + mov rsip, JSAMPROW [rsi] ; inptr + mov rdip, JSAMPROW [rdi] ; outptr cmp rcx, byte SIZEOF_XMMWORD jae short .columnloop @@ -215,7 +216,7 @@ push rax push rcx - mov rdi, JSAMPROW [rsi] + mov rdip, JSAMPROW [rsi] add rdi, rdx mov al, JSAMPLE [rdi-1] @@ -250,9 +251,9 @@ push rdi push rsi - mov rdx, JSAMPROW [rsi+0*SIZEOF_JSAMPROW] ; inptr0 - mov rsi, JSAMPROW [rsi+1*SIZEOF_JSAMPROW] ; inptr1 - mov rdi, JSAMPROW [rdi] ; outptr + mov rdxp, JSAMPROW [rsi+0*SIZEOF_JSAMPROW] ; inptr0 + mov rsip, JSAMPROW [rsi+1*SIZEOF_JSAMPROW] ; inptr1 + mov rdip, JSAMPROW [rdi] ; outptr cmp rcx, byte SIZEOF_XMMWORD jae short .columnloop diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jdcolext-avx2.asm b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jdcolext-avx2.asm --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jdcolext-avx2.asm 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jdcolext-avx2.asm 2021-11-20 03:41:33.403600370 +0000 @@ -4,6 +4,7 @@ ; Copyright 2009, 2012 Pierre Ossman for Cendio AB ; Copyright (C) 2009, 2012, 2016, D. R. Commander. ; Copyright (C) 2015, Intel Corporation. +; Copyright (C) 2018, Matthias Räncker. ; ; Based on the x86 SIMD extension for IJG JPEG library ; Copyright (C) 1999-2006, MIYASAKA Masaru. @@ -58,9 +59,9 @@ mov rdi, r11 mov ecx, r12d - mov rsi, JSAMPARRAY [rdi+0*SIZEOF_JSAMPARRAY] - mov rbx, JSAMPARRAY [rdi+1*SIZEOF_JSAMPARRAY] - mov rdx, JSAMPARRAY [rdi+2*SIZEOF_JSAMPARRAY] + mov rsip, JSAMPARRAY [rdi+0*SIZEOF_JSAMPARRAY] + mov rbxp, JSAMPARRAY [rdi+1*SIZEOF_JSAMPARRAY] + mov rdxp, JSAMPARRAY [rdi+2*SIZEOF_JSAMPARRAY] lea rsi, [rsi+rcx*SIZEOF_JSAMPROW] lea rbx, [rbx+rcx*SIZEOF_JSAMPROW] lea rdx, [rdx+rcx*SIZEOF_JSAMPROW] @@ -79,10 +80,10 @@ push rsi push rcx ; col - mov rsi, JSAMPROW [rsi] ; inptr0 - mov rbx, JSAMPROW [rbx] ; inptr1 - mov rdx, JSAMPROW [rdx] ; inptr2 - mov rdi, JSAMPROW [rdi] ; outptr + mov rsip, JSAMPROW [rsi] ; inptr0 + mov rbxp, JSAMPROW [rbx] ; inptr1 + mov rdxp, JSAMPROW [rdx] ; inptr2 + mov rdip, JSAMPROW [rdi] ; outptr .columnloop: vmovdqu ymm5, YMMWORD [rbx] ; ymm5=Cb(0123456789ABCDEFGHIJKLMNOPQRSTUV) diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jdcolext-sse2.asm b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jdcolext-sse2.asm --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jdcolext-sse2.asm 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jdcolext-sse2.asm 2021-11-20 03:41:33.403600370 +0000 @@ -3,6 +3,7 @@ ; ; Copyright 2009, 2012 Pierre Ossman for Cendio AB ; Copyright (C) 2009, 2012, 2016, D. R. Commander. +; Copyright (C) 2018, Matthias Räncker. ; ; Based on the x86 SIMD extension for IJG JPEG library ; Copyright (C) 1999-2006, MIYASAKA Masaru. @@ -57,9 +58,9 @@ mov rdi, r11 mov ecx, r12d - mov rsi, JSAMPARRAY [rdi+0*SIZEOF_JSAMPARRAY] - mov rbx, JSAMPARRAY [rdi+1*SIZEOF_JSAMPARRAY] - mov rdx, JSAMPARRAY [rdi+2*SIZEOF_JSAMPARRAY] + mov rsip, JSAMPARRAY [rdi+0*SIZEOF_JSAMPARRAY] + mov rbxp, JSAMPARRAY [rdi+1*SIZEOF_JSAMPARRAY] + mov rdxp, JSAMPARRAY [rdi+2*SIZEOF_JSAMPARRAY] lea rsi, [rsi+rcx*SIZEOF_JSAMPROW] lea rbx, [rbx+rcx*SIZEOF_JSAMPROW] lea rdx, [rdx+rcx*SIZEOF_JSAMPROW] @@ -78,10 +79,10 @@ push rsi push rcx ; col - mov rsi, JSAMPROW [rsi] ; inptr0 - mov rbx, JSAMPROW [rbx] ; inptr1 - mov rdx, JSAMPROW [rdx] ; inptr2 - mov rdi, JSAMPROW [rdi] ; outptr + mov rsip, JSAMPROW [rsi] ; inptr0 + mov rbxp, JSAMPROW [rbx] ; inptr1 + mov rdxp, JSAMPROW [rdx] ; inptr2 + mov rdip, JSAMPROW [rdi] ; outptr .columnloop: movdqa xmm5, XMMWORD [rbx] ; xmm5=Cb(0123456789ABCDEF) diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jdmrgext-avx2.asm b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jdmrgext-avx2.asm --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jdmrgext-avx2.asm 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jdmrgext-avx2.asm 2021-11-20 03:41:33.403600370 +0000 @@ -4,6 +4,7 @@ ; Copyright 2009, 2012 Pierre Ossman for Cendio AB ; Copyright (C) 2009, 2012, 2016, D. R. Commander. ; Copyright (C) 2015, Intel Corporation. +; Copyright (C) 2018, Matthias Räncker. ; ; Based on the x86 SIMD extension for IJG JPEG library ; Copyright (C) 1999-2006, MIYASAKA Masaru. @@ -58,14 +59,14 @@ mov rdi, r11 mov ecx, r12d - mov rsi, JSAMPARRAY [rdi+0*SIZEOF_JSAMPARRAY] - mov rbx, JSAMPARRAY [rdi+1*SIZEOF_JSAMPARRAY] - mov rdx, JSAMPARRAY [rdi+2*SIZEOF_JSAMPARRAY] + mov rsip, JSAMPARRAY [rdi+0*SIZEOF_JSAMPARRAY] + mov rbxp, JSAMPARRAY [rdi+1*SIZEOF_JSAMPARRAY] + mov rdxp, JSAMPARRAY [rdi+2*SIZEOF_JSAMPARRAY] mov rdi, r13 - mov rsi, JSAMPROW [rsi+rcx*SIZEOF_JSAMPROW] ; inptr0 - mov rbx, JSAMPROW [rbx+rcx*SIZEOF_JSAMPROW] ; inptr1 - mov rdx, JSAMPROW [rdx+rcx*SIZEOF_JSAMPROW] ; inptr2 - mov rdi, JSAMPROW [rdi] ; outptr + mov rsip, JSAMPROW [rsi+rcx*SIZEOF_JSAMPROW] ; inptr0 + mov rbxp, JSAMPROW [rbx+rcx*SIZEOF_JSAMPROW] ; inptr1 + mov rdxp, JSAMPROW [rdx+rcx*SIZEOF_JSAMPROW] ; inptr2 + mov rdip, JSAMPROW [rdi] ; outptr pop rcx ; col @@ -514,15 +515,16 @@ mov rdi, r11 mov ecx, r12d - mov rsi, JSAMPARRAY [rdi+0*SIZEOF_JSAMPARRAY] - mov rbx, JSAMPARRAY [rdi+1*SIZEOF_JSAMPARRAY] - mov rdx, JSAMPARRAY [rdi+2*SIZEOF_JSAMPARRAY] + mov rsip, JSAMPARRAY [rdi+0*SIZEOF_JSAMPARRAY] + mov rbxp, JSAMPARRAY [rdi+1*SIZEOF_JSAMPARRAY] + mov rdxp, JSAMPARRAY [rdi+2*SIZEOF_JSAMPARRAY] mov rdi, r13 lea rsi, [rsi+rcx*SIZEOF_JSAMPROW] - push rdx ; inptr2 - push rbx ; inptr1 - push rsi ; inptr00 + sub rsp, SIZEOF_JSAMPARRAY*4 + mov JSAMPARRAY [rsp+0*SIZEOF_JSAMPARRAY], rsip ; intpr00 + mov JSAMPARRAY [rsp+1*SIZEOF_JSAMPARRAY], rbxp ; intpr1 + mov JSAMPARRAY [rsp+2*SIZEOF_JSAMPARRAY], rdxp ; intpr2 mov rbx, rsp push rdi @@ -546,16 +548,16 @@ pop rax pop rcx pop rdi - pop rsi - pop rbx - pop rdx + mov rsip, JSAMPARRAY [rsp+0*SIZEOF_JSAMPARRAY] + mov rbxp, JSAMPARRAY [rsp+1*SIZEOF_JSAMPARRAY] + mov rdxp, JSAMPARRAY [rsp+2*SIZEOF_JSAMPARRAY] add rdi, byte SIZEOF_JSAMPROW ; outptr1 add rsi, byte SIZEOF_JSAMPROW ; inptr01 - push rdx ; inptr2 - push rbx ; inptr1 - push rsi ; inptr00 + mov JSAMPARRAY [rsp+0*SIZEOF_JSAMPARRAY], rsip ; intpr00 + mov JSAMPARRAY [rsp+1*SIZEOF_JSAMPARRAY], rbxp ; intpr1 + mov JSAMPARRAY [rsp+2*SIZEOF_JSAMPARRAY], rdxp ; intpr2 mov rbx, rsp push rdi @@ -579,9 +581,10 @@ pop rax pop rcx pop rdi - pop rsi - pop rbx - pop rdx + mov rsip, JSAMPARRAY [rsp+0*SIZEOF_JSAMPARRAY] + mov rbxp, JSAMPARRAY [rsp+1*SIZEOF_JSAMPARRAY] + mov rdxp, JSAMPARRAY [rsp+2*SIZEOF_JSAMPARRAY] + add rsp, SIZEOF_JSAMPARRAY*4 pop rbx uncollect_args 4 diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jdmrgext-sse2.asm b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jdmrgext-sse2.asm --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jdmrgext-sse2.asm 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jdmrgext-sse2.asm 2021-11-20 03:41:33.404600354 +0000 @@ -3,6 +3,7 @@ ; ; Copyright 2009, 2012 Pierre Ossman for Cendio AB ; Copyright (C) 2009, 2012, 2016, D. R. Commander. +; Copyright (C) 2018, Matthias Räncker. ; ; Based on the x86 SIMD extension for IJG JPEG library ; Copyright (C) 1999-2006, MIYASAKA Masaru. @@ -57,14 +58,14 @@ mov rdi, r11 mov ecx, r12d - mov rsi, JSAMPARRAY [rdi+0*SIZEOF_JSAMPARRAY] - mov rbx, JSAMPARRAY [rdi+1*SIZEOF_JSAMPARRAY] - mov rdx, JSAMPARRAY [rdi+2*SIZEOF_JSAMPARRAY] + mov rsip, JSAMPARRAY [rdi+0*SIZEOF_JSAMPARRAY] + mov rbxp, JSAMPARRAY [rdi+1*SIZEOF_JSAMPARRAY] + mov rdxp, JSAMPARRAY [rdi+2*SIZEOF_JSAMPARRAY] mov rdi, r13 - mov rsi, JSAMPROW [rsi+rcx*SIZEOF_JSAMPROW] ; inptr0 - mov rbx, JSAMPROW [rbx+rcx*SIZEOF_JSAMPROW] ; inptr1 - mov rdx, JSAMPROW [rdx+rcx*SIZEOF_JSAMPROW] ; inptr2 - mov rdi, JSAMPROW [rdi] ; outptr + mov rsip, JSAMPROW [rsi+rcx*SIZEOF_JSAMPROW] ; inptr0 + mov rbxp, JSAMPROW [rbx+rcx*SIZEOF_JSAMPROW] ; inptr1 + mov rdxp, JSAMPROW [rdx+rcx*SIZEOF_JSAMPROW] ; inptr2 + mov rdip, JSAMPROW [rdi] ; outptr pop rcx ; col @@ -456,15 +457,16 @@ mov rdi, r11 mov ecx, r12d - mov rsi, JSAMPARRAY [rdi+0*SIZEOF_JSAMPARRAY] - mov rbx, JSAMPARRAY [rdi+1*SIZEOF_JSAMPARRAY] - mov rdx, JSAMPARRAY [rdi+2*SIZEOF_JSAMPARRAY] + mov rsip, JSAMPARRAY [rdi+0*SIZEOF_JSAMPARRAY] + mov rbxp, JSAMPARRAY [rdi+1*SIZEOF_JSAMPARRAY] + mov rdxp, JSAMPARRAY [rdi+2*SIZEOF_JSAMPARRAY] mov rdi, r13 lea rsi, [rsi+rcx*SIZEOF_JSAMPROW] - push rdx ; inptr2 - push rbx ; inptr1 - push rsi ; inptr00 + sub rsp, SIZEOF_JSAMPARRAY*4 + mov JSAMPARRAY [rsp+0*SIZEOF_JSAMPARRAY], rsip ; intpr00 + mov JSAMPARRAY [rsp+1*SIZEOF_JSAMPARRAY], rbxp ; intpr1 + mov JSAMPARRAY [rsp+2*SIZEOF_JSAMPARRAY], rdxp ; intpr2 mov rbx, rsp push rdi @@ -488,16 +490,16 @@ pop rax pop rcx pop rdi - pop rsi - pop rbx - pop rdx + mov rsip, JSAMPARRAY [rsp+0*SIZEOF_JSAMPARRAY] + mov rbxp, JSAMPARRAY [rsp+1*SIZEOF_JSAMPARRAY] + mov rdxp, JSAMPARRAY [rsp+2*SIZEOF_JSAMPARRAY] add rdi, byte SIZEOF_JSAMPROW ; outptr1 add rsi, byte SIZEOF_JSAMPROW ; inptr01 - push rdx ; inptr2 - push rbx ; inptr1 - push rsi ; inptr00 + mov JSAMPARRAY [rsp+0*SIZEOF_JSAMPARRAY], rsip ; intpr00 + mov JSAMPARRAY [rsp+1*SIZEOF_JSAMPARRAY], rbxp ; intpr1 + mov JSAMPARRAY [rsp+2*SIZEOF_JSAMPARRAY], rdxp ; intpr2 mov rbx, rsp push rdi @@ -521,9 +523,10 @@ pop rax pop rcx pop rdi - pop rsi - pop rbx - pop rdx + mov rsip, JSAMPARRAY [rsp+0*SIZEOF_JSAMPARRAY] + mov rbxp, JSAMPARRAY [rsp+1*SIZEOF_JSAMPARRAY] + mov rdxp, JSAMPARRAY [rsp+2*SIZEOF_JSAMPARRAY] + add rsp, SIZEOF_JSAMPARRAY*4 pop rbx uncollect_args 4 diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jdsample-avx2.asm b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jdsample-avx2.asm --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jdsample-avx2.asm 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jdsample-avx2.asm 2021-11-20 03:41:33.404600354 +0000 @@ -4,6 +4,7 @@ ; Copyright 2009 Pierre Ossman for Cendio AB ; Copyright (C) 2009, 2016, D. R. Commander. ; Copyright (C) 2015, Intel Corporation. +; Copyright (C) 2018, Matthias Räncker. ; ; Based on the x86 SIMD extension for IJG JPEG library ; Copyright (C) 1999-2006, MIYASAKA Masaru. @@ -76,7 +77,7 @@ mov rsi, r12 ; input_data mov rdi, r13 - mov rdi, JSAMPARRAY [rdi] ; output_data + mov rdip, JSAMPARRAY [rdi] ; output_data vpxor ymm0, ymm0, ymm0 ; ymm0=(all 0's) vpcmpeqb xmm9, xmm9, xmm9 @@ -90,8 +91,8 @@ push rdi push rsi - mov rsi, JSAMPROW [rsi] ; inptr - mov rdi, JSAMPROW [rdi] ; outptr + mov rsip, JSAMPROW [rsi] ; inptr + mov rdip, JSAMPROW [rdi] ; outptr test rax, SIZEOF_YMMWORD-1 jz short .skip @@ -235,18 +236,18 @@ mov rsi, r12 ; input_data mov rdi, r13 - mov rdi, JSAMPARRAY [rdi] ; output_data + mov rdip, JSAMPARRAY [rdi] ; output_data .rowloop: push rax ; colctr push rcx push rdi push rsi - mov rcx, JSAMPROW [rsi-1*SIZEOF_JSAMPROW] ; inptr1(above) - mov rbx, JSAMPROW [rsi+0*SIZEOF_JSAMPROW] ; inptr0 - mov rsi, JSAMPROW [rsi+1*SIZEOF_JSAMPROW] ; inptr1(below) - mov rdx, JSAMPROW [rdi+0*SIZEOF_JSAMPROW] ; outptr0 - mov rdi, JSAMPROW [rdi+1*SIZEOF_JSAMPROW] ; outptr1 + mov rcxp, JSAMPROW [rsi-1*SIZEOF_JSAMPROW] ; inptr1(above) + mov rbxp, JSAMPROW [rsi+0*SIZEOF_JSAMPROW] ; inptr0 + mov rsip, JSAMPROW [rsi+1*SIZEOF_JSAMPROW] ; inptr1(below) + mov rdxp, JSAMPROW [rdi+0*SIZEOF_JSAMPROW] ; outptr0 + mov rdip, JSAMPROW [rdi+1*SIZEOF_JSAMPROW] ; outptr1 vpxor ymm8, ymm8, ymm8 ; ymm8=(all 0's) vpcmpeqb xmm9, xmm9, xmm9 @@ -539,13 +540,13 @@ mov rsi, r12 ; input_data mov rdi, r13 - mov rdi, JSAMPARRAY [rdi] ; output_data + mov rdip, JSAMPARRAY [rdi] ; output_data .rowloop: push rdi push rsi - mov rsi, JSAMPROW [rsi] ; inptr - mov rdi, JSAMPROW [rdi] ; outptr + mov rsip, JSAMPROW [rsi] ; inptr + mov rdip, JSAMPROW [rdi] ; outptr mov rax, rdx ; colctr .columnloop: @@ -629,14 +630,14 @@ mov rsi, r12 ; input_data mov rdi, r13 - mov rdi, JSAMPARRAY [rdi] ; output_data + mov rdip, JSAMPARRAY [rdi] ; output_data .rowloop: push rdi push rsi - mov rsi, JSAMPROW [rsi] ; inptr - mov rbx, JSAMPROW [rdi+0*SIZEOF_JSAMPROW] ; outptr0 - mov rdi, JSAMPROW [rdi+1*SIZEOF_JSAMPROW] ; outptr1 + mov rsip, JSAMPROW [rsi] ; inptr + mov rbxp, JSAMPROW [rdi+0*SIZEOF_JSAMPROW] ; outptr0 + mov rdip, JSAMPROW [rdi+1*SIZEOF_JSAMPROW] ; outptr1 mov rax, rdx ; colctr .columnloop: diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jdsample-sse2.asm b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jdsample-sse2.asm --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jdsample-sse2.asm 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jdsample-sse2.asm 2021-11-20 03:41:33.404600354 +0000 @@ -3,6 +3,7 @@ ; ; Copyright 2009 Pierre Ossman for Cendio AB ; Copyright (C) 2009, 2016, D. R. Commander. +; Copyright (C) 2018, Matthias Räncker. ; ; Based on the x86 SIMD extension for IJG JPEG library ; Copyright (C) 1999-2006, MIYASAKA Masaru. @@ -74,14 +75,14 @@ mov rsi, r12 ; input_data mov rdi, r13 - mov rdi, JSAMPARRAY [rdi] ; output_data + mov rdip, JSAMPARRAY [rdi] ; output_data .rowloop: push rax ; colctr push rdi push rsi - mov rsi, JSAMPROW [rsi] ; inptr - mov rdi, JSAMPROW [rdi] ; outptr + mov rsip, JSAMPROW [rsi] ; inptr + mov rdip, JSAMPROW [rdi] ; outptr test rax, SIZEOF_XMMWORD-1 jz short .skip @@ -221,18 +222,18 @@ mov rsi, r12 ; input_data mov rdi, r13 - mov rdi, JSAMPARRAY [rdi] ; output_data + mov rdip, JSAMPARRAY [rdi] ; output_data .rowloop: push rax ; colctr push rcx push rdi push rsi - mov rcx, JSAMPROW [rsi-1*SIZEOF_JSAMPROW] ; inptr1(above) - mov rbx, JSAMPROW [rsi+0*SIZEOF_JSAMPROW] ; inptr0 - mov rsi, JSAMPROW [rsi+1*SIZEOF_JSAMPROW] ; inptr1(below) - mov rdx, JSAMPROW [rdi+0*SIZEOF_JSAMPROW] ; outptr0 - mov rdi, JSAMPROW [rdi+1*SIZEOF_JSAMPROW] ; outptr1 + mov rcxp, JSAMPROW [rsi-1*SIZEOF_JSAMPROW] ; inptr1(above) + mov rbxp, JSAMPROW [rsi+0*SIZEOF_JSAMPROW] ; inptr0 + mov rsip, JSAMPROW [rsi+1*SIZEOF_JSAMPROW] ; inptr1(below) + mov rdxp, JSAMPROW [rdi+0*SIZEOF_JSAMPROW] ; outptr0 + mov rdip, JSAMPROW [rdi+1*SIZEOF_JSAMPROW] ; outptr1 test rax, SIZEOF_XMMWORD-1 jz short .skip @@ -512,13 +513,13 @@ mov rsi, r12 ; input_data mov rdi, r13 - mov rdi, JSAMPARRAY [rdi] ; output_data + mov rdip, JSAMPARRAY [rdi] ; output_data .rowloop: push rdi push rsi - mov rsi, JSAMPROW [rsi] ; inptr - mov rdi, JSAMPROW [rdi] ; outptr + mov rsip, JSAMPROW [rsi] ; inptr + mov rdip, JSAMPROW [rdi] ; outptr mov rax, rdx ; colctr .columnloop: @@ -600,14 +601,14 @@ mov rsi, r12 ; input_data mov rdi, r13 - mov rdi, JSAMPARRAY [rdi] ; output_data + mov rdip, JSAMPARRAY [rdi] ; output_data .rowloop: push rdi push rsi - mov rsi, JSAMPROW [rsi] ; inptr - mov rbx, JSAMPROW [rdi+0*SIZEOF_JSAMPROW] ; outptr0 - mov rdi, JSAMPROW [rdi+1*SIZEOF_JSAMPROW] ; outptr1 + mov rsip, JSAMPROW [rsi] ; inptr + mov rbxp, JSAMPROW [rdi+0*SIZEOF_JSAMPROW] ; outptr0 + mov rdip, JSAMPROW [rdi+1*SIZEOF_JSAMPROW] ; outptr1 mov rax, rdx ; colctr .columnloop: diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jfdctint-avx2.asm b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jfdctint-avx2.asm --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jfdctint-avx2.asm 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jfdctint-avx2.asm 2021-11-20 03:41:33.404600354 +0000 @@ -2,7 +2,7 @@ ; jfdctint.asm - accurate integer FDCT (64-bit AVX2) ; ; Copyright 2009 Pierre Ossman for Cendio AB -; Copyright (C) 2009, 2016, 2018, D. R. Commander. +; Copyright (C) 2009, 2016, 2018, 2020, D. R. Commander. ; ; Based on the x86 SIMD extension for IJG JPEG library ; Copyright (C) 1999-2006, MIYASAKA Masaru. @@ -14,7 +14,7 @@ ; NASM is available from http://nasm.sourceforge.net/ or ; http://sourceforge.net/project/showfiles.php?group_id=6208 ; -; This file contains a slow-but-accurate integer implementation of the +; This file contains a slower but more accurate integer implementation of the ; forward DCT (Discrete Cosine Transform). The following code is based ; directly on the IJG's original jfdctint.c; see the jfdctint.c for ; more details. @@ -103,7 +103,7 @@ %endmacro ; -------------------------------------------------------------------------- -; In-place 8x8x16-bit slow integer forward DCT using AVX2 instructions +; In-place 8x8x16-bit accurate integer forward DCT using AVX2 instructions ; %1-%4: Input/output registers ; %5-%8: Temp registers ; %9: Pass (1 or 2) diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jfdctint-sse2.asm b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jfdctint-sse2.asm --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jfdctint-sse2.asm 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jfdctint-sse2.asm 2021-11-20 03:41:33.404600354 +0000 @@ -2,7 +2,7 @@ ; jfdctint.asm - accurate integer FDCT (64-bit SSE2) ; ; Copyright 2009 Pierre Ossman for Cendio AB -; Copyright (C) 2009, 2016, D. R. Commander. +; Copyright (C) 2009, 2016, 2020, D. R. Commander. ; ; Based on the x86 SIMD extension for IJG JPEG library ; Copyright (C) 1999-2006, MIYASAKA Masaru. @@ -14,7 +14,7 @@ ; NASM is available from http://nasm.sourceforge.net/ or ; http://sourceforge.net/project/showfiles.php?group_id=6208 ; -; This file contains a slow-but-accurate integer implementation of the +; This file contains a slower but more accurate integer implementation of the ; forward DCT (Discrete Cosine Transform). The following code is based ; directly on the IJG's original jfdctint.c; see the jfdctint.c for ; more details. diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jidctflt-sse2.asm b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jidctflt-sse2.asm --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jidctflt-sse2.asm 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jidctflt-sse2.asm 2021-11-20 03:41:33.404600354 +0000 @@ -3,6 +3,7 @@ ; ; Copyright 2009 Pierre Ossman for Cendio AB ; Copyright (C) 2009, 2016, D. R. Commander. +; Copyright (C) 2018, Matthias Räncker. ; ; Based on the x86 SIMD extension for IJG JPEG library ; Copyright (C) 1999-2006, MIYASAKA Masaru. @@ -455,12 +456,12 @@ pshufd xmm5, xmm6, 0x4E ; xmm5=(10 11 12 13 14 15 16 17 00 01 02 03 04 05 06 07) pshufd xmm3, xmm7, 0x4E ; xmm3=(30 31 32 33 34 35 36 37 20 21 22 23 24 25 26 27) - mov rdx, JSAMPROW [rdi+0*SIZEOF_JSAMPROW] - mov rbx, JSAMPROW [rdi+2*SIZEOF_JSAMPROW] + mov rdxp, JSAMPROW [rdi+0*SIZEOF_JSAMPROW] + mov rbxp, JSAMPROW [rdi+2*SIZEOF_JSAMPROW] movq XMM_MMWORD [rdx+rax*SIZEOF_JSAMPLE], xmm6 movq XMM_MMWORD [rbx+rax*SIZEOF_JSAMPLE], xmm7 - mov rdx, JSAMPROW [rdi+1*SIZEOF_JSAMPROW] - mov rbx, JSAMPROW [rdi+3*SIZEOF_JSAMPROW] + mov rdxp, JSAMPROW [rdi+1*SIZEOF_JSAMPROW] + mov rbxp, JSAMPROW [rdi+3*SIZEOF_JSAMPROW] movq XMM_MMWORD [rdx+rax*SIZEOF_JSAMPLE], xmm5 movq XMM_MMWORD [rbx+rax*SIZEOF_JSAMPLE], xmm3 diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jidctfst-sse2.asm b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jidctfst-sse2.asm --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jidctfst-sse2.asm 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jidctfst-sse2.asm 2021-11-20 03:41:33.404600354 +0000 @@ -3,6 +3,7 @@ ; ; Copyright 2009 Pierre Ossman for Cendio AB ; Copyright (C) 2009, 2016, D. R. Commander. +; Copyright (C) 2018, Matthias Räncker. ; ; Based on the x86 SIMD extension for IJG JPEG library ; Copyright (C) 1999-2006, MIYASAKA Masaru. @@ -460,21 +461,21 @@ pshufd xmm6, xmm4, 0x4E ; xmm6=(50 51 52 53 54 55 56 57 40 41 42 43 44 45 46 47) pshufd xmm2, xmm7, 0x4E ; xmm2=(70 71 72 73 74 75 76 77 60 61 62 63 64 65 66 67) - mov rdx, JSAMPROW [rdi+0*SIZEOF_JSAMPROW] - mov rsi, JSAMPROW [rdi+2*SIZEOF_JSAMPROW] + mov rdxp, JSAMPROW [rdi+0*SIZEOF_JSAMPROW] + mov rsip, JSAMPROW [rdi+2*SIZEOF_JSAMPROW] movq XMM_MMWORD [rdx+rax*SIZEOF_JSAMPLE], xmm1 movq XMM_MMWORD [rsi+rax*SIZEOF_JSAMPLE], xmm3 - mov rdx, JSAMPROW [rdi+4*SIZEOF_JSAMPROW] - mov rsi, JSAMPROW [rdi+6*SIZEOF_JSAMPROW] + mov rdxp, JSAMPROW [rdi+4*SIZEOF_JSAMPROW] + mov rsip, JSAMPROW [rdi+6*SIZEOF_JSAMPROW] movq XMM_MMWORD [rdx+rax*SIZEOF_JSAMPLE], xmm4 movq XMM_MMWORD [rsi+rax*SIZEOF_JSAMPLE], xmm7 - mov rdx, JSAMPROW [rdi+1*SIZEOF_JSAMPROW] - mov rsi, JSAMPROW [rdi+3*SIZEOF_JSAMPROW] + mov rdxp, JSAMPROW [rdi+1*SIZEOF_JSAMPROW] + mov rsip, JSAMPROW [rdi+3*SIZEOF_JSAMPROW] movq XMM_MMWORD [rdx+rax*SIZEOF_JSAMPLE], xmm5 movq XMM_MMWORD [rsi+rax*SIZEOF_JSAMPLE], xmm0 - mov rdx, JSAMPROW [rdi+5*SIZEOF_JSAMPROW] - mov rsi, JSAMPROW [rdi+7*SIZEOF_JSAMPROW] + mov rdxp, JSAMPROW [rdi+5*SIZEOF_JSAMPROW] + mov rsip, JSAMPROW [rdi+7*SIZEOF_JSAMPROW] movq XMM_MMWORD [rdx+rax*SIZEOF_JSAMPLE], xmm6 movq XMM_MMWORD [rsi+rax*SIZEOF_JSAMPLE], xmm2 diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jidctint-avx2.asm b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jidctint-avx2.asm --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jidctint-avx2.asm 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jidctint-avx2.asm 2021-11-20 03:41:33.404600354 +0000 @@ -2,7 +2,8 @@ ; jidctint.asm - accurate integer IDCT (64-bit AVX2) ; ; Copyright 2009 Pierre Ossman for Cendio AB -; Copyright (C) 2009, 2016, 2018, D. R. Commander. +; Copyright (C) 2009, 2016, 2018, 2020, D. R. Commander. +; Copyright (C) 2018, Matthias Räncker. ; ; Based on the x86 SIMD extension for IJG JPEG library ; Copyright (C) 1999-2006, MIYASAKA Masaru. @@ -14,7 +15,7 @@ ; NASM is available from http://nasm.sourceforge.net/ or ; http://sourceforge.net/project/showfiles.php?group_id=6208 ; -; This file contains a slow-but-accurate integer implementation of the +; This file contains a slower but more accurate integer implementation of the ; inverse DCT (Discrete Cosine Transform). The following code is based ; directly on the IJG's original jidctint.c; see the jidctint.c for ; more details. @@ -113,7 +114,7 @@ %endmacro ; -------------------------------------------------------------------------- -; In-place 8x8x16-bit slow integer inverse DCT using AVX2 instructions +; In-place 8x8x16-bit accurate integer inverse DCT using AVX2 instructions ; %1-%4: Input/output registers ; %5-%12: Temp registers ; %9: Pass (1 or 2) @@ -387,23 +388,23 @@ mov eax, r13d - mov rdx, JSAMPROW [r12+0*SIZEOF_JSAMPROW] ; (JSAMPLE *) - mov rsi, JSAMPROW [r12+1*SIZEOF_JSAMPROW] ; (JSAMPLE *) + mov rdxp, JSAMPROW [r12+0*SIZEOF_JSAMPROW] ; (JSAMPLE *) + mov rsip, JSAMPROW [r12+1*SIZEOF_JSAMPROW] ; (JSAMPLE *) movq XMM_MMWORD [rdx+rax*SIZEOF_JSAMPLE], xmm0 movq XMM_MMWORD [rsi+rax*SIZEOF_JSAMPLE], xmm1 - mov rdx, JSAMPROW [r12+2*SIZEOF_JSAMPROW] ; (JSAMPLE *) - mov rsi, JSAMPROW [r12+3*SIZEOF_JSAMPROW] ; (JSAMPLE *) + mov rdxp, JSAMPROW [r12+2*SIZEOF_JSAMPROW] ; (JSAMPLE *) + mov rsip, JSAMPROW [r12+3*SIZEOF_JSAMPROW] ; (JSAMPLE *) movq XMM_MMWORD [rdx+rax*SIZEOF_JSAMPLE], xmm2 movq XMM_MMWORD [rsi+rax*SIZEOF_JSAMPLE], xmm3 - mov rdx, JSAMPROW [r12+4*SIZEOF_JSAMPROW] ; (JSAMPLE *) - mov rsi, JSAMPROW [r12+5*SIZEOF_JSAMPROW] ; (JSAMPLE *) + mov rdxp, JSAMPROW [r12+4*SIZEOF_JSAMPROW] ; (JSAMPLE *) + mov rsip, JSAMPROW [r12+5*SIZEOF_JSAMPROW] ; (JSAMPLE *) movq XMM_MMWORD [rdx+rax*SIZEOF_JSAMPLE], xmm4 movq XMM_MMWORD [rsi+rax*SIZEOF_JSAMPLE], xmm5 - mov rdx, JSAMPROW [r12+6*SIZEOF_JSAMPROW] ; (JSAMPLE *) - mov rsi, JSAMPROW [r12+7*SIZEOF_JSAMPROW] ; (JSAMPLE *) + mov rdxp, JSAMPROW [r12+6*SIZEOF_JSAMPROW] ; (JSAMPLE *) + mov rsip, JSAMPROW [r12+7*SIZEOF_JSAMPROW] ; (JSAMPLE *) movq XMM_MMWORD [rdx+rax*SIZEOF_JSAMPLE], xmm6 movq XMM_MMWORD [rsi+rax*SIZEOF_JSAMPLE], xmm7 diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jidctint-sse2.asm b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jidctint-sse2.asm --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jidctint-sse2.asm 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jidctint-sse2.asm 2021-11-20 03:41:33.404600354 +0000 @@ -2,7 +2,8 @@ ; jidctint.asm - accurate integer IDCT (64-bit SSE2) ; ; Copyright 2009 Pierre Ossman for Cendio AB -; Copyright (C) 2009, 2016, D. R. Commander. +; Copyright (C) 2009, 2016, 2020, D. R. Commander. +; Copyright (C) 2018, Matthias Räncker. ; ; Based on the x86 SIMD extension for IJG JPEG library ; Copyright (C) 1999-2006, MIYASAKA Masaru. @@ -14,7 +15,7 @@ ; NASM is available from http://nasm.sourceforge.net/ or ; http://sourceforge.net/project/showfiles.php?group_id=6208 ; -; This file contains a slow-but-accurate integer implementation of the +; This file contains a slower but more accurate integer implementation of the ; inverse DCT (Discrete Cosine Transform). The following code is based ; directly on the IJG's original jidctint.c; see the jidctint.c for ; more details. @@ -817,21 +818,21 @@ pshufd xmm2, xmm4, 0x4E ; xmm2=(50 51 52 53 54 55 56 57 40 41 42 43 44 45 46 47) pshufd xmm5, xmm3, 0x4E ; xmm5=(70 71 72 73 74 75 76 77 60 61 62 63 64 65 66 67) - mov rdx, JSAMPROW [rdi+0*SIZEOF_JSAMPROW] - mov rsi, JSAMPROW [rdi+2*SIZEOF_JSAMPROW] + mov rdxp, JSAMPROW [rdi+0*SIZEOF_JSAMPROW] + mov rsip, JSAMPROW [rdi+2*SIZEOF_JSAMPROW] movq XMM_MMWORD [rdx+rax*SIZEOF_JSAMPLE], xmm7 movq XMM_MMWORD [rsi+rax*SIZEOF_JSAMPLE], xmm1 - mov rdx, JSAMPROW [rdi+4*SIZEOF_JSAMPROW] - mov rsi, JSAMPROW [rdi+6*SIZEOF_JSAMPROW] + mov rdxp, JSAMPROW [rdi+4*SIZEOF_JSAMPROW] + mov rsip, JSAMPROW [rdi+6*SIZEOF_JSAMPROW] movq XMM_MMWORD [rdx+rax*SIZEOF_JSAMPLE], xmm4 movq XMM_MMWORD [rsi+rax*SIZEOF_JSAMPLE], xmm3 - mov rdx, JSAMPROW [rdi+1*SIZEOF_JSAMPROW] - mov rsi, JSAMPROW [rdi+3*SIZEOF_JSAMPROW] + mov rdxp, JSAMPROW [rdi+1*SIZEOF_JSAMPROW] + mov rsip, JSAMPROW [rdi+3*SIZEOF_JSAMPROW] movq XMM_MMWORD [rdx+rax*SIZEOF_JSAMPLE], xmm6 movq XMM_MMWORD [rsi+rax*SIZEOF_JSAMPLE], xmm0 - mov rdx, JSAMPROW [rdi+5*SIZEOF_JSAMPROW] - mov rsi, JSAMPROW [rdi+7*SIZEOF_JSAMPROW] + mov rdxp, JSAMPROW [rdi+5*SIZEOF_JSAMPROW] + mov rsip, JSAMPROW [rdi+7*SIZEOF_JSAMPROW] movq XMM_MMWORD [rdx+rax*SIZEOF_JSAMPLE], xmm2 movq XMM_MMWORD [rsi+rax*SIZEOF_JSAMPLE], xmm5 diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jidctred-sse2.asm b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jidctred-sse2.asm --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jidctred-sse2.asm 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jidctred-sse2.asm 2021-11-20 03:41:33.404600354 +0000 @@ -3,6 +3,7 @@ ; ; Copyright 2009 Pierre Ossman for Cendio AB ; Copyright (C) 2009, 2016, D. R. Commander. +; Copyright (C) 2018, Matthias Räncker. ; ; Based on the x86 SIMD extension for IJG JPEG library ; Copyright (C) 1999-2006, MIYASAKA Masaru. @@ -379,12 +380,12 @@ pshufd xmm1, xmm4, 0x4E ; xmm1=(20 21 22 23 30 31 32 33 00 ..) pshufd xmm3, xmm4, 0x93 ; xmm3=(30 31 32 33 00 01 02 03 10 ..) - mov rdx, JSAMPROW [rdi+0*SIZEOF_JSAMPROW] - mov rsi, JSAMPROW [rdi+1*SIZEOF_JSAMPROW] + mov rdxp, JSAMPROW [rdi+0*SIZEOF_JSAMPROW] + mov rsip, JSAMPROW [rdi+1*SIZEOF_JSAMPROW] movd XMM_DWORD [rdx+rax*SIZEOF_JSAMPLE], xmm4 movd XMM_DWORD [rsi+rax*SIZEOF_JSAMPLE], xmm2 - mov rdx, JSAMPROW [rdi+2*SIZEOF_JSAMPROW] - mov rsi, JSAMPROW [rdi+3*SIZEOF_JSAMPROW] + mov rdxp, JSAMPROW [rdi+2*SIZEOF_JSAMPROW] + mov rsip, JSAMPROW [rdi+3*SIZEOF_JSAMPROW] movd XMM_DWORD [rdx+rax*SIZEOF_JSAMPLE], xmm1 movd XMM_DWORD [rsi+rax*SIZEOF_JSAMPLE], xmm3 @@ -558,8 +559,8 @@ pextrw ebx, xmm6, 0x00 ; ebx=(C0 D0 -- --) pextrw ecx, xmm6, 0x01 ; ecx=(C1 D1 -- --) - mov rdx, JSAMPROW [rdi+0*SIZEOF_JSAMPROW] - mov rsi, JSAMPROW [rdi+1*SIZEOF_JSAMPROW] + mov rdxp, JSAMPROW [rdi+0*SIZEOF_JSAMPROW] + mov rsip, JSAMPROW [rdi+1*SIZEOF_JSAMPROW] mov word [rdx+rax*SIZEOF_JSAMPLE], bx mov word [rsi+rax*SIZEOF_JSAMPLE], cx diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jquantf-sse2.asm b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jquantf-sse2.asm --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jquantf-sse2.asm 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jquantf-sse2.asm 2021-11-20 03:41:33.404600354 +0000 @@ -3,6 +3,7 @@ ; ; Copyright 2009 Pierre Ossman for Cendio AB ; Copyright (C) 2009, 2016, D. R. Commander. +; Copyright (C) 2018, Matthias Räncker. ; ; Based on the x86 SIMD extension for IJG JPEG library ; Copyright (C) 1999-2006, MIYASAKA Masaru. @@ -51,8 +52,8 @@ mov rdi, r12 mov rcx, DCTSIZE/2 .convloop: - mov rbx, JSAMPROW [rsi+0*SIZEOF_JSAMPROW] ; (JSAMPLE *) - mov rdx, JSAMPROW [rsi+1*SIZEOF_JSAMPROW] ; (JSAMPLE *) + mov rbxp, JSAMPROW [rsi+0*SIZEOF_JSAMPROW] ; (JSAMPLE *) + mov rdxp, JSAMPROW [rsi+1*SIZEOF_JSAMPROW] ; (JSAMPLE *) movq xmm0, XMM_MMWORD [rbx+rax*SIZEOF_JSAMPLE] movq xmm1, XMM_MMWORD [rdx+rax*SIZEOF_JSAMPLE] diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jquanti-avx2.asm b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jquanti-avx2.asm --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jquanti-avx2.asm 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jquanti-avx2.asm 2021-11-20 03:41:33.404600354 +0000 @@ -4,6 +4,7 @@ ; Copyright 2009 Pierre Ossman for Cendio AB ; Copyright (C) 2009, 2016, 2018, D. R. Commander. ; Copyright (C) 2016, Matthieu Darbois. +; Copyright (C) 2018, Matthias Räncker. ; ; Based on the x86 SIMD extension for IJG JPEG library ; Copyright (C) 1999-2006, MIYASAKA Masaru. @@ -44,23 +45,23 @@ mov eax, r11d - mov rsi, JSAMPROW [r10+0*SIZEOF_JSAMPROW] ; (JSAMPLE *) - mov rdi, JSAMPROW [r10+1*SIZEOF_JSAMPROW] ; (JSAMPLE *) + mov rsip, JSAMPROW [r10+0*SIZEOF_JSAMPROW] ; (JSAMPLE *) + mov rdip, JSAMPROW [r10+1*SIZEOF_JSAMPROW] ; (JSAMPLE *) movq xmm0, XMM_MMWORD [rsi+rax*SIZEOF_JSAMPLE] pinsrq xmm0, XMM_MMWORD [rdi+rax*SIZEOF_JSAMPLE], 1 - mov rsi, JSAMPROW [r10+2*SIZEOF_JSAMPROW] ; (JSAMPLE *) - mov rdi, JSAMPROW [r10+3*SIZEOF_JSAMPROW] ; (JSAMPLE *) + mov rsip, JSAMPROW [r10+2*SIZEOF_JSAMPROW] ; (JSAMPLE *) + mov rdip, JSAMPROW [r10+3*SIZEOF_JSAMPROW] ; (JSAMPLE *) movq xmm1, XMM_MMWORD [rsi+rax*SIZEOF_JSAMPLE] pinsrq xmm1, XMM_MMWORD [rdi+rax*SIZEOF_JSAMPLE], 1 - mov rsi, JSAMPROW [r10+4*SIZEOF_JSAMPROW] ; (JSAMPLE *) - mov rdi, JSAMPROW [r10+5*SIZEOF_JSAMPROW] ; (JSAMPLE *) + mov rsip, JSAMPROW [r10+4*SIZEOF_JSAMPROW] ; (JSAMPLE *) + mov rdip, JSAMPROW [r10+5*SIZEOF_JSAMPROW] ; (JSAMPLE *) movq xmm2, XMM_MMWORD [rsi+rax*SIZEOF_JSAMPLE] pinsrq xmm2, XMM_MMWORD [rdi+rax*SIZEOF_JSAMPLE], 1 - mov rsi, JSAMPROW [r10+6*SIZEOF_JSAMPROW] ; (JSAMPLE *) - mov rdi, JSAMPROW [r10+7*SIZEOF_JSAMPROW] ; (JSAMPLE *) + mov rsip, JSAMPROW [r10+6*SIZEOF_JSAMPROW] ; (JSAMPLE *) + mov rdip, JSAMPROW [r10+7*SIZEOF_JSAMPROW] ; (JSAMPLE *) movq xmm3, XMM_MMWORD [rsi+rax*SIZEOF_JSAMPLE] pinsrq xmm3, XMM_MMWORD [rdi+rax*SIZEOF_JSAMPLE], 1 diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jquanti-sse2.asm b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jquanti-sse2.asm --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jquanti-sse2.asm 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jquanti-sse2.asm 2021-11-20 03:41:33.405600338 +0000 @@ -3,6 +3,7 @@ ; ; Copyright 2009 Pierre Ossman for Cendio AB ; Copyright (C) 2009, 2016, D. R. Commander. +; Copyright (C) 2018, Matthias Räncker. ; ; Based on the x86 SIMD extension for IJG JPEG library ; Copyright (C) 1999-2006, MIYASAKA Masaru. @@ -51,14 +52,14 @@ mov rdi, r12 mov rcx, DCTSIZE/4 .convloop: - mov rbx, JSAMPROW [rsi+0*SIZEOF_JSAMPROW] ; (JSAMPLE *) - mov rdx, JSAMPROW [rsi+1*SIZEOF_JSAMPROW] ; (JSAMPLE *) + mov rbxp, JSAMPROW [rsi+0*SIZEOF_JSAMPROW] ; (JSAMPLE *) + mov rdxp, JSAMPROW [rsi+1*SIZEOF_JSAMPROW] ; (JSAMPLE *) movq xmm0, XMM_MMWORD [rbx+rax*SIZEOF_JSAMPLE] ; xmm0=(01234567) movq xmm1, XMM_MMWORD [rdx+rax*SIZEOF_JSAMPLE] ; xmm1=(89ABCDEF) - mov rbx, JSAMPROW [rsi+2*SIZEOF_JSAMPROW] ; (JSAMPLE *) - mov rdx, JSAMPROW [rsi+3*SIZEOF_JSAMPROW] ; (JSAMPLE *) + mov rbxp, JSAMPROW [rsi+2*SIZEOF_JSAMPROW] ; (JSAMPLE *) + mov rdxp, JSAMPROW [rsi+3*SIZEOF_JSAMPROW] ; (JSAMPLE *) movq xmm2, XMM_MMWORD [rbx+rax*SIZEOF_JSAMPLE] ; xmm2=(GHIJKLMN) movq xmm3, XMM_MMWORD [rdx+rax*SIZEOF_JSAMPLE] ; xmm3=(OPQRSTUV) diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jsimd.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jsimd.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jsimd.c 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/simd/x86_64/jsimd.c 2021-11-20 03:41:33.405600338 +0000 @@ -472,12 +472,6 @@ return 0; } -GLOBAL(int) -jsimd_can_h1v2_fancy_upsample(void) -{ - return 0; -} - GLOBAL(void) jsimd_h2v2_fancy_upsample(j_decompress_ptr cinfo, jpeg_component_info *compptr, JSAMPARRAY input_data, JSAMPARRAY *output_data_ptr) @@ -506,12 +500,6 @@ output_data_ptr); } -GLOBAL(void) -jsimd_h1v2_fancy_upsample(j_decompress_ptr cinfo, jpeg_component_info *compptr, - JSAMPARRAY input_data, JSAMPARRAY *output_data_ptr) -{ -} - GLOBAL(int) jsimd_can_h2v2_merged_upsample(void) { @@ -1043,8 +1031,6 @@ return 0; if (sizeof(JCOEF) != 2) return 0; - if (SIZEOF_SIZE_T != 8) - return 0; if (simd_support & JSIMD_SSE2) return 1; @@ -1069,8 +1055,6 @@ return 0; if (sizeof(JCOEF) != 2) return 0; - if (SIZEOF_SIZE_T != 8) - return 0; if (simd_support & JSIMD_SSE2) return 1; diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/structure.txt b/src/3rdparty/chromium/third_party/libjpeg_turbo/structure.txt --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/structure.txt 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/structure.txt 2021-11-20 03:41:33.405600338 +0000 @@ -548,13 +548,9 @@ typedef JSAMPROW *JSAMPARRAY; ptr to a list of rows typedef JSAMPARRAY *JSAMPIMAGE; ptr to a list of color-component arrays -The basic element type JSAMPLE will typically be one of unsigned char, -(signed) char, or short. Short will be used if samples wider than 8 bits are -to be supported (this is a compile-time option). Otherwise, unsigned char is -used if possible. If the compiler only supports signed chars, then it is -necessary to mask off the value when reading. Thus, all reads of JSAMPLE -values must be coded as "GETJSAMPLE(value)", where the macro will be defined -as "((value) & 0xFF)" on signed-char machines and "((int) (value))" elsewhere. +The basic element type JSAMPLE will be one of unsigned char or short. Short +will be used if samples wider than 8 bits are to be supported (this is a +compile-time option). Otherwise, unsigned char is used. With these conventions, JSAMPLE values can be assumed to be >= 0. This helps simplify correct rounding during downsampling, etc. The JPEG standard's @@ -587,7 +583,7 @@ is helpful when dealing with noninterleaved JPEG files. In general, a specific sample value is accessed by code such as - GETJSAMPLE(image[colorcomponent][row][col]) + image[colorcomponent][row][col] where col is measured from the image left edge, but row is measured from the first sample row currently in memory. Either of the first two indexings can be precomputed by copying the relevant pointer. diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/testimages/test.scan b/src/3rdparty/chromium/third_party/libjpeg_turbo/testimages/test.scan --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/testimages/test.scan 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/testimages/test.scan 2021-11-20 03:41:33.405600338 +0000 @@ -1,5 +1,8 @@ 0 1 2: 0 0 0 0; -0: 1 16 0 0; -0: 17 63 0 0; +0: 1 9 0 0; +0: 10 41 0 2; +0: 10 41 2 1; +0: 10 41 1 0; +0: 42 63 0 0; 1: 1 63 0 0; 2: 1 63 0 0; diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/tjbench.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/tjbench.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/tjbench.c 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/tjbench.c 2021-11-20 03:41:33.405600338 +0000 @@ -1,5 +1,5 @@ /* - * Copyright (C)2009-2019 D. R. Commander. All Rights Reserved. + * Copyright (C)2009-2019, 2021 D. R. Commander. All Rights Reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are met: @@ -816,6 +816,8 @@ printf("-componly = Stop after running compression tests. Do not test decompression.\n"); printf("-nowrite = Do not write reference or output images (improves consistency of\n"); printf(" performance measurements.)\n"); + printf("-limitscans = Refuse to decompress or transform progressive JPEG images that\n"); + printf(" have an unreasonably large number of scans\n"); printf("-stoponwarning = Immediately discontinue the current\n"); printf(" compression/decompression/transform operation if the underlying codec\n"); printf(" throws a warning (non-fatal error)\n\n"); @@ -974,6 +976,8 @@ compOnly = 1; else if (!strcasecmp(argv[i], "-nowrite")) doWrite = 0; + else if (!strcasecmp(argv[i], "-limitscans")) + flags |= TJFLAG_LIMITSCANS; else if (!strcasecmp(argv[i], "-stoponwarning")) flags |= TJFLAG_STOPONWARNING; else usage(argv[0]); diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/tjutil.h b/src/3rdparty/chromium/third_party/libjpeg_turbo/tjutil.h --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/tjutil.h 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/tjutil.h 2021-11-20 03:41:33.405600338 +0000 @@ -30,7 +30,7 @@ #ifndef __MINGW32__ #include #define snprintf(str, n, format, ...) \ - _snprintf_s(str, n, _TRUNCATE, format, __VA_ARGS__) + _snprintf_s(str, n, _TRUNCATE, format, ##__VA_ARGS__) #endif #define strcasecmp stricmp #define strncasecmp strnicmp diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/transupp.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/transupp.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/transupp.c 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/transupp.c 2021-11-20 03:41:33.405600338 +0000 @@ -2,9 +2,9 @@ * transupp.c * * This file was part of the Independent JPEG Group's software: - * Copyright (C) 1997-2011, Thomas G. Lane, Guido Vollbeding. + * Copyright (C) 1997-2019, Thomas G. Lane, Guido Vollbeding. * libjpeg-turbo Modifications: - * Copyright (C) 2010, 2017, D. R. Commander. + * Copyright (C) 2010, 2017, 2021, D. R. Commander. * For conditions of distribution and use, see the accompanying README.ijg * file. * @@ -89,6 +89,189 @@ LOCAL(void) +dequant_comp(j_decompress_ptr cinfo, jpeg_component_info *compptr, + jvirt_barray_ptr coef_array, JQUANT_TBL *qtblptr1) +{ + JDIMENSION blk_x, blk_y; + int offset_y, k; + JQUANT_TBL *qtblptr; + JBLOCKARRAY buffer; + JBLOCKROW block; + JCOEFPTR ptr; + + qtblptr = compptr->quant_table; + for (blk_y = 0; blk_y < compptr->height_in_blocks; + blk_y += compptr->v_samp_factor) { + buffer = (*cinfo->mem->access_virt_barray) + ((j_common_ptr)cinfo, coef_array, blk_y, + (JDIMENSION)compptr->v_samp_factor, TRUE); + for (offset_y = 0; offset_y < compptr->v_samp_factor; offset_y++) { + block = buffer[offset_y]; + for (blk_x = 0; blk_x < compptr->width_in_blocks; blk_x++) { + ptr = block[blk_x]; + for (k = 0; k < DCTSIZE2; k++) + if (qtblptr->quantval[k] != qtblptr1->quantval[k]) + ptr[k] *= qtblptr->quantval[k] / qtblptr1->quantval[k]; + } + } + } +} + + +LOCAL(void) +requant_comp(j_decompress_ptr cinfo, jpeg_component_info *compptr, + jvirt_barray_ptr coef_array, JQUANT_TBL *qtblptr1) +{ + JDIMENSION blk_x, blk_y; + int offset_y, k; + JQUANT_TBL *qtblptr; + JBLOCKARRAY buffer; + JBLOCKROW block; + JCOEFPTR ptr; + JCOEF temp, qval; + + qtblptr = compptr->quant_table; + for (blk_y = 0; blk_y < compptr->height_in_blocks; + blk_y += compptr->v_samp_factor) { + buffer = (*cinfo->mem->access_virt_barray) + ((j_common_ptr)cinfo, coef_array, blk_y, + (JDIMENSION)compptr->v_samp_factor, TRUE); + for (offset_y = 0; offset_y < compptr->v_samp_factor; offset_y++) { + block = buffer[offset_y]; + for (blk_x = 0; blk_x < compptr->width_in_blocks; blk_x++) { + ptr = block[blk_x]; + for (k = 0; k < DCTSIZE2; k++) { + temp = qtblptr->quantval[k]; + qval = qtblptr1->quantval[k]; + if (temp != qval) { + temp *= ptr[k]; + /* The following quantization code is copied from jcdctmgr.c */ +#ifdef FAST_DIVIDE +#define DIVIDE_BY(a, b) a /= b +#else +#define DIVIDE_BY(a, b) if (a >= b) a /= b; else a = 0 +#endif + if (temp < 0) { + temp = -temp; + temp += qval >> 1; /* for rounding */ + DIVIDE_BY(temp, qval); + temp = -temp; + } else { + temp += qval >> 1; /* for rounding */ + DIVIDE_BY(temp, qval); + } + ptr[k] = temp; + } + } + } + } + } +} + + +/* + * Calculate largest common denominator using Euclid's algorithm. + */ +LOCAL(JCOEF) +largest_common_denominator(JCOEF a, JCOEF b) +{ + JCOEF c; + + do { + c = a % b; + a = b; + b = c; + } while (c); + + return a; +} + + +LOCAL(void) +adjust_quant(j_decompress_ptr srcinfo, jvirt_barray_ptr *src_coef_arrays, + j_decompress_ptr dropinfo, jvirt_barray_ptr *drop_coef_arrays, + boolean trim, j_compress_ptr dstinfo) +{ + jpeg_component_info *compptr1, *compptr2; + JQUANT_TBL *qtblptr1, *qtblptr2, *qtblptr3; + int ci, k; + + for (ci = 0; ci < dstinfo->num_components && ci < dropinfo->num_components; + ci++) { + compptr1 = srcinfo->comp_info + ci; + compptr2 = dropinfo->comp_info + ci; + qtblptr1 = compptr1->quant_table; + qtblptr2 = compptr2->quant_table; + for (k = 0; k < DCTSIZE2; k++) { + if (qtblptr1->quantval[k] != qtblptr2->quantval[k]) { + if (trim) + requant_comp(dropinfo, compptr2, drop_coef_arrays[ci], qtblptr1); + else { + qtblptr3 = dstinfo->quant_tbl_ptrs[compptr1->quant_tbl_no]; + for (k = 0; k < DCTSIZE2; k++) + if (qtblptr1->quantval[k] != qtblptr2->quantval[k]) + qtblptr3->quantval[k] = + largest_common_denominator(qtblptr1->quantval[k], + qtblptr2->quantval[k]); + dequant_comp(srcinfo, compptr1, src_coef_arrays[ci], qtblptr3); + dequant_comp(dropinfo, compptr2, drop_coef_arrays[ci], qtblptr3); + } + break; + } + } + } +} + + +LOCAL(void) +do_drop(j_decompress_ptr srcinfo, j_compress_ptr dstinfo, + JDIMENSION x_crop_offset, JDIMENSION y_crop_offset, + jvirt_barray_ptr *src_coef_arrays, + j_decompress_ptr dropinfo, jvirt_barray_ptr *drop_coef_arrays, + JDIMENSION drop_width, JDIMENSION drop_height) +/* Drop (insert) the contents of another image into the source image. If the + * number of components in the drop image is smaller than the number of + * components in the destination image, then we fill in the remaining + * components with zero. This allows for dropping the contents of grayscale + * images into (arbitrarily sampled) color images. + */ +{ + JDIMENSION comp_width, comp_height; + JDIMENSION blk_y, x_drop_blocks, y_drop_blocks; + int ci, offset_y; + JBLOCKARRAY src_buffer, dst_buffer; + jpeg_component_info *compptr; + + for (ci = 0; ci < dstinfo->num_components; ci++) { + compptr = dstinfo->comp_info + ci; + comp_width = drop_width * compptr->h_samp_factor; + comp_height = drop_height * compptr->v_samp_factor; + x_drop_blocks = x_crop_offset * compptr->h_samp_factor; + y_drop_blocks = y_crop_offset * compptr->v_samp_factor; + for (blk_y = 0; blk_y < comp_height; blk_y += compptr->v_samp_factor) { + dst_buffer = (*srcinfo->mem->access_virt_barray) + ((j_common_ptr)srcinfo, src_coef_arrays[ci], blk_y + y_drop_blocks, + (JDIMENSION)compptr->v_samp_factor, TRUE); + if (ci < dropinfo->num_components) { + src_buffer = (*dropinfo->mem->access_virt_barray) + ((j_common_ptr)dropinfo, drop_coef_arrays[ci], blk_y, + (JDIMENSION)compptr->v_samp_factor, FALSE); + for (offset_y = 0; offset_y < compptr->v_samp_factor; offset_y++) { + jcopy_block_row(src_buffer[offset_y], + dst_buffer[offset_y] + x_drop_blocks, comp_width); + } + } else { + for (offset_y = 0; offset_y < compptr->v_samp_factor; offset_y++) { + MEMZERO(dst_buffer[offset_y] + x_drop_blocks, + comp_width * sizeof(JBLOCK)); + } + } + } + } +} + + +LOCAL(void) do_crop(j_decompress_ptr srcinfo, j_compress_ptr dstinfo, JDIMENSION x_crop_offset, JDIMENSION y_crop_offset, jvirt_barray_ptr *src_coef_arrays, @@ -113,13 +296,422 @@ ((j_common_ptr)srcinfo, dst_coef_arrays[ci], dst_blk_y, (JDIMENSION)compptr->v_samp_factor, TRUE); src_buffer = (*srcinfo->mem->access_virt_barray) - ((j_common_ptr)srcinfo, src_coef_arrays[ci], - dst_blk_y + y_crop_blocks, + ((j_common_ptr)srcinfo, src_coef_arrays[ci], dst_blk_y + y_crop_blocks, (JDIMENSION)compptr->v_samp_factor, FALSE); for (offset_y = 0; offset_y < compptr->v_samp_factor; offset_y++) { jcopy_block_row(src_buffer[offset_y] + x_crop_blocks, - dst_buffer[offset_y], - compptr->width_in_blocks); + dst_buffer[offset_y], compptr->width_in_blocks); + } + } + } +} + + +LOCAL(void) +do_crop_ext_zero(j_decompress_ptr srcinfo, j_compress_ptr dstinfo, + JDIMENSION x_crop_offset, JDIMENSION y_crop_offset, + jvirt_barray_ptr *src_coef_arrays, + jvirt_barray_ptr *dst_coef_arrays) +/* Crop. This is only used when no rotate/flip is requested with the crop. + * Extension: If the destination size is larger than the source, we fill in the + * expanded region with zero (neutral gray). Note that we also have to zero + * partial iMCUs at the right and bottom edge of the source image area in this + * case. + */ +{ + JDIMENSION MCU_cols, MCU_rows, comp_width, comp_height; + JDIMENSION dst_blk_y, x_crop_blocks, y_crop_blocks; + int ci, offset_y; + JBLOCKARRAY src_buffer, dst_buffer; + jpeg_component_info *compptr; + + MCU_cols = srcinfo->output_width / + (dstinfo->max_h_samp_factor * dstinfo_min_DCT_h_scaled_size); + MCU_rows = srcinfo->output_height / + (dstinfo->max_v_samp_factor * dstinfo_min_DCT_v_scaled_size); + + for (ci = 0; ci < dstinfo->num_components; ci++) { + compptr = dstinfo->comp_info + ci; + comp_width = MCU_cols * compptr->h_samp_factor; + comp_height = MCU_rows * compptr->v_samp_factor; + x_crop_blocks = x_crop_offset * compptr->h_samp_factor; + y_crop_blocks = y_crop_offset * compptr->v_samp_factor; + for (dst_blk_y = 0; dst_blk_y < compptr->height_in_blocks; + dst_blk_y += compptr->v_samp_factor) { + dst_buffer = (*srcinfo->mem->access_virt_barray) + ((j_common_ptr)srcinfo, dst_coef_arrays[ci], dst_blk_y, + (JDIMENSION)compptr->v_samp_factor, TRUE); + if (dstinfo->_jpeg_height > srcinfo->output_height) { + if (dst_blk_y < y_crop_blocks || + dst_blk_y >= y_crop_blocks + comp_height) { + for (offset_y = 0; offset_y < compptr->v_samp_factor; offset_y++) { + MEMZERO(dst_buffer[offset_y], + compptr->width_in_blocks * sizeof(JBLOCK)); + } + continue; + } + src_buffer = (*srcinfo->mem->access_virt_barray) + ((j_common_ptr)srcinfo, src_coef_arrays[ci], + dst_blk_y - y_crop_blocks, (JDIMENSION)compptr->v_samp_factor, + FALSE); + } else { + src_buffer = (*srcinfo->mem->access_virt_barray) + ((j_common_ptr)srcinfo, src_coef_arrays[ci], + dst_blk_y + y_crop_blocks, (JDIMENSION)compptr->v_samp_factor, + FALSE); + } + for (offset_y = 0; offset_y < compptr->v_samp_factor; offset_y++) { + if (dstinfo->_jpeg_width > srcinfo->output_width) { + if (x_crop_blocks > 0) { + MEMZERO(dst_buffer[offset_y], x_crop_blocks * sizeof(JBLOCK)); + } + jcopy_block_row(src_buffer[offset_y], + dst_buffer[offset_y] + x_crop_blocks, comp_width); + if (compptr->width_in_blocks > x_crop_blocks + comp_width) { + MEMZERO(dst_buffer[offset_y] + x_crop_blocks + comp_width, + (compptr->width_in_blocks - x_crop_blocks - comp_width) * + sizeof(JBLOCK)); + } + } else { + jcopy_block_row(src_buffer[offset_y] + x_crop_blocks, + dst_buffer[offset_y], compptr->width_in_blocks); + } + } + } + } +} + + +LOCAL(void) +do_crop_ext_flat(j_decompress_ptr srcinfo, j_compress_ptr dstinfo, + JDIMENSION x_crop_offset, JDIMENSION y_crop_offset, + jvirt_barray_ptr *src_coef_arrays, + jvirt_barray_ptr *dst_coef_arrays) +/* Crop. This is only used when no rotate/flip is requested with the crop. + * Extension: The destination width is larger than the source, and we fill in + * the expanded region with the DC coefficient of the adjacent block. Note + * that we also have to fill partial iMCUs at the right and bottom edge of the + * source image area in this case. + */ +{ + JDIMENSION MCU_cols, MCU_rows, comp_width, comp_height; + JDIMENSION dst_blk_x, dst_blk_y, x_crop_blocks, y_crop_blocks; + int ci, offset_y; + JCOEF dc; + JBLOCKARRAY src_buffer, dst_buffer; + jpeg_component_info *compptr; + + MCU_cols = srcinfo->output_width / + (dstinfo->max_h_samp_factor * dstinfo_min_DCT_h_scaled_size); + MCU_rows = srcinfo->output_height / + (dstinfo->max_v_samp_factor * dstinfo_min_DCT_v_scaled_size); + + for (ci = 0; ci < dstinfo->num_components; ci++) { + compptr = dstinfo->comp_info + ci; + comp_width = MCU_cols * compptr->h_samp_factor; + comp_height = MCU_rows * compptr->v_samp_factor; + x_crop_blocks = x_crop_offset * compptr->h_samp_factor; + y_crop_blocks = y_crop_offset * compptr->v_samp_factor; + for (dst_blk_y = 0; dst_blk_y < compptr->height_in_blocks; + dst_blk_y += compptr->v_samp_factor) { + dst_buffer = (*srcinfo->mem->access_virt_barray) + ((j_common_ptr)srcinfo, dst_coef_arrays[ci], dst_blk_y, + (JDIMENSION)compptr->v_samp_factor, TRUE); + if (dstinfo->_jpeg_height > srcinfo->output_height) { + if (dst_blk_y < y_crop_blocks || + dst_blk_y >= y_crop_blocks + comp_height) { + for (offset_y = 0; offset_y < compptr->v_samp_factor; offset_y++) { + MEMZERO(dst_buffer[offset_y], + compptr->width_in_blocks * sizeof(JBLOCK)); + } + continue; + } + src_buffer = (*srcinfo->mem->access_virt_barray) + ((j_common_ptr)srcinfo, src_coef_arrays[ci], + dst_blk_y - y_crop_blocks, (JDIMENSION)compptr->v_samp_factor, + FALSE); + } else { + src_buffer = (*srcinfo->mem->access_virt_barray) + ((j_common_ptr)srcinfo, src_coef_arrays[ci], + dst_blk_y + y_crop_blocks, (JDIMENSION)compptr->v_samp_factor, + FALSE); + } + for (offset_y = 0; offset_y < compptr->v_samp_factor; offset_y++) { + if (x_crop_blocks > 0) { + MEMZERO(dst_buffer[offset_y], x_crop_blocks * sizeof(JBLOCK)); + dc = src_buffer[offset_y][0][0]; + for (dst_blk_x = 0; dst_blk_x < x_crop_blocks; dst_blk_x++) { + dst_buffer[offset_y][dst_blk_x][0] = dc; + } + } + jcopy_block_row(src_buffer[offset_y], + dst_buffer[offset_y] + x_crop_blocks, comp_width); + if (compptr->width_in_blocks > x_crop_blocks + comp_width) { + MEMZERO(dst_buffer[offset_y] + x_crop_blocks + comp_width, + (compptr->width_in_blocks - x_crop_blocks - comp_width) * + sizeof(JBLOCK)); + dc = src_buffer[offset_y][comp_width - 1][0]; + for (dst_blk_x = x_crop_blocks + comp_width; + dst_blk_x < compptr->width_in_blocks; dst_blk_x++) { + dst_buffer[offset_y][dst_blk_x][0] = dc; + } + } + } + } + } +} + + +LOCAL(void) +do_crop_ext_reflect(j_decompress_ptr srcinfo, j_compress_ptr dstinfo, + JDIMENSION x_crop_offset, JDIMENSION y_crop_offset, + jvirt_barray_ptr *src_coef_arrays, + jvirt_barray_ptr *dst_coef_arrays) +/* Crop. This is only used when no rotate/flip is requested with the crop. + * Extension: The destination width is larger than the source, and we fill in + * the expanded region with repeated reflections of the source image. Note + * that we also have to fill partial iMCUs at the right and bottom edge of the + * source image area in this case. + */ +{ + JDIMENSION MCU_cols, MCU_rows, comp_width, comp_height, src_blk_x; + JDIMENSION dst_blk_x, dst_blk_y, x_crop_blocks, y_crop_blocks; + int ci, k, offset_y; + JBLOCKARRAY src_buffer, dst_buffer; + JBLOCKROW src_row_ptr, dst_row_ptr; + JCOEFPTR src_ptr, dst_ptr; + jpeg_component_info *compptr; + + MCU_cols = srcinfo->output_width / + (dstinfo->max_h_samp_factor * dstinfo_min_DCT_h_scaled_size); + MCU_rows = srcinfo->output_height / + (dstinfo->max_v_samp_factor * dstinfo_min_DCT_v_scaled_size); + + for (ci = 0; ci < dstinfo->num_components; ci++) { + compptr = dstinfo->comp_info + ci; + comp_width = MCU_cols * compptr->h_samp_factor; + comp_height = MCU_rows * compptr->v_samp_factor; + x_crop_blocks = x_crop_offset * compptr->h_samp_factor; + y_crop_blocks = y_crop_offset * compptr->v_samp_factor; + for (dst_blk_y = 0; dst_blk_y < compptr->height_in_blocks; + dst_blk_y += compptr->v_samp_factor) { + dst_buffer = (*srcinfo->mem->access_virt_barray) + ((j_common_ptr)srcinfo, dst_coef_arrays[ci], dst_blk_y, + (JDIMENSION)compptr->v_samp_factor, TRUE); + if (dstinfo->_jpeg_height > srcinfo->output_height) { + if (dst_blk_y < y_crop_blocks || + dst_blk_y >= y_crop_blocks + comp_height) { + for (offset_y = 0; offset_y < compptr->v_samp_factor; offset_y++) { + MEMZERO(dst_buffer[offset_y], + compptr->width_in_blocks * sizeof(JBLOCK)); + } + continue; + } + src_buffer = (*srcinfo->mem->access_virt_barray) + ((j_common_ptr)srcinfo, src_coef_arrays[ci], + dst_blk_y - y_crop_blocks, (JDIMENSION)compptr->v_samp_factor, + FALSE); + } else { + src_buffer = (*srcinfo->mem->access_virt_barray) + ((j_common_ptr)srcinfo, src_coef_arrays[ci], + dst_blk_y + y_crop_blocks, (JDIMENSION)compptr->v_samp_factor, + FALSE); + } + for (offset_y = 0; offset_y < compptr->v_samp_factor; offset_y++) { + /* Copy source region */ + jcopy_block_row(src_buffer[offset_y], + dst_buffer[offset_y] + x_crop_blocks, comp_width); + if (x_crop_blocks > 0) { + /* Reflect to left */ + dst_row_ptr = dst_buffer[offset_y] + x_crop_blocks; + for (dst_blk_x = x_crop_blocks; dst_blk_x > 0;) { + src_row_ptr = dst_row_ptr; /* (re)set axis of reflection */ + for (src_blk_x = comp_width; src_blk_x > 0 && dst_blk_x > 0; + src_blk_x--, dst_blk_x--) { + dst_ptr = *(--dst_row_ptr); /* destination goes left */ + src_ptr = *src_row_ptr++; /* source goes right */ + /* This unrolled loop doesn't need to know which row it's on. */ + for (k = 0; k < DCTSIZE2; k += 2) { + *dst_ptr++ = *src_ptr++; /* copy even column */ + *dst_ptr++ = -(*src_ptr++); /* copy odd column with sign + change */ + } + } + } + } + if (compptr->width_in_blocks > x_crop_blocks + comp_width) { + /* Reflect to right */ + dst_row_ptr = dst_buffer[offset_y] + x_crop_blocks + comp_width; + for (dst_blk_x = compptr->width_in_blocks - x_crop_blocks - comp_width; + dst_blk_x > 0;) { + src_row_ptr = dst_row_ptr; /* (re)set axis of reflection */ + for (src_blk_x = comp_width; src_blk_x > 0 && dst_blk_x > 0; + src_blk_x--, dst_blk_x--) { + dst_ptr = *dst_row_ptr++; /* destination goes right */ + src_ptr = *(--src_row_ptr); /* source goes left */ + /* This unrolled loop doesn't need to know which row it's on. */ + for (k = 0; k < DCTSIZE2; k += 2) { + *dst_ptr++ = *src_ptr++; /* copy even column */ + *dst_ptr++ = -(*src_ptr++); /* copy odd column with sign + change */ + } + } + } + } + } + } + } +} + + +LOCAL(void) +do_wipe(j_decompress_ptr srcinfo, j_compress_ptr dstinfo, + JDIMENSION x_crop_offset, JDIMENSION y_crop_offset, + jvirt_barray_ptr *src_coef_arrays, + JDIMENSION drop_width, JDIMENSION drop_height) +/* Wipe - discard image contents of specified region and fill with zero + * (neutral gray) + */ +{ + JDIMENSION x_wipe_blocks, wipe_width; + JDIMENSION y_wipe_blocks, wipe_bottom; + int ci, offset_y; + JBLOCKARRAY buffer; + jpeg_component_info *compptr; + + for (ci = 0; ci < dstinfo->num_components; ci++) { + compptr = dstinfo->comp_info + ci; + x_wipe_blocks = x_crop_offset * compptr->h_samp_factor; + wipe_width = drop_width * compptr->h_samp_factor; + y_wipe_blocks = y_crop_offset * compptr->v_samp_factor; + wipe_bottom = drop_height * compptr->v_samp_factor + y_wipe_blocks; + for (; y_wipe_blocks < wipe_bottom; + y_wipe_blocks += compptr->v_samp_factor) { + buffer = (*srcinfo->mem->access_virt_barray) + ((j_common_ptr)srcinfo, src_coef_arrays[ci], y_wipe_blocks, + (JDIMENSION)compptr->v_samp_factor, TRUE); + for (offset_y = 0; offset_y < compptr->v_samp_factor; offset_y++) { + MEMZERO(buffer[offset_y] + x_wipe_blocks, wipe_width * sizeof(JBLOCK)); + } + } + } +} + + +LOCAL(void) +do_flatten(j_decompress_ptr srcinfo, j_compress_ptr dstinfo, + JDIMENSION x_crop_offset, JDIMENSION y_crop_offset, + jvirt_barray_ptr *src_coef_arrays, + JDIMENSION drop_width, JDIMENSION drop_height) +/* Flatten - discard image contents of specified region, similarly to wipe, + * but fill with the average of adjacent blocks instead of zero. + */ +{ + JDIMENSION x_wipe_blocks, wipe_width, wipe_right; + JDIMENSION y_wipe_blocks, wipe_bottom, blk_x; + int ci, offset_y, dc_left_value, dc_right_value, average; + JBLOCKARRAY buffer; + jpeg_component_info *compptr; + + for (ci = 0; ci < dstinfo->num_components; ci++) { + compptr = dstinfo->comp_info + ci; + x_wipe_blocks = x_crop_offset * compptr->h_samp_factor; + wipe_width = drop_width * compptr->h_samp_factor; + wipe_right = wipe_width + x_wipe_blocks; + y_wipe_blocks = y_crop_offset * compptr->v_samp_factor; + wipe_bottom = drop_height * compptr->v_samp_factor + y_wipe_blocks; + for (; y_wipe_blocks < wipe_bottom; + y_wipe_blocks += compptr->v_samp_factor) { + buffer = (*srcinfo->mem->access_virt_barray) + ((j_common_ptr)srcinfo, src_coef_arrays[ci], y_wipe_blocks, + (JDIMENSION)compptr->v_samp_factor, TRUE); + for (offset_y = 0; offset_y < compptr->v_samp_factor; offset_y++) { + MEMZERO(buffer[offset_y] + x_wipe_blocks, wipe_width * sizeof(JBLOCK)); + if (x_wipe_blocks > 0) { + dc_left_value = buffer[offset_y][x_wipe_blocks - 1][0]; + if (wipe_right < compptr->width_in_blocks) { + dc_right_value = buffer[offset_y][wipe_right][0]; + average = (dc_left_value + dc_right_value) >> 1; + } else { + average = dc_left_value; + } + } else if (wipe_right < compptr->width_in_blocks) { + average = buffer[offset_y][wipe_right][0]; + } else continue; + for (blk_x = x_wipe_blocks; blk_x < wipe_right; blk_x++) { + buffer[offset_y][blk_x][0] = (JCOEF)average; + } + } + } + } +} + + +LOCAL(void) +do_reflect(j_decompress_ptr srcinfo, j_compress_ptr dstinfo, + JDIMENSION x_crop_offset, jvirt_barray_ptr *src_coef_arrays, + JDIMENSION drop_width, JDIMENSION drop_height) +/* Reflect - discard image contents of specified region, similarly to wipe, + * but fill with repeated reflections of the outside region instead of zero. + * NB: y_crop_offset is assumed to be zero. + */ +{ + JDIMENSION x_wipe_blocks, wipe_width; + JDIMENSION y_wipe_blocks, wipe_bottom; + JDIMENSION src_blk_x, dst_blk_x; + int ci, k, offset_y; + JBLOCKARRAY buffer; + JBLOCKROW src_row_ptr, dst_row_ptr; + JCOEFPTR src_ptr, dst_ptr; + jpeg_component_info *compptr; + + for (ci = 0; ci < dstinfo->num_components; ci++) { + compptr = dstinfo->comp_info + ci; + x_wipe_blocks = x_crop_offset * compptr->h_samp_factor; + wipe_width = drop_width * compptr->h_samp_factor; + wipe_bottom = drop_height * compptr->v_samp_factor; + for (y_wipe_blocks = 0; y_wipe_blocks < wipe_bottom; + y_wipe_blocks += compptr->v_samp_factor) { + buffer = (*srcinfo->mem->access_virt_barray) + ((j_common_ptr)srcinfo, src_coef_arrays[ci], y_wipe_blocks, + (JDIMENSION)compptr->v_samp_factor, TRUE); + for (offset_y = 0; offset_y < compptr->v_samp_factor; offset_y++) { + if (x_wipe_blocks > 0) { + /* Reflect from left */ + dst_row_ptr = buffer[offset_y] + x_wipe_blocks; + for (dst_blk_x = wipe_width; dst_blk_x > 0;) { + src_row_ptr = dst_row_ptr; /* (re)set axis of reflection */ + for (src_blk_x = x_wipe_blocks; + src_blk_x > 0 && dst_blk_x > 0; src_blk_x--, dst_blk_x--) { + dst_ptr = *dst_row_ptr++; /* destination goes right */ + src_ptr = *(--src_row_ptr); /* source goes left */ + /* this unrolled loop doesn't need to know which row it's on... */ + for (k = 0; k < DCTSIZE2; k += 2) { + *dst_ptr++ = *src_ptr++; /* copy even column */ + *dst_ptr++ = -(*src_ptr++); /* copy odd column with sign change */ + } + } + } + } else if (compptr->width_in_blocks > x_wipe_blocks + wipe_width) { + /* Reflect from right */ + dst_row_ptr = buffer[offset_y] + x_wipe_blocks + wipe_width; + for (dst_blk_x = wipe_width; dst_blk_x > 0;) { + src_row_ptr = dst_row_ptr; /* (re)set axis of reflection */ + src_blk_x = compptr->width_in_blocks - x_wipe_blocks - wipe_width; + for (; src_blk_x > 0 && dst_blk_x > 0; src_blk_x--, dst_blk_x--) { + dst_ptr = *(--dst_row_ptr); /* destination goes left */ + src_ptr = *src_row_ptr++; /* source goes right */ + /* this unrolled loop doesn't need to know which row it's on... */ + for (k = 0; k < DCTSIZE2; k += 2) { + *dst_ptr++ = *src_ptr++; /* copy even column */ + *dst_ptr++ = -(*src_ptr++); /* copy odd column with sign change */ + } + } + } + } else { + MEMZERO(buffer[offset_y] + x_wipe_blocks, + wipe_width * sizeof(JBLOCK)); + } } } } @@ -224,8 +816,7 @@ ((j_common_ptr)srcinfo, dst_coef_arrays[ci], dst_blk_y, (JDIMENSION)compptr->v_samp_factor, TRUE); src_buffer = (*srcinfo->mem->access_virt_barray) - ((j_common_ptr)srcinfo, src_coef_arrays[ci], - dst_blk_y + y_crop_blocks, + ((j_common_ptr)srcinfo, src_coef_arrays[ci], dst_blk_y + y_crop_blocks, (JDIMENSION)compptr->v_samp_factor, FALSE); for (offset_y = 0; offset_y < compptr->v_samp_factor; offset_y++) { dst_row_ptr = dst_buffer[offset_y]; @@ -238,8 +829,9 @@ src_ptr = src_row_ptr[comp_width - x_crop_blocks - dst_blk_x - 1]; /* this unrolled loop doesn't need to know which row it's on... */ for (k = 0; k < DCTSIZE2; k += 2) { - *dst_ptr++ = *src_ptr++; /* copy even column */ - *dst_ptr++ = - *src_ptr++; /* copy odd column with sign change */ + *dst_ptr++ = *src_ptr++; /* copy even column */ + *dst_ptr++ = -(*src_ptr++); /* copy odd column with sign + change */ } } else { /* Copy last partial block(s) verbatim */ @@ -318,14 +910,13 @@ *dst_ptr++ = *src_ptr++; /* copy odd row with sign change */ for (j = 0; j < DCTSIZE; j++) - *dst_ptr++ = - *src_ptr++; + *dst_ptr++ = -(*src_ptr++); } } } else { /* Just copy row verbatim. */ jcopy_block_row(src_buffer[offset_y] + x_crop_blocks, - dst_buffer[offset_y], - compptr->width_in_blocks); + dst_buffer[offset_y], compptr->width_in_blocks); } } } @@ -599,11 +1190,11 @@ /* For even row, negate every odd column. */ for (j = 0; j < DCTSIZE; j += 2) { *dst_ptr++ = *src_ptr++; - *dst_ptr++ = - *src_ptr++; + *dst_ptr++ = -(*src_ptr++); } /* For odd row, negate every even column. */ for (j = 0; j < DCTSIZE; j += 2) { - *dst_ptr++ = - *src_ptr++; + *dst_ptr++ = -(*src_ptr++); *dst_ptr++ = *src_ptr++; } } @@ -614,7 +1205,7 @@ for (j = 0; j < DCTSIZE; j++) *dst_ptr++ = *src_ptr++; for (j = 0; j < DCTSIZE; j++) - *dst_ptr++ = - *src_ptr++; + *dst_ptr++ = -(*src_ptr++); } } } @@ -630,7 +1221,7 @@ src_row_ptr[comp_width - x_crop_blocks - dst_blk_x - 1]; for (i = 0; i < DCTSIZE2; i += 2) { *dst_ptr++ = *src_ptr++; - *dst_ptr++ = - *src_ptr++; + *dst_ptr++ = -(*src_ptr++); } } else { /* Any remaining right-edge blocks are only copied. */ @@ -786,7 +1377,7 @@ * The routine returns TRUE if the spec string is valid, FALSE if not. * * The crop spec string should have the format - * [f]x[f]{+-}{+-} + * [{fr}]x[{fr}]{+-}{+-} * where width, height, xoffset, and yoffset are unsigned integers. * Each of the elements can be omitted to indicate a default value. * (A weakness of this style is that it is not possible to omit xoffset @@ -811,6 +1402,9 @@ if (*spec == 'f' || *spec == 'F') { spec++; info->crop_width_set = JCROP_FORCE; + } else if (*spec == 'r' || *spec == 'R') { + spec++; + info->crop_width_set = JCROP_REFLECT; } else info->crop_width_set = JCROP_POS; } @@ -822,6 +1416,9 @@ if (*spec == 'f' || *spec == 'F') { spec++; info->crop_height_set = JCROP_FORCE; + } else if (*spec == 'r' || *spec == 'R') { + spec++; + info->crop_height_set = JCROP_REFLECT; } else info->crop_height_set = JCROP_POS; } @@ -896,10 +1493,10 @@ jvirt_barray_ptr *coef_arrays; boolean need_workspace, transpose_it; jpeg_component_info *compptr; - JDIMENSION xoffset, yoffset; + JDIMENSION xoffset, yoffset, dtemp; JDIMENSION width_in_iMCUs, height_in_iMCUs; JDIMENSION width_in_blocks, height_in_blocks; - int ci, h_samp_factor, v_samp_factor; + int itemp, ci, h_samp_factor, v_samp_factor; /* Determine number of components in output image */ if (info->force_grayscale && @@ -985,39 +1582,129 @@ info->crop_xoffset = 0; /* default to +0 */ if (info->crop_yoffset_set == JCROP_UNSET) info->crop_yoffset = 0; /* default to +0 */ - if (info->crop_xoffset >= info->output_width || - info->crop_yoffset >= info->output_height) - ERREXIT(srcinfo, JERR_BAD_CROP_SPEC); - if (info->crop_width_set == JCROP_UNSET) + if (info->crop_width_set == JCROP_UNSET) { + if (info->crop_xoffset >= info->output_width) + ERREXIT(srcinfo, JERR_BAD_CROP_SPEC); info->crop_width = info->output_width - info->crop_xoffset; - if (info->crop_height_set == JCROP_UNSET) + } else { + /* Check for crop extension */ + if (info->crop_width > info->output_width) { + /* Crop extension does not work when transforming! */ + if (info->transform != JXFORM_NONE || + info->crop_xoffset >= info->crop_width || + info->crop_xoffset > info->crop_width - info->output_width) + ERREXIT(srcinfo, JERR_BAD_CROP_SPEC); + } else { + if (info->crop_xoffset >= info->output_width || + info->crop_width <= 0 || + info->crop_xoffset > info->output_width - info->crop_width) + ERREXIT(srcinfo, JERR_BAD_CROP_SPEC); + } + } + if (info->crop_height_set == JCROP_UNSET) { + if (info->crop_yoffset >= info->output_height) + ERREXIT(srcinfo, JERR_BAD_CROP_SPEC); info->crop_height = info->output_height - info->crop_yoffset; - /* Ensure parameters are valid */ - if (info->crop_width <= 0 || info->crop_width > info->output_width || - info->crop_height <= 0 || info->crop_height > info->output_height || - info->crop_xoffset > info->output_width - info->crop_width || - info->crop_yoffset > info->output_height - info->crop_height) - ERREXIT(srcinfo, JERR_BAD_CROP_SPEC); + } else { + /* Check for crop extension */ + if (info->crop_height > info->output_height) { + /* Crop extension does not work when transforming! */ + if (info->transform != JXFORM_NONE || + info->crop_yoffset >= info->crop_height || + info->crop_yoffset > info->crop_height - info->output_height) + ERREXIT(srcinfo, JERR_BAD_CROP_SPEC); + } else { + if (info->crop_yoffset >= info->output_height || + info->crop_height <= 0 || + info->crop_yoffset > info->output_height - info->crop_height) + ERREXIT(srcinfo, JERR_BAD_CROP_SPEC); + } + } /* Convert negative crop offsets into regular offsets */ - if (info->crop_xoffset_set == JCROP_NEG) - xoffset = info->output_width - info->crop_width - info->crop_xoffset; - else + if (info->crop_xoffset_set != JCROP_NEG) xoffset = info->crop_xoffset; - if (info->crop_yoffset_set == JCROP_NEG) - yoffset = info->output_height - info->crop_height - info->crop_yoffset; + else if (info->crop_width > info->output_width) /* crop extension */ + xoffset = info->crop_width - info->output_width - info->crop_xoffset; else + xoffset = info->output_width - info->crop_width - info->crop_xoffset; + if (info->crop_yoffset_set != JCROP_NEG) yoffset = info->crop_yoffset; - /* Now adjust so that upper left corner falls at an iMCU boundary */ - if (info->crop_width_set == JCROP_FORCE) - info->output_width = info->crop_width; - else - info->output_width = - info->crop_width + (xoffset % info->iMCU_sample_width); - if (info->crop_height_set == JCROP_FORCE) - info->output_height = info->crop_height; + else if (info->crop_height > info->output_height) /* crop extension */ + yoffset = info->crop_height - info->output_height - info->crop_yoffset; else - info->output_height = - info->crop_height + (yoffset % info->iMCU_sample_height); + yoffset = info->output_height - info->crop_height - info->crop_yoffset; + /* Now adjust so that upper left corner falls at an iMCU boundary */ + switch (info->transform) { + case JXFORM_DROP: + /* Ensure the effective drop region will not exceed the requested */ + itemp = info->iMCU_sample_width; + dtemp = itemp - 1 - ((xoffset + itemp - 1) % itemp); + xoffset += dtemp; + if (info->crop_width <= dtemp) + info->drop_width = 0; + else if (xoffset + info->crop_width - dtemp == info->output_width) + /* Matching right edge: include partial iMCU */ + info->drop_width = (info->crop_width - dtemp + itemp - 1) / itemp; + else + info->drop_width = (info->crop_width - dtemp) / itemp; + itemp = info->iMCU_sample_height; + dtemp = itemp - 1 - ((yoffset + itemp - 1) % itemp); + yoffset += dtemp; + if (info->crop_height <= dtemp) + info->drop_height = 0; + else if (yoffset + info->crop_height - dtemp == info->output_height) + /* Matching bottom edge: include partial iMCU */ + info->drop_height = (info->crop_height - dtemp + itemp - 1) / itemp; + else + info->drop_height = (info->crop_height - dtemp) / itemp; + /* Check if sampling factors match for dropping */ + if (info->drop_width != 0 && info->drop_height != 0) + for (ci = 0; ci < info->num_components && + ci < info->drop_ptr->num_components; ci++) { + if (info->drop_ptr->comp_info[ci].h_samp_factor * + srcinfo->max_h_samp_factor != + srcinfo->comp_info[ci].h_samp_factor * + info->drop_ptr->max_h_samp_factor) + ERREXIT6(srcinfo, JERR_BAD_DROP_SAMPLING, ci, + info->drop_ptr->comp_info[ci].h_samp_factor, + info->drop_ptr->max_h_samp_factor, + srcinfo->comp_info[ci].h_samp_factor, + srcinfo->max_h_samp_factor, 'h'); + if (info->drop_ptr->comp_info[ci].v_samp_factor * + srcinfo->max_v_samp_factor != + srcinfo->comp_info[ci].v_samp_factor * + info->drop_ptr->max_v_samp_factor) + ERREXIT6(srcinfo, JERR_BAD_DROP_SAMPLING, ci, + info->drop_ptr->comp_info[ci].v_samp_factor, + info->drop_ptr->max_v_samp_factor, + srcinfo->comp_info[ci].v_samp_factor, + srcinfo->max_v_samp_factor, 'v'); + } + break; + case JXFORM_WIPE: + /* Ensure the effective wipe region will cover the requested */ + info->drop_width = (JDIMENSION)jdiv_round_up + ((long)(info->crop_width + (xoffset % info->iMCU_sample_width)), + (long)info->iMCU_sample_width); + info->drop_height = (JDIMENSION)jdiv_round_up + ((long)(info->crop_height + (yoffset % info->iMCU_sample_height)), + (long)info->iMCU_sample_height); + break; + default: + /* Ensure the effective crop region will cover the requested */ + if (info->crop_width_set == JCROP_FORCE || + info->crop_width > info->output_width) + info->output_width = info->crop_width; + else + info->output_width = + info->crop_width + (xoffset % info->iMCU_sample_width); + if (info->crop_height_set == JCROP_FORCE || + info->crop_height > info->output_height) + info->output_height = info->crop_height; + else + info->output_height = + info->crop_height + (yoffset % info->iMCU_sample_height); + } /* Save x/y offsets measured in iMCUs */ info->x_crop_offset = xoffset / info->iMCU_sample_width; info->y_crop_offset = yoffset / info->iMCU_sample_height; @@ -1033,7 +1720,9 @@ transpose_it = FALSE; switch (info->transform) { case JXFORM_NONE: - if (info->x_crop_offset != 0 || info->y_crop_offset != 0) + if (info->x_crop_offset != 0 || info->y_crop_offset != 0 || + info->output_width > srcinfo->output_width || + info->output_height > srcinfo->output_height) need_workspace = TRUE; /* No workspace needed if neither cropping nor transforming */ break; @@ -1087,6 +1776,10 @@ need_workspace = TRUE; transpose_it = TRUE; break; + case JXFORM_WIPE: + break; + case JXFORM_DROP: + break; } /* Allocate workspace if needed. @@ -1190,47 +1883,47 @@ if (length < 12) return; /* Length of an IFD entry */ /* Discover byte order */ - if (GETJOCTET(data[0]) == 0x49 && GETJOCTET(data[1]) == 0x49) + if (data[0] == 0x49 && data[1] == 0x49) is_motorola = FALSE; - else if (GETJOCTET(data[0]) == 0x4D && GETJOCTET(data[1]) == 0x4D) + else if (data[0] == 0x4D && data[1] == 0x4D) is_motorola = TRUE; else return; /* Check Tag Mark */ if (is_motorola) { - if (GETJOCTET(data[2]) != 0) return; - if (GETJOCTET(data[3]) != 0x2A) return; + if (data[2] != 0) return; + if (data[3] != 0x2A) return; } else { - if (GETJOCTET(data[3]) != 0) return; - if (GETJOCTET(data[2]) != 0x2A) return; + if (data[3] != 0) return; + if (data[2] != 0x2A) return; } /* Get first IFD offset (offset to IFD0) */ if (is_motorola) { - if (GETJOCTET(data[4]) != 0) return; - if (GETJOCTET(data[5]) != 0) return; - firstoffset = GETJOCTET(data[6]); + if (data[4] != 0) return; + if (data[5] != 0) return; + firstoffset = data[6]; firstoffset <<= 8; - firstoffset += GETJOCTET(data[7]); + firstoffset += data[7]; } else { - if (GETJOCTET(data[7]) != 0) return; - if (GETJOCTET(data[6]) != 0) return; - firstoffset = GETJOCTET(data[5]); + if (data[7] != 0) return; + if (data[6] != 0) return; + firstoffset = data[5]; firstoffset <<= 8; - firstoffset += GETJOCTET(data[4]); + firstoffset += data[4]; } if (firstoffset > length - 2) return; /* check end of data segment */ /* Get the number of directory entries contained in this IFD */ if (is_motorola) { - number_of_tags = GETJOCTET(data[firstoffset]); + number_of_tags = data[firstoffset]; number_of_tags <<= 8; - number_of_tags += GETJOCTET(data[firstoffset + 1]); + number_of_tags += data[firstoffset + 1]; } else { - number_of_tags = GETJOCTET(data[firstoffset + 1]); + number_of_tags = data[firstoffset + 1]; number_of_tags <<= 8; - number_of_tags += GETJOCTET(data[firstoffset]); + number_of_tags += data[firstoffset]; } if (number_of_tags == 0) return; firstoffset += 2; @@ -1240,13 +1933,13 @@ if (firstoffset > length - 12) return; /* check end of data segment */ /* Get Tag number */ if (is_motorola) { - tagnum = GETJOCTET(data[firstoffset]); + tagnum = data[firstoffset]; tagnum <<= 8; - tagnum += GETJOCTET(data[firstoffset + 1]); + tagnum += data[firstoffset + 1]; } else { - tagnum = GETJOCTET(data[firstoffset + 1]); + tagnum = data[firstoffset + 1]; tagnum <<= 8; - tagnum += GETJOCTET(data[firstoffset]); + tagnum += data[firstoffset]; } if (tagnum == 0x8769) break; /* found ExifSubIFD offset Tag */ if (--number_of_tags == 0) return; @@ -1255,29 +1948,29 @@ /* Get the ExifSubIFD offset */ if (is_motorola) { - if (GETJOCTET(data[firstoffset + 8]) != 0) return; - if (GETJOCTET(data[firstoffset + 9]) != 0) return; - offset = GETJOCTET(data[firstoffset + 10]); + if (data[firstoffset + 8] != 0) return; + if (data[firstoffset + 9] != 0) return; + offset = data[firstoffset + 10]; offset <<= 8; - offset += GETJOCTET(data[firstoffset + 11]); + offset += data[firstoffset + 11]; } else { - if (GETJOCTET(data[firstoffset + 11]) != 0) return; - if (GETJOCTET(data[firstoffset + 10]) != 0) return; - offset = GETJOCTET(data[firstoffset + 9]); + if (data[firstoffset + 11] != 0) return; + if (data[firstoffset + 10] != 0) return; + offset = data[firstoffset + 9]; offset <<= 8; - offset += GETJOCTET(data[firstoffset + 8]); + offset += data[firstoffset + 8]; } if (offset > length - 2) return; /* check end of data segment */ /* Get the number of directory entries contained in this SubIFD */ if (is_motorola) { - number_of_tags = GETJOCTET(data[offset]); + number_of_tags = data[offset]; number_of_tags <<= 8; - number_of_tags += GETJOCTET(data[offset + 1]); + number_of_tags += data[offset + 1]; } else { - number_of_tags = GETJOCTET(data[offset + 1]); + number_of_tags = data[offset + 1]; number_of_tags <<= 8; - number_of_tags += GETJOCTET(data[offset]); + number_of_tags += data[offset]; } if (number_of_tags < 2) return; offset += 2; @@ -1287,13 +1980,13 @@ if (offset > length - 12) return; /* check end of data segment */ /* Get Tag number */ if (is_motorola) { - tagnum = GETJOCTET(data[offset]); + tagnum = data[offset]; tagnum <<= 8; - tagnum += GETJOCTET(data[offset + 1]); + tagnum += data[offset + 1]; } else { - tagnum = GETJOCTET(data[offset + 1]); + tagnum = data[offset + 1]; tagnum <<= 8; - tagnum += GETJOCTET(data[offset]); + tagnum += data[offset]; } if (tagnum == 0xA002 || tagnum == 0xA003) { if (tagnum == 0xA002) @@ -1387,7 +2080,7 @@ dstinfo->jpeg_height = info->output_height; #endif - /* Transpose destination image parameters */ + /* Transpose destination image parameters, adjust quantization */ switch (info->transform) { case JXFORM_TRANSPOSE: case JXFORM_TRANSVERSE: @@ -1399,6 +2092,12 @@ #endif transpose_critical_parameters(dstinfo); break; + case JXFORM_DROP: + if (info->drop_width != 0 && info->drop_height != 0) + adjust_quant(srcinfo, src_coef_arrays, + info->drop_ptr, info->drop_coef_arrays, + info->trim, dstinfo); + break; default: #if JPEG_LIB_VERSION < 80 dstinfo->image_width = info->output_width; @@ -1411,12 +2110,12 @@ if (srcinfo->marker_list != NULL && srcinfo->marker_list->marker == JPEG_APP0 + 1 && srcinfo->marker_list->data_length >= 6 && - GETJOCTET(srcinfo->marker_list->data[0]) == 0x45 && - GETJOCTET(srcinfo->marker_list->data[1]) == 0x78 && - GETJOCTET(srcinfo->marker_list->data[2]) == 0x69 && - GETJOCTET(srcinfo->marker_list->data[3]) == 0x66 && - GETJOCTET(srcinfo->marker_list->data[4]) == 0 && - GETJOCTET(srcinfo->marker_list->data[5]) == 0) { + srcinfo->marker_list->data[0] == 0x45 && + srcinfo->marker_list->data[1] == 0x78 && + srcinfo->marker_list->data[2] == 0x69 && + srcinfo->marker_list->data[3] == 0x66 && + srcinfo->marker_list->data[4] == 0 && + srcinfo->marker_list->data[5] == 0) { /* Suppress output of JFIF marker */ dstinfo->write_JFIF_header = FALSE; /* Adjust Exif image parameters */ @@ -1465,7 +2164,23 @@ */ switch (info->transform) { case JXFORM_NONE: - if (info->x_crop_offset != 0 || info->y_crop_offset != 0) + if (info->output_width > srcinfo->output_width || + info->output_height > srcinfo->output_height) { + if (info->output_width > srcinfo->output_width && + info->crop_width_set == JCROP_REFLECT) + do_crop_ext_reflect(srcinfo, dstinfo, + info->x_crop_offset, info->y_crop_offset, + src_coef_arrays, dst_coef_arrays); + else if (info->output_width > srcinfo->output_width && + info->crop_width_set == JCROP_FORCE) + do_crop_ext_flat(srcinfo, dstinfo, + info->x_crop_offset, info->y_crop_offset, + src_coef_arrays, dst_coef_arrays); + else + do_crop_ext_zero(srcinfo, dstinfo, + info->x_crop_offset, info->y_crop_offset, + src_coef_arrays, dst_coef_arrays); + } else if (info->x_crop_offset != 0 || info->y_crop_offset != 0) do_crop(srcinfo, dstinfo, info->x_crop_offset, info->y_crop_offset, src_coef_arrays, dst_coef_arrays); break; @@ -1501,6 +2216,30 @@ do_rot_270(srcinfo, dstinfo, info->x_crop_offset, info->y_crop_offset, src_coef_arrays, dst_coef_arrays); break; + case JXFORM_WIPE: + if (info->crop_width_set == JCROP_REFLECT && + info->y_crop_offset == 0 && info->drop_height == + (JDIMENSION)jdiv_round_up + ((long)info->output_height, (long)info->iMCU_sample_height) && + (info->x_crop_offset == 0 || + info->x_crop_offset + info->drop_width == + (JDIMENSION)jdiv_round_up + ((long)info->output_width, (long)info->iMCU_sample_width))) + do_reflect(srcinfo, dstinfo, info->x_crop_offset, + src_coef_arrays, info->drop_width, info->drop_height); + else if (info->crop_width_set == JCROP_FORCE) + do_flatten(srcinfo, dstinfo, info->x_crop_offset, info->y_crop_offset, + src_coef_arrays, info->drop_width, info->drop_height); + else + do_wipe(srcinfo, dstinfo, info->x_crop_offset, info->y_crop_offset, + src_coef_arrays, info->drop_width, info->drop_height); + break; + case JXFORM_DROP: + if (info->drop_width != 0 && info->drop_height != 0) + do_drop(srcinfo, dstinfo, info->x_crop_offset, info->y_crop_offset, + src_coef_arrays, info->drop_ptr, info->drop_coef_arrays, + info->drop_width, info->drop_height); + break; } } @@ -1571,7 +2310,7 @@ int m; /* Save comments except under NONE option */ - if (option != JCOPYOPT_NONE) { + if (option != JCOPYOPT_NONE && option != JCOPYOPT_ICC) { jpeg_save_markers(srcinfo, JPEG_COM, 0xFFFF); } /* Save all types of APPn markers iff ALL option */ @@ -1582,6 +2321,10 @@ jpeg_save_markers(srcinfo, JPEG_APP0 + m, 0xFFFF); } } + /* Save only APP2 markers if ICC option selected */ + if (option == JCOPYOPT_ICC) { + jpeg_save_markers(srcinfo, JPEG_APP0 + 2, 0xFFFF); + } #endif /* SAVE_MARKERS_SUPPORTED */ } @@ -1607,20 +2350,20 @@ if (dstinfo->write_JFIF_header && marker->marker == JPEG_APP0 && marker->data_length >= 5 && - GETJOCTET(marker->data[0]) == 0x4A && - GETJOCTET(marker->data[1]) == 0x46 && - GETJOCTET(marker->data[2]) == 0x49 && - GETJOCTET(marker->data[3]) == 0x46 && - GETJOCTET(marker->data[4]) == 0) + marker->data[0] == 0x4A && + marker->data[1] == 0x46 && + marker->data[2] == 0x49 && + marker->data[3] == 0x46 && + marker->data[4] == 0) continue; /* reject duplicate JFIF */ if (dstinfo->write_Adobe_marker && marker->marker == JPEG_APP0 + 14 && marker->data_length >= 5 && - GETJOCTET(marker->data[0]) == 0x41 && - GETJOCTET(marker->data[1]) == 0x64 && - GETJOCTET(marker->data[2]) == 0x6F && - GETJOCTET(marker->data[3]) == 0x62 && - GETJOCTET(marker->data[4]) == 0x65) + marker->data[0] == 0x41 && + marker->data[1] == 0x64 && + marker->data[2] == 0x6F && + marker->data[3] == 0x62 && + marker->data[4] == 0x65) continue; /* reject duplicate Adobe */ jpeg_write_marker(dstinfo, marker->marker, marker->data, marker->data_length); diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/transupp.h b/src/3rdparty/chromium/third_party/libjpeg_turbo/transupp.h --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/transupp.h 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/transupp.h 2021-11-20 03:41:33.406600322 +0000 @@ -2,9 +2,9 @@ * transupp.h * * This file was part of the Independent JPEG Group's software: - * Copyright (C) 1997-2011, Thomas G. Lane, Guido Vollbeding. + * Copyright (C) 1997-2019, Thomas G. Lane, Guido Vollbeding. * libjpeg-turbo Modifications: - * Copyright (C) 2017, D. R. Commander. + * Copyright (C) 2017, 2021, D. R. Commander. * For conditions of distribution and use, see the accompanying README.ijg * file. * @@ -62,6 +62,17 @@ * output image covers at least the requested region, but may cover more.) * The adjustment of the region dimensions may be optionally disabled. * + * A complementary lossless wipe option is provided to discard (gray out) data + * inside a given image region while losslessly preserving what is outside. + * A lossless drop option is also provided, which allows another JPEG image to + * be inserted ("dropped") into the source image data at a given position, + * replacing the existing image data at that position. Both the source image + * and the drop image must have the same subsampling level. It is best if they + * also have the same quantization (quality.) Otherwise, the quantization of + * the output image will be adapted to accommodate the higher of the source + * image quality and the drop image quality. The trim option can be used with + * the drop option to requantize the drop image to match the source image. + * * We also provide a lossless-resize option, which is kind of a lossless-crop * operation in the DCT coefficient block domain - it discards higher-order * coefficients and losslessly preserves lower-order coefficients of a @@ -92,20 +103,23 @@ JXFORM_TRANSVERSE, /* transpose across UR-to-LL axis */ JXFORM_ROT_90, /* 90-degree clockwise rotation */ JXFORM_ROT_180, /* 180-degree rotation */ - JXFORM_ROT_270 /* 270-degree clockwise (or 90 ccw) */ + JXFORM_ROT_270, /* 270-degree clockwise (or 90 ccw) */ + JXFORM_WIPE, /* wipe */ + JXFORM_DROP /* drop */ } JXFORM_CODE; /* * Codes for crop parameters, which can individually be unspecified, * positive or negative for xoffset or yoffset, - * positive or forced for width or height. + * positive or force or reflect for width or height. */ typedef enum { JCROP_UNSET, JCROP_POS, JCROP_NEG, - JCROP_FORCE + JCROP_FORCE, + JCROP_REFLECT } JCROP_CODE; /* @@ -120,7 +134,7 @@ boolean perfect; /* if TRUE, fail if partial MCUs are requested */ boolean trim; /* if TRUE, trim partial MCUs as needed */ boolean force_grayscale; /* if TRUE, convert color image to grayscale */ - boolean crop; /* if TRUE, crop source image */ + boolean crop; /* if TRUE, crop or wipe source image, or drop */ boolean slow_hflip; /* For best performance, the JXFORM_FLIP_H transform normally modifies the source coefficients in place. Setting this to TRUE will instead use a slower, @@ -133,14 +147,18 @@ * These can be filled in by jtransform_parse_crop_spec(). */ JDIMENSION crop_width; /* Width of selected region */ - JCROP_CODE crop_width_set; /* (forced disables adjustment) */ + JCROP_CODE crop_width_set; /* (force-disables adjustment) */ JDIMENSION crop_height; /* Height of selected region */ - JCROP_CODE crop_height_set; /* (forced disables adjustment) */ + JCROP_CODE crop_height_set; /* (force-disables adjustment) */ JDIMENSION crop_xoffset; /* X offset of selected region */ JCROP_CODE crop_xoffset_set; /* (negative measures from right edge) */ JDIMENSION crop_yoffset; /* Y offset of selected region */ JCROP_CODE crop_yoffset_set; /* (negative measures from bottom edge) */ + /* Drop parameters: set by caller for drop request */ + j_decompress_ptr drop_ptr; + jvirt_barray_ptr *drop_coef_arrays; + /* Internal workspace: caller should not touch these */ int num_components; /* # of components in workspace */ jvirt_barray_ptr *workspace_coef_arrays; /* workspace for transformations */ @@ -148,6 +166,8 @@ JDIMENSION output_height; JDIMENSION x_crop_offset; /* destination crop offsets measured in iMCUs */ JDIMENSION y_crop_offset; + JDIMENSION drop_width; /* drop/wipe dimensions measured in iMCUs */ + JDIMENSION drop_height; int iMCU_sample_width; /* destination iMCU size */ int iMCU_sample_height; } jpeg_transform_info; @@ -193,10 +213,11 @@ */ typedef enum { - JCOPYOPT_NONE, /* copy no optional markers */ - JCOPYOPT_COMMENTS, /* copy only comment (COM) markers */ - JCOPYOPT_ALL, /* copy all optional markers */ - JCOPYOPT_ALL_EXCEPT_ICC /* copy all optional markers except APP2 */ + JCOPYOPT_NONE, /* copy no optional markers */ + JCOPYOPT_COMMENTS, /* copy only comment (COM) markers */ + JCOPYOPT_ALL, /* copy all optional markers */ + JCOPYOPT_ALL_EXCEPT_ICC, /* copy all optional markers except APP2 */ + JCOPYOPT_ICC /* copy only ICC profile (APP2) markers */ } JCOPY_OPTION; #define JCOPYOPT_DEFAULT JCOPYOPT_COMMENTS /* recommended default */ diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/turbojpeg.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/turbojpeg.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/turbojpeg.c 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/turbojpeg.c 2021-11-20 03:41:33.406600322 +0000 @@ -1,5 +1,6 @@ /* - * Copyright (C)2009-2020 D. R. Commander. All Rights Reserved. + * Copyright (C)2009-2021 D. R. Commander. All Rights Reserved. + * Copyright (C)2021 Alex Richardson. All Rights Reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are met: @@ -112,6 +113,32 @@ boolean isInstanceError; } tjinstance; +struct my_progress_mgr { + struct jpeg_progress_mgr pub; + tjinstance *this; +}; +typedef struct my_progress_mgr *my_progress_ptr; + +static void my_progress_monitor(j_common_ptr dinfo) +{ + my_error_ptr myerr = (my_error_ptr)dinfo->err; + my_progress_ptr myprog = (my_progress_ptr)dinfo->progress; + + if (dinfo->is_decompressor) { + int scan_no = ((j_decompress_ptr)dinfo)->input_scan_number; + + if (scan_no > 500) { + snprintf(myprog->this->errStr, JMSG_LENGTH_MAX, + "Progressive JPEG image has more than 500 scans"); + snprintf(errStr, JMSG_LENGTH_MAX, + "Progressive JPEG image has more than 500 scans"); + myprog->this->isInstanceError = TRUE; + myerr->warning = FALSE; + longjmp(myerr->setjmp_buffer, 1); + } + } +} + static const int pixelsize[TJ_NUMSAMP] = { 3, 3, 3, 1, 3, 3 }; static const JXFORM_CODE xformtypes[TJ_NUMXOP] = { @@ -178,6 +205,11 @@ this->isInstanceError = TRUE; THROWG(m) \ } +#ifdef FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION +/* Private flag that triggers different TurboJPEG API behavior when fuzzing */ +#define TJFLAG_FUZZING (1 << 30) +#endif + #define GET_INSTANCE(handle) \ tjinstance *this = (tjinstance *)handle; \ j_compress_ptr cinfo = NULL; \ @@ -234,10 +266,10 @@ return -1; } -static int setCompDefaults(struct jpeg_compress_struct *cinfo, int pixelFormat, - int subsamp, int jpegQual, int flags) +static void setCompDefaults(struct jpeg_compress_struct *cinfo, + int pixelFormat, int subsamp, int jpegQual, + int flags) { - int retval = 0; #ifndef NO_GETENV char *env = NULL; #endif @@ -300,8 +332,6 @@ cinfo->comp_info[2].v_samp_factor = 1; if (cinfo->num_components > 3) cinfo->comp_info[3].v_samp_factor = tjMCUHeight[subsamp] / 8; - - return retval; } @@ -676,8 +706,7 @@ alloc = 0; *jpegSize = tjBufSize(width, height, jpegSubsamp); } jpeg_mem_dest_tj(cinfo, jpegBuf, jpegSize, alloc); - if (setCompDefaults(cinfo, pixelFormat, jpegSubsamp, jpegQual, flags) == -1) - return -1; + setCompDefaults(cinfo, pixelFormat, jpegSubsamp, jpegQual, flags); jpeg_start_compress(cinfo, TRUE); for (i = 0; i < height; i++) { @@ -692,7 +721,10 @@ jpeg_finish_compress(cinfo); bailout: - if (cinfo->global_state > CSTATE_START) jpeg_abort_compress(cinfo); + if (cinfo->global_state > CSTATE_START) { + if (alloc) (*cinfo->dest->term_destination) (cinfo); + jpeg_abort_compress(cinfo); + } free(row_pointer); if (this->jerr.warning) retval = -1; this->jerr.stopOnWarning = FALSE; @@ -772,7 +804,7 @@ else if (flags & TJFLAG_FORCESSE2) putenv("JSIMD_FORCESSE2=1"); #endif - if (setCompDefaults(cinfo, pixelFormat, subsamp, -1, flags) == -1) return -1; + setCompDefaults(cinfo, pixelFormat, subsamp, -1, flags); /* Execute only the parts of jpeg_start_compress() that we need. If we were to call the whole jpeg_start_compress() function, then it would try @@ -814,7 +846,7 @@ THROW("tjEncodeYUVPlanes(): Memory allocation failure"); for (row = 0; row < cinfo->max_v_samp_factor; row++) { unsigned char *_tmpbuf_aligned = - (unsigned char *)PAD((size_t)_tmpbuf[i], 32); + (unsigned char *)PAD((JUINTPTR)_tmpbuf[i], 32); tmpbuf[i][row] = &_tmpbuf_aligned[ PAD((compptr->width_in_blocks * cinfo->max_h_samp_factor * DCTSIZE) / @@ -830,7 +862,7 @@ THROW("tjEncodeYUVPlanes(): Memory allocation failure"); for (row = 0; row < compptr->v_samp_factor; row++) { unsigned char *_tmpbuf2_aligned = - (unsigned char *)PAD((size_t)_tmpbuf2[i], 32); + (unsigned char *)PAD((JUINTPTR)_tmpbuf2[i], 32); tmpbuf2[i][row] = &_tmpbuf2_aligned[PAD(compptr->width_in_blocks * DCTSIZE, 32) * row]; @@ -986,8 +1018,7 @@ alloc = 0; *jpegSize = tjBufSize(width, height, subsamp); } jpeg_mem_dest_tj(cinfo, jpegBuf, jpegSize, alloc); - if (setCompDefaults(cinfo, TJPF_RGB, subsamp, jpegQual, flags) == -1) - return -1; + setCompDefaults(cinfo, TJPF_RGB, subsamp, jpegQual, flags); cinfo->raw_data_in = TRUE; jpeg_start_compress(cinfo, TRUE); @@ -1061,7 +1092,10 @@ jpeg_finish_compress(cinfo); bailout: - if (cinfo->global_state > CSTATE_START) jpeg_abort_compress(cinfo); + if (cinfo->global_state > CSTATE_START) { + if (alloc) (*cinfo->dest->term_destination) (cinfo); + jpeg_abort_compress(cinfo); + } for (i = 0; i < MAX_COMPONENTS; i++) { free(tmpbuf[i]); free(inbuf[i]); @@ -1249,6 +1283,7 @@ { JSAMPROW *row_pointer = NULL; int i, retval = 0, jpegwidth, jpegheight, scaledw, scaledh; + struct my_progress_mgr progress; GET_DINSTANCE(handle); this->jerr.stopOnWarning = (flags & TJFLAG_STOPONWARNING) ? TRUE : FALSE; @@ -1265,6 +1300,14 @@ else if (flags & TJFLAG_FORCESSE2) putenv("JSIMD_FORCESSE2=1"); #endif + if (flags & TJFLAG_LIMITSCANS) { + MEMZERO(&progress, sizeof(struct my_progress_mgr)); + progress.pub.progress_monitor = my_progress_monitor; + progress.this = this; + dinfo->progress = &progress.pub; + } else + dinfo->progress = NULL; + if (setjmp(this->jerr.setjmp_buffer)) { /* If we get here, the JPEG code has signaled an error. */ retval = -1; goto bailout; @@ -1482,7 +1525,7 @@ THROW("tjDecodeYUVPlanes(): Memory allocation failure"); for (row = 0; row < compptr->v_samp_factor; row++) { unsigned char *_tmpbuf_aligned = - (unsigned char *)PAD((size_t)_tmpbuf[i], 32); + (unsigned char *)PAD((JUINTPTR)_tmpbuf[i], 32); tmpbuf[i][row] = &_tmpbuf_aligned[PAD(compptr->width_in_blocks * DCTSIZE, 32) * row]; @@ -1583,6 +1626,7 @@ JSAMPLE *_tmpbuf = NULL, *ptr; JSAMPROW *outbuf[MAX_COMPONENTS], *tmpbuf[MAX_COMPONENTS]; int dctsize; + struct my_progress_mgr progress; GET_DINSTANCE(handle); this->jerr.stopOnWarning = (flags & TJFLAG_STOPONWARNING) ? TRUE : FALSE; @@ -1604,6 +1648,14 @@ else if (flags & TJFLAG_FORCESSE2) putenv("JSIMD_FORCESSE2=1"); #endif + if (flags & TJFLAG_LIMITSCANS) { + MEMZERO(&progress, sizeof(struct my_progress_mgr)); + progress.pub.progress_monitor = my_progress_monitor; + progress.this = this; + dinfo->progress = &progress.pub; + } else + dinfo->progress = NULL; + if (setjmp(this->jerr.setjmp_buffer)) { /* If we get here, the JPEG code has signaled an error. */ retval = -1; goto bailout; @@ -1841,7 +1893,8 @@ { jpeg_transform_info *xinfo = NULL; jvirt_barray_ptr *srccoefs, *dstcoefs; - int retval = 0, i, jpegSubsamp, saveMarkers = 0; + int retval = 0, alloc = 1, i, jpegSubsamp, saveMarkers = 0; + struct my_progress_mgr progress; GET_INSTANCE(handle); this->jerr.stopOnWarning = (flags & TJFLAG_STOPONWARNING) ? TRUE : FALSE; @@ -1858,6 +1911,14 @@ else if (flags & TJFLAG_FORCESSE2) putenv("JSIMD_FORCESSE2=1"); #endif + if (flags & TJFLAG_LIMITSCANS) { + MEMZERO(&progress, sizeof(struct my_progress_mgr)); + progress.pub.progress_monitor = my_progress_monitor; + progress.this = this; + dinfo->progress = &progress.pub; + } else + dinfo->progress = NULL; + if ((xinfo = (jpeg_transform_info *)malloc(sizeof(jpeg_transform_info) * n)) == NULL) THROW("tjTransform(): Memory allocation failure"); @@ -1920,7 +1981,7 @@ srccoefs = jpeg_read_coefficients(dinfo); for (i = 0; i < n; i++) { - int w, h, alloc = 1; + int w, h; if (!xinfo[i].crop) { w = dinfo->image_width; h = dinfo->image_height; @@ -1978,7 +2039,10 @@ jpeg_finish_decompress(dinfo); bailout: - if (cinfo->global_state > CSTATE_START) jpeg_abort_compress(cinfo); + if (cinfo->global_state > CSTATE_START) { + if (alloc) (*cinfo->dest->term_destination) (cinfo); + jpeg_abort_compress(cinfo); + } if (dinfo->global_state > DSTATE_START) jpeg_abort_decompress(dinfo); free(xinfo); if (this->jerr.warning) retval = -1; @@ -2038,6 +2102,11 @@ THROWG("tjLoadImage(): Unsupported file type"); src->input_file = file; +#ifdef FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION + /* Refuse to load images larger than 1 Megapixel when fuzzing. */ + if (flags & TJFLAG_FUZZING) + src->max_pixels = 1048576; +#endif (*src->start_input) (cinfo, src); (*cinfo->mem->realize_virt_arrays) ((j_common_ptr)cinfo); diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/turbojpeg.h b/src/3rdparty/chromium/third_party/libjpeg_turbo/turbojpeg.h --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/turbojpeg.h 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/turbojpeg.h 2021-11-20 03:41:33.406600322 +0000 @@ -1,5 +1,6 @@ /* - * Copyright (C)2009-2015, 2017, 2020 D. R. Commander. All Rights Reserved. + * Copyright (C)2009-2015, 2017, 2020-2021 D. R. Commander. + * All Rights Reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are met: @@ -418,6 +419,16 @@ * reduce compression and decompression performance considerably. */ #define TJFLAG_PROGRESSIVE 16384 +/** + * Limit the number of progressive JPEG scans that the decompression and + * transform functions will process. If a progressive JPEG image contains an + * unreasonably large number of scans, then this flag will cause the + * decompression and transform functions to return an error. The primary + * purpose of this is to allow security-critical applications to guard against + * an exploit of the progressive JPEG format described in + * this report. + */ +#define TJFLAG_LIMITSCANS 32768 /** diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/turbojpeg-jni.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/turbojpeg-jni.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/turbojpeg-jni.c 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/turbojpeg-jni.c 2021-11-20 03:41:33.406600322 +0000 @@ -1,5 +1,5 @@ /* - * Copyright (C)2011-2019 D. R. Commander. All Rights Reserved. + * Copyright (C)2011-2020 D. R. Commander. All Rights Reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are met: @@ -326,9 +326,11 @@ tjhandle handle = 0; unsigned long jpegSize = 0; jbyteArray jSrcPlanes[3] = { NULL, NULL, NULL }; - const unsigned char *srcPlanes[3]; + const unsigned char *srcPlanesTmp[3] = { NULL, NULL, NULL }; + const unsigned char *srcPlanes[3] = { NULL, NULL, NULL }; + int *srcOffsetsTmp = NULL, srcOffsets[3] = { 0, 0, 0 }; + int *srcStridesTmp = NULL, srcStrides[3] = { 0, 0, 0 }; unsigned char *jpegBuf = NULL; - int *srcOffsets = NULL, *srcStrides = NULL; int nc = (subsamp == org_libjpegturbo_turbojpeg_TJ_SAMP_GRAY ? 1 : 3), i; GET_HANDLE(); @@ -351,56 +353,49 @@ if (ProcessSystemProperties(env) < 0) goto bailout; -#define RELEASE_ARRAYS_COMPRESSFROMYUV() { \ - SAFE_RELEASE(dst, jpegBuf); \ - for (i = 0; i < nc; i++) \ - SAFE_RELEASE(jSrcPlanes[i], srcPlanes[i]); \ - SAFE_RELEASE(jSrcStrides, srcStrides); \ - SAFE_RELEASE(jSrcOffsets, srcOffsets); \ -} + BAILIF0(srcOffsetsTmp = + (*env)->GetPrimitiveArrayCritical(env, jSrcOffsets, 0)); + for (i = 0; i < nc; i++) srcOffsets[i] = srcOffsetsTmp[i]; + SAFE_RELEASE(jSrcOffsets, srcOffsetsTmp); + + BAILIF0(srcStridesTmp = + (*env)->GetPrimitiveArrayCritical(env, jSrcStrides, 0)); + for (i = 0; i < nc; i++) srcStrides[i] = srcStridesTmp[i]; + SAFE_RELEASE(jSrcStrides, srcStridesTmp); - BAILIF0(srcOffsets = (*env)->GetPrimitiveArrayCritical(env, jSrcOffsets, 0)); - BAILIF0(srcStrides = (*env)->GetPrimitiveArrayCritical(env, jSrcStrides, 0)); for (i = 0; i < nc; i++) { int planeSize = tjPlaneSizeYUV(i, width, srcStrides[i], height, subsamp); int pw = tjPlaneWidth(i, width, subsamp); - if (planeSize < 0 || pw < 0) { - RELEASE_ARRAYS_COMPRESSFROMYUV(); + if (planeSize < 0 || pw < 0) THROW_ARG(tjGetErrorStr()); - } - if (srcOffsets[i] < 0) { - RELEASE_ARRAYS_COMPRESSFROMYUV(); + if (srcOffsets[i] < 0) THROW_ARG("Invalid argument in compressFromYUV()"); - } - if (srcStrides[i] < 0 && srcOffsets[i] - planeSize + pw < 0) { - RELEASE_ARRAYS_COMPRESSFROMYUV(); + if (srcStrides[i] < 0 && srcOffsets[i] - planeSize + pw < 0) THROW_ARG("Negative plane stride would cause memory to be accessed below plane boundary"); - } BAILIF0(jSrcPlanes[i] = (*env)->GetObjectArrayElement(env, srcobjs, i)); if ((*env)->GetArrayLength(env, jSrcPlanes[i]) < - srcOffsets[i] + planeSize) { - RELEASE_ARRAYS_COMPRESSFROMYUV(); + srcOffsets[i] + planeSize) THROW_ARG("Source plane is not large enough"); - } - BAILIF0(srcPlanes[i] = + BAILIF0(srcPlanesTmp[i] = (*env)->GetPrimitiveArrayCritical(env, jSrcPlanes[i], 0)); - srcPlanes[i] = &srcPlanes[i][srcOffsets[i]]; + srcPlanes[i] = &srcPlanesTmp[i][srcOffsets[i]]; + SAFE_RELEASE(jSrcPlanes[i], srcPlanesTmp[i]); } BAILIF0(jpegBuf = (*env)->GetPrimitiveArrayCritical(env, dst, 0)); if (tjCompressFromYUVPlanes(handle, srcPlanes, width, srcStrides, height, subsamp, &jpegBuf, &jpegSize, jpegQual, flags | TJFLAG_NOREALLOC) == -1) { - RELEASE_ARRAYS_COMPRESSFROMYUV(); + SAFE_RELEASE(dst, jpegBuf); THROW_TJ(); } bailout: - RELEASE_ARRAYS_COMPRESSFROMYUV(); + SAFE_RELEASE(dst, jpegBuf); return (jint)jpegSize; } @@ -411,9 +406,12 @@ { tjhandle handle = 0; jsize arraySize = 0, actualPitch; + unsigned char *srcBuf = NULL; jbyteArray jDstPlanes[3] = { NULL, NULL, NULL }; - unsigned char *srcBuf = NULL, *dstPlanes[3]; - int *dstOffsets = NULL, *dstStrides = NULL; + unsigned char *dstPlanesTmp[3] = { NULL, NULL, NULL }; + unsigned char *dstPlanes[3] = { NULL, NULL, NULL }; + int *dstOffsetsTmp = NULL, dstOffsets[3] = { 0, 0, 0 }; + int *dstStridesTmp = NULL, dstStrides[3] = { 0, 0, 0 }; int nc = (subsamp == org_libjpegturbo_turbojpeg_TJ_SAMP_GRAY ? 1 : 3), i; GET_HANDLE(); @@ -438,56 +436,49 @@ if ((*env)->GetArrayLength(env, src) * srcElementSize < arraySize) THROW_ARG("Source buffer is not large enough"); -#define RELEASE_ARRAYS_ENCODEYUV() { \ - SAFE_RELEASE(src, srcBuf); \ - for (i = 0; i < nc; i++) \ - SAFE_RELEASE(jDstPlanes[i], dstPlanes[i]); \ - SAFE_RELEASE(jDstStrides, dstStrides); \ - SAFE_RELEASE(jDstOffsets, dstOffsets); \ -} + BAILIF0(dstOffsetsTmp = + (*env)->GetPrimitiveArrayCritical(env, jDstOffsets, 0)); + for (i = 0; i < nc; i++) dstOffsets[i] = dstOffsetsTmp[i]; + SAFE_RELEASE(jDstOffsets, dstOffsetsTmp); + + BAILIF0(dstStridesTmp = + (*env)->GetPrimitiveArrayCritical(env, jDstStrides, 0)); + for (i = 0; i < nc; i++) dstStrides[i] = dstStridesTmp[i]; + SAFE_RELEASE(jDstStrides, dstStridesTmp); - BAILIF0(dstOffsets = (*env)->GetPrimitiveArrayCritical(env, jDstOffsets, 0)); - BAILIF0(dstStrides = (*env)->GetPrimitiveArrayCritical(env, jDstStrides, 0)); for (i = 0; i < nc; i++) { int planeSize = tjPlaneSizeYUV(i, width, dstStrides[i], height, subsamp); int pw = tjPlaneWidth(i, width, subsamp); - if (planeSize < 0 || pw < 0) { - RELEASE_ARRAYS_ENCODEYUV(); + if (planeSize < 0 || pw < 0) THROW_ARG(tjGetErrorStr()); - } - if (dstOffsets[i] < 0) { - RELEASE_ARRAYS_ENCODEYUV(); + if (dstOffsets[i] < 0) THROW_ARG("Invalid argument in encodeYUV()"); - } - if (dstStrides[i] < 0 && dstOffsets[i] - planeSize + pw < 0) { - RELEASE_ARRAYS_ENCODEYUV(); + if (dstStrides[i] < 0 && dstOffsets[i] - planeSize + pw < 0) THROW_ARG("Negative plane stride would cause memory to be accessed below plane boundary"); - } BAILIF0(jDstPlanes[i] = (*env)->GetObjectArrayElement(env, dstobjs, i)); if ((*env)->GetArrayLength(env, jDstPlanes[i]) < - dstOffsets[i] + planeSize) { - RELEASE_ARRAYS_ENCODEYUV(); + dstOffsets[i] + planeSize) THROW_ARG("Destination plane is not large enough"); - } - BAILIF0(dstPlanes[i] = + BAILIF0(dstPlanesTmp[i] = (*env)->GetPrimitiveArrayCritical(env, jDstPlanes[i], 0)); - dstPlanes[i] = &dstPlanes[i][dstOffsets[i]]; + dstPlanes[i] = &dstPlanesTmp[i][dstOffsets[i]]; + SAFE_RELEASE(jDstPlanes[i], dstPlanesTmp[i]); } BAILIF0(srcBuf = (*env)->GetPrimitiveArrayCritical(env, src, 0)); if (tjEncodeYUVPlanes(handle, &srcBuf[y * actualPitch + x * tjPixelSize[pf]], width, pitch, height, pf, dstPlanes, dstStrides, subsamp, flags) == -1) { - RELEASE_ARRAYS_ENCODEYUV(); + SAFE_RELEASE(src, srcBuf); THROW_TJ(); } bailout: - RELEASE_ARRAYS_ENCODEYUV(); + SAFE_RELEASE(src, srcBuf); } /* TurboJPEG 1.4.x: TJCompressor::encodeYUV() byte source */ @@ -785,9 +776,12 @@ jintArray jDstStrides, jint desiredHeight, jint flags) { tjhandle handle = 0; + unsigned char *jpegBuf = NULL; jbyteArray jDstPlanes[3] = { NULL, NULL, NULL }; - unsigned char *jpegBuf = NULL, *dstPlanes[3]; - int *dstOffsets = NULL, *dstStrides = NULL; + unsigned char *dstPlanesTmp[3] = { NULL, NULL, NULL }; + unsigned char *dstPlanes[3] = { NULL, NULL, NULL }; + int *dstOffsetsTmp = NULL, dstOffsets[3] = { 0, 0, 0 }; + int *dstStridesTmp = NULL, dstStrides[3] = { 0, 0, 0 }; int jpegSubsamp = -1, jpegWidth = 0, jpegHeight = 0; int nc = 0, i, width, height, scaledWidth, scaledHeight, nsf = 0; tjscalingfactor *sf; @@ -821,57 +815,50 @@ if (i >= nsf) THROW_ARG("Could not scale down to desired image dimensions"); -#define RELEASE_ARRAYS_DECOMPRESSTOYUV() { \ - SAFE_RELEASE(src, jpegBuf); \ - for (i = 0; i < nc; i++) \ - SAFE_RELEASE(jDstPlanes[i], dstPlanes[i]); \ - SAFE_RELEASE(jDstStrides, dstStrides); \ - SAFE_RELEASE(jDstOffsets, dstOffsets); \ -} + BAILIF0(dstOffsetsTmp = + (*env)->GetPrimitiveArrayCritical(env, jDstOffsets, 0)); + for (i = 0; i < nc; i++) dstOffsets[i] = dstOffsetsTmp[i]; + SAFE_RELEASE(jDstOffsets, dstOffsetsTmp); + + BAILIF0(dstStridesTmp = + (*env)->GetPrimitiveArrayCritical(env, jDstStrides, 0)); + for (i = 0; i < nc; i++) dstStrides[i] = dstStridesTmp[i]; + SAFE_RELEASE(jDstStrides, dstStridesTmp); - BAILIF0(dstOffsets = (*env)->GetPrimitiveArrayCritical(env, jDstOffsets, 0)); - BAILIF0(dstStrides = (*env)->GetPrimitiveArrayCritical(env, jDstStrides, 0)); for (i = 0; i < nc; i++) { int planeSize = tjPlaneSizeYUV(i, scaledWidth, dstStrides[i], scaledHeight, jpegSubsamp); int pw = tjPlaneWidth(i, scaledWidth, jpegSubsamp); - if (planeSize < 0 || pw < 0) { - RELEASE_ARRAYS_DECOMPRESSTOYUV(); + if (planeSize < 0 || pw < 0) THROW_ARG(tjGetErrorStr()); - } - if (dstOffsets[i] < 0) { - RELEASE_ARRAYS_DECOMPRESSTOYUV(); + if (dstOffsets[i] < 0) THROW_ARG("Invalid argument in decompressToYUV()"); - } - if (dstStrides[i] < 0 && dstOffsets[i] - planeSize + pw < 0) { - RELEASE_ARRAYS_DECOMPRESSTOYUV(); + if (dstStrides[i] < 0 && dstOffsets[i] - planeSize + pw < 0) THROW_ARG("Negative plane stride would cause memory to be accessed below plane boundary"); - } BAILIF0(jDstPlanes[i] = (*env)->GetObjectArrayElement(env, dstobjs, i)); if ((*env)->GetArrayLength(env, jDstPlanes[i]) < - dstOffsets[i] + planeSize) { - RELEASE_ARRAYS_DECOMPRESSTOYUV(); + dstOffsets[i] + planeSize) THROW_ARG("Destination plane is not large enough"); - } - BAILIF0(dstPlanes[i] = + BAILIF0(dstPlanesTmp[i] = (*env)->GetPrimitiveArrayCritical(env, jDstPlanes[i], 0)); - dstPlanes[i] = &dstPlanes[i][dstOffsets[i]]; + dstPlanes[i] = &dstPlanesTmp[i][dstOffsets[i]]; + SAFE_RELEASE(jDstPlanes[i], dstPlanesTmp[i]); } BAILIF0(jpegBuf = (*env)->GetPrimitiveArrayCritical(env, src, 0)); if (tjDecompressToYUVPlanes(handle, jpegBuf, (unsigned long)jpegSize, dstPlanes, desiredWidth, dstStrides, desiredHeight, flags) == -1) { - RELEASE_ARRAYS_DECOMPRESSTOYUV(); + SAFE_RELEASE(src, jpegBuf); THROW_TJ(); } bailout: - RELEASE_ARRAYS_DECOMPRESSTOYUV(); + SAFE_RELEASE(src, jpegBuf); } /* TurboJPEG 1.2.x: TJDecompressor::decompressToYUV() */ @@ -920,9 +907,11 @@ tjhandle handle = 0; jsize arraySize = 0, actualPitch; jbyteArray jSrcPlanes[3] = { NULL, NULL, NULL }; - const unsigned char *srcPlanes[3]; + const unsigned char *srcPlanesTmp[3] = { NULL, NULL, NULL }; + const unsigned char *srcPlanes[3] = { NULL, NULL, NULL }; + int *srcOffsetsTmp = NULL, srcOffsets[3] = { 0, 0, 0 }; + int *srcStridesTmp = NULL, srcStrides[3] = { 0, 0, 0 }; unsigned char *dstBuf = NULL; - int *srcOffsets = NULL, *srcStrides = NULL; int nc = (subsamp == org_libjpegturbo_turbojpeg_TJ_SAMP_GRAY ? 1 : 3), i; GET_HANDLE(); @@ -946,56 +935,49 @@ if ((*env)->GetArrayLength(env, dst) * dstElementSize < arraySize) THROW_ARG("Destination buffer is not large enough"); -#define RELEASE_ARRAYS_DECODEYUV() { \ - SAFE_RELEASE(dst, dstBuf); \ - for (i = 0; i < nc; i++) \ - SAFE_RELEASE(jSrcPlanes[i], srcPlanes[i]); \ - SAFE_RELEASE(jSrcStrides, srcStrides); \ - SAFE_RELEASE(jSrcOffsets, srcOffsets); \ -} + BAILIF0(srcOffsetsTmp = + (*env)->GetPrimitiveArrayCritical(env, jSrcOffsets, 0)); + for (i = 0; i < nc; i++) srcOffsets[i] = srcOffsetsTmp[i]; + SAFE_RELEASE(jSrcOffsets, srcOffsetsTmp); + + BAILIF0(srcStridesTmp = + (*env)->GetPrimitiveArrayCritical(env, jSrcStrides, 0)); + for (i = 0; i < nc; i++) srcStrides[i] = srcStridesTmp[i]; + SAFE_RELEASE(jSrcStrides, srcStridesTmp); - BAILIF0(srcOffsets = (*env)->GetPrimitiveArrayCritical(env, jSrcOffsets, 0)); - BAILIF0(srcStrides = (*env)->GetPrimitiveArrayCritical(env, jSrcStrides, 0)); for (i = 0; i < nc; i++) { int planeSize = tjPlaneSizeYUV(i, width, srcStrides[i], height, subsamp); int pw = tjPlaneWidth(i, width, subsamp); - if (planeSize < 0 || pw < 0) { - RELEASE_ARRAYS_DECODEYUV(); + if (planeSize < 0 || pw < 0) THROW_ARG(tjGetErrorStr()); - } - if (srcOffsets[i] < 0) { - RELEASE_ARRAYS_DECODEYUV(); + if (srcOffsets[i] < 0) THROW_ARG("Invalid argument in decodeYUV()"); - } - if (srcStrides[i] < 0 && srcOffsets[i] - planeSize + pw < 0) { - RELEASE_ARRAYS_DECODEYUV(); + if (srcStrides[i] < 0 && srcOffsets[i] - planeSize + pw < 0) THROW_ARG("Negative plane stride would cause memory to be accessed below plane boundary"); - } BAILIF0(jSrcPlanes[i] = (*env)->GetObjectArrayElement(env, srcobjs, i)); if ((*env)->GetArrayLength(env, jSrcPlanes[i]) < - srcOffsets[i] + planeSize) { - RELEASE_ARRAYS_DECODEYUV(); + srcOffsets[i] + planeSize) THROW_ARG("Source plane is not large enough"); - } - BAILIF0(srcPlanes[i] = + BAILIF0(srcPlanesTmp[i] = (*env)->GetPrimitiveArrayCritical(env, jSrcPlanes[i], 0)); - srcPlanes[i] = &srcPlanes[i][srcOffsets[i]]; + srcPlanes[i] = &srcPlanesTmp[i][srcOffsets[i]]; + SAFE_RELEASE(jSrcPlanes[i], srcPlanesTmp[i]); } BAILIF0(dstBuf = (*env)->GetPrimitiveArrayCritical(env, dst, 0)); if (tjDecodeYUVPlanes(handle, srcPlanes, srcStrides, subsamp, &dstBuf[y * actualPitch + x * tjPixelSize[pf]], width, pitch, height, pf, flags) == -1) { - RELEASE_ARRAYS_DECODEYUV(); + SAFE_RELEASE(dst, dstBuf); THROW_TJ(); } bailout: - RELEASE_ARRAYS_DECODEYUV(); + SAFE_RELEASE(dst, dstBuf); } /* TurboJPEG 1.4.x: TJDecompressor::decodeYUV() byte destination */ diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/usage.txt b/src/3rdparty/chromium/third_party/libjpeg_turbo/usage.txt --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/usage.txt 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/usage.txt 2021-11-20 03:41:33.406600322 +0000 @@ -50,11 +50,9 @@ This syntax works on all systems, so it is useful for scripts. The currently supported image file formats are: PPM (PBMPLUS color format), -PGM (PBMPLUS grayscale format), BMP, Targa, and RLE (Utah Raster Toolkit -format). (RLE is supported only if the URT library is available, which it -isn't on most non-Unix systems.) cjpeg recognizes the input image format -automatically, with the exception of some Targa files. You have to tell djpeg -which format to generate. +PGM (PBMPLUS grayscale format), BMP, GIF, and Targa. cjpeg recognizes the +input image format automatically, with the exception of some Targa files. You +have to tell djpeg which format to generate. JPEG files are in the defacto standard JFIF file format. There are other, less widely used JPEG-based file formats, but we don't support them. @@ -76,10 +74,10 @@ -grayscale Create monochrome JPEG file from color input. Be sure to use this switch when compressing a grayscale - BMP file, because cjpeg isn't bright enough to notice - whether a BMP file uses only shades of gray. By - saying -grayscale, you'll get a smaller JPEG file that - takes less time to process. + BMP or GIF file, because cjpeg isn't bright enough to + notice whether a BMP or GIF file uses only shades of + gray. By saying -grayscale, you'll get a smaller JPEG + file that takes less time to process. -rgb Create RGB JPEG file. Using this switch suppresses the conversion from RGB @@ -170,35 +168,43 @@ be unable to view an arithmetic coded JPEG file at all. - -dct int Use integer DCT method (default). - -dct fast Use fast integer DCT (less accurate). - In libjpeg-turbo, the fast method is generally about - 5-15% faster than the int method when using the - x86/x86-64 SIMD extensions (results may vary with other - SIMD implementations, or when using libjpeg-turbo - without SIMD extensions.) For quality levels of 90 and - below, there should be little or no perceptible - difference between the two algorithms. For quality - levels above 90, however, the difference between - the fast and the int methods becomes more pronounced. - With quality=97, for instance, the fast method incurs - generally about a 1-3 dB loss (in PSNR) relative to - the int method, but this can be larger for some images. - Do not use the fast method with quality levels above - 97. The algorithm often degenerates at quality=98 and - above and can actually produce a more lossy image than - if lower quality levels had been used. Also, in - libjpeg-turbo, the fast method is not fully accerated - for quality levels above 97, so it will be slower than - the int method. - -dct float Use floating-point DCT method. - The float method is mainly a legacy feature. It does - not produce significantly more accurate results than - the int method, and it is much slower. The float - method may also give different results on different - machines due to varying roundoff behavior, whereas the - integer methods should give the same results on all - machines. + -dct int Use accurate integer DCT method (default). + -dct fast Use less accurate integer DCT method [legacy feature]. + When the Independent JPEG Group's software was first + released in 1991, the compression time for a + 1-megapixel JPEG image on a mainstream PC was measured + in minutes. Thus, the fast integer DCT algorithm + provided noticeable performance benefits. On modern + CPUs running libjpeg-turbo, however, the compression + time for a 1-megapixel JPEG image is measured in + milliseconds, and thus the performance benefits of the + fast algorithm are much less noticeable. On modern + x86/x86-64 CPUs that support AVX2 instructions, the + fast and int methods have similar performance. On + other types of CPUs, the fast method is generally about + 5-15% faster than the int method. + + For quality levels of 90 and below, there should be + little or no perceptible quality difference between the + two algorithms. For quality levels above 90, however, + the difference between the fast and int methods becomes + more pronounced. With quality=97, for instance, the + fast method incurs generally about a 1-3 dB loss in + PSNR relative to the int method, but this can be larger + for some images. Do not use the fast method with + quality levels above 97. The algorithm often + degenerates at quality=98 and above and can actually + produce a more lossy image than if lower quality levels + had been used. Also, in libjpeg-turbo, the fast method + is not fully accelerated for quality levels above 97, + so it will be slower than the int method. + -dct float Use floating-point DCT method [legacy feature]. + The float method does not produce significantly more + accurate results than the int method, and it is much + slower. The float method may also give different + results on different machines due to varying roundoff + behavior, whereas the integer methods should give the + same results on all machines. -restart N Emit a JPEG restart marker every N MCU rows, or every N MCU blocks if "B" is attached to the number. @@ -290,10 +296,17 @@ is specified, or if the JPEG file is grayscale; otherwise, 24-bit full-color format is emitted. - -gif Select GIF output format. Since GIF does not support - more than 256 colors, -colors 256 is assumed (unless - you specify a smaller number of colors). If you - specify -fast, the default number of colors is 216. + -gif Select GIF output format (LZW-compressed). Since GIF + does not support more than 256 colors, -colors 256 is + assumed (unless you specify a smaller number of + colors). If you specify -fast, the default number of + colors is 216. + + -gif0 Select GIF output format (uncompressed). Since GIF + does not support more than 256 colors, -colors 256 is + assumed (unless you specify a smaller number of + colors). If you specify -fast, the default number of + colors is 216. -os2 Select BMP output format (OS/2 1.x flavor). 8-bit colormapped format is emitted if -colors or -grayscale @@ -305,8 +318,6 @@ grayscale or if -grayscale is specified; otherwise PPM is emitted. - -rle Select RLE output format. (Requires URT library.) - -targa Select Targa output format. Grayscale format is emitted if the JPEG file is grayscale or if -grayscale is specified; otherwise, colormapped format @@ -315,36 +326,45 @@ Switches for advanced users: - -dct int Use integer DCT method (default). - -dct fast Use fast integer DCT (less accurate). - In libjpeg-turbo, the fast method is generally about - 5-15% faster than the int method when using the - x86/x86-64 SIMD extensions (results may vary with other - SIMD implementations, or when using libjpeg-turbo - without SIMD extensions.) If the JPEG image was - compressed using a quality level of 85 or below, then - there should be little or no perceptible difference - between the two algorithms. When decompressing images - that were compressed using quality levels above 85, - however, the difference between the fast and int - methods becomes more pronounced. With images - compressed using quality=97, for instance, the fast - method incurs generally about a 4-6 dB loss (in PSNR) - relative to the int method, but this can be larger for - some images. If you can avoid it, do not use the fast - method when decompressing images that were compressed - using quality levels above 97. The algorithm often - degenerates for such images and can actually produce - a more lossy output image than if the JPEG image had - been compressed using lower quality levels. - -dct float Use floating-point DCT method. - The float method is mainly a legacy feature. It does - not produce significantly more accurate results than - the int method, and it is much slower. The float - method may also give different results on different - machines due to varying roundoff behavior, whereas the - integer methods should give the same results on all - machines. + -dct int Use accurate integer DCT method (default). + -dct fast Use less accurate integer DCT method [legacy feature]. + When the Independent JPEG Group's software was first + released in 1991, the decompression time for a + 1-megapixel JPEG image on a mainstream PC was measured + in minutes. Thus, the fast integer DCT algorithm + provided noticeable performance benefits. On modern + CPUs running libjpeg-turbo, however, the decompression + time for a 1-megapixel JPEG image is measured in + milliseconds, and thus the performance benefits of the + fast algorithm are much less noticeable. On modern + x86/x86-64 CPUs that support AVX2 instructions, the + fast and int methods have similar performance. On + other types of CPUs, the fast method is generally about + 5-15% faster than the int method. + + If the JPEG image was compressed using a quality level + of 85 or below, then there should be little or no + perceptible quality difference between the two + algorithms. When decompressing images that were + compressed using quality levels above 85, however, the + difference between the fast and int methods becomes + more pronounced. With images compressed using + quality=97, for instance, the fast method incurs + generally about a 4-6 dB loss in PSNR relative to the + int method, but this can be larger for some images. If + you can avoid it, do not use the fast method when + decompressing images that were compressed using quality + levels above 97. The algorithm often degenerates for + such images and can actually produce a more lossy + output image than if the JPEG image had been compressed + using lower quality levels. + -dct float Use floating-point DCT method [legacy feature]. + The float method does not produce significantly more + accurate results than the int method, and it is much + slower. The float method may also give different + results on different machines due to varying roundoff + behavior, whereas the integer methods should give the + same results on all machines. -dither fs Use Floyd-Steinberg dithering in color quantization. -dither ordered Use ordered dithering in color quantization. @@ -404,11 +424,6 @@ is often a lot more than it is on larger files. (At present, -optimize mode is always selected when generating progressive JPEG files.) -Support for GIF input files was removed in cjpeg v6b due to concerns over -the Unisys LZW patent. Although this patent expired in 2006, cjpeg still -lacks GIF support, for these historical reasons. (Conversion of GIF files to -JPEG is usually a bad idea anyway.) - HINTS FOR DJPEG @@ -423,10 +438,6 @@ much lower quality than the default behavior. "-dither none" may give acceptable results in two-pass mode, but is seldom tolerable in one-pass mode. -To avoid the Unisys LZW patent (now expired), djpeg produces uncompressed GIF -files. These are larger than they should be, but are readable by standard GIF -decoders. - HINTS FOR BOTH PROGRAMS @@ -533,6 +544,43 @@ -crop WxH+X+Y Crop to a rectangular region of width W and height H, starting at point X,Y. +If W or H is larger than the width/height of the input image, then the output +image is expanded in size, and the expanded region is filled in with zeros +(neutral gray). Attaching an 'f' character ("flatten") to the width number +will cause each block in the expanded region to be filled in with the DC +coefficient of the nearest block in the input image rather than grayed out. +Attaching an 'r' character ("reflect") to the width number will cause the +expanded region to be filled in with repeated reflections of the input image +rather than grayed out. + +A complementary lossless wipe option is provided to discard (gray out) data +inside a given image region while losslessly preserving what is outside: + -wipe WxH+X+Y Wipe (gray out) a rectangular region of width W and + height H from the input image, starting at point X,Y. + +Attaching an 'f' character ("flatten") to the width number will cause the +region to be filled with the average of adjacent blocks rather than grayed out. +If the wipe region and the region outside the wipe region, when adjusted to the +nearest iMCU boundary, form two horizontally adjacent rectangles, then +attaching an 'r' character ("reflect") to the width number will cause the wipe +region to be filled with repeated reflections of the outside region rather than +grayed out. + +A lossless drop option is also provided, which allows another JPEG image to be +inserted ("dropped") into the input image data at a given position, replacing +the existing image data at that position: + -drop +X+Y filename Drop (insert) another image at point X,Y + +Both the input image and the drop image must have the same subsampling level. +It is best if they also have the same quantization (quality.) Otherwise, the +quantization of the output image will be adapted to accommodate the higher of +the input image quality and the drop image quality. The trim option can be +used with the drop option to requantize the drop image to match the input +image. Note that a grayscale image can be dropped into a full-color image or +vice versa, as long as the full-color image has no vertical subsampling. If +the input image is grayscale and the drop image is full-color, then the +chrominance channels from the drop image will be discarded. + Other not-strictly-lossless transformation switches are: -grayscale Force grayscale output. @@ -553,6 +601,9 @@ -copy comments Copy only comment markers. This setting copies comments from the source file but discards any other metadata. + -copy icc Copy only ICC profile markers. This setting copies the + ICC profile from the source file but discards any other + metadata. -copy all Copy all extra markers. This setting preserves miscellaneous markers found in the source file, such as JFIF thumbnails, Exif data, and Photoshop settings. diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/wrbmp.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/wrbmp.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/wrbmp.c 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/wrbmp.c 2021-11-20 03:41:33.407600306 +0000 @@ -141,7 +141,6 @@ } } else if (cinfo->out_color_space == JCS_CMYK) { for (col = cinfo->output_width; col > 0; col--) { - /* can omit GETJSAMPLE() safely */ JSAMPLE c = *inptr++, m = *inptr++, y = *inptr++, k = *inptr++; cmyk_to_rgb(c, m, y, k, outptr + 2, outptr + 1, outptr); outptr += 3; @@ -153,7 +152,6 @@ register int ps = rgb_pixelsize[cinfo->out_color_space]; for (col = cinfo->output_width; col > 0; col--) { - /* can omit GETJSAMPLE() safely */ outptr[0] = inptr[bindex]; outptr[1] = inptr[gindex]; outptr[2] = inptr[rindex]; @@ -372,18 +370,18 @@ if (cinfo->out_color_components == 3) { /* Normal case with RGB colormap */ for (i = 0; i < num_colors; i++) { - putc(GETJSAMPLE(colormap[2][i]), outfile); - putc(GETJSAMPLE(colormap[1][i]), outfile); - putc(GETJSAMPLE(colormap[0][i]), outfile); + putc(colormap[2][i], outfile); + putc(colormap[1][i], outfile); + putc(colormap[0][i], outfile); if (map_entry_size == 4) putc(0, outfile); } } else { /* Grayscale colormap (only happens with grayscale quantization) */ for (i = 0; i < num_colors; i++) { - putc(GETJSAMPLE(colormap[0][i]), outfile); - putc(GETJSAMPLE(colormap[0][i]), outfile); - putc(GETJSAMPLE(colormap[0][i]), outfile); + putc(colormap[0][i], outfile); + putc(colormap[0][i], outfile); + putc(colormap[0][i], outfile); if (map_entry_size == 4) putc(0, outfile); } @@ -438,7 +436,6 @@ JSAMPARRAY image_ptr; register JSAMPROW data_ptr; JDIMENSION row; - register JDIMENSION col; cd_progress_ptr progress = (cd_progress_ptr)cinfo->progress; if (dest->use_inversion_array) { @@ -459,10 +456,7 @@ ((j_common_ptr)cinfo, dest->whole_image, row - 1, (JDIMENSION)1, FALSE); data_ptr = image_ptr[0]; - for (col = dest->row_width; col > 0; col--) { - putc(GETJSAMPLE(*data_ptr), outfile); - data_ptr++; - } + (void)JFWRITE(outfile, data_ptr, dest->row_width); } if (progress != NULL) progress->completed_extra_passes++; diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/wrgif.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/wrgif.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/wrgif.c 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/wrgif.c 2021-11-20 03:41:33.407600306 +0000 @@ -3,6 +3,7 @@ * * This file was part of the Independent JPEG Group's software: * Copyright (C) 1991-1997, Thomas G. Lane. + * Modified 2015-2019 by Guido Vollbeding. * libjpeg-turbo Modifications: * Copyright (C) 2015, 2017, D. R. Commander. * For conditions of distribution and use, see the accompanying README.ijg @@ -10,12 +11,6 @@ * * This file contains routines to write output images in GIF format. * - ************************************************************************** - * NOTE: to avoid entanglements with Unisys' patent on LZW compression, * - * this code has been modified to output "uncompressed GIF" files. * - * There is no trace of the LZW algorithm in this file. * - ************************************************************************** - * * These routines may need modification for non-Unix environments or * specialized applications. As they stand, they assume output to * an ordinary stdio stream. @@ -33,11 +28,6 @@ * copyright notice and this permission notice appear in supporting * documentation. This software is provided "as is" without express or * implied warranty. - * - * We are also required to state that - * "The Graphics Interchange Format(c) is the Copyright property of - * CompuServe Incorporated. GIF(sm) is a Service Mark property of - * CompuServe Incorporated." */ #include "cdjpeg.h" /* Common decls for cjpeg/djpeg applications */ @@ -45,6 +35,37 @@ #ifdef GIF_SUPPORTED +#define MAX_LZW_BITS 12 /* maximum LZW code size (4096 symbols) */ + +typedef INT16 code_int; /* must hold -1 .. 2**MAX_LZW_BITS */ + +#define LZW_TABLE_SIZE ((code_int)1 << MAX_LZW_BITS) + +#define HSIZE 5003 /* hash table size for 80% occupancy */ + +typedef int hash_int; /* must hold -2*HSIZE..2*HSIZE */ + +#define MAXCODE(n_bits) (((code_int)1 << (n_bits)) - 1) + + +/* + * The LZW hash table consists of two parallel arrays: + * hash_code[i] code of symbol in slot i, or 0 if empty slot + * hash_value[i] symbol's value; undefined if empty slot + * where slot values (i) range from 0 to HSIZE-1. The symbol value is + * its prefix symbol's code concatenated with its suffix character. + * + * Algorithm: use open addressing double hashing (no chaining) on the + * prefix code / suffix character combination. We do a variant of Knuth's + * algorithm D (vol. 3, sec. 6.4) along with G. Knott's relatively-prime + * secondary probe. + */ + +typedef int hash_entry; /* must hold (code_int << 8) | byte */ + +#define HASH_ENTRY(prefix, suffix) ((((hash_entry)(prefix)) << 8) | (suffix)) + + /* Private version of data destination object */ typedef struct { @@ -54,14 +75,24 @@ /* State for packing variable-width codes into a bitstream */ int n_bits; /* current number of bits/code */ - int maxcode; /* maximum code, given n_bits */ - long cur_accum; /* holds bits not yet output */ + code_int maxcode; /* maximum code, given n_bits */ + int init_bits; /* initial n_bits ... restored after clear */ + int cur_accum; /* holds bits not yet output */ int cur_bits; /* # of bits in cur_accum */ + /* LZW string construction */ + code_int waiting_code; /* symbol not yet output; may be extendable */ + boolean first_byte; /* if TRUE, waiting_code is not valid */ + /* State for GIF code assignment */ - int ClearCode; /* clear code (doesn't change) */ - int EOFCode; /* EOF code (ditto) */ - int code_counter; /* counts output symbols */ + code_int ClearCode; /* clear code (doesn't change) */ + code_int EOFCode; /* EOF code (ditto) */ + code_int free_code; /* LZW: first not-yet-used symbol code */ + code_int code_counter; /* not LZW: counts output symbols */ + + /* LZW hash table */ + code_int *hash_code; /* => hash table of symbol codes */ + hash_entry *hash_value; /* => hash table of symbol values */ /* GIF data packet construction buffer */ int bytesinpkt; /* # of bytes in current packet */ @@ -71,9 +102,6 @@ typedef gif_dest_struct *gif_dest_ptr; -/* Largest value that will fit in N bits */ -#define MAXCODE(n_bits) ((1 << (n_bits)) - 1) - /* * Routines to package finished data bytes into GIF data blocks. @@ -105,7 +133,7 @@ /* Routine to convert variable-width codes into a byte stream */ LOCAL(void) -output(gif_dest_ptr dinfo, int code) +output(gif_dest_ptr dinfo, code_int code) /* Emit a code of n_bits bits */ /* Uses cur_accum and cur_bits to reblock into 8-bit bytes */ { @@ -117,74 +145,76 @@ dinfo->cur_accum >>= 8; dinfo->cur_bits -= 8; } + + /* + * If the next entry is going to be too big for the code size, + * then increase it, if possible. We do this here to ensure + * that it's done in sync with the decoder's codesize increases. + */ + if (dinfo->free_code > dinfo->maxcode) { + dinfo->n_bits++; + if (dinfo->n_bits == MAX_LZW_BITS) + dinfo->maxcode = LZW_TABLE_SIZE; /* free_code will never exceed this */ + else + dinfo->maxcode = MAXCODE(dinfo->n_bits); + } } -/* The pseudo-compression algorithm. - * - * In this module we simply output each pixel value as a separate symbol; - * thus, no compression occurs. In fact, there is expansion of one bit per - * pixel, because we use a symbol width one bit wider than the pixel width. - * - * GIF ordinarily uses variable-width symbols, and the decoder will expect - * to ratchet up the symbol width after a fixed number of symbols. - * To simplify the logic and keep the expansion penalty down, we emit a - * GIF Clear code to reset the decoder just before the width would ratchet up. - * Thus, all the symbols in the output file will have the same bit width. - * Note that emitting the Clear codes at the right times is a mere matter of - * counting output symbols and is in no way dependent on the LZW patent. - * - * With a small basic pixel width (low color count), Clear codes will be - * needed very frequently, causing the file to expand even more. So this - * simplistic approach wouldn't work too well on bilevel images, for example. - * But for output of JPEG conversions the pixel width will usually be 8 bits - * (129 to 256 colors), so the overhead added by Clear symbols is only about - * one symbol in every 256. - */ +/* Compression initialization & termination */ + + +LOCAL(void) +clear_hash(gif_dest_ptr dinfo) +/* Fill the hash table with empty entries */ +{ + /* It's sufficient to zero hash_code[] */ + MEMZERO(dinfo->hash_code, HSIZE * sizeof(code_int)); +} + + +LOCAL(void) +clear_block(gif_dest_ptr dinfo) +/* Reset compressor and issue a Clear code */ +{ + clear_hash(dinfo); /* delete all the symbols */ + dinfo->free_code = dinfo->ClearCode + 2; + output(dinfo, dinfo->ClearCode); /* inform decoder */ + dinfo->n_bits = dinfo->init_bits; /* reset code size */ + dinfo->maxcode = MAXCODE(dinfo->n_bits); +} + LOCAL(void) compress_init(gif_dest_ptr dinfo, int i_bits) -/* Initialize pseudo-compressor */ +/* Initialize compressor */ { /* init all the state variables */ - dinfo->n_bits = i_bits; + dinfo->n_bits = dinfo->init_bits = i_bits; dinfo->maxcode = MAXCODE(dinfo->n_bits); - dinfo->ClearCode = (1 << (i_bits - 1)); + dinfo->ClearCode = ((code_int) 1 << (i_bits - 1)); dinfo->EOFCode = dinfo->ClearCode + 1; - dinfo->code_counter = dinfo->ClearCode + 2; + dinfo->code_counter = dinfo->free_code = dinfo->ClearCode + 2; + dinfo->first_byte = TRUE; /* no waiting symbol yet */ /* init output buffering vars */ dinfo->bytesinpkt = 0; dinfo->cur_accum = 0; dinfo->cur_bits = 0; + /* clear hash table */ + if (dinfo->hash_code != NULL) + clear_hash(dinfo); /* GIF specifies an initial Clear code */ output(dinfo, dinfo->ClearCode); } LOCAL(void) -compress_pixel(gif_dest_ptr dinfo, int c) -/* Accept and "compress" one pixel value. - * The given value must be less than n_bits wide. - */ -{ - /* Output the given pixel value as a symbol. */ - output(dinfo, c); - /* Issue Clear codes often enough to keep the reader from ratcheting up - * its symbol size. - */ - if (dinfo->code_counter < dinfo->maxcode) { - dinfo->code_counter++; - } else { - output(dinfo, dinfo->ClearCode); - dinfo->code_counter = dinfo->ClearCode + 2; /* reset the counter */ - } -} - - -LOCAL(void) compress_term(gif_dest_ptr dinfo) /* Clean up at end */ { + /* Flush out the buffered LZW code */ + if (!dinfo->first_byte) + output(dinfo, dinfo->waiting_code); /* Send an EOF code */ output(dinfo, dinfo->EOFCode); /* Flush the bit-packing buffer */ @@ -221,7 +251,7 @@ LOCAL(void) emit_header(gif_dest_ptr dinfo, int num_colors, JSAMPARRAY colormap) /* Output the GIF file header, including color map */ -/* If colormap==NULL, synthesize a grayscale colormap */ +/* If colormap == NULL, synthesize a grayscale colormap */ { int BitsPerPixel, ColorMapSize, InitCodeSize, FlagByte; int cshift = dinfo->cinfo->data_precision - 8; @@ -265,12 +295,12 @@ if (colormap != NULL) { if (dinfo->cinfo->out_color_space == JCS_RGB) { /* Normal case: RGB color map */ - putc(GETJSAMPLE(colormap[0][i]) >> cshift, dinfo->pub.output_file); - putc(GETJSAMPLE(colormap[1][i]) >> cshift, dinfo->pub.output_file); - putc(GETJSAMPLE(colormap[2][i]) >> cshift, dinfo->pub.output_file); + putc(colormap[0][i] >> cshift, dinfo->pub.output_file); + putc(colormap[1][i] >> cshift, dinfo->pub.output_file); + putc(colormap[2][i] >> cshift, dinfo->pub.output_file); } else { /* Grayscale "color map": possible if quantizing grayscale image */ - put_3bytes(dinfo, GETJSAMPLE(colormap[0][i]) >> cshift); + put_3bytes(dinfo, colormap[0][i] >> cshift); } } else { /* Create a grayscale map of num_colors values, range 0..255 */ @@ -278,7 +308,7 @@ } } else { /* fill out the map to a power of 2 */ - put_3bytes(dinfo, 0); + put_3bytes(dinfo, CENTERJSAMPLE >> cshift); } } /* Write image separator and Image Descriptor */ @@ -292,7 +322,7 @@ /* Write Initial Code Size byte */ putc(InitCodeSize, dinfo->pub.output_file); - /* Initialize for "compression" of image data */ + /* Initialize for compression of image data */ compress_init(dinfo, InitCodeSize + 1); } @@ -318,17 +348,139 @@ * In this module rows_supplied will always be 1. */ + +/* + * The LZW algorithm proper + */ + METHODDEF(void) -put_pixel_rows(j_decompress_ptr cinfo, djpeg_dest_ptr dinfo, - JDIMENSION rows_supplied) +put_LZW_pixel_rows(j_decompress_ptr cinfo, djpeg_dest_ptr dinfo, + JDIMENSION rows_supplied) { gif_dest_ptr dest = (gif_dest_ptr)dinfo; register JSAMPROW ptr; register JDIMENSION col; + code_int c; + register hash_int i; + register hash_int disp; + register hash_entry probe_value; ptr = dest->pub.buffer[0]; for (col = cinfo->output_width; col > 0; col--) { - compress_pixel(dest, GETJSAMPLE(*ptr++)); + /* Accept and compress one 8-bit byte */ + c = (code_int)(*ptr++); + + if (dest->first_byte) { /* need to initialize waiting_code */ + dest->waiting_code = c; + dest->first_byte = FALSE; + continue; + } + + /* Probe hash table to see if a symbol exists for + * waiting_code followed by c. + * If so, replace waiting_code by that symbol and continue. + */ + i = ((hash_int)c << (MAX_LZW_BITS - 8)) + dest->waiting_code; + /* i is less than twice 2**MAX_LZW_BITS, therefore less than twice HSIZE */ + if (i >= HSIZE) + i -= HSIZE; + + probe_value = HASH_ENTRY(dest->waiting_code, c); + + if (dest->hash_code[i] == 0) { + /* hit empty slot; desired symbol not in table */ + output(dest, dest->waiting_code); + if (dest->free_code < LZW_TABLE_SIZE) { + dest->hash_code[i] = dest->free_code++; /* add symbol to hashtable */ + dest->hash_value[i] = probe_value; + } else + clear_block(dest); + dest->waiting_code = c; + continue; + } + if (dest->hash_value[i] == probe_value) { + dest->waiting_code = dest->hash_code[i]; + continue; + } + + if (i == 0) /* secondary hash (after G. Knott) */ + disp = 1; + else + disp = HSIZE - i; + for (;;) { + i -= disp; + if (i < 0) + i += HSIZE; + if (dest->hash_code[i] == 0) { + /* hit empty slot; desired symbol not in table */ + output(dest, dest->waiting_code); + if (dest->free_code < LZW_TABLE_SIZE) { + dest->hash_code[i] = dest->free_code++; /* add symbol to hashtable */ + dest->hash_value[i] = probe_value; + } else + clear_block(dest); + dest->waiting_code = c; + break; + } + if (dest->hash_value[i] == probe_value) { + dest->waiting_code = dest->hash_code[i]; + break; + } + } + } +} + + +/* + * The pseudo-compression algorithm. + * + * In this version we simply output each pixel value as a separate symbol; + * thus, no compression occurs. In fact, there is expansion of one bit per + * pixel, because we use a symbol width one bit wider than the pixel width. + * + * GIF ordinarily uses variable-width symbols, and the decoder will expect + * to ratchet up the symbol width after a fixed number of symbols. + * To simplify the logic and keep the expansion penalty down, we emit a + * GIF Clear code to reset the decoder just before the width would ratchet up. + * Thus, all the symbols in the output file will have the same bit width. + * Note that emitting the Clear codes at the right times is a mere matter of + * counting output symbols and is in no way dependent on the LZW algorithm. + * + * With a small basic pixel width (low color count), Clear codes will be + * needed very frequently, causing the file to expand even more. So this + * simplistic approach wouldn't work too well on bilevel images, for example. + * But for output of JPEG conversions the pixel width will usually be 8 bits + * (129 to 256 colors), so the overhead added by Clear symbols is only about + * one symbol in every 256. + */ + +METHODDEF(void) +put_raw_pixel_rows(j_decompress_ptr cinfo, djpeg_dest_ptr dinfo, + JDIMENSION rows_supplied) +{ + gif_dest_ptr dest = (gif_dest_ptr)dinfo; + register JSAMPROW ptr; + register JDIMENSION col; + code_int c; + + ptr = dest->pub.buffer[0]; + for (col = cinfo->output_width; col > 0; col--) { + c = (code_int)(*ptr++); + /* Accept and output one pixel value. + * The given value must be less than n_bits wide. + */ + + /* Output the given pixel value as a symbol. */ + output(dest, c); + /* Issue Clear codes often enough to keep the reader from ratcheting up + * its symbol size. + */ + if (dest->code_counter < dest->maxcode) { + dest->code_counter++; + } else { + output(dest, dest->ClearCode); + dest->code_counter = dest->ClearCode + 2; /* reset the counter */ + } } } @@ -342,7 +494,7 @@ { gif_dest_ptr dest = (gif_dest_ptr)dinfo; - /* Flush "compression" mechanism */ + /* Flush compression mechanism */ compress_term(dest); /* Write a zero-length data block to end the series */ putc(0, dest->pub.output_file); @@ -370,7 +522,7 @@ */ GLOBAL(djpeg_dest_ptr) -jinit_write_gif(j_decompress_ptr cinfo) +jinit_write_gif(j_decompress_ptr cinfo, boolean is_lzw) { gif_dest_ptr dest; @@ -380,7 +532,6 @@ sizeof(gif_dest_struct)); dest->cinfo = cinfo; /* make back link for subroutines */ dest->pub.start_output = start_output_gif; - dest->pub.put_pixel_rows = put_pixel_rows; dest->pub.finish_output = finish_output_gif; dest->pub.calc_buffer_dimensions = calc_buffer_dimensions_gif; @@ -407,6 +558,22 @@ ((j_common_ptr)cinfo, JPOOL_IMAGE, cinfo->output_width, (JDIMENSION)1); dest->pub.buffer_height = 1; + if (is_lzw) { + dest->pub.put_pixel_rows = put_LZW_pixel_rows; + /* Allocate space for hash table */ + dest->hash_code = (code_int *) + (*cinfo->mem->alloc_small) ((j_common_ptr)cinfo, JPOOL_IMAGE, + HSIZE * sizeof(code_int)); + dest->hash_value = (hash_entry *) + (*cinfo->mem->alloc_large) ((j_common_ptr)cinfo, JPOOL_IMAGE, + HSIZE * sizeof(hash_entry)); + } else { + dest->pub.put_pixel_rows = put_raw_pixel_rows; + /* Mark tables unused */ + dest->hash_code = NULL; + dest->hash_value = NULL; + } + return (djpeg_dest_ptr)dest; } diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/wrppm.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/wrppm.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/wrppm.c 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/wrppm.c 2021-11-20 03:41:33.407600306 +0000 @@ -5,7 +5,7 @@ * Copyright (C) 1991-1996, Thomas G. Lane. * Modified 2009 by Guido Vollbeding. * libjpeg-turbo Modifications: - * Copyright (C) 2017, 2019, D. R. Commander. + * Copyright (C) 2017, 2019-2020, D. R. Commander. * For conditions of distribution and use, see the accompanying README.ijg * file. * @@ -108,17 +108,17 @@ ppm_dest_ptr dest = (ppm_dest_ptr)dinfo; register char *bufferptr; register JSAMPROW ptr; -#if BITS_IN_JSAMPLE != 8 || (!defined(HAVE_UNSIGNED_CHAR) && !defined(__CHAR_UNSIGNED__)) +#if BITS_IN_JSAMPLE != 8 register JDIMENSION col; #endif ptr = dest->pub.buffer[0]; bufferptr = dest->iobuffer; -#if BITS_IN_JSAMPLE == 8 && (defined(HAVE_UNSIGNED_CHAR) || defined(__CHAR_UNSIGNED__)) +#if BITS_IN_JSAMPLE == 8 MEMCOPY(bufferptr, ptr, dest->samples_per_row); #else for (col = dest->samples_per_row; col > 0; col--) { - PUTPPMSAMPLE(bufferptr, GETJSAMPLE(*ptr++)); + PUTPPMSAMPLE(bufferptr, *ptr++); } #endif (void)JFWRITE(dest->pub.output_file, dest->iobuffer, dest->buffer_width); @@ -200,10 +200,10 @@ ptr = dest->pub.buffer[0]; bufferptr = dest->iobuffer; for (col = cinfo->output_width; col > 0; col--) { - pixval = GETJSAMPLE(*ptr++); - PUTPPMSAMPLE(bufferptr, GETJSAMPLE(color_map0[pixval])); - PUTPPMSAMPLE(bufferptr, GETJSAMPLE(color_map1[pixval])); - PUTPPMSAMPLE(bufferptr, GETJSAMPLE(color_map2[pixval])); + pixval = *ptr++; + PUTPPMSAMPLE(bufferptr, color_map0[pixval]); + PUTPPMSAMPLE(bufferptr, color_map1[pixval]); + PUTPPMSAMPLE(bufferptr, color_map2[pixval]); } (void)JFWRITE(dest->pub.output_file, dest->iobuffer, dest->buffer_width); } @@ -222,7 +222,7 @@ ptr = dest->pub.buffer[0]; bufferptr = dest->iobuffer; for (col = cinfo->output_width; col > 0; col--) { - PUTPPMSAMPLE(bufferptr, GETJSAMPLE(color_map[GETJSAMPLE(*ptr++)])); + PUTPPMSAMPLE(bufferptr, color_map[*ptr++]); } (void)JFWRITE(dest->pub.output_file, dest->iobuffer, dest->buffer_width); } @@ -326,11 +326,12 @@ if (cinfo->quantize_colors || BITS_IN_JSAMPLE != 8 || sizeof(JSAMPLE) != sizeof(char) || - (cinfo->out_color_space != JCS_EXT_RGB #if RGB_RED == 0 && RGB_GREEN == 1 && RGB_BLUE == 2 && RGB_PIXELSIZE == 3 - && cinfo->out_color_space != JCS_RGB + (cinfo->out_color_space != JCS_EXT_RGB && + cinfo->out_color_space != JCS_RGB)) { +#else + cinfo->out_color_space != JCS_EXT_RGB) { #endif - )) { /* When quantizing, we need an output buffer for colormap indexes * that's separate from the physical I/O buffer. We also need a * separate buffer if pixel format translation must take place. diff -Naur a/src/3rdparty/chromium/third_party/libjpeg_turbo/wrtarga.c b/src/3rdparty/chromium/third_party/libjpeg_turbo/wrtarga.c --- a/src/3rdparty/chromium/third_party/libjpeg_turbo/wrtarga.c 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/libjpeg_turbo/wrtarga.c 2021-11-20 03:41:33.407600306 +0000 @@ -4,7 +4,7 @@ * This file was part of the Independent JPEG Group's software: * Copyright (C) 1991-1996, Thomas G. Lane. * libjpeg-turbo Modifications: - * Copyright (C) 2017, D. R. Commander. + * Copyright (C) 2017, 2019, D. R. Commander. * For conditions of distribution and use, see the accompanying README.ijg * file. * @@ -102,9 +102,9 @@ inptr = dest->pub.buffer[0]; outptr = dest->iobuffer; for (col = cinfo->output_width; col > 0; col--) { - outptr[0] = (char)GETJSAMPLE(inptr[2]); /* RGB to BGR order */ - outptr[1] = (char)GETJSAMPLE(inptr[1]); - outptr[2] = (char)GETJSAMPLE(inptr[0]); + outptr[0] = inptr[2]; /* RGB to BGR order */ + outptr[1] = inptr[1]; + outptr[2] = inptr[0]; inptr += 3, outptr += 3; } (void)JFWRITE(dest->pub.output_file, dest->iobuffer, dest->buffer_width); @@ -118,13 +118,10 @@ tga_dest_ptr dest = (tga_dest_ptr)dinfo; register JSAMPROW inptr; register char *outptr; - register JDIMENSION col; inptr = dest->pub.buffer[0]; outptr = dest->iobuffer; - for (col = cinfo->output_width; col > 0; col--) { - *outptr++ = (char)GETJSAMPLE(*inptr++); - } + MEMCOPY(outptr, inptr, cinfo->output_width); (void)JFWRITE(dest->pub.output_file, dest->iobuffer, dest->buffer_width); } @@ -147,7 +144,7 @@ inptr = dest->pub.buffer[0]; outptr = dest->iobuffer; for (col = cinfo->output_width; col > 0; col--) { - *outptr++ = (char)GETJSAMPLE(color_map0[GETJSAMPLE(*inptr++)]); + *outptr++ = color_map0[*inptr++]; } (void)JFWRITE(dest->pub.output_file, dest->iobuffer, dest->buffer_width); } @@ -182,9 +179,9 @@ /* Write the colormap. Note Targa uses BGR byte order */ outfile = dest->pub.output_file; for (i = 0; i < num_colors; i++) { - putc(GETJSAMPLE(cinfo->colormap[2][i]), outfile); - putc(GETJSAMPLE(cinfo->colormap[1][i]), outfile); - putc(GETJSAMPLE(cinfo->colormap[0][i]), outfile); + putc(cinfo->colormap[2][i], outfile); + putc(cinfo->colormap[1][i], outfile); + putc(cinfo->colormap[0][i], outfile); } dest->pub.put_pixel_rows = put_gray_rows; } else { diff -Naur a/src/3rdparty/chromium/third_party/skia/src/pdf/SkPDFSubsetFont.cpp b/src/3rdparty/chromium/third_party/skia/src/pdf/SkPDFSubsetFont.cpp --- a/src/3rdparty/chromium/third_party/skia/src/pdf/SkPDFSubsetFont.cpp 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/skia/src/pdf/SkPDFSubsetFont.cpp 2021-11-20 03:42:30.009703768 +0000 @@ -49,6 +49,37 @@ blob.release()); } +template using void_t = void; +template +struct SkPDFHarfBuzzSubset { + // This is the HarfBuzz 3.0 interface. + // hb_subset_flags_t does not exist in 2.0. It isn't dependent on T, so inline the value of + // HB_SUBSET_FLAGS_RETAIN_GIDS until 2.0 is no longer supported. + static HBFace Make(T input, hb_face_t* face) { + // TODO: When possible, check if a font is 'tricky' with FT_IS_TRICKY. + // If it isn't known if a font is 'tricky', retain the hints. + hb_subset_input_set_flags(input, 2/*HB_SUBSET_FLAGS_RETAIN_GIDS*/); + return HBFace(hb_subset_or_fail(face, input)); + } +}; +template +struct SkPDFHarfBuzzSubset(), std::declval())), + decltype(hb_subset_input_set_drop_hints(std::declval(), std::declval())), + decltype(hb_subset(std::declval(), std::declval())) + >> +{ + // This is the HarfBuzz 2.0 (non-public) interface, used if it exists. + // This code should be removed as soon as all users are migrated to the newer API. + static HBFace Make(T input, hb_face_t* face) { + hb_subset_input_set_retain_gids(input, true); + // TODO: When possible, check if a font is 'tricky' with FT_IS_TRICKY. + // If it isn't known if a font is 'tricky', retain the hints. + hb_subset_input_set_drop_hints(input, false); + return HBFace(hb_subset(face, input)); + } +}; + static sk_sp subset_harfbuzz(sk_sp fontData, const SkPDFGlyphUse& glyphUsage, int ttcIndex) { @@ -71,11 +102,10 @@ hb_set_t* glyphs = hb_subset_input_glyph_set(input.get()); glyphUsage.getSetValues([&glyphs](unsigned gid) { hb_set_add(glyphs, gid);}); - hb_subset_input_set_retain_gids(input.get(), true); - // TODO: When possible, check if a font is 'tricky' with FT_IS_TRICKY. - // If it isn't known if a font is 'tricky', retain the hints. - hb_subset_input_set_drop_hints(input.get(), false); - HBFace subset(hb_subset(face.get(), input.get())); + HBFace subset = SkPDFHarfBuzzSubset::Make(input.get(), face.get()); + if (!subset) { + return nullptr; + } HBBlob result(hb_face_reference_blob(subset.get())); return to_data(std::move(result)); } diff -Naur a/src/3rdparty/chromium/third_party/webrtc/api/frame_transformer_interface.h b/src/3rdparty/chromium/third_party/webrtc/api/frame_transformer_interface.h --- a/src/3rdparty/chromium/third_party/webrtc/api/frame_transformer_interface.h 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/webrtc/api/frame_transformer_interface.h 2021-11-20 03:40:27.217660897 +0000 @@ -35,6 +35,16 @@ virtual uint32_t GetTimestamp() const = 0; virtual uint32_t GetSsrc() const = 0; + + enum class Direction { + kUnknown, + kReceiver, + kSender, + }; + // TODO(crbug.com/1250638): Remove this distinction between receiver and + // sender frames to allow received frames to be directly re-transmitted on + // other PeerConnectionss. + virtual Direction GetDirection() const { return Direction::kUnknown; } }; class TransformableVideoFrameInterface : public TransformableFrameInterface { diff -Naur a/src/3rdparty/chromium/third_party/webrtc/api/test/mock_transformable_video_frame.h b/src/3rdparty/chromium/third_party/webrtc/api/test/mock_transformable_video_frame.h --- a/src/3rdparty/chromium/third_party/webrtc/api/test/mock_transformable_video_frame.h 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/webrtc/api/test/mock_transformable_video_frame.h 2021-11-20 03:40:27.217660897 +0000 @@ -31,6 +31,10 @@ GetMetadata, (), (const, override)); + MOCK_METHOD(webrtc::TransformableFrameInterface::Direction, + GetDirection, + (), + (const, override)); }; } // namespace webrtc diff -Naur a/src/3rdparty/chromium/third_party/webrtc/audio/channel_receive_frame_transformer_delegate.cc b/src/3rdparty/chromium/third_party/webrtc/audio/channel_receive_frame_transformer_delegate.cc --- a/src/3rdparty/chromium/third_party/webrtc/audio/channel_receive_frame_transformer_delegate.cc 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/webrtc/audio/channel_receive_frame_transformer_delegate.cc 2021-11-20 03:40:27.217660897 +0000 @@ -18,15 +18,16 @@ namespace webrtc { namespace { -class TransformableAudioFrame : public TransformableAudioFrameInterface { +class TransformableIncomingAudioFrame + : public TransformableAudioFrameInterface { public: - TransformableAudioFrame(rtc::ArrayView payload, - const RTPHeader& header, - uint32_t ssrc) + TransformableIncomingAudioFrame(rtc::ArrayView payload, + const RTPHeader& header, + uint32_t ssrc) : payload_(payload.data(), payload.size()), header_(header), ssrc_(ssrc) {} - ~TransformableAudioFrame() override = default; + ~TransformableIncomingAudioFrame() override = default; rtc::ArrayView GetData() const override { return payload_; } void SetData(rtc::ArrayView data) override { @@ -36,6 +37,7 @@ uint32_t GetTimestamp() const override { return header_.timestamp; } uint32_t GetSsrc() const override { return ssrc_; } const RTPHeader& GetHeader() const override { return header_; } + Direction GetDirection() const override { return Direction::kReceiver; } private: rtc::Buffer payload_; @@ -71,7 +73,7 @@ uint32_t ssrc) { RTC_DCHECK_RUN_ON(&sequence_checker_); frame_transformer_->Transform( - std::make_unique(packet, header, ssrc)); + std::make_unique(packet, header, ssrc)); } void ChannelReceiveFrameTransformerDelegate::OnTransformedFrame( @@ -88,7 +90,10 @@ RTC_DCHECK_RUN_ON(&sequence_checker_); if (!receive_frame_callback_) return; - auto* transformed_frame = static_cast(frame.get()); + RTC_CHECK_EQ(frame->GetDirection(), + TransformableFrameInterface::Direction::kReceiver); + auto* transformed_frame = + static_cast(frame.get()); receive_frame_callback_(transformed_frame->GetData(), transformed_frame->GetHeader()); } diff -Naur a/src/3rdparty/chromium/third_party/webrtc/audio/channel_send_frame_transformer_delegate.cc b/src/3rdparty/chromium/third_party/webrtc/audio/channel_send_frame_transformer_delegate.cc --- a/src/3rdparty/chromium/third_party/webrtc/audio/channel_send_frame_transformer_delegate.cc 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/webrtc/audio/channel_send_frame_transformer_delegate.cc 2021-11-20 03:40:27.218660881 +0000 @@ -15,16 +15,16 @@ namespace webrtc { namespace { -class TransformableAudioFrame : public TransformableFrameInterface { +class TransformableOutgoingAudioFrame : public TransformableFrameInterface { public: - TransformableAudioFrame(AudioFrameType frame_type, - uint8_t payload_type, - uint32_t rtp_timestamp, - uint32_t rtp_start_timestamp, - const uint8_t* payload_data, - size_t payload_size, - int64_t absolute_capture_timestamp_ms, - uint32_t ssrc) + TransformableOutgoingAudioFrame(AudioFrameType frame_type, + uint8_t payload_type, + uint32_t rtp_timestamp, + uint32_t rtp_start_timestamp, + const uint8_t* payload_data, + size_t payload_size, + int64_t absolute_capture_timestamp_ms, + uint32_t ssrc) : frame_type_(frame_type), payload_type_(payload_type), rtp_timestamp_(rtp_timestamp), @@ -32,7 +32,7 @@ payload_(payload_data, payload_size), absolute_capture_timestamp_ms_(absolute_capture_timestamp_ms), ssrc_(ssrc) {} - ~TransformableAudioFrame() override = default; + ~TransformableOutgoingAudioFrame() override = default; rtc::ArrayView GetData() const override { return payload_; } void SetData(rtc::ArrayView data) override { payload_.SetData(data.data(), data.size()); @@ -48,6 +48,7 @@ int64_t GetAbsoluteCaptureTimestampMs() const { return absolute_capture_timestamp_ms_; } + Direction GetDirection() const override { return Direction::kSender; } private: AudioFrameType frame_type_; @@ -90,9 +91,10 @@ size_t payload_size, int64_t absolute_capture_timestamp_ms, uint32_t ssrc) { - frame_transformer_->Transform(std::make_unique( - frame_type, payload_type, rtp_timestamp, rtp_start_timestamp, - payload_data, payload_size, absolute_capture_timestamp_ms, ssrc)); + frame_transformer_->Transform( + std::make_unique( + frame_type, payload_type, rtp_timestamp, rtp_start_timestamp, + payload_data, payload_size, absolute_capture_timestamp_ms, ssrc)); } void ChannelSendFrameTransformerDelegate::OnTransformedFrame( @@ -111,9 +113,12 @@ std::unique_ptr frame) const { MutexLock lock(&send_lock_); RTC_DCHECK_RUN_ON(encoder_queue_); + RTC_CHECK_EQ(frame->GetDirection(), + TransformableFrameInterface::Direction::kSender); if (!send_frame_callback_) return; - auto* transformed_frame = static_cast(frame.get()); + auto* transformed_frame = + static_cast(frame.get()); send_frame_callback_(transformed_frame->GetFrameType(), transformed_frame->GetPayloadType(), transformed_frame->GetTimestamp() - diff -Naur a/src/3rdparty/chromium/third_party/webrtc/modules/rtp_rtcp/source/rtp_sender_video_frame_transformer_delegate.cc b/src/3rdparty/chromium/third_party/webrtc/modules/rtp_rtcp/source/rtp_sender_video_frame_transformer_delegate.cc --- a/src/3rdparty/chromium/third_party/webrtc/modules/rtp_rtcp/source/rtp_sender_video_frame_transformer_delegate.cc 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/webrtc/modules/rtp_rtcp/source/rtp_sender_video_frame_transformer_delegate.cc 2021-11-20 03:40:27.218660881 +0000 @@ -75,6 +75,8 @@ return expected_retransmission_time_ms_; } + Direction GetDirection() const override { return Direction::kSender; } + private: rtc::scoped_refptr encoded_data_; const RTPVideoHeader header_; @@ -143,6 +145,8 @@ void RTPSenderVideoFrameTransformerDelegate::SendVideo( std::unique_ptr transformed_frame) const { RTC_CHECK(encoder_queue_->IsCurrent()); + RTC_CHECK_EQ(transformed_frame->GetDirection(), + TransformableFrameInterface::Direction::kSender); MutexLock lock(&sender_lock_); if (!sender_) return; diff -Naur a/src/3rdparty/chromium/third_party/webrtc/video/rtp_video_stream_receiver_frame_transformer_delegate.cc b/src/3rdparty/chromium/third_party/webrtc/video/rtp_video_stream_receiver_frame_transformer_delegate.cc --- a/src/3rdparty/chromium/third_party/webrtc/video/rtp_video_stream_receiver_frame_transformer_delegate.cc 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/third_party/webrtc/video/rtp_video_stream_receiver_frame_transformer_delegate.cc 2021-11-20 03:40:27.218660881 +0000 @@ -59,6 +59,8 @@ return std::move(frame_); } + Direction GetDirection() const override { return Direction::kReceiver; } + private: std::unique_ptr frame_; const VideoFrameMetadata metadata_; @@ -111,6 +113,8 @@ void RtpVideoStreamReceiverFrameTransformerDelegate::ManageFrame( std::unique_ptr frame) { RTC_DCHECK_RUN_ON(&network_sequence_checker_); + RTC_CHECK_EQ(frame->GetDirection(), + TransformableFrameInterface::Direction::kReceiver); if (!receiver_) return; auto transformed_frame = absl::WrapUnique( diff -Naur a/src/3rdparty/chromium/ui/webui/webui_allowlist.cc b/src/3rdparty/chromium/ui/webui/webui_allowlist.cc --- a/src/3rdparty/chromium/ui/webui/webui_allowlist.cc 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/ui/webui/webui_allowlist.cc 2021-11-20 03:37:46.626204780 +0000 @@ -6,7 +6,11 @@ #include +#include "base/memory/scoped_refptr.h" +#include "base/sequence_checker.h" +#include "base/supports_user_data.h" #include "content/public/browser/browser_context.h" +#include "content/public/browser/browser_thread.h" #include "content/public/common/url_constants.h" #include "ui/webui/webui_allowlist_provider.h" #include "url/gurl.h" @@ -19,15 +23,27 @@ using MapType = std::map; public: - explicit AllowlistRuleIterator(const MapType& map) - : it_(map.cbegin()), end_(map.cend()) {} + // Hold a reference to `allowlist` to keep it alive during iteration. + explicit AllowlistRuleIterator(scoped_refptr allowlist, + const MapType& map, + std::unique_ptr auto_lock) + : auto_lock_(std::move(auto_lock)), + allowlist_(std::move(allowlist)), + it_(map.cbegin()), + end_(map.cend()) {} AllowlistRuleIterator(const AllowlistRuleIterator&) = delete; void operator=(const AllowlistRuleIterator&) = delete; - ~AllowlistRuleIterator() override = default; + ~AllowlistRuleIterator() override { + DCHECK_CALLED_ON_VALID_SEQUENCE(sequence_checker_); + } - bool HasNext() const override { return it_ != end_; } + bool HasNext() const override { + DCHECK_CALLED_ON_VALID_SEQUENCE(sequence_checker_); + return it_ != end_; + } content_settings::Rule Next() override { + DCHECK_CALLED_ON_VALID_SEQUENCE(sequence_checker_); const auto& origin = it_->first; const auto& setting = it_->second; it_++; @@ -38,8 +54,18 @@ } private: - MapType::const_iterator it_; - const MapType::const_iterator end_; + const std::unique_ptr auto_lock_; + const scoped_refptr allowlist_; + + SEQUENCE_CHECKER(sequence_checker_); + MapType::const_iterator it_ GUARDED_BY_CONTEXT(sequence_checker_); + MapType::const_iterator end_ GUARDED_BY_CONTEXT(sequence_checker_); +}; + +struct WebUIAllowlistHolder : base::SupportsUserData::Data { + explicit WebUIAllowlistHolder(scoped_refptr list) + : allow_list(std::move(list)) {} + const scoped_refptr allow_list; }; } // namespace @@ -48,11 +74,14 @@ WebUIAllowlist* WebUIAllowlist::GetOrCreate( content::BrowserContext* browser_context) { if (!browser_context->GetUserData(kWebUIAllowlistKeyName)) { - browser_context->SetUserData(kWebUIAllowlistKeyName, - std::make_unique()); + auto list = base::MakeRefCounted(); + browser_context->SetUserData( + kWebUIAllowlistKeyName, + std::make_unique(std::move(list))); } - return static_cast( - browser_context->GetUserData(kWebUIAllowlistKeyName)); + return static_cast( + browser_context->GetUserData(kWebUIAllowlistKeyName)) + ->allow_list.get(); } WebUIAllowlist::WebUIAllowlist() = default; @@ -62,6 +91,9 @@ void WebUIAllowlist::RegisterAutoGrantedPermission(const url::Origin& origin, ContentSettingsType type, ContentSetting setting) { + DCHECK_CURRENTLY_ON(content::BrowserThread::UI); + DCHECK_CALLED_ON_VALID_THREAD(thread_checker_); + // It doesn't make sense to grant a default content setting. DCHECK_NE(CONTENT_SETTING_DEFAULT, setting); @@ -70,13 +102,16 @@ DCHECK(origin.scheme() == content::kChromeUIScheme || origin.scheme() == content::kChromeUIUntrustedScheme || origin.scheme() == content::kChromeDevToolsScheme); + { + base::AutoLock auto_lock(lock_); - // If the same permission is already registered, do nothing. We don't want to - // notify the provider of ContentSettingChange when it is unnecessary. - if (permissions_[type][origin] == setting) - return; + // If the same permission is already registered, do nothing. We don't want + // to notify the provider of ContentSettingChange when it is unnecessary. + if (permissions_[type][origin] == setting) + return; - permissions_[type][origin] = setting; + permissions_[type][origin] = setting; + } // Notify the provider. |provider_| can be nullptr if // HostContentSettingsRegistry is shutting down i.e. when Chrome shuts down. @@ -92,25 +127,36 @@ void WebUIAllowlist::RegisterAutoGrantedPermissions( const url::Origin& origin, std::initializer_list types) { + DCHECK_CURRENTLY_ON(content::BrowserThread::UI); + DCHECK_CALLED_ON_VALID_THREAD(thread_checker_); + for (const ContentSettingsType& type : types) RegisterAutoGrantedPermission(origin, type); } void WebUIAllowlist::SetWebUIAllowlistProvider( WebUIAllowlistProvider* provider) { + DCHECK_CURRENTLY_ON(content::BrowserThread::UI); + DCHECK_CALLED_ON_VALID_THREAD(thread_checker_); + provider_ = provider; } void WebUIAllowlist::ResetWebUIAllowlistProvider() { + DCHECK_CURRENTLY_ON(content::BrowserThread::UI); + DCHECK_CALLED_ON_VALID_THREAD(thread_checker_); + provider_ = nullptr; } std::unique_ptr WebUIAllowlist::GetRuleIterator( ContentSettingsType content_type) const { - const auto& type_to_origin_rules = permissions_.find(content_type); - if (type_to_origin_rules != permissions_.cend()) { - return std::make_unique( - type_to_origin_rules->second); + auto auto_lock_ = std::make_unique(lock_); + + auto permissions_it = permissions_.find(content_type); + if (permissions_it != permissions_.end()) { + return std::make_unique(this, permissions_it->second, + std::move(auto_lock_)); } return nullptr; diff -Naur a/src/3rdparty/chromium/ui/webui/webui_allowlist.h b/src/3rdparty/chromium/ui/webui/webui_allowlist.h --- a/src/3rdparty/chromium/ui/webui/webui_allowlist.h 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/ui/webui/webui_allowlist.h 2021-11-20 03:37:46.627204764 +0000 @@ -8,7 +8,9 @@ #include #include -#include "base/supports_user_data.h" +#include "base/memory/ref_counted.h" +#include "base/thread_annotations.h" +#include "base/threading/thread_checker.h" #include "components/content_settings/core/browser/content_settings_rule.h" #include "components/content_settings/core/common/content_settings.h" #include "components/content_settings/core/common/content_settings_types.h" @@ -23,14 +25,13 @@ // list of origins and permissions to be auto-granted to WebUIs. This class is // created before HostContentSettingsMap is registered and has the same lifetime // as the profile it's attached to. It outlives WebUIAllowlistProvider. -class WebUIAllowlist : public base::SupportsUserData::Data { +class WebUIAllowlist : public base::RefCountedThreadSafe { public: static WebUIAllowlist* GetOrCreate(content::BrowserContext* browser_context); WebUIAllowlist(); WebUIAllowlist(const WebUIAllowlist&) = delete; void operator=(const WebUIAllowlist&) = delete; - ~WebUIAllowlist() override; // Register auto-granted |type| permission for |origin|. // @@ -53,16 +54,29 @@ const url::Origin& origin, std::initializer_list types); + // Returns a content_settings::RuleIterator, this method is thread-safe. + // + // This method acquires `lock_` and transfers it to the returned iterator. + // NO_THREAD_SAFETY_ANALYSIS because the analyzer doesn't recognize acquiring + // the lock in a unique_ptr. std::unique_ptr GetRuleIterator( - ContentSettingsType content_type) const; + ContentSettingsType content_type) const NO_THREAD_SAFETY_ANALYSIS; void SetWebUIAllowlistProvider(WebUIAllowlistProvider* provider); void ResetWebUIAllowlistProvider(); private: + friend class base::RefCountedThreadSafe; + ~WebUIAllowlist(); + + THREAD_CHECKER(thread_checker_); + + mutable base::Lock lock_; std::map> - permissions_; - WebUIAllowlistProvider* provider_ = nullptr; + permissions_ GUARDED_BY(lock_); + + WebUIAllowlistProvider* provider_ GUARDED_BY_CONTEXT(thread_checker_) = + nullptr; }; #endif // UI_WEBUI_WEBUI_ALLOWLIST_H_ diff -Naur a/src/3rdparty/chromium/ui/webui/webui_allowlist_provider.cc b/src/3rdparty/chromium/ui/webui/webui_allowlist_provider.cc --- a/src/3rdparty/chromium/ui/webui/webui_allowlist_provider.cc 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/ui/webui/webui_allowlist_provider.cc 2021-11-20 03:37:46.627204764 +0000 @@ -7,8 +7,9 @@ #include "components/content_settings/core/common/content_settings_pattern.h" #include "ui/webui/webui_allowlist.h" -WebUIAllowlistProvider::WebUIAllowlistProvider(WebUIAllowlist* allowlist) - : allowlist_(allowlist) { +WebUIAllowlistProvider::WebUIAllowlistProvider( + scoped_refptr allowlist) + : allowlist_(std::move(allowlist)) { DCHECK(allowlist_); allowlist_->SetWebUIAllowlistProvider(this); } @@ -20,8 +21,6 @@ ContentSettingsType content_type, const content_settings::ResourceIdentifier& /*resource_identifier*/, bool incognito) const { - if (!allowlist_) - return nullptr; return allowlist_->GetRuleIterator(content_type); } @@ -51,7 +50,7 @@ } void WebUIAllowlistProvider::ShutdownOnUIThread() { + DCHECK(CalledOnValidThread()); RemoveAllObservers(); allowlist_->ResetWebUIAllowlistProvider(); - allowlist_ = nullptr; } diff -Naur a/src/3rdparty/chromium/ui/webui/webui_allowlist_provider.h b/src/3rdparty/chromium/ui/webui/webui_allowlist_provider.h --- a/src/3rdparty/chromium/ui/webui/webui_allowlist_provider.h 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/ui/webui/webui_allowlist_provider.h 2021-11-20 03:37:46.627204764 +0000 @@ -5,6 +5,8 @@ #ifndef UI_WEBUI_WEBUI_ALLOWLIST_PROVIDER_H_ #define UI_WEBUI_WEBUI_ALLOWLIST_PROVIDER_H_ +#include "base/synchronization/lock.h" +#include "base/thread_annotations.h" #include "components/content_settings/core/browser/content_settings_observable_provider.h" #include "components/content_settings/core/common/content_settings.h" #include "ui/webui/webui_allowlist.h" @@ -15,8 +17,7 @@ // permissions from the underlying WebUIAllowlist. class WebUIAllowlistProvider : public content_settings::ObservableProvider { public: - // Note, |allowlist| must outlive this instance. - explicit WebUIAllowlistProvider(WebUIAllowlist* allowlist); + explicit WebUIAllowlistProvider(scoped_refptr allowlist); WebUIAllowlistProvider(const WebUIAllowlistProvider&) = delete; void operator=(const WebUIAllowlistProvider&) = delete; ~WebUIAllowlistProvider() override; @@ -27,6 +28,7 @@ ContentSettingsType content_type); // content_settings::ObservableProvider: + // The following methods are thread-safe. std::unique_ptr GetRuleIterator( ContentSettingsType content_type, const content_settings::ResourceIdentifier& /*resource_identifier*/, @@ -42,7 +44,7 @@ void ClearAllContentSettingsRules(ContentSettingsType content_type) override; private: - WebUIAllowlist* allowlist_; + const scoped_refptr allowlist_; }; #endif // UI_WEBUI_WEBUI_ALLOWLIST_PROVIDER_H_ diff -Naur a/src/3rdparty/chromium/v8/include/v8-version.h b/src/3rdparty/chromium/v8/include/v8-version.h --- a/src/3rdparty/chromium/v8/include/v8-version.h 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/v8/include/v8-version.h 2021-11-20 03:42:17.697898746 +0000 @@ -11,7 +11,7 @@ #define V8_MAJOR_VERSION 8 #define V8_MINOR_VERSION 7 #define V8_BUILD_NUMBER 220 -#define V8_PATCH_LEVEL 33 +#define V8_PATCH_LEVEL 34 // Use 1 for candidates and 0 otherwise. // (Boolean macro values are not supported by all preprocessors.) diff -Naur a/src/3rdparty/chromium/v8/src/heap/concurrent-marking.cc b/src/3rdparty/chromium/v8/src/heap/concurrent-marking.cc --- a/src/3rdparty/chromium/v8/src/heap/concurrent-marking.cc 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/v8/src/heap/concurrent-marking.cc 2021-11-20 03:41:00.674124691 +0000 @@ -403,7 +403,7 @@ isolate->PrintWithTimestamp("Starting concurrent marking task %d\n", task_id); } - bool ephemeron_marked = false; + bool another_ephemeron_iteration = false; { TimedScope scope(&time_ms); @@ -413,7 +413,7 @@ while (weak_objects_->current_ephemerons.Pop(task_id, &ephemeron)) { if (visitor.ProcessEphemeron(ephemeron.key, ephemeron.value)) { - ephemeron_marked = true; + another_ephemeron_iteration = true; } } } @@ -454,6 +454,7 @@ current_marked_bytes += visited_size; } } + if (objects_processed > 0) another_ephemeron_iteration = true; marked_bytes += current_marked_bytes; base::AsAtomicWord::Relaxed_Store(&task_state->marked_bytes, marked_bytes); @@ -469,7 +470,7 @@ while (weak_objects_->discovered_ephemerons.Pop(task_id, &ephemeron)) { if (visitor.ProcessEphemeron(ephemeron.key, ephemeron.value)) { - ephemeron_marked = true; + another_ephemeron_iteration = true; } } } @@ -489,8 +490,8 @@ base::AsAtomicWord::Relaxed_Store(&task_state->marked_bytes, 0); total_marked_bytes_ += marked_bytes; - if (ephemeron_marked) { - set_ephemeron_marked(true); + if (another_ephemeron_iteration) { + set_another_ephemeron_iteration(true); } { diff -Naur a/src/3rdparty/chromium/v8/src/heap/concurrent-marking.h b/src/3rdparty/chromium/v8/src/heap/concurrent-marking.h --- a/src/3rdparty/chromium/v8/src/heap/concurrent-marking.h 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/v8/src/heap/concurrent-marking.h 2021-11-20 03:41:00.674124691 +0000 @@ -96,10 +96,12 @@ size_t TotalMarkedBytes(); - void set_ephemeron_marked(bool ephemeron_marked) { - ephemeron_marked_.store(ephemeron_marked); + void set_another_ephemeron_iteration(bool another_ephemeron_iteration) { + another_ephemeron_iteration_.store(another_ephemeron_iteration); + } + bool another_ephemeron_iteration() { + return another_ephemeron_iteration_.load(); } - bool ephemeron_marked() { return ephemeron_marked_.load(); } private: struct TaskState { @@ -121,7 +123,7 @@ WeakObjects* const weak_objects_; TaskState task_state_[kMaxTasks + 1]; std::atomic total_marked_bytes_{0}; - std::atomic ephemeron_marked_{false}; + std::atomic another_ephemeron_iteration_{false}; base::Mutex pending_lock_; base::ConditionVariable pending_condition_; int pending_task_count_ = 0; diff -Naur a/src/3rdparty/chromium/v8/src/heap/incremental-marking.cc b/src/3rdparty/chromium/v8/src/heap/incremental-marking.cc --- a/src/3rdparty/chromium/v8/src/heap/incremental-marking.cc 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/v8/src/heap/incremental-marking.cc 2021-11-20 03:41:00.674124691 +0000 @@ -1074,7 +1074,8 @@ // This ignores that case where the embedder finds new V8-side objects. The // assumption is that large graphs are well connected and can mostly be // processed on their own. For small graphs, helping is not necessary. - v8_bytes_processed = collector_->ProcessMarkingWorklist(bytes_to_process); + std::tie(v8_bytes_processed, std::ignore) = + collector_->ProcessMarkingWorklist(bytes_to_process); StepResult v8_result = local_marking_worklists()->IsEmpty() ? StepResult::kNoImmediateWork : StepResult::kMoreWorkRemaining; diff -Naur a/src/3rdparty/chromium/v8/src/heap/mark-compact.cc b/src/3rdparty/chromium/v8/src/heap/mark-compact.cc --- a/src/3rdparty/chromium/v8/src/heap/mark-compact.cc 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/v8/src/heap/mark-compact.cc 2021-11-20 03:41:00.675124675 +0000 @@ -1641,24 +1641,24 @@ descriptors, number_of_own_descriptors); } -void MarkCompactCollector::ProcessEphemeronsUntilFixpoint() { - bool work_to_do = true; +bool MarkCompactCollector::ProcessEphemeronsUntilFixpoint() { int iterations = 0; int max_iterations = FLAG_ephemeron_fixpoint_iterations; - while (work_to_do) { + bool another_ephemeron_iteration_main_thread; + + do { PerformWrapperTracing(); if (iterations >= max_iterations) { // Give up fixpoint iteration and switch to linear algorithm. - ProcessEphemeronsLinear(); - break; + return false; } // Move ephemerons from next_ephemerons into current_ephemerons to // drain them in this iteration. weak_objects_.current_ephemerons.Swap(weak_objects_.next_ephemerons); - heap()->concurrent_marking()->set_ephemeron_marked(false); + heap()->concurrent_marking()->set_another_ephemeron_iteration(false); { TRACE_GC(heap()->tracer(), @@ -1668,7 +1668,7 @@ heap_->concurrent_marking()->RescheduleTasksIfNeeded(); } - work_to_do = ProcessEphemerons(); + another_ephemeron_iteration_main_thread = ProcessEphemerons(); FinishConcurrentMarking( ConcurrentMarking::StopRequest::COMPLETE_ONGOING_TASKS); } @@ -1676,40 +1676,47 @@ CHECK(weak_objects_.current_ephemerons.IsEmpty()); CHECK(weak_objects_.discovered_ephemerons.IsEmpty()); - work_to_do = work_to_do || !local_marking_worklists()->IsEmpty() || - heap()->concurrent_marking()->ephemeron_marked() || - !local_marking_worklists()->IsEmbedderEmpty() || - !heap()->local_embedder_heap_tracer()->IsRemoteTracingDone(); ++iterations; - } + } while (another_ephemeron_iteration_main_thread || + heap()->concurrent_marking()->another_ephemeron_iteration() || + !local_marking_worklists()->IsEmpty() || + !local_marking_worklists()->IsEmbedderEmpty() || + !heap()->local_embedder_heap_tracer()->IsRemoteTracingDone()); CHECK(local_marking_worklists()->IsEmpty()); CHECK(weak_objects_.current_ephemerons.IsEmpty()); CHECK(weak_objects_.discovered_ephemerons.IsEmpty()); + return true; } bool MarkCompactCollector::ProcessEphemerons() { Ephemeron ephemeron; - bool ephemeron_marked = false; + bool another_ephemeron_iteration = false; // Drain current_ephemerons and push ephemerons where key and value are still // unreachable into next_ephemerons. while (weak_objects_.current_ephemerons.Pop(kMainThreadTask, &ephemeron)) { if (ProcessEphemeron(ephemeron.key, ephemeron.value)) { - ephemeron_marked = true; + another_ephemeron_iteration = true; } } // Drain marking worklist and push discovered ephemerons into // discovered_ephemerons. - DrainMarkingWorklist(); + size_t objects_processed; + std::tie(std::ignore, objects_processed) = ProcessMarkingWorklist(0); + + // As soon as a single object was processed and potentially marked another + // object we need another iteration. Otherwise we might miss to apply + // ephemeron semantics on it. + if (objects_processed > 0) another_ephemeron_iteration = true; // Drain discovered_ephemerons (filled in the drain MarkingWorklist-phase // before) and push ephemerons where key and value are still unreachable into // next_ephemerons. while (weak_objects_.discovered_ephemerons.Pop(kMainThreadTask, &ephemeron)) { if (ProcessEphemeron(ephemeron.key, ephemeron.value)) { - ephemeron_marked = true; + another_ephemeron_iteration = true; } } @@ -1717,7 +1724,7 @@ weak_objects_.ephemeron_hash_tables.FlushToGlobal(kMainThreadTask); weak_objects_.next_ephemerons.FlushToGlobal(kMainThreadTask); - return ephemeron_marked; + return another_ephemeron_iteration; } void MarkCompactCollector::ProcessEphemeronsLinear() { @@ -1803,6 +1810,12 @@ ephemeron_marking_.newly_discovered.shrink_to_fit(); CHECK(local_marking_worklists()->IsEmpty()); + CHECK(weak_objects_.current_ephemerons.IsEmpty()); + CHECK(weak_objects_.discovered_ephemerons.IsEmpty()); + + // Flush local ephemerons for main task to global pool. + weak_objects_.ephemeron_hash_tables.FlushToGlobal(kMainThreadTask); + weak_objects_.next_ephemerons.FlushToGlobal(kMainThreadTask); } void MarkCompactCollector::PerformWrapperTracing() { @@ -1824,9 +1837,11 @@ void MarkCompactCollector::DrainMarkingWorklist() { ProcessMarkingWorklist(0); } template -size_t MarkCompactCollector::ProcessMarkingWorklist(size_t bytes_to_process) { +std::pair MarkCompactCollector::ProcessMarkingWorklist( + size_t bytes_to_process) { HeapObject object; size_t bytes_processed = 0; + size_t objects_processed = 0; bool is_per_context_mode = local_marking_worklists()->IsPerContextMode(); Isolate* isolate = heap()->isolate(); while (local_marking_worklists()->Pop(&object) || @@ -1866,18 +1881,19 @@ map, object, visited_size); } bytes_processed += visited_size; + objects_processed++; if (bytes_to_process && bytes_processed >= bytes_to_process) { break; } } - return bytes_processed; + return std::make_pair(bytes_processed, objects_processed); } // Generate definitions for use in other files. -template size_t MarkCompactCollector::ProcessMarkingWorklist< +template std::pair MarkCompactCollector::ProcessMarkingWorklist< MarkCompactCollector::MarkingWorklistProcessingMode::kDefault>( size_t bytes_to_process); -template size_t MarkCompactCollector::ProcessMarkingWorklist< +template std::pair MarkCompactCollector::ProcessMarkingWorklist< MarkCompactCollector::MarkingWorklistProcessingMode:: kTrackNewlyDiscoveredObjects>(size_t bytes_to_process); @@ -1902,7 +1918,23 @@ // buffer, flush it into global pool. weak_objects_.next_ephemerons.FlushToGlobal(kMainThreadTask); - ProcessEphemeronsUntilFixpoint(); + if (!ProcessEphemeronsUntilFixpoint()) { + // Fixpoint iteration needed too many iterations and was cancelled. Use the + // guaranteed linear algorithm. + ProcessEphemeronsLinear(); + } + +#ifdef VERIFY_HEAP + if (FLAG_verify_heap) { + Ephemeron ephemeron; + + weak_objects_.current_ephemerons.Swap(weak_objects_.next_ephemerons); + + while (weak_objects_.current_ephemerons.Pop(kMainThreadTask, &ephemeron)) { + CHECK(!ProcessEphemeron(ephemeron.key, ephemeron.value)); + } + } +#endif CHECK(local_marking_worklists()->IsEmpty()); CHECK(heap()->local_embedder_heap_tracer()->IsRemoteTracingDone()); diff -Naur a/src/3rdparty/chromium/v8/src/heap/mark-compact.h b/src/3rdparty/chromium/v8/src/heap/mark-compact.h --- a/src/3rdparty/chromium/v8/src/heap/mark-compact.h 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/v8/src/heap/mark-compact.h 2021-11-20 03:41:00.675124675 +0000 @@ -590,7 +590,7 @@ // is drained until it is empty. template - size_t ProcessMarkingWorklist(size_t bytes_to_process); + std::pair ProcessMarkingWorklist(size_t bytes_to_process); private: void ComputeEvacuationHeuristics(size_t area_size, @@ -636,8 +636,9 @@ bool ProcessEphemeron(HeapObject key, HeapObject value); // Marks ephemerons and drains marking worklist iteratively - // until a fixpoint is reached. - void ProcessEphemeronsUntilFixpoint(); + // until a fixpoint is reached. Returns false if too many iterations have been + // tried and the linear approach should be used. + bool ProcessEphemeronsUntilFixpoint(); // Drains ephemeron and marking worklists. Single iteration of the // fixpoint iteration. diff -Naur a/src/3rdparty/chromium/v8/src/objects/js-function.cc b/src/3rdparty/chromium/v8/src/objects/js-function.cc --- a/src/3rdparty/chromium/v8/src/objects/js-function.cc 2021-08-24 12:54:05.000000000 +0100 +++ b/src/3rdparty/chromium/v8/src/objects/js-function.cc 2021-11-20 03:35:29.683326507 +0000 @@ -291,6 +291,14 @@ void JSFunction::InitializeFeedbackCell(Handle function, IsCompiledScope* is_compiled_scope) { Isolate* const isolate = function->GetIsolate(); +#if V8_ENABLE_WEBASSEMBLY + // The following checks ensure that the feedback vectors are compatible with + // the feedback metadata. For Asm / Wasm functions we never allocate / use + // feedback vectors, so a mismatch between the metadata and feedback vector is + // harmless. The checks could fail for functions that has has_asm_wasm_broken + // set at runtime (for ex: failed instantiation). + if (function->shared().HasAsmWasmData()) return; +#endif // V8_ENABLE_WEBASSEMBLY if (function->has_feedback_vector()) { CHECK_EQ(function->feedback_vector().length(), diff -Naur a/src/buildtools/config/windows.pri b/src/buildtools/config/windows.pri --- a/src/buildtools/config/windows.pri 2021-08-24 13:35:32.000000000 +0100 +++ b/src/buildtools/config/windows.pri 2021-11-20 03:27:42.386569236 +0000 @@ -71,7 +71,7 @@ msvc { equals(MSVC_VER, 15.0) { MSVS_VERSION = 2017 - } else: equals(MSVC_VER, 16.0) { + } else: versionAtLeast(MSVC_VER, 16.0) { MSVS_VERSION = 2019 } else { error("Visual Studio compiler version \"$$MSVC_VER\" is not supported by gn.") diff -Naur a/src/core/media_capture_devices_dispatcher.cpp b/src/core/media_capture_devices_dispatcher.cpp --- a/src/core/media_capture_devices_dispatcher.cpp 2021-08-24 13:35:32.000000000 +0100 +++ b/src/core/media_capture_devices_dispatcher.cpp 2021-11-20 03:32:41.112942241 +0000 @@ -70,6 +70,7 @@ #endif #include +#include #if defined(WEBRTC_USE_X11) #include @@ -197,6 +198,7 @@ int numMonitors = 0; XRRMonitorInfo *monitors = getMonitors(display, rootWindow, true, &numMonitors); + auto cleanup = qScopeGuard([&] () { freeMonitors(monitors); }); if (numMonitors > 0) return content::DesktopMediaID(content::DesktopMediaID::TYPE_SCREEN, monitors[0].name); #endif // !defined(WEBRTC_USE_X11) diff -Naur a/src/core/net/proxying_url_loader_factory_qt.cpp b/src/core/net/proxying_url_loader_factory_qt.cpp --- a/src/core/net/proxying_url_loader_factory_qt.cpp 2021-08-24 13:35:32.000000000 +0100 +++ b/src/core/net/proxying_url_loader_factory_qt.cpp 2021-11-20 03:32:13.929364759 +0000 @@ -47,8 +47,11 @@ #include "content/public/browser/browser_task_traits.h" #include "content/public/browser/browser_thread.h" #include "content/public/browser/web_contents.h" +#include "content/public/common/content_switches.h" #include "net/http/http_status_code.h" +#include "services/network/public/cpp/cors/cors.h" #include "third_party/blink/public/mojom/loader/resource_load_info.mojom-shared.h" +#include "url/url_util.h" #include "api/qwebengineurlrequestinfo_p.h" #include "type_conversion.h" @@ -162,6 +165,7 @@ const uint64_t request_id_; const int32_t routing_id_; const uint32_t options_; + bool allowed_cors_ = true; // If the |target_loader_| called OnComplete with an error this stores it. // That way the destructor can send it to OnReceivedError if safe browsing @@ -204,12 +208,37 @@ , target_factory_(std::move(target_factory)) , weak_factory_(this) { + const bool disable_web_security = base::CommandLine::ForCurrentProcess()->HasSwitch(switches::kDisableWebSecurity); current_response_ = network::mojom::URLResponseHead::New(); + current_response_->response_type = network::cors::CalculateResponseType( + request_.mode, + disable_web_security || ( + request_.request_initiator && request_.request_initiator->IsSameOriginWith(url::Origin::Create(request_.url)))); // If there is a client error, clean up the request. target_client_.set_disconnect_handler( - base::BindOnce(&InterceptedRequest::OnURLLoaderClientError, weak_factory_.GetWeakPtr())); + base::BindOnce(&InterceptedRequest::OnURLLoaderClientError, base::Unretained(this))); proxied_loader_receiver_.set_disconnect_with_reason_handler( - base::BindOnce(&InterceptedRequest::OnURLLoaderError, weak_factory_.GetWeakPtr())); + base::BindOnce(&InterceptedRequest::OnURLLoaderError, base::Unretained(this))); + if (!disable_web_security && request_.request_initiator) { + const std::vector &localSchemes = url::GetLocalSchemes(); + std::string fromScheme = request_.request_initiator->GetTupleOrPrecursorTupleIfOpaque().scheme(); + if (base::Contains(localSchemes, fromScheme)) { + content::WebContents *wc = webContents(); + std::string toScheme = request_.url.scheme(); + // local schemes must have universal access, or be accessing something local and have local access. + if (fromScheme != toScheme) { + // note allow_file_access_from_file_urls maps to LocalContentCanAccessFileUrls in our API + // and allow_universal_access_from_file_urls to LocalContentCanAccessRemoteUrls, so we are + // using them as proxies for our API here. + if (toScheme == "file") + allowed_cors_ = wc && wc->GetOrCreateWebPreferences().allow_file_access_from_file_urls; + else if (!base::Contains(localSchemes, toScheme)) + allowed_cors_ = wc && wc->GetOrCreateWebPreferences().allow_universal_access_from_file_urls; + else + allowed_cors_ = true; // We should think about this for future patches + } + } + } } InterceptedRequest::~InterceptedRequest() @@ -246,6 +275,14 @@ { DCHECK_CURRENTLY_ON(content::BrowserThread::UI); + // This is a CORS check on the from URL, the normal check on the to URL is applied later + if (!allowed_cors_ && current_response_->response_type == network::mojom::FetchResponseType::kCors) { + target_client_->OnComplete(network::URLLoaderCompletionStatus( + network::CorsErrorStatus(network::mojom::CorsError::kCorsDisabledScheme))); + delete this; + return; + } + // MEMO since all codepatch leading to Restart scheduled and executed as asynchronous tasks in main thread, // interceptors may change in meantime and also during intercept call, so they should be resolved anew. // Set here only profile's interceptor since it runs first without going to user code. diff -Naur a/src/core/ozone/ozone_platform_qt.cpp b/src/core/ozone/ozone_platform_qt.cpp --- a/src/core/ozone/ozone_platform_qt.cpp 2021-08-24 13:35:32.000000000 +0100 +++ b/src/core/ozone/ozone_platform_qt.cpp 2021-11-20 03:32:51.938773974 +0000 @@ -164,7 +164,17 @@ if (XkbGetState(dpy, XkbUseCoreKbd, &state) != 0) return std::string(); - XkbRF_VarDefsRec vdr; + XkbRF_VarDefsRec vdr {}; // zero initialize it + struct Cleanup { + XkbRF_VarDefsRec &vdr; + Cleanup(XkbRF_VarDefsRec &vdr) : vdr(vdr) { } + ~Cleanup() { + free (vdr.model); + free (vdr.layout); + free (vdr.variant); + free (vdr.options); + } + } cleanup(vdr); if (XkbRF_GetNamesProp(dpy, nullptr, &vdr) == 0) return std::string(); diff -Naur a/src/core/render_widget_host_view_qt.cpp b/src/core/render_widget_host_view_qt.cpp --- a/src/core/render_widget_host_view_qt.cpp 2021-08-24 13:35:32.000000000 +0100 +++ b/src/core/render_widget_host_view_qt.cpp 2021-11-20 03:33:03.760590226 +0000 @@ -1662,7 +1662,8 @@ { const Qt::NativeGestureType type = ev->gestureType(); // These are the only supported gestures by Chromium so far. - if (type == Qt::ZoomNativeGesture || type == Qt::SmartZoomNativeGesture) { + if (type == Qt::ZoomNativeGesture || type == Qt::SmartZoomNativeGesture + || type == Qt::BeginNativeGesture || type == Qt::EndNativeGesture) { if (host()->delegate() && host()->delegate()->GetInputEventRouter()) { auto webEvent = WebEventFactory::toWebGestureEvent(ev); host()->delegate()->GetInputEventRouter()->RouteGestureEvent(this, &webEvent, ui::LatencyInfo()); diff -Naur a/src/core/web_event_factory.cpp b/src/core/web_event_factory.cpp --- a/src/core/web_event_factory.cpp 2021-08-24 13:35:32.000000000 +0100 +++ b/src/core/web_event_factory.cpp 2021-11-20 03:33:03.760590226 +0000 @@ -1540,7 +1540,13 @@ webKitEvent.data.tap.tap_count = 1; break; case Qt::BeginNativeGesture: + webKitEvent.SetType(WebInputEvent::Type::kGesturePinchBegin); + webKitEvent.SetNeedsWheelEvent(true); + break; case Qt::EndNativeGesture: + webKitEvent.SetType(WebInputEvent::Type::kGesturePinchEnd); + webKitEvent.SetNeedsWheelEvent(true); + break; case Qt::RotateNativeGesture: case Qt::PanNativeGesture: case Qt::SwipeNativeGesture: diff -Naur a/src/webenginewidgets/doc/src/qwebenginesettings_lgpl.qdoc b/src/webenginewidgets/doc/src/qwebenginesettings_lgpl.qdoc --- a/src/webenginewidgets/doc/src/qwebenginesettings_lgpl.qdoc 2021-08-24 13:35:32.000000000 +0100 +++ b/src/webenginewidgets/doc/src/qwebenginesettings_lgpl.qdoc 2021-11-20 03:32:13.929364759 +0000 @@ -106,13 +106,11 @@ Enables support for the HTML 5 local storage feature. Enabled by default. \value LocalContentCanAccessRemoteUrls Allows locally loaded documents to ignore cross-origin rules so that they can access - remote resources that would normally be blocked, because all remote resources are - considered cross-origin for a local file. Remote access that would not be blocked by + remote resources that would normally be blocked, since remote resources are + considered cross-origin for a local document. Remote access that would not be blocked by cross-origin rules is still possible when this setting is disabled (default). - Note that disabling this setting does not stop XMLHttpRequests or media elements in - local files from accessing remote content. Basically, it only stops some HTML - subresources, such as scripts, and therefore disabling this setting is not a safety - mechanism. + Note that disabling this setting does not prevent media elements in local files from + accessing remote content. Disabled by default. \value XSSAuditingEnabled Obsolete and has no effect. \value SpatialNavigationEnabled @@ -123,7 +121,8 @@ trying to reach towards the right and which element they probably want. Disabled by default. \value LocalContentCanAccessFileUrls - Allows locally loaded documents to access other local URLs. Enabled by default. + Allows locally loaded documents to access other local URLs. Disabling this makes QtWebEngine + behave more like Chrome and Firefox does by default. Enabled by default. \value HyperlinkAuditingEnabled Enables support for the \c ping attribute for hyperlinks. Disabled by default. \value ScrollAnimatorEnabled diff -Naur a/tests/auto/quick/qmltests/BLACKLIST b/tests/auto/quick/qmltests/BLACKLIST --- a/tests/auto/quick/qmltests/BLACKLIST 2021-08-24 13:35:32.000000000 +0100 +++ b/tests/auto/quick/qmltests/BLACKLIST 2021-11-20 03:32:28.281141688 +0000 @@ -1,2 +1,5 @@ [NewViewRequest::test_loadNewViewRequest] macos + +[CertificateError::test_fatalError] +* diff -Naur a/tests/auto/widgets/certificateerror/BLACKLIST b/tests/auto/widgets/certificateerror/BLACKLIST --- a/tests/auto/widgets/certificateerror/BLACKLIST 1970-01-01 01:00:00.000000000 +0100 +++ b/tests/auto/widgets/certificateerror/BLACKLIST 2021-11-20 03:32:28.281141688 +0000 @@ -0,0 +1,2 @@ +[fatalError] +* diff -Naur a/tests/auto/widgets/origins/tst_origins.cpp b/tests/auto/widgets/origins/tst_origins.cpp --- a/tests/auto/widgets/origins/tst_origins.cpp 2021-08-24 13:35:32.000000000 +0100 +++ b/tests/auto/widgets/origins/tst_origins.cpp 2021-11-20 03:32:13.929364759 +0000 @@ -657,7 +657,7 @@ << QVariant(QString("ok")); QTest::newRow("file->cors") << QString("file:" THIS_DIR "resources/mixedXHR.html") << QString("sendXHR('cors:/resources/mixedXHR.txt')") - << QVariant(QString("ok")); + << QVariant(QString("error")); QTest::newRow("qrc->file") << QString("qrc:/resources/mixedXHR.html") << QString("sendXHR('file:" THIS_DIR "resources/mixedXHR.txt')")